Large Language Models News & Discussions

bnew · Jul 8, 2024

1/1

CVPR 2024 Best Paper Runners-Up Alert

Paper Title: EventPS: Real-Time Photometric Stereo Using an Event Camera

Few pointers from the paper

Photometric stereo is a well-established technique to estimate the surface normal of an object. However, the requirement of capturing multiple high dynamic range images under different illumination conditions limits the speed and real-time applications.

In this paper authors have introduced “EventPS”, a novel approach to real-time photometric stereo using an event camera. Capitalizing on the exceptional temporal resolution, dynamic range, and low bandwidth characteristics of event cameras, EventPS estimates surface normal only from the radiance changes, significantly enhancing data efficiency.

EventPS seamlessly integrates with both optimization-based and deep-learning-based photo-
metric stereo techniques to offer a robust solution for non-Lambertian surfaces. Extensive experiments validate the effectiveness and efficiency of EventPS compared to frame-based counterparts.

Their algorithm runs at over 30 fps in real-world scenarios, unleashing the potential of EventPS in time-sensitive and high-speed downstream applications.

Organization: National Key Laboratory for Multimedia Information Processing, School of Computer Science, @PKU1898 U, National Engineering Research Center of Visual Technology, School of Computer Science, @PKU1898 , School of Mechanical Engineering, @sjtu1896 , Graduate School of Information Science and Technology, @UTokyo_News_en , @jouhouken

Paper Authors: Bohan Yu , Jieji Ren, Jin Han, Feishi Wang, Jinxiu Liang, Boxin Shi

Read the Full Paper here: https://www.ybh1998.space/wp-conten..._Photometric_Stereo_Using_an_Event_Camera.pdf

Project Page: EventPS: Real-Time Photometric Stereo Using an Event Camera – Bohan Yu's Homepage

Code & Data: EventPS

Heartfelt congratulations to all the talented authors!

Be sure to watch the attached Demo Video-Sound on

Music by Music Unlimited from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I generated explanation:
(llama 3 sonar 32k large chat)

**Big News in Computer Vision Research**

A research paper called "EventPS" has been recognized as one of the best papers at a top conference in computer vision (CVPR 2024). Here's what it's about:

**What's the problem?**

Imagine you want to take a picture of an object and figure out its shape and orientation. One way to do this is called "photometric stereo", which involves taking multiple pictures of the object under different lighting conditions. However, this method is slow and can't be used in real-time applications.

**What's the solution?**

The researchers introduced a new approach called "EventPS", which uses a special type of camera called an "event camera". This camera can capture changes in light very quickly and efficiently, which allows it to estimate the shape and orientation of an object in real-time.

**How does it work?**

EventPS uses the event camera to detect changes in light and uses this information to figure out the shape and orientation of the object. It can work with different types of algorithms, including ones that use machine learning, to get accurate results. The researchers tested their approach and found that it works well and is much faster than other methods.

**What are the benefits?**

EventPS can run at over 30 frames per second, which means it can be used in applications that require fast processing, such as robotics or self-driving cars. This technology has the potential to be used in many different areas, including computer vision, robotics, and more.

**Who did the research?**

The research was done by a team of scientists from several universities and research institutions, including Peking University, Shanghai Jiao Tong University, and the University of Tokyo.

bnew · Jul 8, 2024

1/1

CVPR 2024 Best Student Paper Alert

Paper Title: BIOCLIP: A Vision Foundation Model for the Tree of Life

Few pointers from the paper

Images of the natural world, collected by a variety of cameras, from drones to individual phones, are increasingly abundant sources of biological information. There is an explosion of computational methods and tools, particularly computer vision, for extracting biologically relevant information from images for science and conservation.

Yet most of these are bespoke approaches designed for a specific task and are not easily adaptable or extendable to new questions, contexts, and datasets. A vision model for general organismal biology questions on images is of timely need.

To approach this, authors curated and released “TREEOFLIFE-10M”, the largest and most diverse ML-ready dataset of biology images.

They then developed “BIOCLIP”, a foundation model for the tree of life, leveraging the unique properties of biology captured by TREEOFLIFE-10M, namely the abundance and variety of images of plants, animals, and fungi, together with the availability of rich structured biological knowledge.

They rigorously benchmark their approach on diverse fine-grained biology classification tasks and find that BIOCLIP consistently and substantially outperforms existing baselines (by 16% to 17% absolute).

Intrinsic evaluation reveals that BIOCLIP has learned a hierarchical representation conforming to the tree of life, shedding light on its strong generalizability.

Organization: @OhioState , @Microsoft Research, @UCIrvine , @rpi

Paper Authors: @samstevens6860 , Jiaman Wu, Matthew J Thompson, Elizabeth G Campolongo, @luke_ch_song , David Edward Carlyn, Li Dong, Wasila M Dahdul, Charles Stewart, Tanya Berger-Wolf,@weilunchao , @ysu_nlp

Read the Full Paper here: [2311.18803] BioCLIP: A Vision Foundation Model for the Tree of Life

Project Page: BioCLIP: A Vision Foundation Model for the Tree of Life

Code: GitHub - Imageomics/bioclip: This is the repository for the BioCLIP model and the TreeOfLife-10M dataset [CVPR'24 Oral, Best Student Paper].

Models: imageomics/bioclip · Hugging Face

Heartfelt congratulations to all the talented authors!

Be sure to watch the attached Demo Video-Sound on

Music by Alex_Kizenkov from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 8, 2024

1/11

Chinese AI company SenseTime just revealed SenseNova 5.5, an AI model that claims to beat GPT-4o across key metrics

Plus, big developments from Apple, YouTube, KLING, Neuralink, and Google DeepMind.

Here's everything going on in AI right now:

2/11
At the World Artificial Intelligence Conference (WAIC) in Shanghai this weekend, SenseTime unveiled SenseNova 5.5.

The company claims the model outperforms GPT-4o in 5 out of 8 key metrics.

While I'd take it with a grain of salt, China's AI startups are showing major progress

3/11
SenseTime also revealed SenseNova 5o, a real-time multimodal model capable of processing audio, text, image, and video.

Here's a video of a live demonstration of SenseTime 5o in action (it's incredibly similar to the GPT-4o demo)

4/11
Per Bloomberg, Apple’s Siri upgrades are now expected to come in Spring 2025.

The voice assistant not likely to be part of the Apple Intelligence rollout this Fall.

5/11
YouTube introduced a new AI-powered eraser tool.

It allows creators to quickly remove copyrighted music from videos while preserving other audio elements.

Between this and all the other AI tools YouTube is building, it seems like they're all-in.

6/11
Kuaishou’s KLING AI video generator model is now available as a web app for generations up to 10 seconds.

(Though a Chinese phone number is still required for access)

7/11
Noland Arbaugh, Neuralink's first human patient, hinted at the potential to use the brain implant to control a Tesla Optimus humanoid robot within the next year.

Insane thing to think about.

8/11
Google DeepMind researchers published new research introducing JEST.

It's a new method that accelerates AI model training while significantly reducing computing requirements.

Faster training capabilities = the acceleration of advanced model releases is just getting started

9/11
If you want to keep up with all the AI news, tools, and research, join 600,000+ subscribers reading my free newsletter.

Plus, you'll get a bite-sized AI tutorial with every email.

Get free access: The Rundown AI

10/11
That's it for today's news in the world of AI.

I share what's happening in AI every day, follow me @rowancheung to stay up to speed.

If you found this helpful, spare me a like/retweet on the first tweet of this thread to support my content

11/11
How is SenseTime doing it without NVDA chips?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 8, 2024

1/4
I am mind blown by this new technology!
AI is now embodied.
And we are open-sourcing it all.

Listen to @HaixuanT casually discussing with its cute robot at the @linuxfoundation:

What's your name?
> I am Reachy, a robot from @pollenrobotics, I have two arms.

What do you see?
> A large venue with many people sitting at tables.

Can you give me a high five?
> Yes of course!

2/4
Bro, that’s barely turning test level. We’ve only begun.

3/4
where can i get the video？

4/4
Awesome, congratulations to your amazing team!
And thank you for choosing Reachy for this project :smile:

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 8, 2024

1/2
[CL] T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings
B Deiseroth, M Brack, P Schramowski, K Kersting, S Weinbach [IPAI & Technical University Darmstadt] (2024)
[2406.19223] T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings

- Tokenizers require dedicated training which adds computational overhead. Design choices and errors during tokenizer training negatively impact the downstream model.

- Tokenizer vocabularies are heavily optimized for the reference corpus, leading to poor performance on underrepresented languages.

- Up to 34% of tokens in vocabularies are near duplicates (differing only in capitalization or whitespace), with limited additional information but still independently trained embeddings.

- The resulting vocabulary expansion leads to large embedding and head layers, requiring advanced model parallelism techniques.

- Tokenization principles have remained largely unchanged despite recent advances.

- T-FREE eliminates the need for tokenizer training by directly embedding words as sparse activation patterns over hashed character trigrams.

- T-FREE inherently exploits morphological similarities without independently training each word variant.

- Compressing similar words in T-FREE reduces embedding layer size by 87.5% and encoding length by 56% without performance loss.

- T-FREE shows better cross-lingual transfer as it does not depend on a biased reference corpus.

2/2
Dark mode for this paper for night readers

T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 8, 2024

1/1
[CL] How Does Quantization Affect Multilingual LLMs?
K Marchisio, S Dash, H Chen, D Aumiller… [Cohere] (2024)
[2407.03211] How Does Quantization Affect Multilingual LLMs?

- Quantization is commonly used to deploy large LMs by reducing compute, but little work examines the impact on multilingual models. This paper conducts the first thorough analysis of quantized multilingual LMs.

- They evaluate 4 multilingual LLMs from 8B to 103B parameters under various quantization techniques using automatic benchmarks, LLM-as-a-Judge, and human evaluation across 20+ languages.

- Key findings are: 1) Damage from quantization is much worse than automatic metrics indicate - human evaluators notice significant drops not reflected in benchmarks. 2) Quantization affects languages differently, with non-Latin scripts harmed most. 3) Math reasoning and challenging tasks degrade fastest. But 4) Sometimes quantization helps model performance.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 8, 2024

1/1
[CL] RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
J Dang, A Ahmadian, K Marchisio… [Cohere] (2024)
[2407.02552] RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

- Preference optimization leads to large gains for LLMs, but most work has focused on English. Expanding to more languages is an urgent challenge.

- Multilingual modeling faces data scarcity and quality issues. High-quality multilingual preference data is especially lacking.

- They introduce a novel method to generate diverse, high-quality multilingual feedback data.

- Preference optimization exhibits cross-lingual transfer even with only English data. Adding more languages increases transfer.

- Increasing multilingual data consistently improves multilingual performance over English-only data.

- Online preference optimization with RLOO outperforms offline DPO, and enables better cross-lingual transfer.

- Their optimized model outperforms Aya 23 8B and other widely used models like Gemma, Llama, Mistral.

- They expand alignment techniques to 23 languages covering half the world's population.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 8, 2024

1/2
[LG] On the Anatomy of Attention
N Khatri, T Laakkonen, J Liu, V Wang-Maścianica [Quantinuum] (2024)
[2407.02423] On the Anatomy of Attention

- The paper introduces a category-theoretic diagrammatic formalism to systematically relate and reason about machine learning models.

- The diagrams present architectures intuitively but without loss of essential detail. Natural relationships between models are captured by graphical transformations.

- Important differences and similarities between models can be identified at a glance.

- In this paper, the authors focus on attention mechanisms - translating folklore into mathematical derivations and constructing a taxonomy of attention variants.

- As an example, they identify recurring anatomical components of attention and exhaustively recombine them to explore a space of variations on the attention mechanism.

2/2
Dark mode for this paper

On the Anatomy of Attention

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 8, 2024

1/1

CVPR 2024 Paper Alert

Paper Title: LASA Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset

Few pointers from the paper

Instance shape reconstruction from a 3D scene involves recovering the full geometries of multiple objects at the semantic instance level. Many methods leverage data-driven learning due to the intricacies of scene complexity and significant indoor occlusions.

Training these methods often requires a large-scale, high-quality dataset with aligned and paired shape annotations with real-world scans. Existing datasets are either synthetic or misaligned, restricting the performance of data-driven methods on real data.

To this end, authors have introduced “LASA”, a Large-scale Aligned Shape Annotation Dataset comprising 10,412 high-quality CAD annotations aligned with 920 real-world scene scans from ArkitScenes, created manually by professional artists.

On this top, Authors have also proposed a novel Diffusion-based Cross-Modal Shape Reconstruction (DisCo) method. It is empowered by a hybrid feature aggregation design to fuse multi-modal inputs and recover high-fidelity object geometries.

Besides, authors have presented an Occupancy-Guided 3D Object Detection (OccGOD) method and demonstrate that their shape annotations provide scene occupancy clues that can further improve 3D object detection.

Supported by LASA, extensive experiments show that their methods achieve state-of-the-art performance in both instance-level scene reconstruction and 3D object detection tasks.

Organization: FNii, @cuhksz , SSE, @cuhksz , @TU_Muenchen

Paper Authors: Haolin Liu, @ychngji6 , @yinyu_nie , Yingfan He, @XiaogHan

Read the Full Paper here: [2312.12418] LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset

Project Page: LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset

Code: GitHub - GAP-LAB-CUHK-SZ/LASA: CVPR2024 | LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset

Be sure to watch the attached Demo Video-Sound on

Music by Heptatonic Tamilzha from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

PoorAndDangerous · Jul 8, 2024

Been using Claude pretty exclusively. GPT has gone to shyt for some reason.

bnew · Jul 9, 2024

Open Sora can create AI-generated videos on an RTX 3090 GPU, but memory capacity limits it to 4-second 240p clips

It takes about a minute for the 24fps output.

www.tomshardware.com

Open Sora can create AI-generated videos on an RTX 3090 GPU, but memory capacity limits it to 4-second 240p clips

News

By Jowi Morales

published June 27, 2024

It takes about a minute for the 24fps output.

Comments (3)

(Image credit: Shutterstock)

Backprop, a GPU cloud provider for AI processes, recently showcased AI-generated video using an environment based on Open-Sora V1.2. The company showed four examples using different prompts, and the results were generally of average quality. But the hardware requirements for even these relatively tame samples are quite high.

According to the company blog post, “On a 3090, you can generate videos up to 240p and 4s. Higher values than that use more memory than the card has. Generation takes around 30s for a 2s video and around 60s for a 4s video.” That's a decent amount of computational power for only 424x240 output — a four-second video has just under ten million pixels in total.

(Image credit: Backprop)

The Nvidia RTX 3090 was once the most powerful GPU available, and it's still pretty potent today. It comes with 24GB of GDDR6X memory, a critical element for many complex AI workloads. While that's plenty of VRAM for any modern games, matching the newer RTX 4090 on capacity, it's still a limiting factor for Open Sora video generation. The only way to get more VRAM in a graphics card at present would be to move to professional or data center hardware.

A single Nvidia H100 PCIe GPU can have up to 94GB of HBM2e memory — and up to 141GB for the newer SXM-only Nvidia H200 with HBM3e. Aside from its massive capacity, which is more than triple than that of top consumer GPUs, these data center GPUs also have much higher memory bandwidths, with next-gen HBM3s from Micron planned to achieve 2 TB/s. The H100 PCIe adapters currently cost in the ballpark of $30,000 retail, though licensed distributors might have them for slightly less money.

You could pick up an Nvidia RTX 6000 Ada Generation with 48GB of memory for a far more reasonable $6,800 direct from Nvidia. That's double the VRAM of any consumer GPU and would likely suffice for up to 512x512 video generation, though still in relatively short clips.

(Image credit: Nvidia)

Alternatively, there's the H100 NVL, but that's a bit hard to find all on its own for the dual-GPU variant. Newegg has a Supermicro dual Grace Hopper server for $75,000, though, which would give you 186GB of shared VRAM. Then you could perhaps start making 720p video content.

Obviously, the biggest downside to acquiring any of the above GPUs is price. A single RTX 4090 starts at $1,599, which is a lot of money for most consumers. Professional GPUs cost four times as much, and data center AI GPUs are potentially 20 times as expensive. The H100 does have competition from Intel and AMD, but Intel’s Gaudi is still expected to cost over $15,000, while AMD’s MI300X is priced between $10,000 and $15,000. And then there's the Sohu AI chip that's supposed to be up to 20X faster than an H100, but which isn't actually available yet.

Even if you have the kind of cash needed, you can’t just walk in your nearest PC shop to grab most of these GPUs. Larger orders of the H100 have a lead time of two to three months between paying for your order and it arriving on your doorstep. Don't forget about power requirements either. The PCIe variant of the H100 can still draw 350W of power, so if it's generating video 24/7 that adds up to about 3 MWh per year — roughly $300 per year, which isn't much considering the price of the hardware.

Getting Open Sora up and running may not be a trivial endeavor either, particularly if you want to run it on non-Nvidia solutions. And, like so many other AI generators, there are loads of copyright and fair use questions left unanswered. But even with the best hardware available, we suspect it will take a fair bit more than AI generators to create the any epic movies.

bnew · Jul 9, 2024

AI models that cost $1 billion to train are underway, $100 billion models coming — largest current models take 'only' $100 million to train: Anthropic CEO

AI training costs are growing exponentially year after year.

www.tomshardware.com

AI researchers run AI chatbots at a lightbulb-esque 13 watts with no performance loss — stripping matrix multiplication from LLMs yields massive gains

News

By Christopher Harper

published June 26, 2024

Data centers rejoice as Nvidia feels a strange chill in the air.

Comments (14)

LED lightbulbs, which usually consume about 10 Watts of power a piece. (Image credit: Getty Images)

A research paper from UC Santa Cruz and accompanying writeup discussing how AI researchers found a way to run modern, billion-parameter-scale LLMs on just 13 watts of power. That's about the same as a 100W-equivalent LED bulb, but more importantly, its about 50 times more efficient than the 700W of power that's needed by data center GPUs like the Nvidia H100 and H200, never mind the upcoming Blackwell B200 that can use up to 1200W per GPU.

The work was done using custom FGPA hardware, but the researchers clarify that (most) of their efficiency gains can be applied through open-source software and tweaking of existing setups. Most of the gains come from the removal of matrix multiplication (MatMul) from the LLM training and inference processes.

How was MatMul removed from a neural network while maintaining the same performance and accuracy? The researchers combined two methods. First, they converted the numeric system to a "ternary" system using -1, 0, and 1. This makes computation possible with summing rather than multiplying numbers. They then introduced time-based computation to the equation, giving the network an effective "memory" to allow it to perform even faster with fewer operations being run.

The mainstream model that the researchers used as a reference point is Meta's LLaMa LLM. The endeavor was inspired by a Microsoft paper on using ternary numbers in neural networks, though Microsoft did not go as far as removing matrix multiplication or open-sourcing their model like the UC Santa Cruz researchers did.

It boils down to an optimization problem. Rui-Jie Zhu, one of the graduate students working on the paper, says, "We replaced the expensive operation with cheaper operations." Whether the approach can be universally applied to AI and LLM solutions remains to be seen, but if viable it has the potential to radically alter the AI landscape.

We've witnessed a seemingly insatiable desire for power from leading AI companies over the past year. This research suggests that much of this has been a race to be first while using inefficient processing methods. We've heard comments from reputable figures like Arm's CEO warning that AI power demands continuing to increase at current rates would consume one fourth of the United States' power by 2030. Cutting power use down to 1/50 of the current amount would represent a massive improvement.

Here's hoping Meta, OpenAI, Google, Nvidia, and all the other major players will find ways to leverage this open-source breakthrough. Faster and far more efficient processing of AI workloads would bring us closer to human brain levels of functionality — a brain gets by with approximately 0.3 kWh of power per day by some estimates, or 1/56 of what an Nvidia H100 requires. Of course, many LLMs require tens of thousands of such GPUs and months of training, so our gray matter isn't quite outdated just yet.

bnew · Jul 9, 2024

1/1

CVPR 2024 Paper Alert

Paper Title: Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera

Few pointers from the paper

In this paper authors have presented a lightweight and affordable motion capture method based on two smartwatches and a head-mounted camera. In contrast to the existing approaches that use six or more expert-level IMU devices, their approach is much more cost-effective and convenient.

Their method can make wearable motion capture accessible to everyone everywhere, enabling 3D full-body motion capture in diverse environments. As a key idea to overcome the extreme sparsity and ambiguities of sensor inputs with different modalities, they integrated 6D head poses obtained from the head-mounted cameras for motion estimation.

To enable capture in expansive indoor and outdoor scenes, they proposed an algorithm to track and update floor level changes to define head poses, coupled with a multi-stage Transformer-based regression module.

They also introduce novel strategies leveraging visual cues of egocentric images to further enhance the motion capture quality while reducing ambiguities.

They have demonstrated the performance of their method on various challenging scenarios, including complex outdoor environments and everyday motions including object interactions and social interactions among multiple individuals.

Organization: @SeoulNatlUni

Paper Authors: @jiyewise , @jhugestar

Read the Full Paper here: [2401.00847] Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera

Project Page: MocapEvery

Code: GitHub - jiyewise/MocapEvery: Author's implementation of the paper Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera (CVPR 2024)

Be sure to watch the attached Demo Video-Sound on

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 9, 2024

1/1

@CVPR 2024 Paper Alert

Paper Title: ReconFusion: 3D Reconstruction with Diffusion Priors

Few pointers from the paper

3D reconstruction methods such as Neural Radiance Fields (NeRFs) excel at rendering photorealistic novel views of complex scenes. However, recovering a high-quality NeRF typically requires tens to hundreds of input images, resulting in a time-consuming capture process.

In this paper authors have presented “ReconFusion” to reconstruct real-world scenes using only a few photos. Their approach leverages a diffusion prior for novel view synthesis, trained on synthetic and multiview datasets, which regularizes a NeRF-based 3D reconstruction pipeline at novel camera poses beyond those captured by the set of input images.

Their method synthesizes realistic geometry and texture in under constrained regions
while preserving the appearance of observed regions.

They performed an extensive evaluation across various real-world datasets, including forward-facing and 360-degree scenes, demonstrating significant performance improvements over previous few-view NeRF reconstruction approaches.

Organization: @Columbia , @Google Research, @GoogleDeepMind

Paper Authors: @ChrisWu6080 , @BenMildenhall , @philipphenzler , @KeunhongP , @RuiqiGao , Daniel Watson, @_pratul_ , @dorverbin , @jon_barron , @poolio , Aleksander Holynski

Read the Full Paper here: [2312.00298] An equivalent criteria for irrationality of $ζ(5)$

Project Page: ReconFusion: 3D Reconstruction with Diffusion Priors

Data: ReconFusion_data – Google Drive

Be sure to watch the attached Demo Video-Sound on

Music by Oleg Fedak from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024 /search?q=#nerf

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 9, 2024

1/1

Open Source Alert

@Meta Fundamental AI Research (FAIR) team has just unveiled a treasure trove of cutting-edge AI models and research artifacts.

These resources are now publicly available, aiming to ignite innovation across the global community and propel responsible AI advancements.

Let’s tune in to @jpineau1 insights and explore the exciting possibilities this release brings!

For more details, you can check out Meta’s official announcement here: Sharing new research, models, and datasets from Meta FAIR

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#opensource /search?q=#AIatmeta

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Large Language Models News & Discussions

More options

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

PoorAndDangerous

Superstar

bnew

Veteran

Open Sora can create AI-generated videos on an RTX 3090 GPU, but memory capacity limits it to 4-second 240p clips

Open Sora can create AI-generated videos on an RTX 3090 GPU, but memory capacity limits it to 4-second 240p clips

bnew

Veteran

AI models that cost $1 billion to train are underway, $100 billion models coming — largest current models take 'only' $100 million to train: Anthropic CEO

AI researchers run AI chatbots at a lightbulb-esque 13 watts with no performance loss — stripping matrix multiplication from LLMs yields massive gains

bnew

Veteran

bnew

Veteran

bnew

Veteran

Large Language Models News & Discussions

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Superstar

Veteran

Open Sora can create AI-generated videos on an RTX 3090 GPU, but memory capacity limits it to 4-second 240p clips​

Veteran

AI researchers run AI chatbots at a lightbulb-esque 13 watts with no performance loss — stripping matrix multiplication from LLMs yields massive gains​

Veteran

Veteran

Veteran

Open Sora can create AI-generated videos on an RTX 3090 GPU, but memory capacity limits it to 4-second 240p clips

AI researchers run AI chatbots at a lightbulb-esque 13 watts with no performance loss — stripping matrix multiplication from LLMs yields massive gains