Large Language Models News & Discussions

bnew · Jul 9, 2024

1/1

CVPR 2024 (Highlight) Paper Alert

Paper Title: Relightable and Animatable Neural Avatar from Sparse-View Video

Few pointers from the paper

This paper tackles the challenge of creating relightable and animatable neural avatars from sparse-view (or even monocular) videos of dynamic humans under unknown illumination.

Compared to studio environments, this setting is more practical and accessible but poses an extremely challenging ill-posed problem.

Previous neural human reconstruction methods are able to reconstruct animatable avatars from sparse views using deformed Signed Distance Fields (SDF) but cannot recover material parameters for relighting.

While differentiable inverse rendering-based methods have succeeded in material recovery of static objects, it is not straightforward to extend them to dynamic humans as it is computationally intensive to compute pixel-surface intersection and light visibility on deformed SDFs for inverse rendering.

To solve this challenge, authors of this paper have proposed a Hierarchical Distance Query (HDQ) algorithm to approximate the world space distances under arbitrary human poses.

Specifically, they estimate coarse distances based on a parametric human model and compute fine distances by exploiting the local deformation invariance of SDF.

Based on the HDQ algorithm, they leveraged sphere tracing to efficiently estimate the surface intersection and light visibility. This allows them to develop the first system to recover animatable and relightable neural avatars from sparse view (or monocular) inputs.

Organization: @ZJU_China , @Stanford , @UofIllinois

Paper Authors: @realzhenxu , @pengsida , @gengchen01 , @LinzhanMou , @yzihan_hci ,@JiamingSuen , Hujun Bao, @XiaoweiZhou5

Read the Full Paper here: [2308.07903] Relightable and Animatable Neural Avatar from Sparse-View Video

Project Page: Relightable and Animatable Neural Avatar from Sparse-View Video

Code: GitHub - zju3dv/RelightableAvatar: [CVPR 2024 (Highlight)] Relightable and Animatable Neural Avatar from Sparse-View Video

Be sure to watch the attached Demo Video-Sound on

Music by Yevgeniy Sorokin from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024highlight

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

CVPR 2024 (Highlight) Paper Alert

Paper Title: Creating a Realistic Digital Human from a Few Videos

Few pointers from the paper

* The Challenge: This paper tries to solve a difficult problem: creating a realistic digital human that can be relit (changed lighting) and animated (moved) from just a few videos of a person.
* The Problem: This is hard because the videos are taken from different angles and with different lighting, making it difficult to create a consistent digital human.
* Previous Methods: Other methods can create digital humans from multiple videos, but they can't change the lighting or make the human move in a realistic way.
* The Solution: The authors of this paper came up with a new algorithm called Hierarchical Distance Query (HDQ) that can estimate the distance between the camera and the person in the video, even when the person is moving.
* How it Works: The HDQ algorithm uses a combination of a simple human model and a more detailed model to estimate the distance and lighting of the person in the video. This allows them to create a realistic digital human that can be relit and animated.
* The Result: The authors were able to create a system that can take a few videos of a person and create a realistic digital human that can be relit and animated in a realistic way.

Organization: The research was done by a team from Zhejiang University, Stanford University, and the University of Illinois.

Paper Authors: The authors of the paper are a team of researchers from these universities.

Read More:

Full Paper: You can read the full paper here: [2308.07903] Relightable and Animatable Neural Avatar from Sparse-View Video

Project Page: You can learn more about the project here: Relightable and Animatable Neural Avatar from Sparse-View Video

Code: You can access the code used in the project here: GitHub - zju3dv/RelightableAvatar: [CVPR 2024 (Highlight)] Relightable and Animatable Neural Avatar from Sparse-View Video

bnew · Jul 9, 2024

1/1

CVPR 2024 Paper Alert

Paper Title:

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

Few pointers from the paper

Diffusion models have recently revolutionized the field of image synthesis due to their ability to generate photorealistic images. However, one of the major drawbacks of diffusion models is that the image generation process is costly.

A large image-to-image network has to be applied many times to iteratively refine an image from random noise. While many recent works propose techniques to reduce the number of required steps, they generally treat the underlying denoising network as a black box.

In this work, authors investigate the behavior of the layers within the network and find that

The layers' output changes smoothly over time.

The layers show distinct patterns of change.

The change from step to step is often very small.

The authors hypothesize that many layer computations in the denoising network are redundant. Leveraging this, they introduced block caching, in which they reuse outputs from layer blocks of previous steps to speed up inference.

Furthermore, they proposed a technique to automatically determine caching schedules based on each block's changes over timesteps.

In their experiments, they showed through FID, human evaluation and qualitative analysis that Block Caching allows to generate images with higher visual quality at the same computational cost.

Organization: @Meta GenAI, @TU_Muenchen , MCML, @UniofOxford

Paper Authors: @felixwimbauer , @wu_bichen , @schoenfeldedgar , Xiaoliang Dai, @j1h0u , Zijian He, @artsiom_s , Peizhao Zhang, Sam Tsai, Jonas Kohler, @chrirupp , Daniel Cremers, Peter Vajda, Jialiang Wang

Read the Full Paper here: [2312.03209] Cache Me if You Can: Accelerating Diffusion Models through Block Caching

Project Page: Cache Me if You Can: Accelerating Diffusion Models through Block Caching

Be sure to watch the attached Demo Video-Sound on

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

**Title:** Cache Me if You Can: Accelerating Diffusion Models through Block Caching

**Summary:** This paper is about making a type of AI model called diffusion models faster and more efficient. Diffusion models are really good at generating realistic images, but they can be slow and use a lot of computer power.

**The Problem:** To generate an image, diffusion models have to apply a large network many times to refine the image from random noise. This process can be slow and costly.

**The Solution:** The authors of the paper found that the layers within the network change smoothly over time and show distinct patterns of change. They also found that the changes from one step to the next are often very small. This means that many of the calculations done by the network are redundant and can be reused.

**Block Caching:** The authors introduced a technique called block caching, which reuses the outputs from previous steps to speed up the image generation process. They also developed a way to automatically determine when to reuse these outputs.

**Results:** The authors showed that block caching allows for generating higher-quality images at the same computational cost. This means that the model can produce better images without using more computer power.

**Authors and Organizations:** The paper was written by a team of researchers from Meta GenAI, Technical University of Munich, MCML, and University of Oxford.

**Read More:** You can read the full paper here: [2312.03209] Cache Me if You Can: Accelerating Diffusion Models through Block Caching or visit the project page here: Cache Me if You Can: Accelerating Diffusion Models through Block Caching

bnew · Jul 9, 2024

1/11
I created this video effect in approx. 1-hour

It is much easier than it looks

Here's how:

2/11
1) Retouch Tool

The core tool we will use to create this video effect will be @freepik new "Retouch" feature that performs real-time image inpainting

Retouch can be accessed here:
Freepik AI image generator - Free text-to-image generator

/search?q=#AIPartner

3/11
2) Choose your video

I used a stock video for my example, but an AI generated video with the subject in the center of the frame will also work well

Tip: Try to avoid using videos with a lot of arm/leg/hair movement

4/11
3) Export your video clip into single frames

Using any video editor, you will need to export your video clip into individual frames

I will use Adobe Premiere Pro for this example:

A) Drag your video into the timeline of Premiere Pro
B) Go to File -> Export -> Media
C) Provide a file name, select a file save location, and change the format to PNG.
D) Click export

5/11
4) Begin inpainting with Freepik

Head back over to the @freepik Retouch feature & begin inpainting your exported frames

This video demonstrates my process for inpainting with Retouch.

You don't need to inpaint every frame. Only the section of the video that you want the effect to occur. Take another look at my demo video in the first post. I only inpaint the first 60 frames

It took me approx. 30 minutes to inpaint and export the 60 frames

6/11
5) Export Inpainted images and recompile

Okay, now that you have inpainted your frames, you need to export them and rename the exports in the correct order

**This is the most complicated part. It is important that you name your inpainted frames in the correct numerical order. If the frames are out of order, your video will play incorrectly**

7/11
6) Compile in video editor

Drag your renamed inpainted single frames into any video editor. If your frames are labeled numerically, they will be dragged into the video timeline in the correct order.

Since you are bringing image files into a video editor, you will need to shorten each image down to 1-frame in duration

Most video editors allow you to trim all of the images to 1-frame at the same time

8/11
7) Export Video

Once you resize the individual images, you are ready to export your video with the effect!

Great job following along & drop any questions in the comments below:

9/11
Access @freepik Retouch feature directly here:
Freepik Retouch - Free AI image editor

10/11
Faster than a speeding bullet... Is a bird, is it plane, no, it's Changeroom Guy

11/11
Yes!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 9, 2024

1/1

Product Update

Hold onto your hats!

@pika_labs has just unleashed their new mind-bending Image-to-Video Model!

Watch the attached demo video and prepare to have your mind blown!

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 9, 2024

1/12
360° ULTRA REALISM IS HERE with Skybox AI Model 3.1!

Model 3.1: 8K details that shame even Model 3

Remix Mode: Switch up your style, it’s back!

Remix Your 360°: Unleash Model 3.1 on your own panos

3D Mesh & Depth Maps: Available for subscribers

Skybox AI

2/12
We’ve also added a few new realistic styles!

Cinematic Realism: Dramatic, filmic mood

Magical Realism: Misty, glowy atmosphere

Drone Shot: Realistic aerial views

Read the full release notes on our blog

360° Ultra Realism is Here with Model 3.1 for Skybox AI Generator

3/12
looks amazing

4/12

yes.

5/12
does the Technowizard subscription allow us to create a 3D mesh and export it to Unity ?

6/12
Any of our subscriptions will allow you to generate mesh and export

7/12
so excited to try this.

have you done anything w education?

8/12
Yeah we see a lot being done in the edu space! Check out @ThingLink_EDU which has Skybox AI bundled in it

9/12
amazing, super realistic and magic too

10/12
Real genuine magic!

11/12
Can I curse here!?! OMG, what did I wake up to?

12/12
Every curse word goes into the generation jar. Save up enough and you owe us a Technowizard subscription!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 9, 2024

1/1

CHI 2024 Paper Alert

Paper Title: InflatableBots: Inflatable Shape-Changing Mobile Robots for Large-Scale Encountered-Type Haptics in VR

Few pointers from the paper

In this paper authors have introduced “InflatableBots”, shape-changing inflatable robots for
large-scale encountered-type haptics in VR.

Unlike traditional inflatable shape displays, which are immobile and limited in interaction
areas, their approach combines mobile robots with fan-based inflatable structures.

This enables safe, scalable, and deployable haptic interactions on a large scale. They developed three coordinated inflatable mobile robots, each of which consists of an omni-directional mobile base and a reel-based inflatable structure.

The robot can simultaneously change its height and position rapidly (horizontal:58.5 cm/sec, vertical: 10.4 cm/sec, from 40 cm to 200 cm), which allows for quick and dynamic haptic rendering of multiple touch points to simulate various body-scale objects and surfaces in real-time across large spaces (3.5 m x 2.5 m).

They evaluated their system with a user study (N = 12), which confirms the unique advantages in safety, deployability, and large-scale interactability to significantly improve realism in VR experiences.

Organization: @TohokuUniPR , @UCalgary

Paper Authors: Ryota Gomi, @ryosuzk, Kazuki Takashima, Kazuyuki Fujita, Yoshifumi Kitamura

Read the Full Paper here: https://dl.acm.org/doi/pdf/10.1145/3613904.3642069

Be sure to watch the attached Demo Video-Sound on

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#VR /search?q=#AR /search?q=#robots /search?q=#CHI2024

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

CHI 2024 Paper Alert

This is an announcement about a new research paper that's been published.

Paper Title: InflatableBots: Inflatable Shape-Changing Mobile Robots for Large-Scale Encountered-Type Haptics in VR

Here are some key points from the paper:

What's it about?

The authors have created something called "InflatableBots", which are robots that can change shape and move around. They're designed to be used in Virtual Reality (VR) to create a more realistic experience.

How is it different?

Unlike other inflatable displays that can't move and are limited in what they can do, these robots can move around and change shape in real-time. This makes it possible to create a more immersive and interactive experience in VR.

How does it work?

The robots have a special base that can move in any direction, and a part that can inflate and deflate to change shape. They can move quickly and change shape rapidly, which allows them to simulate different objects and surfaces in VR.

What did they test?

The researchers tested their system with 12 people and found that it was safe, easy to set up, and allowed for a more realistic experience in VR.

Who did the research?

The research was done by a team from Tohoku University and the University of Calgary. The authors of the paper are Ryota Gomi, Ryosuke Suzuki, Kazuki Takashima, Kazuyuki Fujita, and Yoshifumi Kitamura.

Want to read more?

You can read the full paper here: https://dl.acm.org/doi/pdf/10.1145/3613904.3642069

bnew · Jul 9, 2024

1/1

CVPR 2024 Paper Alert

Paper Title: DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars

Few pointers from the paper

DiffusionAvatars synthesizes a high-fidelity 3D head avatar of a person, offering intuitive control over both pose and expression.

In this paper authors have proposed a diffusion-based neural renderer that leverages generic 2D priors to produce compelling images of faces.

For coarse guidance of the expression and head pose, they render a neural parametric head model (NPHM) from the target viewpoint, which acts as a proxy geometry of the person.

Additionally, to enhance the modeling of intricate facial expressions, they conditioned DiffusionAvatars directly on the expression codes obtained from NPHM via cross-attention.

Finally, to synthesize consistent surface details across different viewpoints and expressions, they rigged learnable spatial features to the head’s surface via TriPlane lookup in NPHM’s canonical space.

Then they trained DiffusionAvatars on RGB videos and corresponding fitted NPHM meshes of a person and test the obtained avatars in both self-reenactment and animation scenarios.

Their experiments demonstrate that DiffusionAvatars generates temporally consistent and visually appealing videos for novel poses and expressions of a person, outperforming existing approaches.

Organization: @TU_Muenchen

Paper Authors: @TobiasKirschst1 , @SGiebenhain , @MattNiessner

Read the Full Paper here: [2311.18635] DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars

Project Page: DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars

Code: GitHub - tobias-kirschstein/diffusion-avatars

Be sure to watch the attached Demo Video-Sound on

Music by Alex_Kizenkov from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

Title: DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars

This paper is about creating highly realistic 3D avatars of people's heads, which can be controlled to change their pose and expression in a natural and intuitive way.

What does it do?

* It creates a 3D avatar of a person's head that looks very realistic.
* It allows you to control the avatar's pose (e.g. turning their head) and expression (e.g. smiling or frowning) in a natural way.
* It uses a special technique called "diffusion" to generate high-quality images of faces.

How does it work?

* The system uses a 2D image of a person's face as a starting point.
* It then uses a "neural parametric head model" (NPHM) to create a 3D model of the person's head, which is used as a guide for the avatar.
* The system uses "cross-attention" to focus on specific parts of the face, such as the eyes or mouth, to create more realistic expressions.
* It also uses a technique called "TriPlane lookup" to add detailed surface features to the avatar's head.
* The system is trained on videos of people's faces and corresponding 3D models, and can generate new videos of the avatar in different poses and expressions.

Results:

* The system can generate highly realistic and consistent videos of people's heads in different poses and expressions.
* It outperforms existing approaches in terms of quality and realism.

Who did it?

* The paper was written by researchers from the Technical University of Munich (TUM).
* The authors are Tobias Kirschstein, Stefan Giebenhain, and Matthias Niessner.

Want to learn more?

* You can read the full paper here: [2311.18635] DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars
* You can visit the project page here: DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars
* You can access the code here: GitHub - tobias-kirschstein/diffusion-avatars

bnew · Jul 9, 2024

1/1

SIGGRAPH 2024 Paper Alert

Paper Title: Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis

Few pointers from the paper

In this paper authors have presented Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence.

Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion

The sparsity of these movements makes it challenging for deep learning-based systems, trained on moderately sized datasets, to capture the relationship between the movements and the corresponding speech semantics.

To address this challenge, authors developed a generative retrieval framework based on a large language model. This framework efficiently retrieves suitable semantic gesture candidates from a motion library in response to the input speech.

To construct this motion library, they summarized a comprehensive list of commonly used semantic gestures based on findings in linguistics, and they also collected a high-quality motion dataset encompassing both body and hand movements.

They also designed a novel GPT-based model with strong generalization capabilities to audio, capable of generating high-quality gestures that match the rhythm of speech.

Furthermore, authors have also proposed a semantic alignment mechanism to efficiently align the retrieved semantic gestures with the GPT's output, ensuring the naturalness of the final animation.

Their system demonstrates robustness in generating gestures that are rhythmically coherent and semantically explicit, as evidenced by a comprehensive collection of examples.

Organization: @PKU1898 , @RenminUniv , @ShandongU , State Key Lab of General AI, China

Paper Authors: Zeyi Zhang, Tenglong Ao, Yuyao Zhang, Qingzhe Gao, Chuan Lin, Baoquan Chen, Libin Liu

Read the Full Paper here: [2405.09814] Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis

Project Page: Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis

Code: Coming

Be sure to watch the attached Demo Video-Sound on

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#SIGGRAPH2024 /search?q=#gpt /search?q=#realisticgestures

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

**Title:** Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis

**What's it about?** This paper is about creating a system that can generate realistic gestures that match what someone is saying. Think of it like a robot or avatar that can move its hands and body in a way that looks natural and matches the words it's speaking.

**The problem:** The problem is that it's hard to teach a computer to do this because there are so many different ways that people move their bodies when they talk. It's like trying to teach a robot to dance - it's hard to program all the different moves.

**The solution:** The authors of the paper came up with a new way to solve this problem. They created a system that can look at a big library of movements and pick the ones that match what someone is saying. It's like having a big book of dance moves that the robot can look at and say "oh, I need to do this move to match what I'm saying".

**How it works:** The system uses a special kind of computer model called a "large language model" to understand what someone is saying. It then looks at the library of movements and picks the ones that match the words. The system can also generate new movements that are similar to the ones in the library, so it can create new gestures that look natural.

**The result:** The system is really good at generating gestures that look natural and match what someone is saying. It's like having a robot that can talk and move its body in a way that looks like a real person.

**Who did it?** The paper was written by a team of researchers from several universities in China.

**Want to learn more?** You can read the full paper here: [2405.09814] Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis or check out the project page here: Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis

bnew · Jul 9, 2024

1/1

Paper Alert

Paper Title: FacET : How Video Meetings Change Your Expression

Few pointers from the paper

Do our facial expressions change when we speak over video calls? Given two unpaired sets of videos of people, we seek to automatically find spatio-temporal patterns that are distinctive of each set.

Existing methods use discriminative approaches and perform post-hoc explainability analysis. Such methods are insufficient as they are unable to provide insights beyond obvious dataset biases, and the explanations are useful only if humans themselves are good at the task.

Instead, authors of this paper tackle the problem through the lens of generative domain translation: their method generates a detailed report of learned, input-dependent spatio-temporal features and the extent to which they vary between the domains.

They demonstrated that their method can discover behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs). They also showed the applicability of their method on discovering differences in presidential communication styles.

Additionally, they were able to predict temporal change-points in videos that decouple expressions in an unsupervised way, and increase the interpretability and usefulness of their model.

Finally, their method, being generative, can be used to transform a video call to appear as if it were recorded in a F2F setting. Experiments and visualizations show their approach is able to discover a range of behaviors, taking a step towards deeper understanding of human behaviors.

Organization: @Columbia

Paper Authors: Sumit Sarin, @utkarshmall13 , Purva Tendulkar, @cvondrick

Read the Full Paper here: [2406.00955] How Video Meetings Change Your Expression

Project Page: How Video Meetings Change Your Expression

Code: Coming

Be sure to watch the attached Demo Video-Sound on

Music by John Rush from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

**Title:** FacET: How Video Meetings Change Your Expression

**Summary:** This paper is about how video calls can change the way we express ourselves through facial expressions. The researchers wanted to find out if there are any differences in how we look when we're talking to someone in person versus when we're on a video call.

**The Problem:** Existing methods for analyzing facial expressions on video calls are limited because they can only find obvious differences and don't provide much insight. The researchers wanted to find a better way to understand how video calls affect our facial expressions.

**The Solution:** The researchers developed a new method that uses a technique called "generative domain translation". This method can generate a detailed report of the facial expressions and how they change between different settings (like in-person vs. video call). They tested their method and found that it can discover differences in how people express themselves on video calls versus in-person conversations.

**What They Found:** The researchers found that their method can:

* Discover differences in how people express themselves on video calls versus in-person conversations
* Identify changes in facial expressions over time
* Even transform a video call to make it look like it was recorded in-person
* Be used to analyze differences in communication styles, like between politicians

**The Takeaway:** This research takes a step towards better understanding how video calls affect our behavior and can lead to more effective communication in online meetings.

bnew · Jul 9, 2024

1/1

CVPR 2024 Highlight Paper Alert

Paper Title: 4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations

Few pointers from the paper

The studies of human clothing for digital avatars have predominantly relied on synthetic datasets. While easy to collect, synthetic data often fall short in realism and fail to capture authentic clothing dynamics.

Addressing this gap, authors have introduced “4D-DRESS”, the first real-world 4D dataset advancing human clothing research with its high-quality 4D textured scans and garment meshes.

4D-DRESS captures 64 outfits in 520 human motion sequences, amounting to 78k textured scans. Creating a real-world clothing dataset is challenging, particularly in annotating and segmenting the extensive and complex 4D human scans.

To address this, authors have developed a semi-automatic 4D human parsing pipeline. They efficiently combine a human-in-the-loop process with automation to accurately label 4D scans in diverse garments and body movements.

Leveraging precise annotations and high-quality garment meshes, authors established several benchmarks for clothing simulation and reconstruction. 4D-DRESS offers realistic and challenging data that complements synthetic sources, paving the way for advancements in research of lifelike human clothing.

Organization: Department of Computer Science, @ETH_en , @MPI_IS

Paper Authors: Wenbo Wang, @hohs_ETH , @ChenGuo96 , Boxiang Rong, @ArturGrigorev57 , Jie Song, Juan Jose Zarate, @OHilliges

Read the Full Paper here: [2404.18630] 4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations

Project Page: 4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations.

Code: GitHub - eth-ait/4d-dress: Official repository for CVPR 2024 highlight paper 4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations.

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#cvpr2024highlight /search?q=#digitalavatar /search?q=#humanclothing

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

CVPR 2024 Highlight Paper Alert

This is a big deal in the world of computer science and artificial intelligence!

Paper Title: 4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations

Here are some key points from the paper:

* The Problem: When creating digital avatars, researchers have been using fake data (synthetic datasets) to study human clothing. But this data isn't very realistic and can't capture how clothes move and behave in real life.
* The Solution: The authors of this paper have created a new dataset called "4D-DRESS" that uses real-world data to study human clothing. This dataset includes high-quality 4D scans of people wearing different outfits and moving around.
* What's in the Dataset: 4D-DRESS has 64 different outfits and 520 motion sequences, which adds up to 78,000 individual scans. This is a huge amount of data!
* The Challenge: Creating a dataset like this is hard because it's difficult to label and segment the complex 4D scans of humans. To solve this, the authors developed a special pipeline that combines human input with automation to accurately label the scans.
* What's Next: With this new dataset, researchers can now create more realistic digital avatars and study how clothes move and behave in real life. This can lead to advancements in fields like fashion, gaming, and more.

The Team Behind the Paper:

* The paper was written by researchers from the Department of Computer Science at ETH (a university in Switzerland) and MPI (a research institute in Germany).
* The authors include Wenbo Wang, Hao Wang, Chen Guo, Boxiang Rong, Jie Song, Juan Jose Zarate, and Artur Grigorev.

Want to Learn More?

* You can read the full paper here: [2404.18630] 4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations
* Check out the project page here: 4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations.
* Get the code here: GitHub - eth-ait/4d-dress: Official repository for CVPR 2024 highlight paper 4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations.

bnew · Jul 9, 2024

1/11
Actually, really liked the Apple Intelligence announcement. It must be a very exciting time at Apple as they layer AI on top of the entire OS. A few of the major themes.

Step 1 Multimodal I/O. Enable text/audio/image/video capability, both read and write. These are the native human APIs, so to speak.
Step 2 Agentic. Allow all parts of the OS and apps to inter-operate via "function calling"; kernel process LLM that can schedule and coordinate work across them given user queries.
Step 3 Frictionless. Fully integrate these features in a highly frictionless, fast, "always on", and contextual way. No going around copy pasting information, prompt engineering, or etc. Adapt the UI accordingly.
Step 4 Initiative. Don't perform a task given a prompt, anticipate the prompt, suggest, initiate.
Step 5 Delegation hierarchy. Move as much intelligence as you can on device (Apple Silicon very helpful and well-suited), but allow optional dispatch of work to cloud.
Step 6 Modularity. Allow the OS to access and support an entire and growing ecosystem of LLMs (e.g. ChatGPT announcement).
Step 7 Privacy. <3

We're quickly heading into a world where you can open up your phone and just say stuff. It talks back and it knows you. And it just works. Super exciting and as a user, quite looking forward to it.

2/11
Never believe product demos. Google Assistant was supposed to be managing all my appointments 6 years ago.

3/11
100% agree, "the proof is in the pudding". It has to actually work. I will say that I think the technology exists today to actually make it work at the needed threshold. Actually making it work is still difficult. But 6 years ago I would have said the technology does not exist.

4/11
So can we never turn it off?

5/11
Any thoughts on the new Siri summoning ChatGPT? I understand the importance and opportunity for modularity but I feel that this particular AI experience is unnecessary. Why do I need one assistance to access another assistance and add friction? I really don't get it. Maybe it's just me but curious what others think.

6/11
i’m very very curious on how Apple did context management and tool use over ~unbounded set of app intents!

if you come across anyone willing to share please send them our way, i’d love to get the engineering stories out of them

7/11
AirPods are a key conduit of LLMs & massively underestimated

Great products.
Great user- experience (physical/ digital/ handoff between devices)

Lots of companies talking about ‘owning the ears

’, but few do it as well as Apple

Her.

If AirPods were separated out as a business unit, revenue would be greater than Airbnb (2022)

Margins are likely more extreme

8/11
@readwise save thread

9/11
There is a race going on at Apple between how fast they can improve the hardware for local inference, and how secure they can make the idea of private cloud inference. The latter seems really hard to implement.

10/11
@daddyyjuul @iqanazmah Apple finally reigniting some interest after years of silence. The next iPhone OS is going to be

11/11
Calculator coming to iPad

but with this new feature called Math notes!!!! How cool..

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 9, 2024

1/5
Today Sam Altman and Arianna Huffington announced the launch of Thrive AI Health, a new company whose mission is to use AI to democratize access to expert-level health coaching to improve health outcomes. Read more in @TimE: AI-Driven Behavior Change Could Transform Health Care

2/5
Love to see more builders in healthcare AI!

3/5
Sus tweet, Deepak.

4/5
@ariannahuff is bringing her master plan to life (again)!

5/5
We built a company to do that almost 1 year ago!

THRIVEbyAI | LinkedIn

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
Today Sam Altman and I published a piece in TIME sharing our vision for how AI-driven personalized behavior change can transform healthcare and announcing the launch of Thrive AI Health, a new company funded by the OpenAI Startup Fund and Thrive Global, which will be devoted to building an AI health coach. The company’s mission is to use AI to democratize access to expert-level health coaching to improve health outcomes and address growing health inequities.

As @sama and I write, AI could go well beyond efficiency and optimization to something much more fundamental: improving both our health spans and lifespans.

With AI-driven personalized behavior change, we have the chance to finally reverse the trend lines on chronic diseases like diabetes and cardiovascular diseases, which are directly related to daily behaviors but not distributed equally across demographics.

DeCarlos Love — a brilliant product leader passionate about improving health outcomes — has left Google to become Thrive AI Health’s CEO, and I’m very much looking forward to working with him. And The Alice L. Walton Foundation is joining us as a strategic investor to help us scale our impact to underserved communities and reduce health inequities.

AI has become central to @Thrive's mission to improve health and productivity outcomes, and I’m incredibly passionate about the opportunity to leverage AI to deliver hyper-personalized behavior change across the five key behaviors that Thrive focuses on and that govern our health: sleep, food, movement, stress management and connection. The AI health coach will be embedded in Thrive’s behavior change platform and we look forward to bringing this innovative offering to the market.

Read more in
@TimE
: AI-Driven Behavior Change Could Transform Health Care

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2
BREAKING NEWS

OpenAI CEO Sam Altman and Thrive Global’s Arianna Huffington launch a health coach company

Meet Thrive AI Health

It will use artificial intelligence and help people take charge of their well-being and make expert health advice accessible to all.

As per them, AI could help improve “both our health spans and our lifespans”.

Thrive AI Health will primarily focus on promoting healthy behaviour, like: getting enough sleep, eating well, exercising, spending time in nature and meditating. Furthermore, this AI health coach wouldn't just tell you to eat better or exercise more, but promises to take your preferences, your schedule and your health data into account.

2/2
Courtesy: BI India

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 10, 2024

AI is effectively ‘useless’—and it’s created a ‘fake it till you make it’ bubble that could end in disaster, veteran market watcher warns

"Fake it till you make it may work in Silicon Valley, but for the rest of us, I think once bitten twice shy may be more appropriate for AI," MacroStrategy Parternship's Ferguson told Bloomberg.

finance.yahoo.com

AI is effectively ‘useless’—and it’s created a ‘fake it till you make it’ bubble that could end in disaster, veteran market watcher warns

Fortune· Michael Nagle/Bloomberg via Getty Images

Will Daniel

Mon, Jul 8, 2024, 3:40 PM EDT4 min read

147

In This Article:

NVDA

There’s no avoiding the hype surrounding AI these days. Promises of new developments like personal robot assistants and miracle cancer cures are ubiquitous as executives take every opportunity to emphasize their AI chops to enthusiastic investors—and slightly less enthusiastic consumers.

Not everyone has been blown away by the AI fanfare, however. James Ferguson, founding partner of the UK-based macroeconomic research firm MacroStrategy Partnership, fears investors’ AI exuberance has created a concentrated market bubble that’s reminiscent of the dot-com era.

“These historically end badly,” Ferguson told Bloomberg's Merryn Somerset Webb in the latest episode of the Merryn Talks Money podcast. “So anyone who's sort of a bit long in the tooth and has seen this sort of thing before is tempted to believe it'll end badly.”

The veteran analyst argued that hallucinations—large language models’ (LLMs) tendency to invent facts, sources, and more—may prove a more intractable problem than initially anticipated, leading AI to have far fewer viable applications.

“AI still remains, I would argue, completely unproven. And fake it till you make it may work in Silicon Valley, but for the rest of us, I think once bitten twice shy may be more appropriate for AI,” he said. “If AI cannot be trusted…then AI is effectively, in my mind, useless.”

Ferguson also noted AI may end up being too “energy hungry” to be a cost effective tool for many businesses. To his point, a recent study from the Amsterdam School of Business and Economics found that AI applications alone could use as much power as the Netherlands by 2027.

“Forget Nvidia charging more and more and more for its chips, you also have to pay more and more and more to run those chips on your servers. And therefore you end up with something that is very expensive and has yet to prove anywhere really, outside of some narrow applications, that it’s paying for this,” he said.

For investors, particularly those leaning into the AI enthusiasm, Ferguson warned that the excessive tech hype based on questionable promises is very similar to the period before the dot-com crash. He noted that during both of these periods, market returns were concentrated in tech stocks that traded based on Wall Street’s sky-high earnings growth estimates.

But despite those lofty forecasts, the dominant hardware giants of the dot-com era, Cisco and Intel, have largely disappointed investors ever since. Ferguson argued today’s AI hardware hero, Nvidia, might experience a similar fate, particularly given its elevated valuation.

“What multiple of sales is Nvidia a good deal on if you think that it might only have—no matter how stratospheric the growth rate at the moment—if you think that it's probably not going to be a player in a decade's time?” he asked, implying Nvidia might not be worth the current price tag of nearly 40 times sales investors are paying.

Despite his argument that AI-linked tech stocks like Nvidia are highly overvalued, Ferguson admitted that no one can predict when a bubble will end. This dynamic leads many bearish investors to feel “compelled to play” in the markets even when stocks look pricey, according to the analyst—and that’s a great way to get hurt.

“I mean, it's certainly what was happening in the dotcom [bubble], for example, where almost anybody who wasn't a retail punter was looking at these things and saying, 'well, it can't last, but having said that, if it lasts one more quarter and I'm not playing, I'll lose my job,'” he explained.

The good news, according to Ferguson, is that because the current stock market bubble is so concentrated in AI-linked stocks, there is still value out there.

Of course, there will be widespread pain for investors if the AI bubble bursts. But after that, Ferguson recommended looking at the currently unloved U.S. small-cap stocks, which may benefit from interest rate cuts and aren’t highly valued.

“There's a lot of value to be found in the U.S. The trouble is that that value is to be found in good old fashioned ways, trawling through small caps and looking for businesses that are growing in a good old fashioned, steady way,” he said.

This story was originally featured on Fortune.com

bnew · Jul 10, 2024

1/11
Can LLMs invent better ways to train LLMs?

At Sakana AI, we’re pioneering AI-driven methods to automate AI research and discovery. We’re excited to release DiscoPOP: a new SOTA preference optimization algorithm that was discovered and written by an LLM!

Sakana AI

Our method leverages LLMs to propose and implement new preference optimization algorithms. We then train models with those algorithms and evaluate their performance, providing feedback to the LLM. By repeating this process for multiple generations in an evolutionary loop, the LLM discovers many highly-performant and novel preference optimization objectives!

Paper: [2406.08414] Discovering Preference Optimization Algorithms with and for Large Language Models
GitHub: GitHub - SakanaAI/DiscoPOP: Code for Discovering Preference Optimization Algorithms with and for Large Language Models
Model: SakanaAI/DiscoPOP-zephyr-7b-gemma · Hugging Face

We proudly collaborated with the @UniOfOxford (@FLAIR_Ox) and @Cambridge_Uni (@MihaelaVDS) on this groundbreaking project. Looking ahead, we envision a future where AI-driven research reduces the need for extensive human intervention and computational resources. This will accelerate scientific discoveries and innovation, pushing the boundaries of what AI can achieve.

2/11
This GitHub repository contains the code for our paper “Discovering Preference Optimization Algorithms with and for Large Language Models”
GitHub - SakanaAI/DiscoPOP: Code for Discovering Preference Optimization Algorithms with and for Large Language Models

3/11
Just apply perturbation theory to the training algorithm to do training in a forward pass. I was working on this for a year now and decided to use such a 'Jiminy Cricket' LLM to evaluate loss/goodness then propagate a 'goodness' value forward during a special pass. The backprop algorithm is computationally and spatialy intensive, so a simpler function--taking the activation results of an input vector at each layer and modifying it relative to those activation values, the 'goodness' value which is on [-1,1] and a training hyperparameter for stability. The intuition is that perturbating the weight matrix relative to the resultant activation, goodness value, and training hyperparameter would asymptote to backprop, but allow in-situ training without the need to store activation and gradient values. In the example of 'jiminy cricket' you feed a fine-tuned LLM the next prompt as a result of the current inferred response, evaluate the 'sentiment' of the next prompt--the resultant human 'feedback'--and map it to [-1,1] then re-run the inferred response through the network and perform the weight matrix perturbations layer by layer. This is a 'well he meant well' approximation of Geoffrey Hinton's activation-based layer by layer adjustment on the forward pass. I was trying to GPL it so it could be used by as many models as possible since my theory is to use a 'Sir Francis Galton' market-based approach to solving 'AI Safety'. This method would only hold for a pre-trained network.

4/11
Wow, incredible innovation from Sakana AI! Leveraging LLMs to pioneer AI-driven research is groundbreaking. Excited to see how this accelerates scientific discovery responsibly and ethically.

5/11
Thank you @SakanaAILabs for highlighting our groundbreaking work on DiscoPOP!

Thrilled to see our collaborative efforts with @chris_lu, Claudio Fanconi, @AlexJChan, @j_foerst, @MihaelaVDS and @RobertTLange recognized.

Our joint paper introduces the innovative LLM-driven objective discovery method and reveals DiscoPOP, a state-of-the-art preference optimization algorithm.

/search?q=#RLHF /search?q=#LLMFineTuning /search?q=#OfflinePreferenceOptimization /search?q=#LLMs

6/11
Hey @ericschmidt, you said once AIs begin to recursively self-improve, we should unplug them.

So it's time, right?

7/11
That's impressive, using LLMs to advance AI research sounds promising. This looks like a step in the right direction. Well done, Sakana.

8/11
Automating AI research is when we truly start to lose control over how this technology progresses. Time to pause, before it's too late.

9/11
Incredible innovation! LLMs discovering new optimization algorithms is next-level.

10/11
So, can they?

11/11
Congratulations @SakanaAILabs for this stimulating project ! Maybe next iteration includes automated curiosity-driven exploration and discovery with and for LLMs ?

cc @hardmaru

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

**Title:** Discovering Preference Optimization Algorithms with and for Large Language Models

**Summary:** The authors propose a new method to automatically discover new algorithms for optimizing large language models (LLMs) to align with human preferences. They use a large language model (LLM) to generate new objective functions, which are then evaluated and refined through an iterative process. The best-performing objective function, called DiscoPOP, is discovered and shown to outperform existing state-of-the-art algorithms.

**Background:** Large language models (LLMs) are trained on large datasets and can generate text that is not aligned with human values. To address this, preference optimization algorithms are used to fine-tune LLMs to generate text that is more aligned with human preferences. However, existing algorithms are limited by human creativity and expertise.

**Method:** The authors propose a meta-optimization approach, where an LLM is used to generate new objective functions, which are then evaluated and refined through an iterative process. The LLM is prompted to generate new objective functions, which are then tested on a downstream task. The performance of each objective function is fed back to the LLM, which refines its proposals based on the feedback.

**Results:** The authors discover several new objective functions, including DiscoPOP, which outperforms existing state-of-the-art algorithms on multiple tasks. DiscoPOP is a novel algorithm that adaptively blends logistic and exponential losses.

**Contributions:**

1. The authors propose a new method for discovering preference optimization algorithms using LLMs.
2. They discover multiple high-performing objective functions, including DiscoPOP.
3. DiscoPOP is shown to outperform existing state-of-the-art algorithms on multiple tasks.

**Implications:** This work has implications for the development of more advanced language models that can generate text that is more aligned with human values. The proposed method can be used to discover new algorithms for other applications beyond language models.

bnew · Jul 10, 2024

1/1

CVPR 2024 Paper Alert

Paper Title: NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Few pointers from the paper

Neural Radiance Fields (NeRFs) have shown remarkable success in synthesizing photorealistic views from multi-view images of static scenes, but face challenges in dynamic, real-world environments with distractors like moving objects, shadows, and lighting changes.

Existing methods manage controlled environments and low occlusion ratios but fall short in render quality, especially under high occlusion scenarios.

In this paper, authors have introduced “NeRF On-the-go”, a simple yet effective approach that enables the robust synthesis of novel views in complex, in-the-wild scenes from only casually captured image sequences.

Delving into uncertainty, their method not only efficiently eliminates distractors, even when they are predominant in captures, but also achieves a notably faster convergence speed.

Through comprehensive experiments on various scenes, their method demonstrates a significant improvement over state-of-the-art techniques. This advancement opens new avenues for NeRF in diverse and dynamic real-world applications.

Organization: @ETH_en , @Microsoft , @MPI_IS

Paper Authors: Weining Ren, @zhuzihan2000 , Boyang Sun, Jiaqi Chen, @mapo1 , @songyoupeng

Read the Full Paper here: [2405.18715] NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Project Page: NeRF On-the-go

Code: GitHub - cvg/nerf-on-the-go: [CVPR'24] NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Be sure to watch the attached Demo Video-Sound on

Music by Breakz Studios from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024 /search?q=#nerf

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

CVPR 2024 Paper Alert

A new research paper has been published, and it's making waves in the field of computer vision!

Paper Title: NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Here are some key points from the paper:

The Problem: Neural Radiance Fields (NeRFs) are great at creating realistic images from multiple photos of a static scene. However, they struggle when dealing with dynamic scenes that have moving objects, shadows, and changing lighting.

The Limitation: Current methods can handle controlled environments with few obstacles, but they don't work well in real-world scenarios with many obstacles.

The Solution: The authors of this paper have developed a new approach called "NeRF On-the-go". This method can create realistic images from casual photo sequences taken in complex, real-world scenes. It's able to remove distractions (like moving objects) and works faster than previous methods.

The Benefits: This new approach has been tested on various scenes and has shown significant improvement over existing techniques. This breakthrough opens up new possibilities for NeRF in real-world applications.

The Team: The research was conducted by a team from ETH, Microsoft, and MPI_IS.

The Authors: Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Mapo, and Youpeng Song.

Want to Learn More?

Read the full paper here: [2405.18715] NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Check out the project page: NeRF On-the-go

Get the code: GitHub - cvg/nerf-on-the-go: [CVPR'24] NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

Large Language Models News & Discussions

More options

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

bnew

Veteran

AI is effectively ‘useless’—and it’s created a ‘fake it till you make it’ bubble that could end in disaster, veteran market watcher warns

AI is effectively ‘useless’—and it’s created a ‘fake it till you make it’ bubble that could end in disaster, veteran market watcher warns

bnew

Veteran

bnew

Veteran

Large Language Models News & Discussions

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

AI is effectively ‘useless’—and it’s created a ‘fake it till you make it’ bubble that could end in disaster, veteran market watcher warns​

Veteran

Veteran

AI is effectively ‘useless’—and it’s created a ‘fake it till you make it’ bubble that could end in disaster, veteran market watcher warns