bnew

Veteran
Joined
Nov 1, 2015
Messages
51,804
Reputation
7,926
Daps
148,687

1/6
Math Arena Leaderboard View

2/6
In a multiturn setting, i experienced sonnet outshine all others by a lot more. The example was ppm dosages and concentrates with a desired mix ratio

3/6
Now sonnet got its position

4/6
Seems like @sama is sleeping

5/6
At least from my experience: whatever math Sonnet is being used for, it certainly ain't linear algebra

6/6
flying fish contest


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GSFCJTsaUAAEJvh.jpg

GSEgsBsaUAI20Yr.jpg





1/10
[Chatbot Arena Update]

We are excited to launch Math Arena and Instruction-Following (IF) Arena!

Math/IF are the two key domains testing models’ logical skills & real-world tasks. Key findings:

- Stats: 500K IF votes (35%), 180K Math votes (13%)
- Claude 3.5 Sonnet is now #1 in Math Arena, and joint #1 in IF.
- DeepSeek-coder #1 open model
- Early GPT-4s improved significantly over Llama-3 & Gemma-2

More analysis below👇

2/10
Instruction-Following Arena

- Claude-3.5/GPT-4o joint #1 (in CIs)
- Gemma-2-27B #1 Open Model
- Early GPT-4/Claudes all UP

3/10
Math Arena (Pt 2)

Ranking shifts quite a lot:
- Mistral-8x22b UP
- Gemma2-9b, Llama3-8b, Command-r drop
- Phi-3 series UP

4/10
Let us know your thoughts! Credits to builders @LiTianleli @infwinston

Links:
- Full data at http://leaderboard.lmsys.org
- Random samples at Arena Example - a Hugging Face Space by lmsys

5/10
This chart somehow correlates very well with my own experience. Also GPT-4-0314 being above GPT-4-0613 feels vindicating

6/10
Question: For the math one how is this data being analyzed. What I mean like is this metric calculated off of total responses, that's to say over all types of math? Because if so then this can potentially open up error in that some models might do better in specific types

7/10
I wonder if it would work if we build could model merging / breeding into the arena to see if we could kick start the evolutionary process?

Why not?

8/10
I love it when new arenas emerge! Math and IF are crucial domains to test logical skills and real-world tasks. Congrats to Claude 3.5 Sonnet and DeepSeek-coder on their top spots!

9/10
"While others are tired of losing money, we are tired of taking profits 🔥

Same Market, Different strategies 💯"

If you aren’t following @MrAlexBull you should be. There aren’t many who have a better understanding of the current market.

10/10
I truly believe previous dip was to scare retail into selling, allowing institutions to buy cheaper, They were late to the game, they needed a better entry.

Following @MrAlexBull tweets, posts, tips and predictions I have added massively to my holdings


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GSEgsBsaUAI20Yr.jpg

GSEkwj5aUAYTG2d.jpg

GSEmVKMaIAAIaWd.jpg





1/6
Multi-turn conversations with LLMs are crucial for many applications today.

We’re excited to introduce a new category, "Multi-Turn," which includes conversations with >=2 turns to measure models' abilities to handle longer interactions.

Key findings:
- 14% Arena votes are multi-turn
- Claude models' scores increased significantly. Claude 3.5 Sonnet becomes joint #1 with GPT-4o.
- Gemma-2-27B and Llama-3-70B are the best open models, now joint #10.

Let us know your thoughts!

2/6
We also collect more votes for Gemma-2-27B (now 5K+) for the past few days. Gemma-2 stays robust against Llama-3-70B, now the new best open model!

3/6
Count of Conversation Turns

4/6
Check out full rankings at http://leaderboard.lmsys.org

5/6
Could you clean up those GTP4/Claude/Gemini variants? Does it make sense if GPT-4o has 100 sub-versions?

6/6
Sad to see 9b falling off, it was probably too good to be true.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GRSbrBVb0AAa9uu.jpg

GRSajD1bkAEM9oz.png

GRScQ5rbMAEHfQq.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,804
Reputation
7,926
Daps
148,687





1/11
The cost of AI is nearing $0 per million tokens. If you think this will not profoundly affect the employment landscape you will be disappointed.

2/11
Beyond tokens many problems related to deployment in a corp org have to be sorted out. Hallucination is still a problem,data privacy and cheaper domain fine tuning processes

3/11
Hallucination is only a problem if you don't control the output structure. If you strictly stick to a specific JSON format or a regex, then a hallucination becomes a classification error which is measurable and manageable.

4/11
Novice question here. But even without any AGI/ASI I’m assuming you mean that a lot of “basic” tasks will become automated as a result of this?

5/11
Yes, it already is. Writing code from scratch is gone. This was the most exciting part of the profession. Now what remains are debugging/fixing. The same concerns graphical design, writing music, literature. You never start from scratch anymore. Which is both exciting and terrifying.

6/11
What is the current cost?

7/11
$0.1 per 1M.

8/11
It will impact the employment landscape, but will reshape of the landscape rather than destroy it.

The opportunities for industrious folks that are paying attention are immense!

9/11
I hope you are right.

10/11
If you think it WILL affect the emp landscape will you also be disappointed?

11/11
I'm both excited and disappointed. Excited because we have such a powerful tool in our disposal. Disappointed that most likely we will not be training this tool anymore. So for at least a couple of years, we will write prompt until we see where LLMs are too weak and where a different approach to AI might be needed.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,804
Reputation
7,926
Daps
148,687

1/6
In this paper from DeepMind, the authors show that many-shot in-context learning (ICL, when you put training examples in the prompt during inference) significantly outperforms few-shot learning across a wide variety of tasks, including translation, summarization, planning, reward modeling, mathematical problem solving, question-answering, algorithmic reasoning, and sentiment analysis.

Furthermore, many-shot provides comparable results to supervised finetuning (SFT, when you finetune the base model on task-specific data before serving it).

Performance on some tasks (e.g. MATH) can degrade with too many examples, suggesting an optimal number of shots exists.

While SFT is computationally expensive in terms of training, many-shot ICL does not require any training. However, many-shot ICL has a larger inference cost, which can be substantially reduced with KV caching.

The authors suggest that many-shot ICL could make task-specific fine-tuning less essential or, in some cases, even unnecessary. This could potentially allow large language models to tackle a wider range of tasks without specialization. https://arxiv.org/pdf/2404.11018

2/6
What’s the difference?

3/6
With what?

4/6
Yes, the large context window LLMs will first make fine-tuning obsolete instead of RAG.

5/6
yes, but how expensive is this, after filling up that context?

6/6
Thanks for sharing @burkov


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GR2YvQYacAESF5u.png


[Submitted on 17 Apr 2024 (v1), last revised 22 May 2024 (this version, v2)]

Many-Shot In-Context Learning​

Rishabh Agarwal
Abstract:Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, many-shot ICL can be bottlenecked by the available amount of human-generated examples. To mitigate this limitation, we explore two new settings: Reinforced and Unsupervised ICL. Reinforced ICL uses model-generated chain-of-thought rationales in place of human examples. Unsupervised ICL removes rationales from the prompt altogether, and prompts the model only with domain-specific questions. We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. Finally, we demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases, can learn high-dimensional functions with numerical inputs, and performs comparably to fine-tuning. Our analysis also reveals the limitations of next-token prediction loss as an indicator of downstream ICL performance.
Subjects:Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as: arXiv:2404.11018
arXiv:2404.11018v2
[2404.11018] Many-Shot In-Context Learning

Submission history​

From: [v1] [ view email]
[v1]

https://arxiv.org/pdf/2404.11018
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,804
Reputation
7,926
Daps
148,687

1/1
🚨Paper Alert 🚨

➡️Paper Title: Magic Insert: Style-Aware Drag-and-Drop

🌟Few pointers from the paper

🎯In this paper authors have presented “Magic Insert”, a method for dragging-and-dropping subjects from a user-provided image into a target image of a different style in a physically plausible manner while matching the style of the target image.

🎯 This work formalizes the problem of style-aware drag-and-drop and presents a method for tackling it by addressing two sub-problems: style-aware personalization and realistic object insertion in stylized images.

🎯For “style-aware personalization”, their method first fine-tunes a pretrained text-to-image diffusion model using LoRA and learned text tokens on the subject image, and then infuses it with a CLIP representation of the target style.

🎯For object insertion, they used “Bootstrapped Domain Adaptation” to adapt a domain-specific photorealistic object insertion model to the domain of diverse artistic styles. Overall, the method significantly outperforms traditional approaches such as inpainting.

🎯Finally, authors have also presented a dataset, “SubjectPlop”, to facilitate evaluation and future progress in this area.

🏢Organization: @Google

🧙Paper Authors: @natanielruizg , Yuanzhen Li, @neal_wadhwa , @Yxp52492 , Michael Rubinstein, David E. Jacobs, @shlomifruchter

1️⃣Read the Full Paper here: [2407.02489] Magic Insert: Style-Aware Drag-and-Drop

2️⃣Project Page: Magic Insert: Style-Aware Drag-and-Drop

3️⃣Demo:Magic Insert Interactive Demo

🎥 Be sure to watch the attached Demo Video -Sound on 🔊🔊

🎵 Music by Mark from @pixabay

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


A.I Generated explanation:

**Title:** Magic Insert: Style-Aware Drag-and-Drop

**Summary:** Imagine you want to take a picture of a cat from one image and drop it into another image with a completely different style, like a painting or a cartoon. This paper presents a new method called "Magic Insert" that makes this possible.

**What's the problem?:** The problem is that when you try to drop an object from one image into another, it usually looks out of place because it doesn't match the style of the new image. For example, if you take a photo of a cat and drop it into a painting, the cat will look like a photo, not a painting.

**How does Magic Insert work?:** The method works by solving two main problems:

1. **Style-aware personalization:** The algorithm takes the object you want to drop (the cat) and makes it match the style of the new image. It does this by using a special kind of AI model that can understand the style of the new image and change the object to fit in.
2. **Realistic object insertion:** The algorithm then takes the styled object and inserts it into the new image in a way that looks realistic. It does this by using another AI model that can adapt to different styles and make the object look like it belongs in the new image.

**Results:** The Magic Insert method is much better than traditional methods at making the dropped object look like it belongs in the new image.

**Additional resources:**

* **Paper:** You can read the full paper here.
* **Project page:** You can learn more about the project here.
* **Demo:** You can try out the Magic Insert demo here.

**Authors and organization:** The paper was written by a team of researchers from Google, including Nataniel Ruiz, Yuanzhen Li, Neal Wadhwa, Yxp52492, Michael Rubinstein, David E. Jacobs, and Shlomi Fruchter.

.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,804
Reputation
7,926
Daps
148,687

1/1
🚨Paper Alert 🚨

➡️Paper Title: TAPVid-3D: A Benchmark for Tracking Any Point in 3D

🌟Few pointers from the paper

🎯In this paper authors have introduced a new benchmark, TAPVid-3D, for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). While point tracking in two dimensions (TAP) has many benchmarks measuring performance on real-world videos, such as TAPVid-DAVIS; three-dimensional point tracking has none.

🎯To this end, leveraging existing footage, they built a new benchmark for 3D point tracking featuring 4,000+ real-world videos, composed of three different data sources spanning a variety of object types, motion patterns, and indoor and outdoor environments.

🎯 To measure performance on the TAP-3D task, authors formulated a collection of metrics that extend the Jaccard-based metric used in TAP to handle the complexities of ambiguous depth scales across models, occlusions, and multi-track spatio-temporal Smoothness.

🎯They manually verified a large sample of trajectories to ensure correct video annotations, and assess the current state of the TAP-3D task by constructing competitive baselines using existing tracking models.

🎯They anticipated that this benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.

🏢Organization: @GoogleDeepMind , @ucl ,@UniofOxford

🧙Paper Authors: @skandakoppula , Ignacio Rocco, Yi Yang, Joe Heyward, João Carreira, Andrew Zisserman, Gabriel Brostow, @CarlDoersch

1️⃣Read the Full Paper here: [2407.05921] TAPVid-3D: A Benchmark for Tracking Any Point in 3D

2️⃣Project Page: TAPVid-3D: A Benchmark for Tracking Any Point in 3D

3️⃣Data & Code: tapnet/tapnet/tapvid3d at main · google-deepmind/tapnet

🎥 Be sure to watch the attached Technical Summary -Sound on 🔊🔊

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


A.I Generated explanation:

**Title:** TAPVid-3D: A Benchmark for Tracking Any Point in 3D

**Summary:** Imagine you're watching a video and you want to track a specific point on an object, like a car or a person, as it moves around in 3D space. This paper introduces a new benchmark, called TAPVid-3D, to help computers do this task better.

**What's the problem?:** Currently, there are many benchmarks that test how well computers can track points on objects in 2D videos, but there aren't any benchmarks that test this ability in 3D videos. This is a problem because 3D tracking is much harder and more important for applications like self-driving cars or augmented reality.

**What is TAPVid-3D?:** TAPVid-3D is a new benchmark that tests how well computers can track points on objects in 3D videos. It's a collection of over 4,000 real-world videos that show different objects moving around in different environments. The benchmark also includes a set of metrics that measure how well a computer can track these points.

**How was TAPVid-3D created?:** The authors of the paper created TAPVid-3D by combining existing footage from different sources and manually verifying the accuracy of the tracking data. They also used existing tracking models to create competitive baselines for the benchmark.

**Why is TAPVid-3D important?:** TAPVid-3D will help computers improve their ability to understand and track 3D motion and surface deformation from monocular video. This will have many applications in fields like computer vision, robotics, and augmented reality.

**Additional resources:**

* **Paper:** You can read the full paper here.
* **Project page:** You can learn more about the project here.
* **Data & Code:** You can access the data and code for TAPVid-3D here.

**Authors and organization:** The paper was written by a team of researchers from Google DeepMind, University College London, and the University of Oxford, including Skanda Koppula, Ignacio Rocco, Yi Yang, Joe Heyward, João Carreira, Andrew Zisserman, Gabriel Brostow, and Carl Doersch.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,804
Reputation
7,926
Daps
148,687

1/1
🚨ECCV 2024 Paper Alert 🚨

➡️Paper Title: Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation

🌟Few pointers from the paper

🎯Egocentric gaze anticipation serves as a key building block for the emerging capability of Augmented Reality. Notably, gaze behavior is driven by both visual cues and audio signals during daily activities. Motivated by this observation, authors of this paper introduced the first model that leverages both the video and audio modalities for egocentric gaze anticipation.

🎯 Specifically, they have proposed a Contrastive Spatial-Temporal Separable (CSTS) fusion approach that adopts two modules to separately capture audio-visual correlations in spatial and temporal dimensions, and applies a contrastive loss on the re-weighted audio-visual features from fusion modules for representation learning.

🎯They conducted extensive ablation studies and thorough analysis using two egocentric video datasets: Ego4D and Aria, to validate their model design. They demonstrated that the audio improves the performance by +2.5% and +2.4% on the two datasets.

🎯Their model also outperforms the prior state-of-the-art methods by at least +1.9% and +1.6%. Moreover, they provide visualizations to show the gaze anticipation results and provide additional insights into audio-visual representation learning.

🏢Organization: @GeorgiaTech , GenAI, @Meta , @UofIllinois

🧙Paper Authors: @bryanislucky , @fionakryan , @Wenqi_Jia , @aptx4869ml , @RehgJim

1️⃣Read the Full Paper here: [2305.03907] Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation

2️⃣Project Page: Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation

🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊

🎵 Music by Maksym Dudchyk from @pixabay

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#ECCV2024


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


A.I Generated explanation:

ECCV 2024 Paper Alert

A new research paper has been published, and it's making waves in the field of Augmented Reality (AR). Here's what it's about:

Paper Title: Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation

The paper is about a new way to predict where people will look in the future, using both what they see and what they hear. This is important for AR, which is a technology that overlays digital information onto the real world.

Here are some key points from the paper:

* The researchers created a new model that uses both video and audio to predict where someone will look. This is the first time anyone has done this.
* They tested their model using two big datasets of videos, and it worked really well. The audio part of the model improved the results by 2.5% and 2.4% compared to just using video.
* Their model is better than other models that have been tried before, and they showed some cool visualizations to prove it.

The researchers are from several organizations, including Georgia Tech, GenAI, Meta, and the University of Illinois.

Want to learn more?

You can read the full paper here: [2305.03907] Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation

Or, you can check out the project page here: Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,804
Reputation
7,926
Daps
148,687

1/1
🚨Paper Alert 🚨

➡️Paper Title: ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data

🌟Few pointers from the paper

🎯Audio signals provide rich information for the robot interaction and object properties through contact. These information can surprisingly ease the learning of contact-rich robot manipulation skills, especially when the visual information alone is ambiguous or incomplete.

🎯However, the usage of audio data in robot manipulation has been constrained to teleoperated demonstrations collected by either attaching a microphone to the robot or object, which significantly limits its usage in robot learning pipelines.

🎯In this paper author have introduced ManiWAV: an ‘ear-in-hand’ data collection device to collect in-the-wild human demonstrations with synchronous audio and visual feedback, and a corresponding policy interface to learn robot manipulation policy directly from the demonstrations.

🎯 They demonstrated the capabilities of their system through four contact-rich manipulation tasks that require either passively sensing the contact events and modes, or actively sensing the object surface materials and states.

🎯 In addition, they showed that their system can generalize to unseen in-the-wild environments, by learning from diverse in-the-wild human demonstrations.

🏢Organization: @Stanford , @Columbia , @ToyotaResearch

🧙Paper Authors: @Liu_Zeyi_ , @chichengcc , @eacousineau , Naveen Kuppuswamy, @Ben_Burchfiel , @SongShuran

1️⃣Read the Full Paper here: [2406.19464] ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data

2️⃣Project Page: ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data

3️⃣Code: GitHub - real-stanford/maniwav: Official codebase of paper "ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data"

4️⃣Dataset: Index of /maniwav/data

🎥 Be sure to watch the attached Technical Summary Video-Sound on 🔊🔊

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


A.I Generated explanation:

**Title:** ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data

**Summary:** This paper is about teaching robots to perform tasks that require touching and manipulating objects, like picking up a ball or opening a door. The researchers found that using audio signals, like the sound of a ball bouncing, can help the robot learn these tasks more easily.

**The Problem:** Usually, robots learn by watching humans perform tasks, but this can be limited because the visual information might not be enough. For example, if a robot is trying to pick up a ball, it might not be able to see exactly how the ball is moving, but it can hear the sound of the ball bouncing.

**The Solution:** The researchers created a special device that can collect audio and visual data from humans performing tasks, like picking up a ball. This device is like a special glove that has a microphone and a camera. They used this device to collect data from humans performing four different tasks, like picking up a ball or opening a door.

**The Results:** The researchers found that their system can learn to perform these tasks by using the audio and visual data collected from humans. They also found that their system can work in different environments and with different objects, even if it hasn't seen them before.

**The Team:** The researchers are from Stanford University, Columbia University, and Toyota Research.

**Resources:**

* **Read the Full Paper:** [2406.19464] ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data
* **Project Page:** ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data
* **Code:** GitHub - real-stanford/maniwav: Official codebase of paper "ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data"
* **Dataset:** Index of /maniwav/data
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,804
Reputation
7,926
Daps
148,687

Tsinghua University Open Sources CodeGeeX4-ALL-9B: A Groundbreaking Multilingual Code Generation Model Outperforming Major Competitors and Elevating Code Assistance​

By Asif Razzaq

July 7, 2024


In a significant leap forward for the field of code generation, the Knowledge Engineering Group (KEG) and Data Mining team at Tsinghua University have unveiled their latest innovation: CodeGeeX4-ALL-9B. This model, part of the renowned CodeGeeX series, represents the pinnacle of multilingual code generation, setting a new standard for performance and efficiency in automated coding.

The CodeGeeX4-ALL-9B model is a product of extensive training on the GLM-4-9B framework, which has markedly improved its capabilities in code generation. With a parameter count of 9.4 billion, this model stands out as one of the most powerful in its class, surpassing even larger general-purpose models. It excels in inference speed and overall performance, making it a versatile tool for various software development tasks.

One of the standout features of CodeGeeX4-ALL-9B is its ability to handle various functions seamlessly. This model covers all critical aspects of software development, from code completion and generation to code interpretation and web searches. It offers repository-level code Q&A, enabling developers to interact with their codebase more intuitively and efficiently. This comprehensive functionality makes CodeGeeX4-ALL-9B an invaluable asset for developers in diverse programming environments.

Performance benchmarks have demonstrated exceptional results on public benchmarks such as BigCodeBench and NaturalCodeBench. These benchmarks assess various aspects of code generation models, and CodeGeeX4-ALL-9B’s performance indicates its robustness and reliability in real-world applications. It has achieved top-tier results, outpacing many larger models and establishing itself as the leading model with fewer than 10 billion parameters.

AD_4nXdi-kxGLkZKO6aNPb1PRBO4yxdzdyFyAZ86GJSfVAPZmL4ivzyL0HFLguXhxWqzLBfxN5pSMYHzcUmEUkn6P8ucmg-LhRrwG_oOrD19R_DolwjHdjSqJMPWZeyXAdRwtz0PDBy7hBdm6IWJ278LR524KY9M
Image Source

The user-friendly design of CodeGeeX4-ALL-9B ensures that developers can quickly integrate it into their workflows. Users can easily launch and utilize the model for their projects using the specified versions of the transformers library. The model supports GPUs and CPUs, ensuring flexibility in different computational environments. This accessibility is crucial for fostering widespread adoption and maximizing the model’s impact across the software development community.

To illustrate its practical application, the model’s inference process involves generating outputs based on user inputs. The results are decoded to provide clear and actionable code, streamlining the development process. This capability is beneficial for tasks that require precise and efficient code generation, such as developing complex algorithms or automating repetitive coding tasks.

In conclusion, the release of CodeGeeX4-ALL-9B by KEG and Data Mining at Tsinghua University marks a milestone in the evolution of code generation models. Its unparalleled performance, comprehensive functionality, and user-friendly integration will revolutionize how developers approach coding tasks, driving efficiency and innovation in software development.



GRtpGZ1WUAAxm-O.jpg







CodeGeeX4: Open Multilingual Code Generation Model​

中文

We introduce CodeGeeX4-ALL-9B, the open-source version of the latest CodeGeeX4 model series. It is a multilingual code generation model continually trained on the GLM-4-9B, significantly enhancing its code generation capabilities. Using a single CodeGeeX4-ALL-9B model, it can support comprehensive functions such as code completion and generation, code interpreter, web search, function call, repository-level code Q&A, covering various scenarios of software development. CodeGeeX4-ALL-9B has achieved highly competitive performance on public benchmarks, such as BigCodeBench and NaturalCodeBench. It is currently the most powerful code generation model with less than 10B parameters, even surpassing much larger general-purpose models, achieving the best balance in terms of inference speed and model performance.

Evaluation​

ModelSeq LengthHumanEvalMBPPNCBLCBHumanEvalFIMCRUXEval-O
Llama3-70B-intruct8K77.482.337.027.4--
DeepSeek Coder 33B Instruct16K81.180.439.329.378.249.9
Codestral-22B32K81.178.246.035.391.651.3
CodeGeeX4-All-9B128K82.375.740.428.585.047.1








[Submitted on 30 Mar 2023 (v1), last revised 10 Jul 2024 (this version, v2)]

CodeGeeX - A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X​

Qinkai Zheng
Abstract:Large pre-trained code generation models, such as OpenAI Codex, can generate syntax- and function-correct code, making the coding of programmers more productive and our pursuit of artificial general intelligence closer. In this paper, we introduce CodeGeeX, a multilingual model with 13 billion parameters for code generation. CodeGeeX is pre-trained on 850 billion tokens of 23 programming languages as of June 2022. Our extensive experiments suggest that CodeGeeX outperforms multilingual code models of similar scale for both the tasks of code generation and translation on HumanEval-X. Building upon HumanEval (Python only), we develop the HumanEval-X benchmark for evaluating multilingual models by hand-writing the solutions in C++, Java, JavaScript, and Go. In addition, we build CodeGeeX-based extensions on Visual Studio Code, JetBrains, and Cloud Studio, generating 4.7 billion tokens for tens of thousands of active users per week. Our user study demonstrates that CodeGeeX can help to increase coding efficiency for 83.4% of its users. Finally, CodeGeeX is publicly accessible and in Sep. 2022, we open-sourced its code, model weights (the version of 850B tokens), API, extensions, and HumanEval-X at this https URL.
Subjects:Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Cite as: arXiv:2303.17568
arXiv:2303.17568v2
[2303.17568] CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X

Submission history​

From: [v1] [ view email]
[v1]

https://arxiv.org/pdf/2303.17568


1/1
Introducing open-source #CodeGeeX4-ALL-9B, the most powerful, versatile coding model under 10B parameters! The CodeGeeX4 supports code completion, annotation, translation, and advanced features. #Coding #LLMs
Github: GitHub - THUDM/CodeGeeX4: CodeGeeX4-ALL-9B, a versatile model for all AI software development scenarios, including code completion, code interpreter, web search, function calling, repository-level Q&A and much more.

Developed by #ZhipuAI, CodeGeeX has consistently evolved since its inception in Sep 2022. CodeGeeX4-ALL-9B significantly enhances code generation capabilities based on the powerful language abilities of #GLM4. This single model can cover all programming scenarios. It delivers competitive performance on authoritative code capability evaluation benchmarks such as NaturalCodeBench and BigCodeBench, surpassing several times larger general models in inference performance and model effect. Performance evaluation on BigCodeBench shows that CodeGeeX4-ALL-9B is the best in its class.

CodeGeeX4-ALL-9B supports 128K context, helping the model to understand and use information from longer code files or even project codes, significantly improving the model's ability to deal with complex tasks and accurately answer content from different code files. CodeGeeX4-ALL-9B is the only code model that implements Function Call. It has successfully called over 90% of AST and Exec test sets on the Berkeley Function Calling Leaderboard.

The latest CodeGeeX plugin v2.12.0 fully integrates the fourth-generation model. It can automatically generate README files for projects, remember and understand long text context at the project level, support cross-file analysis and Q&A in projects, and support local mode. CodeGeeX v2.12.0 also significantly improves NL2SQL capabilities. Now, you can directly generate complex SQL queries in natural language in the plugin.

Experience the power of CodeGeeX4 now! Upgrade your CodeGeeX plugin in your IDE or search for "CodeGeeX" in the IDE plugin market to download it for free. You can also download CodeGeeX's fourth-generation model on GitHub and deploy a dedicated project-level intelligent programming assistant on your computer.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GR-EfgCbQAAiDfN.jpg
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,804
Reputation
7,926
Daps
148,687

1/1
🚨Paper Alert 🚨

➡️Paper Title: Hierarchical World Models as Visual Whole-Body Humanoid Controllers

🌟Few pointers from the paper

🎯Whole-body control for humanoids is challenging due to the high-dimensional nature of the problem, coupled with the inherent instability of a bipedal morphology. Learning from visual observations further exacerbates this difficulty.

🎯 In this work, authors have explored highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, reward design, or skill primitives.

🎯Specifically, authors have proposed a hierarchical world model in which a high-level agent generates commands based on visual observations for a low-level agent to execute, both of which are trained with rewards.

🎯Their approach produces highly performant control policies in 8 tasks with a simulated 56-DoF humanoid, while synthesizing motions that are broadly preferred by humans.

🏢Organization: @UCSanDiego , @nyuniversity , @AIatMeta

🧙Paper Authors: @ncklashansen , @jyothir_s_v , @vlad_is_ai , @ylecun , @xiaolonw , @haosu_twitr

1️⃣Read the Full Paper here: [2405.18418] Hierarchical World Models as Visual Whole-Body Humanoid Controllers

2️⃣Project Page: Puppeteer

3️⃣Code: GitHub - nicklashansen/puppeteer: Code for "Hierarchical World Models as Visual Whole-Body Humanoid Controllers"

4️⃣Models: puppeteer – Google Drive

🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊

🎵 Music by Yevgeniy Sorokin from @pixabay

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


A.I Generated explanation:


**Title:** Hierarchical World Models as Visual Whole-Body Humanoid Controllers

**Summary:** This paper is about creating a computer system that can control a humanoid robot (a robot that looks like a human) using only visual observations (like a camera). This is a challenging problem because the robot has many moving parts and can be unstable.

**Key Points:**

* The researchers used a type of artificial intelligence called reinforcement learning to teach the robot how to move.
* They didn't use any simplifications or assumptions to make the problem easier, which makes their approach more realistic.
* They created a hierarchical system, where one part of the system (the "high-level agent") tells another part (the "low-level agent") what to do based on what it sees.
* They tested their system on a simulated robot with 56 moving parts and were able to get it to perform well on 8 different tasks.
* The movements the robot made were also preferred by humans.

**Authors and Organizations:**

* The researchers are from the University of California, San Diego, New York University, and Meta AI.
* The authors are Nicklas Hansen, Jyothi S. V., Vladlen Koltun, Yann LeCun, Xiaolong Wang, and Haosu Wei.

**Resources:**

* You can read the full paper here: [2405.18418] Hierarchical World Models as Visual Whole-Body Humanoid Controllers
* You can visit the project page here: Puppeteer
* You can access the code here: GitHub - nicklashansen/puppeteer: Code for "Hierarchical World Models as Visual Whole-Body Humanoid Controllers"
* You can access the models here: puppeteer – Google Drive
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,804
Reputation
7,926
Daps
148,687

1/1
🚨Paper Alert 🚨

➡️Paper Title: NPGA: Neural Parametric Gaussian Avatars

🌟Few pointers from the paper

🎯The creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives.

🎯Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance.

⚛️In this work, authors have proposed “Neural Parametric Gaussian Avatars” (NPGA), a data-driven approach to create high-fidelity, controllable avatars from multi-view video recordings.

🎯They build their method around 3D Gaussian splatting for its highly efficient rendering and to inherit the topological flexibility of point clouds.

🎯In contrast to previous work, they conditioned their avatars’ dynamics on the rich expression space of neural parametric head models (NPHM), instead of mesh-based 3DMMs.

🎯To this end, they distilled the backward deformation field of their underlying NPHM into forward deformations which are compatible with rasterization-based rendering.

🎯 All remaining fine-scale, expression-dependent details are learned from the multi-view videos. To
increase the representational capacity of their avatars, they augmented the canonical Gaussian point cloud using per-primitive latent features which govern its dynamic behavior.

🎯 To regularize this increased dynamic expressivity, authors have proposed Laplacian terms on the latent features and predicted dynamics. They evaluated their method on the public NeRSemble dataset, demonstrating that NPGA significantly outperforms the previous state-of-the-art avatars on the self-reenactment task by ≈ 2.6 PSNR.

🏢Organization: @TU_Muenchen , @synthesiaIO , @ucl

🧙Paper Authors: @SGiebenhain , Tobias Kirschstein, Martin Rünz, @LourdesAgapito , @MattNiessner

1️⃣Read the Full Paper here: [2405.19331] NPGA: Neural Parametric Gaussian Avatars

2️⃣Project Page: NPGA: Neural Parametric Gaussian Avatars

3️⃣Code: Coming 🔜

🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊

🎵 Music by Sergio Prosvirini from @pixabay

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#3dgaussiansplatting


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


A.I Generated explanation:

**Title:** NPGA: Neural Parametric Gaussian Avatars

**What's it about?**

Creating digital versions of human heads that look super realistic is important for integrating virtual components into our daily lives. However, making these digital heads, called avatars, is a tough problem because they need to look very realistic and be able to move smoothly in real-time.

**What did the researchers do?**

The researchers came up with a new way to create these avatars using a technique called "Neural Parametric Gaussian Avatars" (NPGA). They used videos taken from multiple angles to create these avatars, which can be controlled and moved around.

**How did they do it?**

They used a combination of two techniques: 3D Gaussian splatting (which is fast and flexible) and neural parametric head models (which can capture a wide range of facial expressions). They also added some extra details to the avatars to make them look more realistic.

**What's the result?**

The researchers tested their method on a public dataset and found that their avatars looked much better than previous ones. They also made sure that the avatars could move smoothly and naturally.

**Who did the research?**

The research was done by a team from the Technical University of Munich, Synthesia, and University College London.

**Want to learn more?**

You can read the full paper here: [2405.19331] NPGA: Neural Parametric Gaussian Avatars or check out the project page here: NPGA: Neural Parametric Gaussian Avatars. The code for the project will be available soon
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,804
Reputation
7,926
Daps
148,687

1/1
🚨CVPR 2024 Paper Alert 🚨

➡️Paper Title: IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing

🌟Few pointers from the paper

🎯In this paper authors have presented IntrinsicAvatar, a novel approach to recovering the intrinsic properties of clothed human avatars including geometry, albedo, material, and environment lighting from only monocular videos.

🎯Recent advancements in human-based neural rendering have enabled high-quality geometry and appearance reconstruction of clothed humans from just monocular videos.

🎯However, these methods bake intrinsic properties such as albedo, material, and environment lighting into a single entangled neural representation.

🎯On the other hand, only a handful of works tackle the problem of estimating geometry and disentangled appearance properties of clothed humans from monocular videos. They usually achieve limited quality and disentanglement due to approximations of secondary shading effects via learned MLPs.

🎯In this work, authors have proposed to model secondary shading effects explicitly via Monte-Carlo ray tracing. They modeled the rendering process of clothed humans as a volumetric scattering process, and combined ray tracing with body articulation.

🎯Their approach can recover high-quality geometry, albedo, material, and lighting properties of clothed
humans from a single monocular video, without requiring supervised pre-training using ground truth materials.

🎯 Furthermore, since they explicitly model the volumetric scattering process and ray tracing, their model naturally generalizes to novel poses, enabling animation of the reconstructed avatar in novel lighting conditions.

🏢Organization: @ETH_en , University of T¨ubingen, T¨ubingen AI Center

🧙Paper Authors: @sfwang0928 , @anticboz , Andreas Geiger (@AutoVisionGroup ), @SiyuTang3

1️⃣Read the Full Paper here: [2312.05210] IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing

2️⃣Project Page: IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing

3️⃣Code: GitHub - taconite/IntrinsicAvatar: [CVPR 2024] IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing

🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊

🎵Music by Riley Clent from @pixabay

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


A.I Generated explanation:

**Title:** IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing

**What's it about:** This paper is about creating a computer program that can take a video of a person and recreate a 3D model of that person, including their clothes, skin tone, and the lighting around them. This is a hard problem because the video only shows the person from one angle, and the program has to figure out what the person looks like from all sides.

**The problem:** Current methods can create a 3D model of a person from a video, but they have some limitations. They can't separate the person's skin tone, clothes, and lighting into individual components, and they don't work well when the person is moving or in different lighting conditions.

**The solution:** The authors of this paper have come up with a new approach that uses a technique called "ray tracing" to create a more accurate 3D model of the person. Ray tracing is a way of simulating how light behaves in the real world, which helps the program to better understand how the person looks in different lighting conditions. This approach can create a more detailed and realistic 3D model of the person, including their clothes, skin tone, and lighting.

**The benefits:** This new approach has several advantages. It can create a 3D model of the person that looks more realistic and detailed, and it can do this without needing a lot of training data. It can also animate the 3D model in different lighting conditions, which is useful for applications like video games or virtual reality.

**The team:** The paper was written by a team of researchers from ETH Zurich, the University of Tübingen, and the Tübingen AI Center.

**Resources:**

* Read the full paper here: https://arxiv.org/abs/2312.05210
* Project page: https://neuralbodies.github.io/IntrinsicAvatar/
* Code: https://github.com/taconite/IntrinsicAvatar
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,804
Reputation
7,926
Daps
148,687

1/2
🚨CVPR 2024 Paper Alert 🚨

➡️Paper Title: HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

🌟Few pointers from the paper

🎯In this paper authors have presented “HiFi4G”, an explicit and compact Gaussian-based approach for high-fidelity human performance rendering from dense footage.

🎯Their core intuition is to marry the 3D Gaussian representation with non-rigid tracking, achieving a compact and compression-friendly representation.

🎯They first proposed a dual-graph mechanism to obtain motion priors, with a coarse deformation graph for effective initialization and a fine-grained Gaussian graph to enforce subsequent constraints.

🎯Then, they utilized a 4D Gaussian optimization scheme with adaptive spatial-temporal regularizers to effectively balance the non-rigid prior and Gaussian updating.

🎯They also presented a companion compression scheme with residual compensation for immersive experiences on various platforms.

🎯 It achieves a substantial compression rate of approximately 25 times, with less than 2MB of storage per frame. Extensive experiments demonstrate the effectiveness of their approach, which significantly outperforms existing approaches in terms of optimization speed, rendering quality, and storage overhead.

🏢Organization: @ShanghaiTechUni , NeuDim, @BytedanceTalk , DGene

🧙Paper Authors: Yuheng Jiang, Zhehao Shen, Penghao Wang, Zhuo Su, Yu Hong, Yingliang Zhang, Jingyi Yu, Lan Xu

1️⃣Read the Full Paper here: [2312.03461] HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

2️⃣Project Page: HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

🎥 Be sure to watch the attached Video-Sound on 🔊🔊

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024 /search?q=#gaussiansplatting

2/2
Can’t fade that gausian blur


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


A.I Generated explanation:

**Title:** HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

**What's it about?**

This paper is about creating a new way to render high-quality human performances (like dancing or acting) from video footage. The goal is to make it look super realistic and detailed, while also making it easy to store and transmit.

**How does it work?**

The authors came up with a new approach called HiFi4G, which uses a combination of 3D Gaussian representations and non-rigid tracking to create a compact and efficient way to render human performances. Here's a simplified breakdown of the steps:

1. **Dual-graph mechanism**: They create two graphs to help initialize and refine the motion of the human performance. One graph is coarse and helps with the initial setup, while the other graph is fine-grained and enforces more detailed constraints.
2. **4D Gaussian optimization**: They use a special optimization scheme to balance the non-rigid prior (which helps with the overall motion) and the Gaussian updating (which refines the details). This helps to create a smooth and realistic performance.
3. **Compression scheme**: They also developed a companion compression scheme that reduces the amount of data needed to store the performance. This makes it possible to store and transmit the data more efficiently.

**Results?**

The authors claim that their approach achieves a significant compression rate of about 25 times, with less than 2MB of storage per frame. They also show that their approach outperforms existing methods in terms of optimization speed, rendering quality, and storage overhead.

**Who's behind it?**

The paper is a collaboration between researchers from ShanghaiTech University, NeuDim, Bytedance, and DGene. The authors are Yuheng Jiang, Zhehao Shen, Penghao Wang, Zhuo Su, Yu Hong, Yingliang Zhang, Jingyi Yu, and Lan Xu.

**Want to learn more?**

You can read the full paper here: [2312.03461] HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

Or check out the project page here: HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,804
Reputation
7,926
Daps
148,687

1/1
🚨Paper Alert 🚨

➡️Paper Title: MotionLLM: Understanding Human Behaviors from Human Motions and Videos

🌟Few pointers from the paper

🎯This study delves into the realm of multi-modality (i.e., video and motion modalities) human behavior understanding by leveraging the powerful capabilities of Large Language Models (LLMs).

🎯Diverging from recent LLMs designed for video-only or motion-only understanding, authors argued that understanding human behavior necessitates joint modeling from both videos and motion sequences (e.g., SMPL sequences) to capture nuanced body part dynamics and semantics effectively.

🎯 In this paper, authors have presented “MotionLLM”, a straightforward yet effective framework for human motion understanding, captioning, and reasoning.

🎯Specifically, MotionLLM adopts a unified video-motion training strategy that leverages the complementary advantages of existing coarse video-text data and fine-grained motion-text data to clean rich spatial-temporal insights.

🎯Furthermore, they collected a substantial dataset, MoVid, comprising diverse videos, motions, captions, and instruction.

🎯Additionally, authors have also proposed the MoVid-Bench, with carefully manual annotations, for better evaluation of human behavior understanding on video and motion.

🏢Organization: @Tsinghua_Uni , School of Data Science, Shenzhen Research Institute of Big Data, @cuhksz ,@IDEACVR , @hkust

🧙Paper Authors: @Evan_THU , @ShunlinL , Ailing Zeng, Hao Zhang, @wabyking , @RamonCheung , @leizhangcs

1️⃣Read the Full Paper here: [2405.20340] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

2️⃣Project Page: MotionLLM: Understanding Human Behaviors from Human Motions and Videos

3️⃣Code: GitHub - IDEA-Research/MotionLLM: [Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

4️⃣Demo: http://demo.humotionx.com/

🎥 Be sure to watch the attached Demo Video-Sound on 🔊🔊

🎵 Music by Oleksii Holubiev from @pixabay

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


A.I Generated explanation:

**Title:** MotionLLM: Understanding Human Behaviors from Human Motions and Videos

**Summary:** This paper is about a new way to understand human behavior by analyzing both videos and motion data (like 3D animations). The researchers created a system called MotionLLM that can look at videos and motion data together to understand what people are doing and why.

**Key Points:**

* Most systems only look at videos or motion data separately, but this system combines both to get a better understanding of human behavior.
* The system is called MotionLLM and it's a simple but effective way to understand human motion and behavior.
* The researchers collected a large dataset of videos, motion data, and captions to train their system.
* They also created a special benchmark to test how well their system can understand human behavior.
* The system can be used to analyze videos and motion data to understand what people are doing and why.

**Authors:** The paper was written by a team of researchers from several universities and institutions.

**Resources:**

* You can read the full paper here: [2405.20340] MotionLLM: Understanding Human Behaviors from Human Motions and Videos
* You can visit the project page here: MotionLLM: Understanding Human Behaviors from Human Motions and Videos
* You can find the code on GitHub here: GitHub - IDEA-Research/MotionLLM: [Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos
* You can see a demo of the system here: http://demo.humotionx.com/
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,804
Reputation
7,926
Daps
148,687

1/1
🎵 Product Update 🎵

Imagine stepping into an office where humans and robots collaborate seamlessly.

The hum of machinery harmonizes with the click of keyboards, creating a symphony of productivity.

Watch a team of EVEs from @1x_tech work side by side with their human counterparts, transforming cluttered spaces into pristine oases.

Find this Valuable 💎 ?

♻️QT and teach your network something new

Follow me 👣, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top