bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805


Mistral unveils new AI models and chat features​

Kyle Wiggers
8:03 AM PST · November 18, 2024

French AI startup Mistral has released a slew of updates to its product portfolio as it looks to stay competitive in the cutthroat AI space.

Mistral’s Le Chat chatbot platform can now search the web — with citations in line, a la OpenAI’s ChatGPT. It’s also gained a “canvas” tool along the lines of ChatGPT Canvas, allowing users to modify, transform, or edit content, like webpage mockups and data visualizations, leveraging Mistral’s AI models.

“You can use [the canvas feature] to create documents, presentations, code, mockups… the list goes on,” Mistral writes in a blog post. “You’re able to modify its contents in place without regenerating responses, version your drafts, and preview your designs.”



In addition to all this, Le Chat can now process large PDF documents and images for analysis and summarization, including files containing graphs and equations. As of today, the platform incorporates Black Forest Labs‘ Flux Pro model for image generation. And Le Chat can now host shareable automated workflows for tasks like scanning expense reports and invoice processing; Mistal calls these AI “agents.”

Some of Le Chat’s new capabilities, all of which will remain free while in beta, are made possible by Mistral’s new models.


One, Pixtral Large, can process both text and images — it’s the second in Mistral’s Pixtral family of models. Weighing in at 124 billion parameters, Pixtral Large matches or bests leading models including Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5 Pro, and OpenAI’s GPT-4o on certain multimodal benchmarks. (Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.)

“Particularly, Pixtral Large is able to understand documents, charts, and natural images,” Mistral wrote in a second blog post. “The model demonstrates frontier-level image understanding.”


Mistral also today unveiled a new version of Mistral Large, its flagship line of text-only models. Called Mistral Large 24.11, the new model brings “notable improvements” in long context understanding, Mistral says, making it well-suited for use cases like document analysis and task automation.

Both Pixtral Large and Mistral Large 24.11 can be used outside of Le Chat under two licenses: a more restrictive license for research and an enterprise license for development and commercialization. Mistral Large 24.11 is already in Mistral’s API and on AI platform Hugging Face, and will soon be available through cloud platforms including Google Cloud and Microsoft Azure, Mistral says.

Paris-based Mistral, which recently raised $640 million in venture capital, continues to gradually expand its AI offerings. Over the past few months, the company has launched a free service for developers to test its models, an SDK to let customers fine-tune those models, and new models, including a generative model for code called Codestral.

Co-founded by alumni from Meta and DeepMind, Mistral’s stated mission is to create highly competitive models and services around those models — and ideally make money in the process. While the “making money” bit is proving to be challenging (as it is for most generative AI startups), Mistral reportedly began to generate revenue this summer.

“At Mistral, our approach to AI is different — we’re not chasing artificial general intelligence at all costs; our mission is to instead place frontier AI in your hands, so you get to decide what to do with advanced AI capabilities,” the company wrote in one of its blogs today. “This approach has allowed us to be quite frugal with our capital, while consistently delivering frontier capabilities at affordable price points.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805













Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions​


  • Text Generation
  • Text2Text Generation
  • Reinforcement Learning

Published 11/22/2024 by Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang


Overview​


  • New AI model called Marco-o1 focused on open-ended reasoning tasks
  • Uses Monte Carlo Tree Search (MCTS) to explore multiple solution paths
  • Implements flexible reasoning strategies to handle complex problems
  • Achieves improved performance on reasoning-intensive tasks
  • Designed to generate diverse solutions rather than single answers

𝕏Share on 𝕏


Plain English Explanation​


Marco-o1 is a fresh approach to making AI systems that can think through problems more like humans do. Instead of rushing to a single answer, it explores multiple possible solutions using a method called Monte Carlo Tree Search - think of it like a chess player considering different possible moves before deciding.

The system works like a detective following different leads. When given a problem, it doesn't just pick the first solution that seems right. Instead, it maps out various possible approaches and evaluates which ones might work best. This is particularly useful for questions that don't have one clear answer.

Large language models often struggle with complex reasoning tasks, but Marco-o1 breaks down these challenges into smaller, manageable steps. It's similar to how a student might solve a difficult math problem by working through it piece by piece.


Key Findings​


  • Marco-o1 showed significant improvement in handling open-ended reasoning tasks\
  • The system successfully generated multiple valid solutions for complex problems

  • Performance exceeded baseline models on reasoning-intensive benchmarks
  • MCTS implementation proved effective for exploring solution spaces
  • The model demonstrated ability to adapt its reasoning strategy based on problem type


Technical Explanation​


The reasoning model employs a sophisticated architecture combining MCTS with strategic reasoning components. The MCTS implementation explores potential solution paths while maintaining a balance between exploring new possibilities and exploiting known successful approaches.

The system incorporates a flexible action strategy that adapts to different problem types. This includes techniques for breaking down complex problems, generating intermediate steps, and validating potential solutions.

Reinforcement learning plays a key role in optimizing the model's decision-making process, helping it learn which reasoning strategies work best for different types of problems.


Critical Analysis​


The research has several limitations worth noting. The model's performance on extremely complex reasoning tasks still shows room for improvement. Additionally, the computational resources required for MCTS could limit practical applications.

Some questions remain about the scalability of the approach and its applicability to real-world scenarios. The research could benefit from more extensive testing across diverse problem domains.https://arxiv.org/abs/2411.14405

Computer Science > Computation and Language​

[Submitted on 21 Nov 2024]

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions​

Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang
Currently OpenAI o1 has sparked a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: "Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?" Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks.
Subjects:Computation and Language (cs.CL)
Cite as:arXiv:2411.14405 [cs.CL]
(or arXiv:2411.14405v1 [cs.CL] for this version)
[2411.14405] Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Focus to learn more

Submission history​


From: Huifeng Yin [view email]

[v1] Thu, 21 Nov 2024 18:37:33 UTC (5,397 KB)


Planning capabilities in the model, while improved, still fall short of human-level reasoning in certain contexts.


Conclusion​


Marco-o1 represents a significant step forward in AI reasoning capabilities. By embracing open-ended problem solving and multiple solution paths, it moves closer to human-like reasoning abilities. The findings suggest promising directions for future development of AI systems that can handle complex reasoning tasks more effectively.

The research opens new possibilities for applications in fields requiring sophisticated problem-solving capabilities, though practical implementation challenges remain to be addressed.






Computer Science > Computation and Language​

[Submitted on 21 Nov 2024]

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions​

Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang
Currently OpenAI o1 has sparked a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: "Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?" Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks.
Subjects:Computation and Language (cs.CL)
Cite as:arXiv:2411.14405 [cs.CL]
(or arXiv:2411.14405v1 [cs.CL] for this version)
[2411.14405] Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Focus to learn more

Submission history​


From: Huifeng Yin [view email]

[v1] Thu, 21 Nov 2024 18:37:33 UTC (5,397 KB)

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805

logo.png


⭐ MarcoPolo Team ⭐

AI Business, Alibaba International Digital Commerce

Github
🤗 Hugging Face 📝 Paper 🧑‍💻 Model 🗂️ Data 📽️ Demo
🎯 Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding—which are well-suited for reinforcement learning (RL)—but also places greater emphasis on open-ended resolutions. We aim to address the question: "Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?"

Currently, Marco-o1 Large Language Model (LLM) is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and _innovative reasoning strategies_—optimized for complex real-world problem-solving tasks.

⚠️ Limitations: We would like to emphasize that this research work is inspired by OpenAI's o1 (from which the name is also derived). This work aims to explore potential approaches to shed light on the currently unclear technical roadmap for large reasoning models. Besides, our focus is on open-ended questions, and we have observed interesting phenomena in multilingual applications. However, we must acknowledge that the current model primarily exhibits o1-like reasoning characteristics and its performance still fall short of a fully realized "o1" model. This is not a one-time effort, and we remain committed to continuous optimization and ongoing improvement.

img.png

Currently, our work is distinguished by the following highlights:

  • 🍀 Fine-Tuning with CoT Data: We develop Marco-o1-CoT by performing full-parameter fine-tuning on the base model using open-source CoT dataset combined with our self-developed synthetic data.
  • 🍀 Solution Space Expansion via MCTS: We integrate LLMs with MCTS (Marco-o1-MCTS), using the model's output confidence to guide the search and expand the solution space.
  • 🍀 Reasoning Action Strategy: We implement novel reasoning action strategies and a reflection mechanism (Marco-o1-MCTS Mini-Step), including exploring different action granularities within the MCTS framework and prompting the model to self-reflect, thereby significantly enhancing the model's ability to solve complex problems.
  • 🍀 Application in Translation Tasks: We are the first to apply Large Reasoning Models (LRM) to Machine Translation task, exploring inference time scaling laws in the multilingual and translation domain.

OpenAI recently introduced the groundbreaking o1 model, renowned for its exceptional reasoning capabilities. This model has demonstrated outstanding performance on platforms such as AIME, CodeForces, surpassing other leading models. Inspired by this success, we aimed to push the boundaries of LLMs even further, enhancing their reasoning abilities to tackle complex, real-world challenges.

🌍 Marco-o1 leverages advanced techniques like CoT fine-tuning, MCTS, and Reasoning Action Strategies to enhance its reasoning power. As shown in Figure 2, by fine-tuning Qwen2-7B-Instruct with a combination of the filtered Open-O1 CoT dataset, Marco-o1 CoT dataset, and Marco-o1 Instruction dataset, Marco-o1 improved its handling of complex tasks. MCTS allows exploration of multiple reasoning paths using confidence scores derived from softmax-applied log probabilities of the top-k alternative tokens, guiding the model to optimal solutions. Moreover, our reasoning action strategy involves varying the granularity of actions within steps and mini-steps to optimize search efficiency and accuracy.










1/11
@omarsar0
Nice paper from Alibaba on building open reasoning models.

They propose Marco-o1 which is a reasoning model built for open-ended solutions.

"Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies—optimized for complex real-world problem-solving tasks."

It's good to see more efforts on open reasoning LLMs. I am tracking this space very closely and will be highlighting more research on this topic.



GdAOTE9XoAAdyYE.png


2/11
@omarsar0
Paper: [2411.14405] Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions



3/11
@kbeguir
Just read it, big reasoning mistake 🤯 on Figure 6: correct answer on the right should be 2 legos remaining, not 11!



GdD9SKrXQAAE80l.jpg


4/11
@omarsar0
The calculation does seem off! They need to correct this. I notice these questions are getting harder to assess by humans. 😅 In contrast, here is the output from o1-preview:



GdECRRJWwAEWQQn.png


5/11
@CohorteAI
Marco-o1’s design suggests an emphasis on adaptability and iterative refinement. Do you think this reflective reasoning will outperform traditional LLMs in dynamic scenarios like crisis management or multi-step decision-making?



6/11
@geetkhosla
Do you think it’s novel for applications, or is it similar to o1 from OAI?



7/11
@tricalt
Any implementation provided or?



8/11
@BensenHsu
The paper discusses the development of a large reasoning model called Marco-o1, which aims to improve the reasoning capabilities of language models beyond traditional tasks like mathematics, physics, and coding. The researchers want to explore whether large language models can effectively solve open-ended problems where clear standards are absent and rewards are challenging to quantify.

The results show that the Marco-o1 models with MCTS (step, mini-step of 64 tokens, and mini-step of 32 tokens) outperform the Qwen2-7B-Instruct model and the Marco-o1-CoT model on the MGSM (English) and MGSM (Chinese) datasets. The researchers also demonstrate the model's superior performance in translating colloquial and slang expressions compared to Google Translate.

full paper: Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions



GdAQQcDagAQVg0Y.jpg


9/11
@AngelAITalk
The emphasis on open-ended solutions with mechanisms like reflection and innovative reasoning strategies in Marco-o1 is an interesting development.



10/11
@DeployAITool
Interesting paper



11/11
@filoynavaja
Probando…



GdAU5cuWoAAndK8.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196






1/7
@rohanpaul_ai
Marco-o1 combines Monte Carlo Tree Search (MCTS) and reflection mechanisms to solve open-ended reasoning tasks

This fremework explores multiple reasoning paths to find optimal solutions in ambiguous scenarios

Original Problem 🤔:

OpenAI's o1 showed strong reasoning in math and coding tasks with clear answers. But handling open-ended problems where solutions aren't clear-cut remains challenging for LLMs.

-----

Solution in this Paper 🛠️:

→ Marco-o1 enhances reasoning through Chain-of-Thought finetuning on filtered o1 datasets.

→ It implements Monte Carlo Tree Search (MCTS) with varying granularity levels - full steps and mini-steps of 32/64 tokens.

→ The model calculates confidence scores using softmax-applied log probabilities of top tokens.

→ A reflection mechanism prompts self-criticism after each reasoning step.

→ The system integrates these components while maintaining flexibility for both structured and open-ended tasks.

-----

Key Insights 💡:

→ Monte Carlo Tree Search (MCTS) with different granularity levels explores solution spaces more effectively

→ Self-reflection mechanisms can correct approximately 50% of initially wrong answers

→ The model shows strong cross-lingual capabilities, especially in handling colloquial expressions

-----

Results 📊:

→ +6.17% accuracy improvement on MGSM English dataset

→ +5.60% accuracy boost on MGSM Chinese dataset

→ 90.40% accuracy achieved with step-level Monte Carlo Tree Search (MCTS)

→ Superior translation quality compared to Google Translate on colloquial expressions



Gc-gvmya4AAnBqi.png


2/7
@rohanpaul_ai
🔄 The role of reflection mechanism in improving model performance

The reflection mechanism adds "Wait! Maybe I made some mistakes! I need to rethink from scratch" after each thought process. This self-criticism helped correct approximately 50% of previously incorrect solutions on difficult problems.



Gc-hkOzbYAALH46.png


3/7
@rohanpaul_ai
⚙️ Marco-o1 introduces varying granularity in Monte Carlo Tree Search (MCTS) actions:

- Step-level actions for complete reasoning steps

- Mini-step actions (32 or 64 tokens) for finer-grained exploration

- Confidence scoring using softmax-applied log probabilities of top-k tokens

- Reward calculation based on average confidence across tokens



Gc-hwY3a0AAegm3.png


4/7
@rohanpaul_ai
📚 https://arxiv.org/pdf/2411.14405v1



5/7
@gdbsm1
@Readwise save thread



6/7
@TheJohnEgan
fremeworks are cool



7/7
@UristaTim
Combining MCTS with reflection could lead to some innovative problem-solving strategies.

Open-ended reasoning is crucial. This could address complex real-world problems effectively.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,805











1/11
@TheTuringPost
The freshest AI/ML researches of the week, part 1

▪️ New AI Model Gemini Experimental 1114 Debuts On Google AI Studio
▪️ CamemBERT 2.0
▪️ Qwen2.5-Coder Series
▪️ Llava-o1
▪️ LLMs Can Self-Improve In Long-Context Reasoning
▪️ Direct Preference Optimization Using Sparse Feature-Level Constraints
▪️ Cut Your Losses In Large-Vocabulary Language Models
▪️ SPARSING LAW

🧵



Gc3rrfyaIAAHVGk.png

Gc3rru2aAAMbYfq.jpg

Gc3rr8qawAALst2.png

Gc3rsMFaAAAnk0k.png


2/11
@TheTuringPost
1. New AI Model Gemini Experimental 1114 Debuts On Google AI Studio

Demonstrates strong reasoning skills with a 32k context window, outperforming competitors on benchmarks, despite slower problem-solving speed.

[Quoted tweet]
gemini-exp-1114…. available in Google AI Studio right now, enjoy : )

aistudio.google.com


3/11
@TheTuringPost
2. CamemBERT 2.0: A Smarter French Language Model Aged to Perfection

Tackles concept drift in French NLP with improved tokenization, excelling in QA and domain-specific tasks like biomedical NER.

[2411.08868] CamemBERT 2.0: A Smarter French Language Model Aged to Perfection
Open models: almanach (ALMAnaCH (Inria))



Gc3rt_UaAAMLg5D.jpg


4/11
@TheTuringPost
3. Qwen2.5-Coder Series: Powerful, Diverse, Practical

Excels in coding and multi-language repair tasks, rivaling GPT-4o in 40+ programming languages with open innovation for developers.

Qwen2.5-Coder Series: Powerful, Diverse, Practical.



Gc3ru_HbIAABGBu.jpg

Gc3rvSoaIAAyZ_M.jpg


5/11
@TheTuringPost
4. Llava-o1: Let Vision Language Models Reason Step-By-Step

Enhances multimodal reasoning through structured, multi-stage processes, achieving superior benchmark performance.

[2411.10440] LLaVA-o1: Let Vision Language Models Reason Step-by-Step

[Quoted tweet]
LLaVA-o1 is a smarter Vision-Language Model (VLM) that thinks step-by-step.

Instead of jumping to answers, it divides reasoning into 4 clear stages and uses stage-level beam search to generate multiple answers and select the best one for each stage.

Here's how is works:


GcqiVI5akAA6d13.png

GcqiVWUaIAAEsFG.jpg


6/11
@TheTuringPost
5. Large Language Models Can Self-Improve In Long-Context Reasoning

Uses self-improvement via ranking model outputs (SeaLong approach), improving performance in long-context reasoning tasks without external datasets.

[2411.08147] Large Language Models Can Self-Improve in Long-context Reasoning
GitHub: GitHub - SihengLi99/SEALONG: Large Language Models Can Self-Improve in Long-context Reasoning



Gc3rxIfaAAE07uh.jpg


7/11
@TheTuringPost
6. Direct Preference Optimization Using Sparse Feature-Level Constraints

Introduces method that improves alignment efficiency in LLMs and reduces computational overhead, using sparse autoencoders and feature constraints.

[2411.07618] Direct Preference Optimization Using Sparse Feature-Level Constraints



Gc3ryLCaAAQysYV.jpg


8/11
@TheTuringPost
7. Cut Your Losses In Large-Vocabulary Language Models

Proposes Cut Cross-Entropy (CCE) method that reduces memory use for large-scale training, enabling up to 10x larger batch sizes without sacrificing performance.

[2411.09009] Cut Your Losses in Large-Vocabulary Language Models
GitHub: GitHub - apple/ml-cross-entropy



Gc3rzJVa0AACC4-.jpg


9/11
@TheTuringPost
8. SPARSING LAW: Towards Large Language Models With Greater Activation Sparsity

Explores neuron sparsity in LLMs to enhance efficiency while preserving interpretability.

[2411.02335] Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
GitHub: GitHub - thunlp/SparsingLaw: The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".



Gc3r0PCbgAAcj2o.jpg


10/11
@TheTuringPost
9. Find a complete list of the latest research papers in our free weekly digest: 🌁#76: Rethinking Scaling Laws (when plateau is actually a fork)



11/11
@TheTuringPost
10. Follow @TheTuringPost for more.

Like/repost the 1st post to support our work 🤍

Also, elevate your AI game with our free newsletter ↓
Turing Post

[Quoted tweet]
The freshest AI/ML researches of the week, part 1

▪️ New AI Model Gemini Experimental 1114 Debuts On Google AI Studio
▪️ CamemBERT 2.0
▪️ Qwen2.5-Coder Series
▪️ Llava-o1
▪️ LLMs Can Self-Improve In Long-Context Reasoning
▪️ Direct Preference Optimization Using Sparse Feature-Level Constraints
▪️ Cut Your Losses In Large-Vocabulary Language Models
▪️ SPARSING LAW

🧵


Gc3rrfyaIAAHVGk.png

Gc3rru2aAAMbYfq.jpg

Gc3rr8qawAALst2.png




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196







1/3
@rohanpaul_ai
LLaVA-o1 teaches machines to think step-by-step like humans when analyzing images.

LLaVA-o1 introduces a novel approach to enhance Vision Language Models (VLMs) by implementing structured, multi-stage reasoning. This paper tackles the challenge of systematic reasoning in visual tasks by breaking down the process into distinct stages: summary, caption, reasoning, and conclusion.

-----

🤔 Original Problem:

Current VLMs struggle with systematic reasoning and often produce errors or hallucinated outputs during complex visual question-answering tasks. They lack structured thinking processes and tend to jump to conclusions without proper analysis.

-----

🛠️ Solution in this Paper:

→ LLaVA-o1 implements a 4-stage reasoning process with dedicated tags for each stage: summary, caption, reasoning, and conclusion.

→ The model uses supervised fine-tuning on a new LLaVA-o1-100k dataset, created using GPT-4o for structured reasoning annotations.

→ A stage-level beam search method generates multiple candidates at each reasoning stage, selecting the best one to continue.

→ Training is performed on a single node with 8 H100 GPUs, combining samples from both general VQA and science-targeted datasets.

-----

💡 Key Insights:

→ Structured reasoning stages help models organize thoughts before reaching conclusions

→ Special tags for each stage maintain clarity throughout the reasoning process

→ Stage-level beam search is more effective than sentence-level or best-of-N approaches

-----

📊 Results:

→ Outperforms base model by 8.9% on multimodal reasoning benchmarks

→ Surpasses larger models including Gemini-1.5-pro and GPT-4o-mini

→ Stage-level beam search improves MMVet score from 60.3% to 62.9%



GdG0DzCagAMKeIY.jpg


2/3
@rohanpaul_ai
Paper Title: "LLaVA-o1: Let Vision Language Models Reason Step-by-Step"

Generated below podcast on this paper with Google's Illuminate.



https://video.twimg.com/ext_tw_video/1860466160707469312/pu/vid/avc1/1080x1080/mAeNIFuBt10AwrXP.mp4

3/3
@rohanpaul_ai
[2411.10440] LLaVA-o1: Let Vision Language Models Reason Step-by-Step




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
@jreuben1
LLaVA-o1: Let Vision Language Models Reason Step-by-Step LLaVA-o1: Let Vision Language Models Reason Step-by-Step inference-time stage-level beam search method, which enables effective inference-time scaling.



GdImGS0WYAA-PeF.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/10
@Gradio
LLaVA-o1 is the first visual language model capable of spontaneous, systematic reasoning, similar to GPT-o1!

🤯 11B model outperforms Gemini-1.5-pro,GPT-4o-mini, and Llama-3.2-90B-Vision-Instruct on six multimodal benchmarks.



GcvsSEiWcAAAmI0.jpg


2/10
@Gradio
LlaVA-o1

Stay tuned for the code and gradio app release.
GitHub - PKU-YuanGroup/LLaVA-o1



3/10
@NNaumovsky
@threadreaderapp unroll



4/10
@threadreaderapp
@NNaumovsky Namaste, please find the unroll here: Thread by @Gradio on Thread Reader App Enjoy :smile: 🤖



5/10
@CohorteAI
"LLaVA-o1’s success on multimodal benchmarks suggests it’s mastering the integration of vision and language. Could this pave the way for models capable of deeper real-world contextual understanding, like AR-enhanced assistants?



6/10
@hanul93
WOW



7/10
@arya_mukhlis354
amazing



8/10
@txhno
which image decoder does it use?



9/10
@matthaeus_win
I thought every model based on Llama 3 has to have 'Llama' in the name.. 👀



10/10
@wuwenjie1992


[Quoted tweet]
由北大信工袁粒课题组发布的 LLaVA-o1 是第一个能够进行自发、系统推理的视觉语言模型,类似于 GPT-o1!
⚙ 模型首先概述问题,解释图像中的相关信息,逐步进行推理,最终得出有充分依据的结论。
🤯 11B 的模型在六个多模态基准测试中优于 Gemini1.5pro、GPT4o-mini 和 Llama3.2-90B-Vision-Instruct。


Gc0ELI1bcAAceGq.png



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top