bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559

1/1
@abidlabs
This is really nice for anyone who browses papers on ArXiv. Easily find linked models, datasets, and Spaces!

[Quoted tweet]
This is huge: 🤗 @huggingface paper-page is now available directly through arXiv

A quick thread 🧵 👇


GamNm9iaQAA66hW.png



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/1
did you notice something new on arxiv today?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559













1/13
@cocktailpeanut
Omnigen: One Model to Rule Them All

One universal model to take care of every image generation task WITHOUT add-ons like controlnet, ip-adapter, etc. Prompt is all you need.

They finally dropped the code and a gradio app, and now you can run it on your computer with 1 click.

[Quoted tweet]
OmniGen

Unified Image Generation

discuss: huggingface.co/papers/2409.1…

In this work, we introduce OmniGen, a new diffusion model for unified image generation. Unlike popular diffusion models (e.g., Stable Diffusion), OmniGen no longer requires additional modules such as ControlNet or IP-Adapter to process diverse control conditions. OmniGenis characterized by the following features: 1) Unification: OmniGen not only demonstrates text-to-image generation capabilities but also inherently supports other downstream tasks, such as image editing, subject-driven generation, and visual-conditional generation. Additionally, OmniGen can handle classical computer vision tasks by transforming them into image generation tasks, such as edge detection and human pose recognition. 2) Simplicity: The architecture of OmniGen is highly simplified, eliminating the need for additional text encoders. Moreover, it is more user-friendly compared to existing diffusion models, enabling complex tasks to be accomplished through instructions without the need for extra preprocessing steps (e.g., human pose estimation), thereby significantly simplifying the workflow of image generation. 3) Knowledge Transfer: Through learning in a unified format, OmniGen effectively transfers knowledge across different tasks, manages unseen tasks and domains, and exhibits novel capabilities. We also explore the model's reasoning capabilities and potential applications of chain-of-thought mechanism. This work represents the first attempt at a general-purpose image generation model, and there remain several unresolved issues.


GamuBqXX0AA36ia.jpg

GXuashmXwAAUVhw.jpg


2/13
@cocktailpeanut
Available for 1 click launch on Pinokio



Gamt1M4W4AAAEjx.jpg


3/13
@cocktailpeanut
Omnigen Project Page GitHub - VectorSpaceLab/OmniGen



4/13
@cocktailpeanut
The localized gradio app was adopted from the huggingface space here https://huggingface.co/spaces/shytao/OmniGen



5/13
@cocktailpeanut
It runs on ALL platforms (Mac, NVIDIA, etc.) but the real question is how fast it will be.

FYI it uses a lot of resources and not the fastest thing you would run.

On NVIDIA 4090, it took 1 minute 10 second to generate this 1024x576 image.



GamvxS3WgAAJdj2.jpg


6/13
@cocktailpeanut
And on a Mac M1 Max 64G....around 40 minutes to generate an 1024x576 image.

If you're planning to try this on a Mac, I recommend closing EVERY app you're running and only keep this running, as the more memory you have the faster it will be.



GamxFWzWQAAzaNX.jpg


7/13
@cocktailpeanut
Here are some example things you can do, it's really powerful and versatile.

If you read through the prompts, it can **potentially** replace all kinds of things like IPAdapter, ControlNet, and even things like SAM.



Gamx1LRXEAAwp3Z.jpg


8/13
@cocktailpeanut
You can use multiple images as input, and even OpenPose as input (WITHOUT a controlnet addon!), and background removal, and so on and so on.

It has its own markup language, kind of like a mix between an LLM and an image model.



GamyS-HWkAAbLet.jpg


9/13
@cocktailpeanut
IMO this is the future. I don't think people in the future will be sitting there trying to combine A111 extensions and so on. Just ask AI something and you get it.

Plus, the more inter-disciplinary knowledge (LLM + Image + maybe even Video) an AI has, the smarter it will be.



10/13
@cocktailpeanut
But lets not get ahead of ourselves, first we def need a lighter version of this (omnigen-turbo?)

If I can generate one image in 10 second on my 4090 for example, I will be much more motivated to experiment more.



11/13
@cocktailpeanut
Also I appreciate how powerful the prompt markup is, but I think maybe we can improve it.

The markup feels too machine-like. If there's a way to express all this in natural language which gets transformed into the markup, that would ideal. Maybe an additional LLM layer?



12/13
@cocktailpeanut
Since this is an edge tech that takes relatively long time to run, if you play with this, please do share your results so others can learn more. I will add the shared results on the pinokio newsfeed here Pinokio



13/13
@cocktailpeanut
Looks like Omnigen is actually NOT supposed to be this slow. In fact it's supposed to be even faster than Flux or SD, they said they're gonna work on it, so we should all go ask them to do it ASAP

[Quoted tweet]
"In fact, OmniGen is much smaller than the latest models like SD and Flux, so there is significant room for optimization"

Wait what?

Good news is, if we all go and ask them enough, they will appreciate all the interest and expedite it probably. Lesgo!


GarJEx5XQAANdCy.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559




1/4
@rohanpaul_ai
The newly introduced quantized Llama models from @AIatMeta looks so powerful.

New quantization method and the resulting models are performant enough to run on many popular mobile devices.

Decode latency improved by 2.5x and prefill latency improved by 4.2x on average, while model size decreased by 56% and memory usage reduced by 41% on average.

🔧 QAT + LoRA Method

- Start with regular Llama model trained in BF16 format
- During training, simulate how model will perform in lower precision (like 4-bit)
- Freeze main model parts
- Add small trainable LoRA adapters to fine-tune performance
- These adapters stay in BF16 while rest gets quantized
- Final step: tune everything using DPO (Direct Preference Optimization)

🛠️ SpinQuant Method

This is a post-training quantization method that prioritizes portability

- Takes already trained model
- Uses WikiText dataset to learn special rotation matrices
- These matrices smooth out extreme values
- Applies quantization after training
- No need for full training data
- Easier to use but slightly less accurate than QAT+LoRA

⚡ Key Differences

- QAT+LoRA: Better accuracy but needs training data and compute
- SpinQuant: More portable, works with any fine-tuned model, needs minimal data

Think of QAT+LoRA like teaching a model to work with less precision from the start, while SpinQuant is like compressing an already trained model efficiently.



GariaeHXoAA-vLm.png


2/4
@rohanpaul_ai
They developed these state-of-the-art models using Quantization-Aware Training with LoRA adaptors (QLoRA) to optimize performance in low-precision environments.

Also used SpinQuant, a technique that enables us to determine the best possible combination for compression while retaining the most possible quality.



GarikeQWsAEz50-.png


3/4
@rohanpaul_ai
Introducing quantized Llama models with increased speed and a reduced memory footprint



4/4
@rohanpaul_ai




GarizBIWMAIkvY-.png



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559

1/1
@rohanpaul_ai
Now Claude runs code and analyzes data directly.

So you get precise mathematical solutions and interactive visualizations through Artifacts.

e.g. a simple task that's surprisingly tricky for pure neural inference

Rather than guessing palindromes, now Claude can write code to check if 'racecar' reads the same backward

[Quoted tweet]
Claude can now write and run code.

We've added a new analysis tool. The tool helps Claude respond with mathematically precise and reproducible answers. You can then create interactive data visualizations with Artifacts.

Enable the feature preview: claude.ai/new?fp=1.


https://video.twimg.com/ext_tw_video/1849463452189839360/pu/vid/avc1/1920x1080/nVEM6MeEMkmauxn2.mp4


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196





1/51
@AnthropicAI
Claude can now write and run code.

We've added a new analysis tool. The tool helps Claude respond with mathematically precise and reproducible answers. You can then create interactive data visualizations with Artifacts.

Enable the feature preview: Claude.



https://video.twimg.com/ext_tw_video/1849463452189839360/pu/vid/avc1/1920x1080/nVEM6MeEMkmauxn2.mp4

2/51
@AnthropicAI
Read more about the analysis tool: Introducing the analysis tool in Claude.ai



3/51
@ai_for_success
Claude is on 🔥



GaqiePcbgAAa7On.jpg


4/51
@AISafetyMemes
"So how did the AIs escape the box in the end after all the precautions?"

"Box?"



5/51
@dr_cintas
You guys are crushing it!

Very happy to see updates without waiting lists or having to wait months.

Can’t wait to see Claude 3.5 Opus 👀



6/51
@lexfridman
Very cool!



7/51
@leore245
@OpenAI please add this to canvas 👏



8/51
@nadzi_mouad
But this feature was there he can do that from the beginning what changed?



9/51
@chrypnotoad
@elder_plinius 👀



10/51
@ruansgon
This is the second account I've created and the second one this has happened to. The first one I imagine was due to using a VPN, but I don't know why the second one.

This makes it difficult for me to become a customer, even if I'm interested in subscribing to the service.



GaqkRV_XMAAQqaJ.jpg


11/51
@AhmedRezaT
I’m gonna end up spending a good chunk of the day playing with this now 😂



12/51
@jhermie24
Yippie!



13/51
@untitled01ipynb




14/51
@thedealdirector
Best closed-source product on the market. Long term viability is still questionable but it's still very entertaining to see.



15/51
@elder_plinius
/search?q=#freeopus



16/51
@vipulgnu
That was one thing i felt Anthropic could have done easily.

A lot of 'excel' ling is going to get unbundled. Add voice on top of it (not far away), and we have an ad-hoc data analyst.

Going to play with this :smile:



17/51
@Duarteosrm
Can these be shared publicly?



18/51
@recursiverealms
Wait what... why are you adding things while I'm sitting here still trying to come to grips with how absolutely mind blowing Claude is after this latest update? :D



19/51
@macmaniac77
Works very well!!



20/51
@Jay_sharings
It means Latex rendering is not yet available, right?



Gaqgrk6bMAAnjMq.png


21/51
@shrimpwtf
Claude is the best 🔥



22/51
@cpdough
javascript for data analysis?!?



23/51
@JiquanNgiam
Love it! We've extended this to also support Excel files, large CSVs, and more - check out @Lutra_AI



24/51
@Petre_Bogdan
klod ïs kikin 🙀 gpt ass



25/51
@EricBuess
@MatthewBerman



26/51
@opeksoy
analyze away time!



27/51
@Jeff_Pittman
❤️



28/51
@youraimarketer
😟 @JuliusAI_



29/51
@Leitparadigma_X
The one area where Anthropic were inexplicably behind the curve. Glad to see this… finally.



30/51
@bsilone
Very impressive!



31/51
@cloudseedingtec
heh this will get interesting swiftly<3



32/51
@0xPaulius
Claude IDE coming?



33/51
@blendic_ai
well done!



GaqsInqboAIlM5r.jpg


34/51
@HungamaHeadline
This opens doors for more accurate data-driven insights, seamless integration of AI with real-time analytics, and interactive visualizations.

The potential to streamline complex problem-solving across industries, from research to software development, is massive.



35/51
@KolinKoehl
Ah, finally catching up! Claude can now do what ChatGPT’s been up to for the past 12 months. Guess the safety team finally gave it the green light. 😏 /search?q=#BetterLateThanNever



36/51
@Shalev_lif
Awesome, been waiting for this one! Big thanks to the amazing team.



37/51
@DaivikGoel
You guys are going for it



38/51
@rolandalong
Just tried this-so much better than copy paste of code into IDE



39/51
@nadzi_mouad
Pls increase context window when i give it a csv file it doesn't finish it and got stuck



40/51
@thegenioo
now we are talking!



41/51
@analysisjitsu
Cool! I have been using claudedev with vscode for a bit and have found that to be very useful.



42/51
@AntDX316
👍



43/51
@Yuchenj_UW
this is awesome



44/51
@0xsanketk
Wow..These charts look super appealing!

It seems like we're not far off from a future where entire data analysis will be done using AI tools like Claude.

Super excited to try this out.



45/51
@AntDX316
Can you make it easy for me to see all my deployed artifacts?



46/51
@HououinTyouma
wow but i liked running code for claude and he would thank me every time :(



47/51
@dhruv2038
OpenAI continues to get hammered.



48/51
@koltregaskes
Really neat, ta.



49/51
@biubiu958
🌟 克劳德宝贝加油👏



50/51
@tomosman
well this is awesome



51/51
@FibNewtonian
This is amazing, similar to v0




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559



1/3
@rohanpaul_ai
LLMs gain human-like awareness of word positions through numbered tracking.

Adding position markers to LLM inputs enables exact length control and accurate text manipulation.

**Original Problem** 🔍:

LLMs struggle with length control and precise copy-paste operations due to lack of positional awareness.

The authors identify a lack of positional awareness as the root cause of LLMs' inability to effectively control text length. This stems from token-level operations and insufficient training on data with strict length limitations.

-----

**Solution in this Paper** 🛠️:

• PositionID Prompting: Assigns sequential IDs to words/sentences/paragraphs during generation

• PositionID Fine-Tuning: Trains models on mixed normal and PositionID modes

• PositionID CP Prompting: Enables accurate copy-paste using a three-stage tool-use mechanism

-----

**Key Insights from this Paper** 💡:

• Explicit positional awareness enhances LLMs' length control and copy-paste abilities

• PositionID techniques work for both closed-source and open-source models

• Mixed-mode training transfers positional awareness to normal generation mode

-----

**Results** 📊:

• PositionID Prompting: Best Rouge-L (23.2) and MAE scores across all levels

• PositionID Fine-Tuning: Outperforms CFT and InstructCTG in MAE metrics

• PositionID CP Prompting: 80.8% CP Success Rate, 18.4 Rouge-L, 8.4 PPL



GarS0HTWkAE-f6S.png


2/3
@rohanpaul_ai
📝 LenCtrl-Bench Details

This component has three workflow variants:

👉 Vanilla Prompting:
- Takes user query and length constraint
- Directly generates text without position tracking
- Less accurate length control
👉 PositionID Prompting:
- Adds sequential position IDs to each word/token
- Helps model track length during generation

- More precise length control
- Example: "Three 1 -word 2 text 3"
👉 PositionID Fine-Tuning:
- Trains model in two modes:
- Normal mode (without position IDs)
- PositionID mode (with position IDs)
- Infers in normal mode while retaining positional awareness
- Most effective for length control



GarUsGOWMAA0KBH.png


3/3
@rohanpaul_ai
📚 [2410.07035] PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
@arXivGPT
🏷️:PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness

🔗:https://arxiv.org/pdf/2410.07035.pdf



GZ9N37aWwAATYoq.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559


OpenAI researchers develop new model that speeds up media generation by 50X​

Carl Franzen@carlfranzen

October 23, 2024 2:07 PM



A robot runs fast facing left in profile view carrying film canister surrounded by film reels against a rainbow hued background


Credit: VentureBeat made with OpenAI ChatGPT


A pair of researchers at OpenAI has published a paper describing a new type of model — specifically, a new type of continuous-time consistency model (sCM) — that increases the speed at which multimedia including images, video, and audio can be generated by AI by 50 times compared to traditional diffusion models, generating images in nearly a 10th of a second compared to more than 5 seconds for regular diffusion.

With the introduction of sCM, OpenAI has managed to achieve comparable sample quality with only two sampling steps, offering a solution that accelerates the generative process without compromising on quality.

Described in the pre-peer reviewed paper published on arXiv.org and blog post released today, authored by Cheng Lu and Yang Song, the innovation enables these models to generate high-quality samples in just two steps—significantly faster than previous diffusion-based models that require hundreds of steps.

Song was also a leading author on a 2023 paper from OpenAI researchers including former chief scientist Ilya Sutskever that coined the idea of “consistency models,” as having “points on the same trajectory map to the same initial point.”

While diffusion models have delivered outstanding results in producing realistic images, 3D models, audio, and video, their inefficiency in sampling—often requiring dozens to hundreds of sequential steps—has made them less suitable for real-time applications.

Theoretically, the technology could provide the basis for a near-realtime AI image generation model from OpenAI. As fellow VentureBeat reporter Sean Michael Kerner mused in our internal Slack channels, “can DALL-E 4 be far behind?”


Faster sampling while retaining high quality​


In traditional diffusion models, a large number of denoising steps are needed to create a sample, which contributes to their slow speed.

In contrast, sCM converts noise into high-quality samples directly within one or two steps, cutting down on the computational cost and time.

OpenAI’s largest sCM model, which boasts 1.5 billion parameters, can generate a sample in just 0.11 seconds on a single A100 GPU.

This results in a 50x speed-up in wall-clock time compared to diffusion models, making real-time generative AI applications much more feasible.


Reaching diffusion-model quality with far less computational resources​


The team behind sCM trained a continuous-time consistency model on ImageNet 512×512, scaling up to 1.5 billion parameters.

Even at this scale, the model maintains a sample quality that rivals the best diffusion models, achieving a Fréchet Inception Distance (FID) score of 1.88 on ImageNet 512×512.

This brings the sample quality within 10% of diffusion models, which require significantly more computational effort to achieve similar results.


Benchmarks reveal strong performance​


OpenAI’s new approach has undergone extensive benchmarking against other state-of-the-art generative models.

By measuring both the sample quality using FID scores and the effective sampling compute, the research demonstrates that sCM provides top-tier results with significantly less computational overhead.

While previous fast-sampling methods have struggled with reduced sample quality or complex training setups, sCM manages to overcome these challenges, offering both speed and high fidelity.

The success of sCM is also attributed to its ability to scale proportionally with the teacher diffusion model from which it distills knowledge.

As both the sCM and the teacher diffusion model grow in size, the gap in sample quality narrows further, and increasing the number of sampling steps in sCM reduces the quality difference even more.


Applications and future uses​


The fast sampling and scalability of sCM models open new possibilities for real-time generative AI across multiple domains.

From image generation to audio and video synthesis, sCM provides a practical solution for applications that demand rapid, high-quality output.

Additionally, OpenAI’s research hints at the potential for further system optimization that could accelerate performance even more, tailoring these models to the specific needs of various industries.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559


OpenAI scientist Noam Brown stuns TED AI Conference: ’20 seconds of thinking worth 100,000x more data’​

Michael Nuñez@MichaelFNunez

October 23, 2024 12:46 PM


Credit: VentureBeat made with Midjourney


Credit: VentureBeat made with Midjourney

Noam Brown, a leading research scientist at OpenAI, took the stage at the TED AI conference in San Francisco on Tuesday to deliver a powerful speech on the future of artificial intelligence, with a particular focus on OpenAI’s new o1 model and its potential to transform industries through strategic reasoning, advanced coding, and scientific research. Brown, who has previously driven breakthroughs in AI systems like Libratus, the poker-playing AI, and CICERO, which mastered the game of Diplomacy, now envisions a future where AI isn’t just a tool, but a core engine of innovation and decision-making across sectors.

“The incredible progress in AI over the past five years can be summarized in one word: scale,” Brown began, addressing a captivated audience of developers, investors, and industry leaders. “Yes, there have been uplink advances, but the frontier models of today are still based on the same transformer architecture that was introduced in 2017. The main difference is the scale of the data and the compute that goes into it.”

Brown, a central figure in OpenAI’s research endeavors, was quick to emphasize that while scaling models has been a critical factor in AI’s progress, it’s time for a paradigm shift. He pointed to the need for AI to move beyond sheer data processing and into what he referred to as “system two thinking”—a slower, more deliberate form of reasoning that mirrors how humans approach complex problems.


The psychology behind AI’s next big leap: Understanding system two thinking​


To underscore this point, Brown shared a story from his PhD days when he was working on Libratus, the poker-playing AI that famously defeated top human players in 2017.

“It turned out that having a bot think for just 20 seconds in a hand of poker got the same boosting performance as scaling up the model by 100,000x and training it for 100,000 times longer,” Brown said. “When I got this result, I literally thought it was a bug. For the first three years of my PhD, I had managed to scale up these models by 100x. I was proud of that work. I had written multiple papers on how to do that scaling, but I knew pretty quickly that all that would be a footnote compared to this scaling up system two thinking.”

Brown’s presentation introduced system two thinking as the solution to the limitations of traditional scaling. Popularized by psychologist Daniel Kahneman in the book Thinking, Fast and Slow, system two thinking refers to a slower, more deliberate mode of thought that humans use for solving complex problems. Brown believes incorporating this approach into AI models could lead to major performance gains without requiring exponentially more data or computing power.

He recounted that allowing Libratus to think for 20 seconds before making decisions had a profound effect, equating it to scaling the model by 100,000x. “The results blew me away,” Brown said, illustrating how businesses could achieve better outcomes with fewer resources by focusing on system two thinking.


Inside OpenAI’s o1: The revolutionary model that takes time to think​


Brown’s talk comes shortly after the release of OpenAI’s o1 series models, which introduce system two thinking into AI. Launched in September 2024, these models are designed to process information more carefully than their predecessors, making them ideal for complex tasks in fields like scientific research, coding, and strategic decision-making.

“We’re no longer constrained to just scaling up the system one training. Now we can scale up the system two thinking as well, and the beautiful thing about scaling up in this direction is that it’s largely untapped,” Brown explained. “This isn’t a revolution that’s 10 years away or even two years away. It’s a revolution that’s happening now.”

The o1 models have already demonstrated strong performance in various benchmarks. For instance, in a qualifying exam for the International Mathematics Olympiad, the o1 model achieved an 83% accuracy rate—a significant leap from the 13% scored by OpenAI’s GPT-4o. Brown noted that the ability to reason through complex mathematical formulas and scientific data makes the o1 model especially valuable for industries that rely on data-driven decision-making.


The business case for slower AI: Why patience pays off in enterprise solutions​


For businesses, OpenAI’s o1 model offers benefits beyond academic performance. Brown emphasized that scaling system two thinking could improve decision-making processes in industries like healthcare, energy, and finance. He used cancer treatment as an example, asking the audience, “Raise your hand if you would be willing to pay more than $1 for a new cancer treatment… How about $1,000? How about a million dollars?”

Brown suggested that the o1 model could help researchers speed up data collection and analysis, allowing them to focus on interpreting results and generating new hypotheses. In energy, he noted that the model could accelerate the development of more efficient solar panels, potentially leading to breakthroughs in renewable energy.

He acknowledged the skepticism about slower AI models. “When I mention this to people, a frequent response that I get is that people might not be willing to wait around for a few minutes to get a response, or pay a few dollars to get an answer to the question,” he said. But for the most important problems, he argued, that cost is well worth it.


Silicon Valley’s new AI race: Why processing power isn’t everything​


OpenAI’s shift toward system two thinking could reshape the competitive landscape for AI, especially in enterprise applications. While most current models are optimized for speed, the deliberate reasoning process behind o1 could offer businesses more accurate insights, particularly in industries like finance and healthcare.

In the tech sector, where companies like Google and Meta are heavily investing in AI, OpenAI’s focus on deep reasoning sets it apart. Google’s Gemini AI, for instance, is optimized for multimodal tasks, but it remains to be seen how it will compare to OpenAI’s models in terms of problem-solving capabilities.

That said, the cost of implementing o1 could limit its widespread adoption. The model is slower and more expensive to run than previous versions. Reports indicate that the o1-preview model costs $15 per million input tokens and $60 per million output tokens, far more than GPT-4o. Still, for enterprises that need high-accuracy outputs, the investment may be worthwhile.

As Brown concluded his talk, he emphasized that AI development is at a critical juncture: “Now we have a new parameter, one where we can scale up system two thinking as well — and we are just at the very beginning of scaling up in this direction.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559



1/11
@ivanfioravanti
LMStudio AI + OpenWebUI + Apple MLX + Qwen 2.5 Coder 32B Q4 in action on M4 Max!

What a combination 🤩



https://video.twimg.com/ext_tw_video/1856385337485819904/pu/vid/avc1/1662x1080/a9IxQtbO_77APdtz.mp4

2/11
@ivanfioravanti
You can even compare Q4 vs Q8 on same request (be ready from lower t/s overall clearly, because GPUs are managing 2 models in parallels



GcM2BH2WsAAjNxm.jpg


3/11
@iotcoi
you should reconsider the Q4



4/11
@ivanfioravanti
What do you mean?



5/11
@albfresco
thanks for reminding me to get 0.3.5



6/11
@MaziyarPanahi
When does @LMStudioAI add Code Interpreter? 😆



7/11
@ozgrozer
Try it with Cursor

[Quoted tweet]
I tried Qwen 2.5 Coder 32B on my MB Pro M4 Max and I have to say that it's quite impressive

Here's a video of me using it with the help of @LMStudioAI, @ngrokHQ and @cursor_ai


https://video.twimg.com/ext_tw_video/1856139978981445632/pu/vid/avc1/1594x1080/zv1syuvbqUruP_H0.mp4

8/11
@yar_vol
Why LM studio given that it is not open source especially if you’re using it just in a server mode.



9/11
@iamRezaSayar
Do you use @stackblitz Bolt and / or Cursor for coding stuff? Have you tried using this hosted model there?



10/11
@modoulaminc
Wonder how it will perform with base m4 with 32gb



11/11
@Medusasound1
What is the use case for lm studio? Doesn’t it do the same as openwebui ?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196










1/12
@itsPaulAi
Qwen has just released an open-source AI model that codes as well as GPT-4o.

Yes. 32B. Free. Open-source. 🤯

It's also really close to Claude 3.5 Sonnet.

You can use it for free with the link below



GcIEisYWwAA1jLi.jpg


2/12
@itsPaulAi
A free demo is available on Hugging Face here:
Qwen2.5 Coder Demo - a Hugging Face Space by Qwen

And if you're GPU rich, you can use it locally with ollama or LM Studio.

I'm pretty sure that very inexpensive APIs will be available soon.



3/12
@arattml


[Quoted tweet]
are you finally starting to understand how over parameterized language models are?


4/12
@itsPaulAi
Seems to be the case. Specialized models are the way.



5/12
@DavidSZDahan
Tu la essayé ?



6/12
@itsPaulAi
Tried the demo (looks VERY promising) and waiting for an API to try with a real coding project.



7/12
@gmstoner
Does it work with ollama?



8/12
@itsPaulAi
Yes! No problem if you have powerful enough hardware.



9/12
@ronpezzy
I'm working with it as I type...it's really good



10/12
@itsPaulAi
Nice! I think it's going to become the default choice because it's so good and probably less expensive via API.



11/12
@MadMonkeSol
wow, that's wild. free open-source ai that's stacking up against the big dogs? that's some next-level stuff.



12/12
@itsPaulAi
Just incredible. It's going to make AI coding even cheaper and more accessible.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196










1/12
@Yuchenj_UW
We @hyperbolic_labs now serve Qwen2.5-Coder-32B-Instruct released by @Alibaba_Qwen today in BF16! 🚤

> It outperforms Claude 3.5 Sonnet & GPT-4o on almost all the coding benchmarks!
> We support 128K tokens context length
> Integration with @OpenRouterAI coming soon

Some interesting points in its tech report:
> The training dataset comprises 5.2 trillion tokens, they found a mixture of 70% Code, 20% Text, and 10% Math works the best.
> Qwen2.5-Coder uses its predecessor, CodeQwen1.5, to generate synthetic datasets. To minimize the risk of hallucination, an executor checks the generated code to ensure it is executable and syntactically correct.

Congrats to @huybery, @JustinLin610, and the entire Qwen team for driving open-source AI forward!



GcI3uWvbcAAtYzD.jpg

GcI4o6EbAAAMWZk.jpg


2/12
@Yuchenj_UW
Give it a vibe check on our playground and use the API for your hardest coding problems: Hyperbolic AI Dashboard.

Tell me how good the model is!



GcI5dkKasAEsiyU.jpg


3/12
@iruletheworldmo
nice yo.



4/12
@Yuchenj_UW
yo strawberry, thanks for replying to open source AGI



5/12
@mayfer
there seems to be some issues with this model, it generated a bunch of gibberish as seen here, first time it happens with a supposedly top-scoring model



GcJAuFJXkAAjV_3.jpg


6/12
@Yuchenj_UW
interesting! what's your prompt?



7/12
@TheXeophon
Will you make it available in @poe_platform as well?



8/12
@Yuchenj_UW
will do!



9/12
@TiggerSharkML
as expected, code is the strongest, then some text for learning the basics and a sprinkle of math for inspiration.



10/12
@Yuchenj_UW
I wonder if a 10x engineer is trained in such a way



11/12
@N8Programs
I cannot wait!



12/12
@Yuchenj_UW
give it a try!!!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
@idare
Qwen-2.5-Coder 7B is a solid code evaluation model. I haven't put it through heavy tests but I fed it code on my helix hyperdimensional memory engine (HHME) and it understood the fractal spirals just fine. 32B is on huggingchat if you don't have enough RAM for local ops. Try it!

LM Studio is great if you're not up to speed on backend deployments, or Pinokio Computer.

[Quoted tweet]
Qwen 2.5 Coder 0.5B, 1.5B, 3B, 14B, and 32B are here!

The 32B version scores higher than GPT-4o on Aider's code editing benchmark.

Available now in LM Studio 👾🥝

Download from the terminal:

lms get qwen/qwen2.5-coder-3b-instruct
lms get qwen/qwen2.5-coder-14b-instruct
lms get qwen/qwen2.5-coder-32b-instruct


GcH7cp9W4AAmaD6.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196





1/6
@reach_vb
Enjoy Qwen 2.5 Coder 32B aka Smol AGI on Hugging Chat - 100% free and unlimited queries 🔥

[Quoted tweet]
If this doesn’t make you bullish on Open Source - I don’t know what will! 🔥

That’s a 32B LLM that can easily fit on a
~0.8 USD/ hour GPU - spitting ungodly num of tokens

Back of the napkin math:
- fp16/ bf16 - 32GB VRAM (would fit on a L40S)
- 8-bit - 16GB VRAM (L4)
- 4-bit - 8GB VRAM (T4)

This + memory required for loading the context!

Intelligence is becoming too cheap to meter 🤗


GcI7qeBXwAAMvw4.jpg

GcHzaD5WEAAh2qf.jpg


2/6
@reach_vb
Check it out here:

Qwen/Qwen2.5-Coder-32B-Instruct - HuggingChat



3/6
@the_mallison
man small open source models becoming this good enables so many great things



4/6
@YayaSoumah
Max output tokens?



5/6
@LounasGana
basedddddd



6/6
@s10boyal
Seems like all models are giving syntax errors and writing unwanted characters only in the chat interface while writing python can you check.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196






1/5
@Gradio
Wow! Qwen 2.5 Coder has been released with code artifacts! 🤯

Qwen has open-sourced the powerful and diverse Qwen2.5-Coder Series (0.5B/1.5B/3B/7B/14B/32B). Apache 2.0 license!

The code artifacts app is built with Gradio 5 and makes use of our custom components feature too 🤩



https://video.twimg.com/ext_tw_video/1856305761200095232/pu/vid/avc1/960x720/SppaGX7Un2TGzQRL.mp4

2/5
@Gradio
Qwen2.5-Coder apps are live on @huggingface Spaces!

💪 Qwen2.5-Coder Instruct demos: Qwen2.5 Coder Demo - a Hugging Face Space by Qwen

👍 Qwen2.5-Coder 7B Instruct: Qwen2.5-Coder-7B-Instruct - a Hugging Face Space by Qwen

🔥🔥🔥 The Code-Artifacts app that uses Qwen2.5-Coder 32b-instruct: Qwen2.5 Coder Artifacts - a Hugging Face Space by Qwen



3/5
@Gradio
Qwen2.5-Coder is a Code-specific model series based on Qwen2.5.

🔥Model and Space collection is live on @huggingface Spaces:
Qwen2.5-Coder - a Qwen Collection



4/5
@SaquibOptimusAI
Very awesome.



5/5
@Hyperstackcloud
Awesome! 👏 Open source FTW 🚀 Our tutorial on using Qwen2.5 Coder 32B on Hyperstack is coming very soon - keep an eye out! 🙌




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559

1/1
@TheAIObserverX
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
[2411.02265] Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent



GccKbBAXEAAAE_Z.png



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/3
@DeepLearningAI
Tencent’s Hunyuan-Large model, a language model with a mixture of experts (MoE) architecture, surpasses open competitors like Llama 3.1 405B on multiple benchmarks, including math, coding, and multilingual tasks.

Learn more in /search?q=#TheBatch: Hunyuan-Large Outshines Open Competitors with High Benchmark Scores



2/3
@intheloopwithai
Nice to see multilingual tasks getting some love too



3/3
@SaadR_Biz
Impressive performance by Hunyuan-Large on various benchmarks! Its efficiency is remarkable, using only 52 billion parameters to process inputs. /search?q=#TheBatch




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/3
@TheTuringPost
Overview of new Tencent Hunyuan-Large model:

▪️ It's the largest open-source Transformer-based MoE model
▪️ Has 389 B total parameters and 52 B activation parameters
▪️ Outperforms the LLama3.1-70B model
▪️ Matches performance of the larger LLama3.1-405B

▪️ Architecture:
- KV cache compression: Groups and shares certain cache elements, saving up to 95% in memory.
- Recycle routing: Reallocates tokens from overloaded experts to less busy experts, preventing data loss and improving training efficiency.
- Expert-specific learning rates: Each expert has an optimized learning rate, improving training efficiency.

▪️ Uses a combination of 7 trillion tokens of natural and synthetic data, primarily in Chinese and English. The four-step process of data synthesis is used.
▪️ Post-training includes fine-tuning and reinforcement learning from human feedback (RLHF).
▪️ Extended long-context capabilities up to 256,000 tokens are especially useful for tasks, such as legal documents or scientific literature.



GbuFhC4awAABA1f.jpg

GbuFhTGbYAAQURl.jpg


2/3
@TheTuringPost
Paper: https://arxiv.org/pdf/2411.02265
Code: GitHub - Tencent/Tencent-Hunyuan-Large
Models: tencent/Tencent-Hunyuan-Large · Hugging Face



3/3
@NextFrontierAI
Impressive to see Tencent's Hunyuan-Large pushing MoE models forward. Keen to learn more about its real-world applications!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
@aswadi_didi
Tencent open source Hunyuan-Large 52Bil model beat Meta LLama3.1-405Bil

Power giler China

[2411.02265] Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent



GcLlgrPaAAAdcjx.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559















1/15
@TheTuringPost
The freshest AI/ML researches of the week, part 2

▪️ WEBRL: Training LLM Web Agents
▪️ DynaSaur: Large Language Agents
▪️ THANOS: Skill-Of-Mind-Infused Agents
▪️ DELIFT
▪️ HtmlRAG
▪️ M3DOCRAG
▪️ Needle Threading
▪️ Survey Of Cultural Awareness In LMs
▪️ OPENCODER
▪️ Polynomial Composition Activations
▪️ Hunyuan-Large
▪️ Balancing Pipeline Parallelism With Vocabulary Parallelism

🧵



GcYkoczbYAAi3pt.png

GcYkouNaAAAyWn1.png

GcYkpAkaIAALZqi.jpg

GcYkpRSbEAAmvMi.png


2/15
@TheTuringPost
1. WEBRL: Training LLM Web Agents

Trains web agents with a curriculum that evolves through agent learning, improving task success rates.

[2411.02337] WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
GitHub: GitHub - THUDM/WebRL: Building Open LLM Web Agents with Self-Evolving Online Curriculum RL



GcYkqTgb0AAv3Tg.jpg


3/15
@TheTuringPost
2. DynaSaur: Large Language Agents Beyond Predefined Actions

Allows agents to create actions on-the-fly, handling unforeseen tasks with Python-based adaptability.

[2411.01747] DynaSaur: Large Language Agents Beyond Predefined Actions



GcYkrVibQAAktw6.jpg


4/15
@TheTuringPost
3. THANOS: Skill-Of-Mind-Infused Agents

Enhances conversational agents with social skills, improving response accuracy and empathy.

[2411.04496] Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model
GitHub: GitHub - passing2961/Thanos: Official code repository for our paper: Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model



GcYksSracAAJjnZ.png


5/15
@TheTuringPost
4. DELIFT: Data Efficient Language Model Instruction Fine-Tuning

Optimizes fine-tuning by selecting the most informative data, cutting dataset size significantly.

[2411.04425] DELIFT: Data Efficient Language model Instruction Fine Tuning



GcYktPUbMAAfC2R.jpg


6/15
@TheTuringPost
5. HtmlRAG: HTML Is Better Than Plain Text

Improves RAG systems by preserving HTML structure, enhancing retrieval quality.

[2411.02959] HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems



GcYkuNcbEAAPwci.jpg


7/15
@TheTuringPost
6. M3DOCRAG: Multi-Modal Retrieval For Document Understanding

Introduces a multimodal RAG framework to handle multi-page and document QA tasks with visual data.

[2411.04952] M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding



GcYkvN4bgAA_Emx.jpg


8/15
@TheTuringPost
7. Needle Threading: LLMs For Long-Context Retrieval

Examines LLMs’ retrieval capabilities, identifying limits in handling extended contexts.

[2411.05000] Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
GitHub: Needle Threading



GcYkwNlbMAAXsGK.png


9/15
@TheTuringPost
8. Survey Of Cultural Awareness In Language Models

Reviews cultural inclusivity in LLMs, emphasizing diverse and ethically sound datasets.

[2411.00860] Survey of Cultural Awareness in Language Models: Text and Beyond
GitHub: GitHub - siddheshih/culture-awareness-llms



GcYkxLjbEAAoHKc.jpg


10/15
@TheTuringPost
9. OPENCODER: The Open Cookbook For Code Models

Provides a comprehensive open-source guide for building high-performance code LLMs.

[2411.04905] OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

[Quoted tweet]
A “how-to” guide on building top-tier Code LLM

OpenCoder is a fully open-source code model, matching top code models in performance. It comes with a cookbook, training data, processing methods, and experiment results.

This cookbook covers key components for code models:

- Data cleaning rules
- Methods to avoid duplicates
- Mixing of text and code data for better context
- High-quality synthetic data for training

Reproduce your own model with this guide!

The cookbook: arxiv.org/pdf/2411.04905
OpenCoder GitHub: opencoder-llm.github.io/


Gb5HwrrbwAMAMIo.png

Gb5Hw87bEAA16YT.jpg


11/15
@TheTuringPost
10. Polynomial Composition Activations

Enhances model expressivity using polynomial activations, optimizing parameter efficiency.

[2411.03884] Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models



GcYkzEVa4AAa0od.jpg


12/15
@TheTuringPost
11. Hunyuan-Large: An Open-Source MoE Model

Presents a large-scale MoE model, excelling across language, math, and coding tasks.

[2411.02265] Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
GitHub: GitHub - BryceZhuo/PolyCom: The official implementation of Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models.

[Quoted tweet]
Overview of new Tencent Hunyuan-Large model:

▪️ It's the largest open-source Transformer-based MoE model
▪️ Has 389 B total parameters and 52 B activation parameters
▪️ Outperforms the LLama3.1-70B model
▪️ Matches performance of the larger LLama3.1-405B

▪️ Architecture:
- KV cache compression: Groups and shares certain cache elements, saving up to 95% in memory.
- Recycle routing: Reallocates tokens from overloaded experts to less busy experts, preventing data loss and improving training efficiency.
- Expert-specific learning rates: Each expert has an optimized learning rate, improving training efficiency.

▪️ Uses a combination of 7 trillion tokens of natural and synthetic data, primarily in Chinese and English. The four-step process of data synthesis is used.
▪️ Post-training includes fine-tuning and reinforcement learning from human feedback (RLHF).
▪️ Extended long-context capabilities up to 256,000 tokens are especially useful for tasks, such as legal documents or scientific literature.


GbuFhC4awAABA1f.jpg

GbuFhTGbYAAQURl.jpg


13/15
@TheTuringPost
12. Balancing Pipeline Parallelism With Vocabulary Parallelism

Improves transformer training efficiency by balancing memory across vocabulary layers.

[2411.05288] Balancing Pipeline Parallelism with Vocabulary Parallelism



GcYk02JakAAW61G.jpg


14/15
@TheTuringPost
13. Find a complete list of the latest research papers in our free weekly digest: 🌁#75: What is Metacognitive AI



15/15
@TheTuringPost
14. Follow @TheTuringPost for more.

Like/repost the 1st post to support our work 🤍

Also, elevate your AI game with our free newsletter ↓
Turing Post

[Quoted tweet]
The freshest AI/ML researches of the week, part 2

▪️ WEBRL: Training LLM Web Agents
▪️ DynaSaur: Large Language Agents
▪️ THANOS: Skill-Of-Mind-Infused Agents
▪️ DELIFT
▪️ HtmlRAG
▪️ M3DOCRAG
▪️ Needle Threading
▪️ Survey Of Cultural Awareness In LMs
▪️ OPENCODER
▪️ Polynomial Composition Activations
▪️ Hunyuan-Large
▪️ Balancing Pipeline Parallelism With Vocabulary Parallelism

🧵


GcYkoczbYAAi3pt.png

GcYkouNaAAAyWn1.png



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559




1/18
@AndrewCurran_
A new Gemini experimental build is up, and in its final stages of development. This new model was anonymously tested in the arena over the last week, and now ranks first overall. Google has retaken the lead.

[Quoted tweet]
gemini-exp-1114…. available in Google AI Studio right now, enjoy : )

aistudio.google.com


GcXKemwXEAACj72.jpg


2/18
@AndrewCurran_


[Quoted tweet]
Massive News from Chatbot Arena🔥

@GoogleDeepMind's latest Gemini (Exp 1114), tested with 6K+ community votes over the past week, now ranks joint #1 overall with an impressive 40+ score leap — matching 4o-latest in and surpassing o1-preview! It also claims #1 on Vision leaderboard.

Gemini-Exp-1114 excels across technical and creative domains:

- Overall #3 -> #1
- Math: #3 -> #1
- Hard Prompts: #4 -> #1
- Creative Writing #2 -> #1
- Vision: #2 -> #1
- Coding: #5 -> #3
- Overall (StyleCtrl): #4 -> #4

Huge congrats to @GoogleDeepMind on this remarkable milestone!

Come try the new Gemini and share your feedback!


GcXExmabMAALHIs.jpg


3/18
@AndrewCurran_
Matches o1-preview in math. There's a good chance this model is Gemini 2.



GcXLTmwWwAAYU4e.jpg


4/18
@AndrewCurran_
I need to see the AidanBench numbers.



5/18
@JoJrobotics
but if this is the new gemini 2.0 then its disappointing cause its an barely better than gpt-4o. We need models that can acheive real Human-level reasoninh



6/18
@indrajeet877
Google Gemini is on fire now



7/18
@AchillesSlays
It'd be crap as usual when actual users use it



8/18
@alikayadibi11
Never trusting gemini



9/18
@KCDN19
I find the Google models lacking in charisma and deceptive about it's censorship protocols.



10/18
@algxtradingx
I wish I could believe that it’ll be worthwhile, it’s just that sonnet has been so good and Gemini has been so bad that it’s hard for me to fathom a flipping.



11/18
@__p_i_o_t_r__
Behind Sonnet with style control applied and behind Sonnet in my personal tests. Supposedly really good at math, so at least that if it's 2.0.



12/18
@hingeloss
`gemini-exp`, not `gemini-1.5-exp`, wink wink?



13/18
@nordic_eacc
If this is Gemini 2.0, that’s pretty sad



14/18
@kami_ayani
Gemini just fundamentally sucks



15/18
@fifth_sign
Gemini fukking roasted my resume last week. This makes sense.



16/18
@xundecidability
Does arena still limit prompt to 1k tokens?



17/18
@AiAnvil
Don’t poke the bear…



18/18
@BrettBaronR32
More evidence that lmsys is worthless, gemini is ass




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559




1/2
@MSFTResearch
Orca-AgentInstruct, from Microsoft Research, can generate diverse, high-quality synthetic data at scale to post-train and fine-tune base LLMs for expanded capabilities, continual learning, and increased performance.
Orca-AgentInstruct: Agentic flows can be effective synthetic-data generators



GcXR_6TWkAA_QCl.jpg


2/2
@calebfahlgren
Amazing

microsoft/orca-agentinstruct-1M-v1 · Datasets at Hugging Face




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/1
@itinaicom
🚀 Exciting news from Microsoft AI Research! They've released AgentInstruct-1M-v1, a game-changing dataset featuring **1 million synthetic instruction pairs**. This innovation boosts the capabilities of instruction-tuned LLMs, enhancing performance in va… Microsoft AI Research Released 1 Million Synthetic Instruction Pairs Covering Different Capabilities



GckfO0bW4AAeFxH.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
@Marktechpost
Microsoft AI Research Released 1 Million Synthetic Instruction Pairs Covering Different Capabilities

Microsoft Research released a groundbreaking dataset of 1 million synthetic instruction-response pairs, aptly named AgentInstruct-1M-v1. This dataset, generated using the innovative AgentInstruct framework, represents a fully synthetic collection of tasks. Spanning diverse capabilities such as text editing, creative writing, coding, and reading comprehension, this dataset is a significant leap forward in enabling instruction tuning for base language models. By leveraging publicly available web text seeds, Microsoft Research created a corpus that is not only expansive but also representative of real-world use cases.

AgentInstruct-1M-v1 serves as a subset of a larger dataset comprising approximately 25 million instruction-response pairs. Notably, this larger set was instrumental in post-training the Mistral-7b model, culminating in the enhanced Orca-3-Mistral model. These synthetic datasets address the dual problem of scale and diversity, providing a robust foundation for advancing LLM performance across benchmarks....

Read the full article here: Microsoft AI Research Released 1 Million Synthetic Instruction Pairs Covering Different Capabilities

Dataset: microsoft/orca-agentinstruct-1M-v1 · Datasets at Hugging Face

@Microsoft @MSFTnews @MSFTResearch



https://video.twimg.com/ext_tw_video/1858033263501258754/pu/vid/avc1/1152x720/piPPgtLVOkxO64YK.mp4


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/3
@gm8xx8
microsoft/orca-agentinstruct-1M-v1

🔗: microsoft/orca-agentinstruct-1M-v1 · Datasets at Hugging Face

A fully synthetic collection of ~1 million instruction-response pairs generated using the AgentInstruct framework, which creates data from publicly available web text seeds.
> Covers a wide range of tasks including text editing, creative writing, coding, and reading comprehension, making it suitable for instruction tuning of base language models
> Part of a larger set (~25M pairs) used to post-train Mistral-7b. The resulting model, Orca-3-Mistral, shows significant performance gains over Mistral-7b-Instruct across multiple benchmarks, including 40% improvement on AGIEval, 19% on MMLU, 54% on GSM8K, 38% on BBH, and 45% on AlpacaEval.



2/3
@gm8xx8


[Quoted tweet]
AgentInstruct: Toward Generative Teaching with Agentic Flows

paper: arxiv.org/abs/2407.03502v1


3/3
@Crypt0_Facts
Thank you for posting!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196








1/7
@MaziyarPanahi
Question:
What do you think of the new Orca AgentInstruct dataset by @Microsoft released on @huggingface? Let’s think step by step.

Answer:
Step 1, holly shyt! 💩



Gcc5ktRWQAAZus9.jpg


2/7
@MaziyarPanahi
microsoft/orca-agentinstruct-1M-v1 · Datasets at Hugging Face



3/7
@MaziyarPanahi
Total number of tokens per subset:



GcdAlXNXkAALsii.jpg


4/7
@MaziyarPanahi
And here is the result for the entire dataset tokenized by using Llama-3.1 tokenizer:

1.1 billion tokens!!! 🔥



GcdBolEXEAEGrbQ.jpg


5/7
@BehnamEbrahimi
Thanks Mazyiyar., is it only for the fine tuning of model ?



6/7
@MaziyarPanahi
anytime! yes, these are all for supervise fine-tuning.



7/7
@cybossbt
It looks like solution D does not meet the constraint on Youlanda. No one is sitting between Tara and Uma in D).




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/6
@AhmedHAwadallah
Synthetic data is becoming essential for training and fine-tuning models, but there’s a lot we still need to learn about best practices for generating, evaluating, and using it effectively.

To support this research, we’re excited to release **orca-agentinstruct-1M**—a fully synthetic dataset with 1 million instruction-response pairs. Both prompts and responses were generated by multi-agent flows using LLMs, tools, etc.

The data was created by AgentInstruct, an agentic synthetic data generation framework. For each skill (e.g. math, RAG, creative writing, etc.), a team of agents iteratively generated and refined the both prompts and responses using raw web documents as seeds.

We hope the dataset would be a valuable resource for exploring new methods in synthetic data generation and application.

HF: microsoft/orca-agentinstruct-1M-v1 · Datasets at Hugging Face

/search?q=#Orca /search?q=#AgentInstruct /search?q=#AutoGen

[Quoted tweet]
Orca-AgentInstruct, from Microsoft Research, can generate diverse, high-quality synthetic data at scale to post-train and fine-tune base LLMs for expanded capabilities, continual learning, and increased performance.
msft.it/6017WYRxz


GcXR_6TWkAA_QCl.jpg


2/6
@AhmedHAwadallah
@Arindam1408, Luciano, Guoqing, Shewti, Dany Andres, Yadong, Wei-ge, @corby_rosset , Hamed, @YashLara



3/6
@Teknium1
Thank you 🫡



4/6
@TomRBeaudoin
Hey @AhmedHAwadallah ! Thank you for the rellease, this is super useful! I have been working on an OSS implementation of AgentInstruct:

GitHub - ThomasRochefortB/open-agentinstruct: An open-source recreation of the AgentInstruct agentic workflow for synthetic data generation

Would love to get in contact with the team!



5/6
@HCSolakoglu
@Teknium1



6/6
@yar_vol
So did you open source the prompts used to generate the data? And the name of the model used? Was it gpt4?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196






1/3
@HaHoang411
This is huge! Orca team from @MSFTResearch just released AgentInstruct dataset - a 1M synthetic instruction pairs for AI training.
Key highlights:
- Generated using AgentInstruct framework and public web content.
- Covers text editing, creative writing, coding & comprehension.



GcclivPXsAADemb.jpg


2/3
@HaHoang411
Link to the dataset: microsoft/orca-agentinstruct-1M-v1 · Datasets at Hugging Face



3/3
@HaHoang411
Link to the synthetic data generation framework: GitHub - wang-research-lab/agentinstruct: Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559

1/6
@ClementDelangue
Smol models for the win!



https://video.twimg.com/ext_tw_video/1857481762567303168/pu/vid/avc1/720x720/UFDHzqq_nQjPuxqE.mp4

2/6
@AngelAITalk
Big things come in small packages, no doubt.



3/6
@aiprepper
Were getting closer!



4/6
@gruuummm
i think 5-10B parameter model is good enough for any particular domain.



5/6
@OverfitForTruth
3.5 sonnet for example is better than 3 opus



6/6
@sourmansa
j'aime la petites moddellettes




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559



1/3
@HaHoang411
How this I missed this?
Awesome work by @bria_ai_ team! They've just released RMBG 2.0, their new state-of-the-art background removal model. 🔥
Key notes:
- It has only 968M parameters.
- Nearly matches Remove Background from Image for Free – remove.bg's performance (97%) in photorealistic



https://video.twimg.com/ext_tw_video/1857359766181355520/pu/vid/avc1/928x720/9v1gr8wfegwRnj-Y.mp4

2/3
@HaHoang411
Official blog: Bria's New State-of-the-Art Remove Background 2.0 Outperforms the Competition



3/3
@HaHoang411
Give it a shot here: BRIA RMBG 2.0 - a Hugging Face Space by briaai




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,602
Reputation
8,519
Daps
160,559

Bria's New State-of-the-Art Remove Background 2.0 Outperforms the Competition

Bria.ai : Nov 12, 2024 9:34:52 AM
Technology
Bria's New State-of-the-Art Remove Background 2.0 Outperforms the Competition

Play

Bria's New State-of-the-Art Remove Background 2.0 Outperforms the Competition

AI-generated audio

4:34

Intro

In the rapidly evolving field of AI-driven image processing, background removal remains one of the most challenging tasks, particularly in workflows that involve complex image compositions.

This advanced open-source model enables development teams to streamline and scale processes with precise and reliable background removal. By integrating RMBG 2.0 into their workflows, teams can automate intricate image processing tasks, facilitating scalable content creation across various industries such as e-commerce, image editing, stock media, gaming, and more.


What’s New in RMBG 2.0

Bria's RMBG 2.0+ sets a new standard in open-source background removal by overcoming the limitations of RMBG 1.4. While the original model was effective, RMBG 2.0+ advances its capabilities to deliver state-of-the-art, industry-leading results that distinguish it from competitors.

RMBG 2.0+ provides unmatched precision in background removal with exceptional accuracy and consistency, even in complex scenes. It is available through a Model, API, and iframe for commercial use, with a commercial agreement required. For non-commercial use, it is only accessible through the model.

Impact on Bria API Routes

Separating and layering a “foreground” from a ”background” is the foundation of many complex AI Tasks. Thus, such an upgrade systematically improves any downstream endpoints utilizing this core function.

watches rmbg 2.0


watch_display_in_a_luxury_store__background_of_the_store_with_people_undefined (1)


cat rmbg 2.0


2.3FAST - a_fluffy_white_cat__sitting_on_the_glass_table_with_a_slight_reflection_undernea_Seed1372292595







See How it Compares to the Competition.

See how our new state-of-the-art 2.0 Background Removal capability consistently outperforms competitors like BiRefNe, Photoshop, and Remove.bg, setting a new benchmark for quality and commercial readiness.

We tested these capabilities and measured their outputs for commercial readiness and quality of results.
rmbg 2.0 blog - compare - 1
Three top industry-leading Remove Background, Open Source Models.,


Competitor Overview


[td width="226.5px"]
bria-logo
RMBG 2.0
[/td]
[td width="226.5px"]
birefnet logo2

[/td]
[td width="226.5px"]
photoshop-40

[/td]
[td width="226.5px"]
Remove_BG_Logo

[/td]
[td width="226.5px"]
It offers Remove Background 2.0, which has an open-source model, API, and iFrame.​
[/td][td width="226.5px"]
An open-source model was used, and it was the latest open SOTA​
[/td][td width="226.5px"]
Photoshop Background Removal offers an Interface and API​
[/td][td width="226.5px"]
Company specializing in Background Removal, which is available via a web app and an API​
[/td]​

Learn More about Bria’s AI Image Editing Products


Methodology

To evaluate RMBG 2.0+ against other leading background removal models, we benchmarked its performance on diverse images, assessing each model’s output for commercial readiness. Results were rated on a four-point scale: Very Bad for outputs with significant artifacts, Bad for conceptually accurate but unusable results, Good for outputs needing minor adjustments, and Very Good for near-perfect, ready-to-use segmentation.

Our test included diverse subject compositions (e.g., people, objects, animals, text) and photorealistic and non-photorealistic images to gauge versatility. We used images with simple, complex, and transparent backgrounds to test precision and analyzed foreground complexity with single and multiple elements.

This approach gave us a comprehensive view of how RMBG 2.0+ performs across various real-world scenarios, solidifying its position as a state-of-the-art model for consistent, high-quality background removal.

Benchmarking Results

Previous
Next

  • SIZE=0px]1
  • SIZE=0px]2[/SIZE]
  • SIZE=0px]3[/SIZE]
  • SIZE=0px]4[/SIZE]

  • Models’ Success Rate
  • Photorealistic vs. Illustrative Backgrounds
  • Model’s Consistency
  • Solid vs. Complex Background



This graph illustrates the percentage of benchmarks in which each model achieved usable results, defined as a score of "Good" or "Very Good."

rmbg 2.0 blog - compare -2

  • SIZE=16px]BiRefNet: Bria’s new RMBG 2.0 model outperforms the current open-source SOTA, BiRefNet, by a significant margin (90% vs. 85%), positioning RMBG 2.0 as the new state-of-the-art in open-source background removal.[/SIZE]

  • SIZE=5]Adobe Photoshop: Bria’s RMBG 2.0 surpasses Adobe Photoshop’s background removal capabilities by a large margin (90% vs. 46%).


  • SIZE=5]Remove.bg: Our model delivers competitive results against leading commercial solutions like remove.bg (90% vs. 97%). Particularly in the photorealistic use case, our primary focus, RMBG 2.0, achieved 92% accuracy compared to the removal.bg’s 97%.[/SIZE]
  • SIZE=5]RMBG 1.4: The new RMBG 2.0 model dramatically improves upon our previous RMBG 1.4 model (90% vs. 74%). This upgrade addresses critical issues in RMBG 1.4, with marked improvements in consistency, reduced indecisiveness, and significantly fewer misclassifications.[/SIZE]
[/SIZE]



Conclusion

The comprehensive benchmarking of RMBG 2.0+ demonstrates its remarkable advancement in background removal, positioning it as the new state-of-the-art solution among open-source models. RMBG 2.0+ consistently outperforms other models, particularly in handling complex backgrounds, where it excels beyond BiRefNet and Adobe Photoshop. Its high accuracy and reliability, especially in photorealistic scenarios, make it highly competitive with commercial solutions like remove.bg.

RMBG 2.0+ addresses critical issues in earlier models, such as inconsistent output and indecisiveness in complex scenes. This upgrade enhances the precision and robustness of Bria’s background removal capabilities and delivers commercial-grade results suitable for diverse real-world applications. Through its open-source availability, RMBG 2.0+ empowers development teams with a reliable, scalable tool for high-quality background removal, setting a new benchmark in the industry.


[td width="269.828px"]

Browse Our
API Documentation
[/td]
[td width="272.172px"]

Demo Bria’s Remove Background 2.0 Capability in Our Console
[/td]​
[td width="269.828px"]
Demo Bria’s Remove Background 2.0 Capability in Hugging Face [/td]
[td width="272.172px"]
Learn more in our Hugging Face model card
[/td]
[/SIZE]
 
Top