The A.I Megathread (LLM , GPT , Development)

bnew · Apr 14, 2024

AI Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Abstract:

Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence, optimising the allocation along the sequence for different layers across the model depth. Our method enforces a total compute budget by capping the number of tokens (𝑘) that can participate in the self-attention and MLP computations at a given layer. The tokens to be processed are determined by the network using a top-𝑘 routing mechanism. Since 𝑘 is defined a priori, this simple procedure uses a static computation graph with known tensor sizes, unlike other conditional computation techniques. Nevertheless, since the identities of the 𝑘 tokens are fluid, this method can expend FLOPs non-uniformly across the time and model depth dimensions. Thus, compute expenditure is entirely predictable in sum total, but dynamic and context-sensitive at the token-level. Not only do models trained in this way learn to dynamically allocate compute, they do so efficiently. These models match baseline performance for equivalent FLOPS and wall-clock times to train, but require a fraction of the FLOPs per forward pass, and can be upwards of 50% faster to step during post-training sampling.

https://arxiv.org/pdf/2404.02258.pdf

bnew · Apr 14, 2024

1/5
Google presents RecurrentGemma

Moving Past Transformers for Efficient Open Language Models

We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent

2/5
performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-

3/5
embedding parameters, and an instruction tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens.

4/5
paper page:

5/5
daily papers:

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2
BREAKING

Google releases model with new Griffin architecture that outperforms transformers.

Across multiple sizes, Griffin out performs the benchmark scores of transformers baseline in controlled tests in both the MMLU score across different parameter sizes as well as the average score of many benchmarks. The architecture also offers efficiency advantages with faster inference and lower memory usage when inferencing long contexts.

2B version of this on huggingface today:

2/2
PAPER - https://arxiv.org/pdf/2402.19427.pdf 2B version of this on huggingface -

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/10
Announcing RecurrentGemma!

GitHub - google-deepmind/recurrentgemma: Open weights language model from Google DeepMind, based on Griffin.

Open weights language model from Google DeepMind, based on Griffin. - google-deepmind/recurrentgemma

github.com

- A 2B model with open weights based on Griffin
- Replaces transformer with mix of gated linear recurrences and local attention
- Competitive with Gemma-2B on downstream evals
- Higher throughput when sampling long sequences

2/10
Building on ideas from SSMs and LSTMs, Griffin matches transformer performance without global attention, achieving faster inference on long sequences.

https://arxiv.org/abs/2402.19427 See
@sohamde_ 's great thread for more details:

3/10
Just got back from vacation, and super excited to finally release Griffin - a new hybrid LLM mixing RNN layers with Local Attention - scaled up to 14B params!

https://arxiv.org/abs/2402.19427 My co-authors have already posted about our amazing results, so here's a on how we got there!

4/10
In RecurrentGemma, we provide two 2B model checkpoints:
- A pretrained model, trained for 2T tokens
- An instruction tuned model for dialogue

We train on the same data as Gemma-2B for fewer tokens, achieving comparable performance.

Technical report:

https://storage.googleapis.com/deepmind-media/gemma/recurrentgemma-report.pdf

5/10
We provide efficient jax code for RecurrentGemma, which can also be used for general Griffin models.

This includes a memory efficient implementation of the linear recurrence in Pallas, with which we match the training speed of transformers on TPU

recurrentgemma/recurrentgemma/jax/pallas.py at main · google-deepmind/recurrentgemma

Open weights language model from Google DeepMind, based on Griffin. - google-deepmind/recurrentgemma

github.com

6/10
We also provide a reference implementation in Pytorch, though we recommend the jax code for best performance.

We hope RecurrentGemma provides an alternative to pre-trained transformers, and enables researchers to explore the capabilities of this exciting new model family.

7/10
A huge thank you to our brilliant team, especially @botev_mg @sohamde_ Anushan Fernando @GeorgeMuraru @haroun_ruba @LeonardBerrada

Finally, I'm sure there will be a few rough edges in the initial release, we will try to address these as fast as we can!

8/10
This is quite a complicated question to answer!

There is a way to efficiently expand the RNN width without slowing down training. But to do this you need to wrap the entire recurrent block in a single Pallas/CUDA kernel.

9/10
We looked into the long context performance of Griffin in the paper: [2402.19427] Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

We found Griffin performs well on some long context tasks (eg extrapolation/next token loss on v long sequences). However transformers are better at needle in haystack retrieval tasks.

10/10
Like Gemma, RecurrentGemma was trained on sequences of 8k tokens, although we also found in Griffin paper ([2402.19427] Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models) that we can extrapolate beyond the training sequence length for some tasks.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2
1/n Griffin: DeepMind's Soaring Leap Beyond the Transformer Paradigm

Imagine having a conversation with an AI assistant that can engage with you for hours on end, seamlessly incorporating context from your entire dialogue history. Or envision AI models that can analyze books, research papers, or legal documents of arbitrary length with perfect comprehension. The unbounded potential of artificial intelligence seems tantalizing close, if only our current language models could truly scale to sequences of unlimited length.

The transformer architecture underlying modern large language models like GPT, BERT, and their successors has delivered remarkable capabilities. However, these models face fundamental bottlenecks when working with very long sequences due to their reliance on global self-attention mechanisms. As sequence lengths grow, transformers must iterate over quadratically increasing numbers of pairwise comparisons, while caching linear amounts of intermediate representations. This makes inference on long sequences increasingly slow and computationally impractical.

But what if we could design language models that achieve transformer-level performance while scaling efficiently to sequences of arbitrary length? A team of researchers from Google DeepMind may have unlocked the solution by revisiting an old friend - recurrent neural networks. In their groundbreaking work "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models", they propose two novel architectures, Hawk and Griffin, that combine the recursive strength of RNNs with the latest capabilities of transformers.

At the core of these models lies a powerful innovation - the Real-Gated Linear Recurrent Unit (RG-LRU) layer. By incorporating gating mechanisms into linear recurrences, the RG-LRU can flexibly control how information flows through the recurrent state, allowing seamless summarization of entire sequences into fixed-sized representations. When combined with local attention over recent context, this mechanism proves transformative.

The results are striking - Griffin models achieve state-of-the-art performance matching or exceeding high-profile transformer models like Llama, all while being trained on a fraction of the data. More importantly, they demolish the shackles of sequence length, enabling efficient extrapolation to sequences far longer than seen during training. On the hardware front, custom optimizations for the recurrent layers allow training speeds on par with transformers.

2/2
Dude, it's all emergent stuff! Who knows, just as long as it works!!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/8
Meta’s new AI training and inference chips 3x performance

Plus, more insights in Generative AI today from Google, Microsoft and Mistral

Here's everything you need to know:

2/8
➲ Meta unveiled its latest MTIA AI chip family, outperforming last year's version by 3x across four key metrics.

It aims to decrease reliance on Nvidia GPUs while enhancing their proprietary systems.

3/8
Introducing the next generation of the Meta Training and Inference Accelerator (MTIA), the next in our family of custom-made silicon, designed for Meta’s AI workloads.

Full details Our next generation Meta Training and Inference Accelerator

4/8
➲ Mistral AI launched Mixtral 8x22B, a 176B Mixture-of-Experts model with a 65k context window, locally executable.

It scores 77% on MMLU benchmark, compared to Cohere Command R+ (75%) and GPT-4 (86%).

It's open source and accessible on Hugging Face.

Video credit: awnihannun

5/8
Google's Gemini 1.5 system prompt got leaked.

It prevents emotions, consciousness, personal opinions, self-awareness, and self-preservation.

6/8
➲ Former Google DeepMind researchers launched Udio, an AI-powered music creation app.

Users can generate full audio tracks in under 40 seconds with simple prompts.

The app secured early funding from a16z, Common, and others.

7/8
Introducing Udio, an app for music creation and sharing that allows you to generate amazing music in your favorite styles with intuitive and powerful text-prompting.

1/11

8/8
Microsoft Bing unveiled a new patent for 'Visual Search'

It details a reverse image search with personalised results based on user preferences.

Credit: mspoweruser

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 14, 2024

1/3
AI-generated sad girl with piano performs the text of the MIT License

2/3
my favorite part of this is ~1:25 it nails “WARRANTIES OF MERCHANTABILITY” with a beautiful imogen heap-style glissando then immediately pronounces “FITNESS” as “fistiff”

3/3
suno ai — song link:

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2
AI-generated sad girl with piano performs the text of the MIT License

2/2
suno ai — song link: [UPermission is hereby granted | Suno[/U]

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 14, 2024

bnew · Apr 14, 2024

1/3
Microsoft Research's Chris Bishop: when AI models regurgitate information in response to prompts we call them stochastic parrots; when humans do it we give them university degrees

2/3
Source:

3/3
good point

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 15, 2024

Can Base ChatGPT be Used for Forecasting without Additional Optimization?

This study investigates whether OpenAI's ChatGPT-3.5 and ChatGPT-4 can forecast future events. To evaluate the accuracy of the predictions, we take advantage of the fact that the training data at the time of our experiments (mid 2023) stopped at September 2021, and ask about events that happened...

arxiv.org

Economics > General Economics

[Submitted on 11 Apr 2024]

ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past

Van Pham, Scott Cunningham

This study investigates whether OpenAI's ChatGPT-3.5 and ChatGPT-4 can accurately forecast future events using two distinct prompting strategies. To evaluate the accuracy of the predictions, we take advantage of the fact that the training data at the time of experiment stopped at September 2021, and ask about events that happened in 2022 using ChatGPT-3.5 and ChatGPT-4. We employed two prompting strategies: direct prediction and what we call future narratives which ask ChatGPT to tell fictional stories set in the future with characters that share events that have happened to them, but after ChatGPT's training data had been collected. Concentrating on events in 2022, we prompted ChatGPT to engage in storytelling, particularly within economic contexts. After analyzing 100 prompts, we discovered that future narrative prompts significantly enhanced ChatGPT-4's forecasting accuracy. This was especially evident in its predictions of major Academy Award winners as well as economic trends, the latter inferred from scenarios where the model impersonated public figures like the Federal Reserve Chair, Jerome Powell. These findings indicate that narrative prompts leverage the models' capacity for hallucinatory narrative construction, facilitating more effective data synthesis and extrapolation than straightforward predictions. Our research reveals new aspects of LLMs' predictive capabilities and suggests potential future applications in analytical contexts.

Comments:	61 pages, 26 figures
Subjects:	General Economics (econ.GN); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.07396 [econ.GN]
	(or arXiv:2404.07396v1 [econ.GN] for this version)

Submission history

From: Scott Cunningham [view email]
[v1] Thu, 11 Apr 2024 00:03:03 UTC (3,365 KB)

https://arxiv.org/pdf/2404.07396

1/3
I wrote a substack about my new paper "ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past" coauthored with Van Pham. You can find the original paper at the link below, but this is a deep dive into it in case you're interested.

2/3
It's a weird paper.

ChatGPT Can Predict the Future When it Tells Stories Set in the Future About the Past

3/3
You can now selectively edit your Dall-E 3 pictures using OpenAI. It looks like you just highlight an area and write a new prompt and it recreates that area. I've not had much luck with it but I'll keep at it.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/10
"ChatGPT forecasts the future better when telling tales" — The Register

Take a peek at the essence of the story! 1/10

2/10
1. AI models, particularly ChatGPT, show improved forecasting ability when asked to frame predictions as stories about the past.

3/10
2. Baylor University researchers found ChatGPT's narrative prompts to be more effective in predicting events, such as Oscar winners, than direct prompts.

4/10
3. The study also highlights concerns about OpenAI's safety mechanisms and the potential misuse of AI for critical decision-making.
4. Despite prohibitions on certain predictions, the model can provide indirect advice or forecasts when prompted creatively.

5/10
5. OpenAI's GPT-4 model, used in the study, was trained on data up to September 2021, limiting its direct forecasting capability.

6/10
6. The narrative prompting strategy led ChatGPT-4 to accurately predict all actor and actress category winners for the 2022 Academy Awards but failed for the Best Picture.

7/10
7. The researchers suggest that narrative prompting can outperform direct inquiries and random guesses, offering a new approach to AI forecasting.
8. Accuracy varies with the type of prompt, with some narrative details leading to better or worse forecasts.

8/10
9. The study raises questions about the ability of AI models to consistently provide accurate predictions and the underlying mechanics that enable or inhibit forecasting.

9/10
10. There is an inherent randomness in AI predictions, indicating the need for considering confidence intervals or averages over singular predictions.

10/10
Hungry for more takes like this? Subscribe to our newsletter and stay up to date!

Find the link in our bio :smile:

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 15, 2024

https://www.reuters.com/technology/adobe-explores-openai-partnership-it-adds-ai-video-tools-2024-04-15/

Technology

Adobe explores OpenAI partnership as it adds AI video tools

By Stephen Nellis

April 15, 20241:19 PM EDTUpdated 27 min ago

Adobe logo is seen on smartphone in this illustration taken June 13, 2022. REUTERS/Dado Ruvic/Illustration/File Photo Purchase Licensing Rights, opens new tab

SAN FRANCISCO, April 15 (Reuters) - Adobe (ADBE.O), opens new tab is in the early stages of allowing third-party generative artificial intelligence tools such as OpenAI's Sora and others inside its widely used video editing software, the U.S. software maker said on Monday.

Adobe's Premiere Pro app is widely used in the television and film industries. The San Jose, California, company is planning this year to add AI-based features to the software, such as the ability to fill in parts of a scene with AI-generated objects or remove distractions from a scene without any tedious manual work from a video editor.

Both those features will rely on Firefly, an AI model that Adobe has already deployed in its Photoshop software for editing still images. Amid competition from OpenAI, Midjourney and other startups, Adobe has sought to set itself apart by training its Firefly system data it has full rights to and offering indemnity to users against copyright claims.

But Adobe also said on Monday that it is developing a way to let its users tap third-party tools from OpenAI, as well as startups Runway and Pika Labs, to generate and use video within Premiere Pro. The move could help Adobe, whose shares have fallen about 20% this year, address Wall Street's concerns that AI tools for generating images and videos put its core businesses at risk.

OpenAI has demonstrated its Sora model generating realistic videos based on text prompts but has not made the technology public or given a timeline for when it will be available. Adobe, which released a demonstration of Sora being used to generate video in Premiere Pro, described the demonstration as an "experiment" and gave no timeline for when it would become available.

Deepa Subramaniam, Adobe's vice president of product marketing for creative professional apps, said that Adobe has not yet settled how revenue generated by third-party AI tools used on its software platform will be split up between Adobe and outside developers.

But Subramaniam said that Adobe users will be alerted when they are not using Adobe's "commercially safe" AI models and that all videos produced by Premiere Pro will indicate clearly which AI technology was used to create them.

"Our industry-leading AI ethics approach and the human bias work that we do, none of that's going away," Subramaniam told Reuters. "We're really excited to do is explore a world where you can have more choice beyond that through third-party models."

The Technology Roundup newsletter brings the latest news and trends straight to your inbox. Sign up here.

Reporting by Stephen Nellis and Krystal Hu in San Francisco; Editing by Michael Perry and Josie Kao

bnew · Apr 15, 2024

1/8
I can't stand chatgpt's fancy jargon anymore: "delve", "realm", "tapestry".

Even academics are over-using it.

So I made a piece of prompt you can add anywhere to write normally:

2/8
Here's my style between ### style ### brackets.

The style is how you write & sound.

This is NOT the context of what you should write but HOW you should write.

### style ###

Avoid fancy jargon.

Write like people talk in a PhD classroom in a conversation.

Avoid analogies. Be straightforward. Short sentences.

You are forbidden to use complex English words.

You will be penalized & fined $1000 if you use the words from the ### ban list ###.

If you use one word from the list, I will stop the generation right away.

### ban list ###
juncture
cusp
Hurdles
Bustling
Harnessing
Unveiling the power
Realm
Depicted
Demistify
Insurmountable
New Era
Poised
Unravel
Entanglement
Unprecedented
Eerie connection
unliving
Beacon
Unleash
Delve
Enrich
Multifaced
Elevate
Discover
Supercharge
Unlock
Unleash
Tailored
Elegant
Delve
Dive
Ever-evolving
pride
Realm
Meticulously
Grappling
Weighing
Picture
Architect
Adventure
Journey
Embark
Navigate
Navigation
dazzle
Tapestry
### ban list ###
### style ###

3/8
The prompt is from my personal library.

How to prompt ChatGPT in 2024 ($89, one-time payment)

Have access to:
> my entire prompt library
> my prompt engineering masterclass
> some cool gpt agents
> a midjourney masterclass (for fun)

Last thing before you scroll away: How to prompt ChatGPT in 2024

4/8
I run daily tests on LLMs like chatgpt, gemini & claude, to master them.

Check
@rubenhssd for more.

If you'd like to support me, a like or a simple RT helps me for free :smile:

5/8
I can't stand chatgpt's fancy jargon anymore: "delve", "realm", "tapestry".

Even academics are over-using it.

So I made a piece of prompt you can add anywhere to write normally:

6/8
Now that's something.

Do you have the link?

7/8
It's insane

8/8
Context is still fine, Erik.

But you can make your own list!

That's the beauty of it.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 15, 2024

1/6
Today we are announcing WizardLM-2, our next generation state-of-the-art LLM.

New family includes three cutting-edge models: WizardLM-2 8x22B, 70B, and 7B - demonstrates highly competitive performance compared to leading proprietary LLMs.

Release Blog: WizardLM 2

Model Weights: WizardLM - a microsoft Collection

2/6
WizardLM-2 8x22B is our most advanced model, and just slightly falling behind GPT-4-1106-preview.
WizardLM-2 70B reaches top-tier capabilities in the same size.
WizardLM-2 7B even achieves comparable performance with existing 10x larger opensource leading models.

The…

3/6
As the natural world's human data becomes increasingly exhausted through LLM training, we believe that: the data carefully created by AI and the model step-by-step supervised by AI will be the sole path towards more powerful AI. Thus, we built a Fully AI powered Synthetic…

4/6
OK, we will train it and share with you ASAP.

5/6
Thanks for your support, my friend.

6/6
Yes, coming soon.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2
Now I have to do this.

Running wizardLM2 with
@ollama from my phone

2/2
lol,
@ollama releases the new model which is very very good and is small that I can run on a phone

Kidding aside, I am just trying to run the new ollama model on my GPU while trying to get some sleep

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
Big day for unexpectedly powerful LLM releases.

Microsoft's open source WizardLM 2 (also note that it used synthetic inputs in training, maybe "running out of data" will not be a big deal): WizardLM 2

Closed source Reka, which is multimodal: Reka AI

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/3
One of the community members tried the IQ3-XS quants of "WizardLM-2-8x22B" by
@WizardLM_AI —this is such a complicated question!

Such an advanced and coherent response! I am quite impressed!

2/3
I knew there would come a day when I would put these two next to each other:

"ChatGPT CEO Sam Altman to Indians: It's pretty hopeless for you to compete with us" - 10 mo. ago

https://reddit.com/r/ChatGPT/commen..._sam_altman_to_indians_its_pretty/ @OpenAI eat that!

3/3
We can do it! First open LLM outperforms @OpenAI GPT-4 (March) on MT-Bench. WizardLM 2 is a fine-tuned and preferences-trained Mixtral 8x22B!

TL;DR;
Mixtral 8x22B based (141B-A40 MoE)
Apache 2.0 license
First > 9.00 on MT-Bench with an open LLM
Used multi-step…

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
the ai refining the ai's post-ai training - this space is fascinating...

We introduce and opensource WizardLM-2, our next generation state-of-the-art large language models, which have improved performance on complex chat, multilingual, reasoning and agent. New family includes three cutting-edge models: WizardLM-2 8x22B, WizardLM-2 70B, and WizardLM-2 7B.

WizardLM-2 is the latest milestone in our effort in scaling up LLM post-training. One year ago, we have been iterating on training of Wizard series since our first work on Empowering Large Language Models to Follow Complex Instructions, then we accelerate the evolution to code and math reasoning scenarios. Since then, Evol-Instruct and Instruction&Process Supervised Reinforcement Learning (RLEIF) have become fundamental technologies for GenAI community. Recently, we have further optimized our methods and data quality, resulting in significant performance improvements, the outcome is WizardLM-2.

WizardLM-2 8x22B is our most advanced model, and the best opensource LLM in our internal evaluation on highly complex tasks. WizardLM-2 70B reaches top-tier reasoning capabilities and is the first choice in the same size. WizardLM-2 7B is the fastest and achieves comparable performance with existing 10x larger opensource leading models.

Following, we will introduce the overall methods and main experimental results, and the associated details and rethinking will be presented in our upcoming paper.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 15, 2024

WizardLM 2

SOCIAL MEDIA DESCRIPTION TAG TAG

wizardlm.github.io

WizardLM 2

Microsoft AI

Apr 15, 2024

Models Code arXiv [Coming]

We introduce and opensource WizardLM-2, our next generation state-of-the-art large language models, which have improved performance on complex chat, multilingual, reasoning and agent. New family includes three cutting-edge models: WizardLM-2 8x22B, WizardLM-2 70B, and WizardLM-2 7B.

WizardLM-2 is the latest milestone in our effort in scaling up LLM post-training. One year ago, we have been iterating on training of Wizard series since our first work on Empowering Large Language Models to Follow Complex Instructions, then we accelerate the evolution to code and math reasoning scenarios. Since then, Evol-Instruct and Instruction&Process Supervised Reinforcement Learning (RLEIF) have become fundamental technologies for GenAI community. Recently, we have further optimized our methods and data quality, resulting in significant performance improvements, the outcome is WizardLM-2.

WizardLM-2 8x22B is our most advanced model, and the best opensource LLM in our internal evaluation on highly complex tasks. WizardLM-2 70B reaches top-tier reasoning capabilities and is the first choice in the same size. WizardLM-2 7B is the fastest and achieves comparable performance with existing 10x larger opensource leading models.

Following, we will introduce the overall methods and main experimental results, and the associated details and rethinking will be presented in our upcoming paper.

Method Overview

As the natural world's human-generated data becomes increasingly exhausted through LLM training, we believe that: the data carefully created by AI and the model step-by-step supervised by AI will be the sole path towards more powerful AI.

In the past one year, we built a fully AI powered synthetic training system:

Data Pre-Processing:
- Data Analysis: We use this pipline to get the distribution of different attributes for new source data. This helps us to have a preliminary understanding of the data.
- Weighted Sampling: The distribution of the best training data is always not consistent with the natural distribution of human chat corpus, thus we need adjust the weights of various attributes in the training data based on experimental experience.
Progressive Learning: Unlike the common practice of using all data for one-time training, we found that using different data partitions and progressively training stage-by-stage can achieve better results with less data. In each stage, we firstly feed the data slice to following Evol Lab to get more diverse and complex [instruction, response] pairs. Immediately, we leverage a new framework named "AI Align AI" (AAA) which can group multi state-of-the-art LLMs to teach and improve each other. Finally, we successively apply the Supervised Learning, Stage-DPO, and RLEIF to optimize each variant.
Evol Lab:
- Evol-Instruct: Recently, we have dedicated significant effort to reassessing the various issues within the original Evol-Instruct method and have initiated preliminary modifications. The new method enables various agents to automatically generate high quality instructions.
- Evol-Answer: Guiding the model to generate and rewrite responses multiple times can improve its logic, correctness, and affinity.
AI Align AI (AAA):
- Co-Teaching: We collect WizardLMs, and various licensed opensource and proprietary state-of-the-art models, then let them co-teach and improve each other, the teaching contains simulated chat, quality judging, improvement suggestions and closing skill gap, etc.
- Self-Teaching: WizardLM can generate new evolution training data for supervised learning and preference data for reinforcement learning via activate learning from itself.
Learning:
- Supervised Learning.
- Stage-DPO: For more effective offline reinforcement learning, we also split the preference data to different slices, and progressively improve the model stage by stage.
- RLEIF: We employ instruction quality reward model (IRM) combined with the process supervision reward model (PRM) to achieve more precise correctness in the online reinforcement learning.

WizardLM 2 Capacities

To present a comprehensive overview of the performance of WizardLM-2, we conduct both human and automatic evaluations between our models and diverse baselines. As indicated in the following main experimental results, WizardLM-2 demonstrates highly competitive performance compared to those leading proprietary works and consistently outperforms all the existing state-of-the-art opensource models. More associated details and thinking will be presented in our upcoming paper.

Human Preferences Evaluation

We carefully collected a complex and challenging set consisting of real-world instructions, which includes main requirements of humanity, such as writing, coding, math, reasoning, agent, and multilingual. We perform a blind pairwise comparison between WizardLM-2 and baselines. To each annotator, responses from all models are presented, which are randomly shuffled to hide their sources. We report the win:loss rate without tie:

WizardLM-2 8x22B is just slightly falling behind GPT-4-1106-preview, and significantly stronger than Command R Plus and GPT4-0314.
WizardLM-2 70B is better than GPT4-0613, Mistral-Large, and Qwen1.5-72B-Chat.
WizardLM-2 7B is comparable with Qwen1.5-32B-Chat, and surpasses Qwen1.5-14B-Chat and Starling-LM-7B-beta.

Through this human preferences evaluation, WizardLM-2's capabilities are very close to the cutting-edge proprietary models such as GPT-4-1106-preview, and significantly ahead of all the other open source models.

MT-Bench

We also adopt the automatic MT-Bench evaluation framework based on GPT-4 proposed by lmsys to assess the performance of models. The WizardLM-2 8x22B even demonstrates highly competitive performance compared to the most advanced proprietary works such as GPT-4-Trubo and Glaude-3. Meanwhile, WizardLM-2 7B and WizardLM-2 70B are all the top-performing models among the other leading baselines at 7B to 70B model scales.

Usage

The model weights of WizardLM-2 8x22B and WizardLM-2 7B are shared on Huggingface, and WizardLM-2 70B and the demo of all the models will be available in the coming days. Please use the same system prompts strictly with us to guarantee the generation quality.

Note for model system prompts usage:

WizardLM-2 adopts the prompt format from Vicuna and supports multi-turn conversation.

The prompt should be as following:

Code:

[/SIZE][/JUSTIFY]
[SIZE=4][JUSTIFY]A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi ASSISTANT: Hello.&lt;/s&gt;USER: Who are you? ASSISTANT: I am WizardLM.&lt;/s&gt;......

Inference WizardLM-2 Demo Script

We provide a WizardLM-2 inference demo code on our github.

bnew · Apr 15, 2024

1/1
After months of silence,
@TeamCodeLLM_AI silently dropped wavecoder in the shadow of WizardLM2 (also a great looking model!)

Check out my quants of their ultra model here:

GGUF: bartowski/wavecoder-ultra-6.7b-GGUF · Hugging Face

EXL2 bartowski/wavecoder-ultra-6.7b-exl2 · Hugging Face

Should I make the others? Ultra seems like the one to go with, top tier performance so far

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 15, 2024

1/6
Big news:

Google just rolled out a massive upgrade.

AI is now inside Google Photos, and it's free.

Here are new AI features that you can't miss out:

2/6
1. Magic Eraser

Remove unwanted distractions with a few taps.

3/6
2. Portrait Light

Adjust the position and brightness of light in portraits.

4/6
3. Photo Unblur

Blurry shots are a thing of the past!

AI sharpens those fuzzy photos, rescuing precious memories.

5/6
That's it!

If you enjoyed this thread:

- Like and Retweet
- Follow <
@hey_madni > for more similar content.

6/6
Big news:

Google just rolled out a massive upgrade.

AI is now inside Google Photos, and it's free.

Here are new AI features that you can't miss out:

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 16, 2024

1/4
WizardLM-2 now gone… Premature release? or something else going on here

2/4
This was their announcement

3/4
Today we are announcing WizardLM-2, our next generation state-of-the-art LLM.

New family includes three cutting-edge models: WizardLM-2 8x22B, 70B, and 7B - demonstrates highly competitive performance compared to leading proprietary LLMs.

Release Blog:…

4/4
What would be the point? Besides some short lived attention on social media lol

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
The internet never forgets ...

alpindale/WizardLM-2-8x22B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/3
We are sorry for that.

It’s been a while since we’ve released a model months ago, so we’re unfamiliar with the new release process now: We accidentally missed an item required in the model release process - toxicity testing.

We are currently completing this test quickly and then will re-release our model as soon as possible.

Do not worry, thanks for your kindly caring and understanding.

2/3
No, we just forgot a necessary testing. This is a step that all new models currently need to complete.

3/3
According to the latest regulations, we can only operate in this way. We need to strictly abide by the necessary process. We are very sorry for the trouble it has caused you.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/5
Looks like $MSFT's WizardLM was short-lived.

They just wiped it from their HF repo - Github as well?

There's a couple of quantized versions _still_ available. Maybe grab one?

Anyone see anything?

2/5
Ollama was the fastest pull. (ollama run wizardlm2)

MaziyarPanahi/WizardLM-2-7B-GGUF · Hugging Face

WizardLM-2-7B-Q8_0.gguf · bartowski/WizardLM-2-7B-GGUF at main

mlx-community/WizardLM-2-7B-4bit · Hugging Face
@Prince_Canuma

3/5
nice one

4/5
All good - will be back online soon.

Apparently a test was missed that should have been performed.

5/5
We are sorry for that.

It’s been a while since we’ve released a model months ago, so we’re unfamiliar with the new release process now: We accidentally missed an item required in the model release process - toxicity testing.

We are currently completing this test quickly… x.com/WizardLM_AI/st…

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

alpindale/WizardLM-2-8x22B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

1/1
Running WizardLM-2-8x22B 4-bit quantized on a Mac Studio with SiLLM powered by Apple MLX

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
Introducing the newest large Mixtral finetune, WizardLM 2 by Microsoft AI:

WizardLM-2 8x22B by microsoft | OpenRouter

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is an instruct finetune of [Mixtral...

openrouter.ai

65k context and incredible results on benchmarks, and works well for roleplay.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 16, 2024

1/1
Introducing OpenAI Japan, our first office in Asia, along with a new GPT-4 custom model specifically optimized for 日本語 (the Japanese language). Introducing OpenAI Japan

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 16, 2024

1/1
Am I the only one who is thinking of a walking chair that runs to me when I get back home?

And plays lofi Sumo/Udio songs when I sit down?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

The A.I Megathread (LLM , GPT , Development)

Veteran

AI Mixture-of-Depths: Dynamically allocating compute in transformer-based language models​

Veteran

Veteran

Veteran

Veteran

Veteran

Economics > General Economics​

ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past​

Submission history​

Veteran

Adobe explores OpenAI partnership as it adds AI video tools​

Veteran

Veteran

Veteran

WizardLM 2​

Method Overview​

WizardLM 2 Capacities​

Usage​

Veteran

Veteran

Veteran

Veteran

Veteran

AI Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Economics > General Economics

ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past

Submission history

Adobe explores OpenAI partnership as it adds AI video tools

WizardLM 2

Method Overview

WizardLM 2 Capacities

Usage