bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870

1/1
SambaNova already outpaces Databricks DBRX


@SambaNovaAI released Samba-CoE v0.2 LLM and it's already leaving the competition in the dust.

The model is doing more with less.









1/7
Excited to announce Samba-CoE v0.2, which outperforms DBRX by
@DbrxMosaicAI and
@databricks
, Mixtral-8x7B from
@MistralAI
, and Grok-1 by
@grok
at a breakneck speed of 330 tokens/s.


These breakthrough speeds were achieved without sacrificing precision and only on 8 sockets, showcasing the true capabilities of dataflow! Why would you buy 576 sockets and go to 8 bits when you can run using 16 bits and just 8 sockets. Try out the model and check out the speed here - Streamlit.

We are also providing a sneak peak of our next model, Samba-CoE v0.3, available soon with our partners at
@LeptonAI
. Read more about this announcement at SambaNova Delivers Accurate Models At Blazing Speed

2/7
Extending the methodology used to create Samba-CoE v0.1, these models are built on top of open-source models in Samba-1 and Sambaverse (Find the best open source model for your project with Sambaverse) using a unique approach towards ensembling and model merging.

3/7
This model outperforms Gemma-7B from
@GoogleAI and
@GoogleDeepMind
, Mixtral-8x7B from
@MistralAI
,
llama2-70B from
@AIatMeta
, Qwen-72B from
@AlibabaGroup
Qwen team, Falcon-180B from
@TIIuae
and BLOOM-176B from
@BigscienceW
.

4/7
The expert models are all open source, the routing strategy has not been open sourced yet. Much more information to follow in the coming weeks.

5/7
@mattshumer_
@EvanKirstel

@_akhaliq

@rasbt

@pmddomingos

@emollick

@GaryMarcus

6/7
@ylecun
@mattmayo13

@alliekmiller

@ValaAfshar

@Andrew

@rowancheung

7/7
The expert models are all open source, the routing strategy has not been open sourced yet. Much more information to follow in the coming weeks.
GJxnFKGakAM5mYd.jpg

GJylcUoXkAAsjaP.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870
NBA Pacers used Snapchat AI filters to make it look like Los Angeles Lakers fans were crying during the game.

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870



1/2
We are releasing our first step in validating and independently confirming the claims of the Bitnet paper, a 1B model trained on the first 60B tokens of the Dolma dataset.

Comparisons made on the @weights_biases
charts below are between the Bitnet implementation and a full FP16 run (all hyperparameters equivalent).

Model: NousResearch/OLMo-Bitnet-1B · Hugging Face
Weights & Biases: OLMo-Bitnet

2/2
This work is to independently validate and reproduce the paper "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits"

Paper available here:
GJ48srmW4AAS8v1.jpg

GJ_VBoHXkAAH1XZ.jpg

GJ_VBoMWQAAgO6t.jpg

GJ703OfWEAA8Hk7.png

GFlm93gacAAPLw0.jpg

GJ_fVcKXkAERWL0.png







 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870





1/5
Mistral is basically the leader in open source LLMs. Mixtral is a very good model for fine tuning, if you don’t have the resources mistral 7B is pretty much the best starting point ime

2/5
The license for mistral models is much more permissive. Allowing for using the outputs on other models

3/5
You can get pretty far with <= 10k samples using mixtral. Unless it’s data in a specific language

4/5
Sounds like pruning, but I don’t think it would work afaik the same way you are referring to it

5/5
it’s useful when it makes sense, which is generally not good for a MVP or early products just reaching pmf
GJ703OfWEAA8Hk7.png







1/5
I ran the new #DBRX Instruct model through 4 benchmarks that have high correlation with the @lmsysorg
Chatbot Arena and measure different capabilities:

MT Bench: a multi-turn chat benchmark that uses GPT-4 as a judge. Known to suffer from length bias & is somewhat noisy, but is still a rough proxy for "chattiness" that is cheap to run.

IFEval: a clever benchmark from Google which contains ~500 "verifiable instructions" like "write a poem about bricks, with less than 100 words and use no commas" that can be checked with string parsing. Avoids the issues with LLM-judge benchmarks and mostly measures instruction following aka "helpfulness".

BBH: a set of 23 hard tasks from the Big-Bench eval suite, targeting things like causal judgement from stories, navigation, and humorously, questions about penguins on a table . Popularised by @Teknium1
and @NousResearch
in training models like OpenHermes

AGIEval: a benchmark focused on human knowledge exams like SATs, math competitions. Also popularised by @Teknium1
and many of the Chinese LLMs

Overall we can see DBRX-Instruct is a very strong model, but the difference compared to Mixtral-Instruct is not large and only on AGIEval does DBRX do better.

Of course, these benchmarks don't portray a complete picture of model capabilities (e.g. code), but I do find it somewhat surprising that DBRX is not significantly better than Mixtral, which has far fewer params.

2/5
All of these evals were run with LightEval - the internal suite we use at Hugging Face for evaluating LLMs

A big thank you to
@clefourrier and
@nathanhabib1011 for putting up with my endless features requests!

Lib: GitHub - huggingface/lighteval: LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

3/5
Sure!

4/5
Very nice! That suggests the diff with Mixtral Instruct is largely due to the fine tuning recipe - looking forward to seeing the community fine tunes bear this out :smile:

5/5
Yes, I think it's definitely interesting to tune the model and see if the current perf in DBRX Instruct is due to an "alignment tax" from human feedback that nerfs capabilities

Unfortunately, the modelling code still needs some work as I'm hitting many issues fine-tuning like…
GJ703OfWEAA8Hk7.png

GJ8K7u3WcAASMqY.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870




1/4
A 7B-parameter model that beats ChatGPT-3.5, Mixtral, Gemni Pro, and some of the best 30B and 70B models. Isn't this exciting? Meaning that you can squeeze much more capability per parameter if you know what you are doing.

2/4
The ELO leaderboard is a result of pairwise blind tests by ordinary users.

3/4
ELO

4/4
I have a statistically backed information. You have an opinion. Hmmm, hard to choose.
GJ8_uWSWYAAwn0V.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870






1/6
Are medical studies being written with ChatGPT?

Well, we all know ChatGPT overuses the word "delve".

Look below at how often the word 'delve' is used in papers on PubMed (2023 was the first full year of ChatGPT).

2/6
"Mosaic" is another one that comes up way too frequently.

Someone told me that the word "notable" is a dead give away of GPT

—I'm so bummed, because I use "notable" to describe interesting parts when I'm writing up studies all the time.

3/6
Thanks, William!

Where did you get the economics data from?

Yep, I used code interpreter to make my chart, too.

4/6
I'm definitely not saying it's a bad thing.

Full disclaimer: every single paper I've submitted this year has a statement of disclosure that I've used GPT-4 in editing words, sometimes for coding assistance

—and that if there are errors, they are my responsibility.

5/6
Good call.

I'll see if I can do that chart tomorrow

6/6
Completely agree.

Just got the count of papers data

—I'll post the normalised chart as soon as I finish handling some other things
GJ6WnpmasAAZIC2.jpg

GJ7bpIGXkAANxFC.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870

1/1
This is how Tim Dettmers, Artidoro, et al created QLoRA

Original paper: [2305.14314] QLoRA: Efficient Finetuning of Quantized LLMs
Blog: Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

QLoRA allows 4-bit quantization + training and has been one of the most impactful papers from 2023.
GKAiHElXkAAPLTI.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870











1/11
What are the LLMs with the most output tokens these days?

GPT-4 and Claude 3 are both 4096. Gemini Pro 1.5 is 8192

This really matters for structured data extraction: even with 1m of input tokens you can't scrape a big webpage into a CSV file if you run out of output tokens

2/11
An interesting trick that does work is you can send a prompt requesting "more" and have the LLM pick up again where it stopped

That requires round-tripping the work it has done so far, but with a long enough context window (and a will to spend the money) that's quite feasible

3/11
The bigger problem here is a usability one: explaining to end users why their extracted data was randomly cut off half way through (or risking them not noticing) isn't great

4/11
Yes, that's a very real risk. I'm starting to try and push the limits of what makes sense to pipe through these things

5/11
The documentation says it should cut off at 4096 tokens of output, but I haven't stress tested it myself yet

6/11
Do you know where they document their output token limit? I can't find that for any of their models

7/11
That's input though - the docs say output for gpt-4-turbo is limited to 4096

8/11
In data journalism the use-cases are mainly around structured data extraction and other forms of transformation

Inputting a 5MB HTML file to output a ~3MB CSV or JSON file for example

Also things like "translate this report from Spanish to English" where the report might be >4k

9/11
Whoa, is that a documented feature? That's really useful

10/11
The HTML thing was really just an illustrative example - the general challenge is that there are plenty of text extraction tasks where the output is > 8196 tokens so the more output tokens we can have the easier these things are to put into practice

11/11
Definitely founded - I've had particular trouble getting table data out of screenshots of tables, but I don't trust it very much at all yet
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870

1/1
Presenting Starling-LM-7B-beta, our new cutting-edge 7B language model fine-tuned with RLHF!

Also introducing Starling-RM-34B, the workhorse Reward Model behind the Starling-LM-7B-beta, ranking #1 in the latest RewardBenchmark from
@natolambert and the
@allenai_org
team.

HuggingFace links:
[Starling-LM-7B-beta] Nexusflow/Starling-LM-7B-beta · Hugging Face
[Starling-RM-34B] Nexusflow/Starling-RM-34B · Hugging Face

Discord Link: 加入 Discord 服务器 Nexusflow!

RewardBench from
@allenai_org
: Reward Bench Leaderboard - a Hugging Face Space by allenai

Since the release of Starling-LM-7B-alpha, we've received numerous requests to make the model commercially viable. Therefore, we're licensing all models and datasets under Apache-2.0, with the condition that they are not used to compete with OpenAI. Enjoy!
GJIx-MMbIAAGcJp.jpg

GKAiHElXkAAPLTI.jpg

GJ-_GAjXMAAAgqZ.jpg

GJ_VBoHXkAAH1XZ.jpg

GJ_VBoMWQAAgO6t.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870

1/1
An Apache 2.0 licensed dataset for LLM pretraining, 30.4T tokens in deduplicated documents. Languages: English, German, French, Italian, Spanish
GJ_xCdeXwAE5GKW.jpg

GJ2KWynWYAANiD1.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870









1/9
I tested Claude 3 Opus on one of the problems on the hardest software engineering benchmark for AI — real Github issues.

It took ~4mins with 37.5k input tokens and 2.8k output tokens to *mostly* solve it, with only minor hiccups..

This changes software development.

1/7

2/9
Let's unpack the benchmark (SWE-Bench) first.

Devin's 13% beat the former leader, Claude 2 at 4% of the 2294 problems in the benchmark.

These problems (test set) come from real-world Github issues of the following open-source repos:

2/7

3/9
I looked at issue #1834 in sqfluff/sqlfluff, a SQL linter — adding quiet mode.

The benchmark is supplemented with a princeton-nlp BM25 Retrieval dataset on Huggingface which adds the right file context for the change and assembles it into a huge prompt.

3/7

4/9
That prompt contains
- the text in the Github issue
- full text of top 5 relevant files from the repo (that number varies)
- natural language prompting "I need you to solve this issue.."

Here's the final prompt we feed into Opus.

4/7

5/9
It does generate a valid patch, but it's.. corrupt.

The line # s are wrong. The .diff can't be applied... unless you ignore the, and align by matching the code context!

It produces a different solution from the real world one, PR#4764! Here's a sample (of 317 lines)

5/7

6/9
It's only when I dug into the dataset did I appreciate the difficulty of the task, and just how far we've gotten.

SqlFluff is a ~100k lines of Python. The real patch was +132 -39 and Claude did +93 -34 lines.

Didn't update tests, but caught most of the callsites!

6/7

7/9
A change like this would've taken a normal developer close to 2-8hrs and Claude 3 Opus just cranked out a reasonable fix in ~4mins.

This is pretty incredible to see when you get into the weeds of it.

We're about to see a major shift in the way software is built!

7/7

8/9
Links:
SWE-Bench: https://swebench.com
Dataset:[/URL] princeton-nlp/SWE-bench_bm25_40K · Datasets at Hugging Face
Example[/URL] Issue: Enable quiet mode/no-verbose in CLI for use in pre-commit hook · Issue #1834 · sqlfluff/sqlfluff
Real[/URL] world fix:

9/9
There's not many VCs who are engineers

GJ-BiTJakAA3Nfj.jpg

GJ-Bi2wawAIJoUB.png

GJ-BjUgbAAA_YA7.png

GJ-Bj2eaIAAFtJN.jpg

GJ-BkbEaMAAC6TN.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870






1/6
Google just dropped SIMA, and it's insane

It's literally ChatGPT for video games.

Here are 5 features of SIMA you don't want to miss

2/6
Versatile AI agent

Meet SIMA, the AI that's mastering video games by understanding natural language instructions.

3/6
SIMA stands out by learning to perform tasks in various 3D environments, proving AI can be more versatile and adaptable than ever before.

4/6
Learning from video games

SIMA was evaluated across 600 basic skills, spanning navigation, object interaction, and menu use.

5/6
SIMA’s performance relies on language.

In a control test without language training, the agent acts appropriately but aimlessly. Instead of following instructions, it might just gather resources.

6/6
Collaboration with eight game studios

DeepMind, made this revolutionary research possible in collaboration with Eight game Studios.
GJ69wwMacAAdH6g.jpg

GJ69zrVbgAAy5NL.jpg

GJ692SCa4AAkMh7.png

GJ692zoaUAAgxgN.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870









1/9
Announcing 𝐕𝐨𝐢𝐜𝐞𝐂𝐫𝐚𝐟𝐭

SotA for both speech editing and zero-shot text-to-speech, Outperforming VALL-E, XTTS-v2, etc.

VoiceCraft works on in-the-wild data such as movies, random videos and podcasts

We fully open source it at VoiceCraft

2/9
𝐕𝐨𝐢𝐜𝐞𝐂𝐫𝐚𝐟𝐭 works well on recordings with diverse accents, emotions, styles, content, background noise, recording conditions.

Demo: VoiceCraft

Paper:[/URL] https://jasonppy.github.io/assets/pdfs/VoiceCraft.pdf

code,[/URL] model, data:

3/9
No training needed - To clone or edit a voice, it only needs a 3 seconds reference of that voice during inference.

4/9
Yea let’s do that!

5/9
will be available by the end of March

6/9
Weights will be available by the end of March

7/9
We evaluated VoiceCraft on internet videos and podcasts, which consist diverse accents, the model handles them pretty well. Check out examples at VoiceCraft

8/9
have been discussing the licensing issue, might change it in the coming days

9/9
Thanks! Have been discussing the licensing issue, might change it in the coming days

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870

Tennessee Makes A.I. an Outlaw to Protect Its Country Music and More​

Gov. Bill Lee on Thursday signed a first-in-the-nation bill to prevent the use of artificial intelligence to copy a performer’s “voice.”

Gov. Bill Lee stands at a lectern in a bar decorated with guitars and other music memorabilia.

Gov. Bill Lee took the stage at Robert’s Western World in Nashville to sign legislation offering new protections from A.I.Credit...Jason Kempin/Getty Images For Human Artistry


By Emily Cochrane

Reporting from a Nashville honky-tonk

March 21, 2024, 7:12 p.m. ET

The floor in front of the stage at Robert’s Western World, a beloved lower Broadway honky-tonk in Nashville, was packed on Thursday afternoon.

But even with the country music superstar Luke Bryan and multiple other musicians on hand, the center of attention was Gov. Bill Lee and his Elvis Act.

And Mr. Lee did not disappoint, signing into law the Ensuring Likeness, Voice and Image Security Act, a first-in-the-nation bill that aims to protect musicians from artificial intelligence by adding penalties for copying a performer’s “voice” without permission.

“There are certainly many things that are positive about what A.I. does,” Mr. Lee told the crowd. But, he added, “when fallen into the hands of bad actors, it can destroy this industry.”

Image

Luke Bryan, wearing a flannel shirt, snaps a selfie with two lawmakers in suits.

Luke Bryan snapped a selfie with State Representative William Lamberth and Governor Lee.Credit...Jason Kempin/Getty Images For Human Artistry

The use of A.I. technology — and its rapid fire improvement in mimicking public figures — has led several legislatures to move to tighten regulations over A.I., particularly when it comes to election ads. The White House late last year imposed a sweeping executive order to push for more guardrails as Congress wrestles with federal regulations.

But since this is Tennessee, the focus was unsurprisingly on the toll it could take on musicians in Nashville, Memphis and beyond. Mr. Lee’s office said that the music industry generates billions of dollars for the state and supports more than 61,000 jobs and upward of 4,500 venues.

Several leading musicians, recording industry groups and artists alliances rallied around the bill this year, warning about the dire consequences of A.I.

“I’ve just gotten to where stuff comes in of my voice, on my phone, and I can’t tell it’s not me,” Mr. Bryan said on Thursday, adding that “hopefully this will curb it, slow it down.”

Chris Janson, a country singer and songwriter who recounted the time he spent working gigs on lower Broadway, the area downtown where many of the city’s honky-tonks are concentrated, told lawmakers and supporters that “we are grateful for you guys protecting, and you ladies protecting, our community, our artist community.”

Tennessee first intervened to protect an artist’s name, image and likeness with a 1984 law, which came as the Presley estate was battling in court to control how the musical legend’s name and likeness could be used commercially after his death. The version signed into law Thursday adds to that measure and will take effect July 1.

The new law passed through the legislature unanimously, a remarkable feat for a rancorous body that has spent weeks fighting — at one point, almost literally — over the smallest of slights and policy changes.

The decision to hold a bill signing at a honky-tonk was a first for many there, and it was an unusual scene for Mr. Lee, a more reserved public figure whose suited security detail visibly startled a couple of tourists outside the venue.

Inside, fried bologna sandwiches — the cornerstone of the Robert’s $6 recession special — sizzled on the stovetop as Mr. Lee spoke. Republicans and Democrats alike sported “ELVIS Act” pins and applauded when Mr. Lee and top Republicans received framed platinum records recognizing the act’s signing.

State Senator Jack Johnson, the majority leader, reminisced about celebrating his bachelor party at Robert’s, while Mr. Lee described a fondness for incognito date nights with his wife to listen to some music. And State Representative Justin Jones, a top Democratic foe of the Republican supermajority, later posted photos of the event on Instagram with the note that it feels good to have a bill “that’s not complete trash.”

The legislation’s broad definitions, however, have given some lawyers pause about whether it could inadvertently limit certain performances, including when an actor is playing a well-known artist. The law also makes a person liable for civil action if an audio recording or a reproduction of a person’s likeness was knowingly published without authorization.

Voice, under the law, is defined as a sound in a recording or other medium that is “readily identifiable and attributable to a particular individual,” whether the record contains a person’s voice or a simulation.

Those concerns led to some changes in the bill to create an exemption for such audiovisual representations unless they give “the false impression that the work is an authentic recording.”

And given the broad definition of voice, one legal expert wondered, what would this mean for tribute bands, or the men who have perfected an Elvis impersonation?

“It’s not what the bill is intended to do, but when a law is drafted in a way that allows people to make mischief with it, mischief tends to follow,” said Joseph Fishman, a professor of law at Vanderbilt University.

But Mr. Fishman emphasized that even if the measure requires some further tailoring in the coming years, it remained “a well intentioned bill that does do a lot of good.”

Ben Sisario contributed reporting.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,843
Reputation
7,926
Daps
148,870

AI and society

Apr 1, 2024


Deepmind chief doesn't see AI reaching its limits anytime soon - but still warns against hype​

George Gillams bei Flickr, CC BY-SA 4.0 DEED

Deepmind chief doesn't see AI reaching its limits anytime soon - but still warns against hype




Matthias Bastian

Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Profile
E-Mail



AI is both overrated and underrated, says Deepmind founder Demis Hassabis.

According to Hassabis, the massive hype and huge sums of money currently being invested in AI are obscuring scientific progress.

The head of AI research at Google told the Financial Times that the billions being poured into AI start-ups and products "brings with it a whole attendant bunch of hype and maybe some grifting and some other things that you see in other hyped-up areas, crypto or whatever."

Hassabis expects many AI startups to fail because they will not be able to meet the technology's enormous demands on computing power. Even experiments with new architectures and techniques would have to be massive to keep up at the top.

"There doesn't seem to be any limit to how far you can push them [the models]. So one has to push that as hard as possible," Hassabis says of the massive AI models currently being researched.

This is difficult for small startups trying to build a business and a product at the same time, and Hassabis expects consolidation in the industry as a result.

"In a way, AI's not hyped enough but in some senses it's too hyped. We're talking about all sorts of things that are just not real," he says.

AI could lead to new "golden era"​

Despite the hype, Hassabis believes the potential of AI is far from exhausted: "We’re at the beginning, maybe, of a new golden era of scientific discovery, a new Renaissance," says the Deepmind founder.

The company has developed AlphaFold, an AI system for science, and is bringing this progress to the market in the form of new drugs with the startup Insilico and Google.

Recommendation

AI and society

Hassabis puts the chances of an artificial general intelligence (AGI) in the next ten years at 50 percent, even if it still needs "one or two" decisive breakthroughs driven by scientific methodology. "I wouldn't be surprised if it happened in the next decade," he said.

When asked by The Sunday Times about reports that Apple is in talks to use Google's Gemini AI model in its smartphones, Hassabis declined to comment specifically.

However, he noted that "Google historically has had many very deep partnerships, from hardware to software products, and I expect that to continue."

You have read 2 of our articles this month. Thank you for your interest!

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:

Bank transfer

Summary
  • Demis Hassabis, founder of Deepmind, believes that AI is both overrated and underrated. The hype and big investments have obscured real scientific progress. Many start-ups are likely to fail due to the enormous computational requirements.
  • According to Hassabis, AI could be the beginning of a golden age of scientific discovery. With AlphaFold and the startup Insilico, Deepmind wants to use this to develop new drugs.
  • Hassabis did not specifically comment on reports of a possible partnership between Apple and Google to use the Gemini AI model. But he emphasized Google's tradition of deep partnerships, from hardware to software, which he expects to continue in the future.
Sources

Financial Times The Sunday Times Image: George Gillams Flickr CC BY-SA 4.0
 
Top