bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556




1/5
Retraining generative models solely on their own synthetic data leads to model collapse. But what if the data was curated?

With @Qu3ntinB , @bose_joey , @gauthier_gidel we show that retraining on curated data implicitly optimizes for a reward model ! 🚀

https://arxiv.org/pdf/2407.09499

2/5
Numerous generative models (like Midjourney or Stable Diffusion) return multiple samples to a single prompt. A human then usually picks their preferred sample and posts it on the web, eventually feeding the training of next generation models.

3/5
We prove that the self-consuming loop with curation of synthetic data implicitly maximizes an underlying reward model. The learned distribution converges to maximum reward regions and the reward’s variance collapses. Eventually, only maximum reward samples end up being generated.

4/5
We nuance these conclusions by showing how keeping a positive fraction of real data provides stability. The overall process combines reward maximization through the curated data with KL regularization through the real data, bearing interesting links with alignment methods.

5/5
Finally, we illustrate our theory by conducting experiments on synthetic data and CIFAR10. We empirically demonstrate that data curation inside the self-consuming loop can incur biases in the underlying generative model.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTRw0DxXEAA1a7Z.jpg

GTRxlpBWsAA_k-N.jpg

GTRyezzW0AA78GA.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556

1/11
We're excited to announce the release of the research paper for Stable Audio Open!

This open-weight text-to-audio model generates high-quality stereo audio at 44.1kHz from text prompts. Perfect for synthesizing realistic sounds and field recordings, it runs on consumer-grade GPUs, making it accessible for academic and artistic use.

Learn more here: Stable Audio Open: Research Paper — Stability AI

2/11
thieves

3/11
Remarkable innovation! Replicating realistic audio from text inputs revolutionizes accessibility. How might this empower content creators?

4/11
@360

5/11
Wow, the diffusion model in the autoencoder works really good for audio and music, as it achieves these high parameters 👍

6/11
This is crazy

7/11
Imagine if you didn't fire all the competent people working on SD3!

8/11
GenAiSongs 🔥🎸🎸🎵

9/11
@MattVidPro

10/11
Awesome!

11/11
Oh boy, how much did you steal this time?


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTGajeJaMAAm9w2.jpg



1/2
The paper presents a new open-weight text-to-audio model developed by Stability AI, highlighting its architecture and training with Creative Commons data. The model, a latent diffusion type, can generate variable-length stereo audio at 44.1kHz and competes wit...

2/2
Stable Audio Open


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTMq-vJbkAAL7SA.png

GTFLsY_XIAAjrbC.png






1/6
Stable Audio Open

Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon.

2/6
Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the

3/6
reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.

4/6
paper page: Paper page - Stable Audio Open

5/6
daily papers: Daily Papers - Hugging Face

6/6
Oh dang.

I'm mentioned in this paper.

That basically makes me famous now. 🤩


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTDpgfLWIAAvBEw.jpg
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556

1/3
FIFO-Diffusion - Bob Ross
Stable Audio Tools soundtrack
Created with Visions of Chaos Softology - Visions of Chaos
/search?q=#ai
/search?q=#aiart /search?q=#machinelearning /search?q=#visionsofchaos /search?q=#softology

2/3
Do us next. /search?q=#ROSS

3/3
Hello you have some clean artworks pics.....
I will love to buy a collection of your artworks as an NFT ...message me if you are interested🤝


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/1
FIFO-Diffusion - Salvador Dali
Stable Audio Tools soundtrack
Created with Visions of Chaos Softology - Visions of Chaos
/search?q=#ai
/search?q=#aiart /search?q=#machinelearning /search?q=#visionsofchaos /search?q=#softology


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556





1/11
Super excited to announce Mistral Large 2
- 123B params - fits on a single H100 node
- Natively Multilingual
- Strong code & reasoning
- SOTA function calling
- Open-weights for non-commercial usage

Blog: Large Enough
Weights: mistralai/Mistral-Large-Instruct-2407 · Hugging Face

1/N

2/11
Code & Reasoning
- Trained with 80+ programming languages
- Mistral Large 2 (123B) comparable to GPT-4o, Opus-3 and Llama-3 405B at coding benchmarks
- As compared to Mistral Large 1, significantly reduced hallucinations, improved reliability

2/N

3/11
Instruction Following
Mistral Large 2 is particularly better at following precise instructions and handling long multi-turn conversations.
On Wild Bench, Arena Hard & MT Bench:
- Outperforms Llama 3.1 405B & Opus-3
- Comparable to Sonnet-3.5 and GPT-4o

3/N

4/11
Multilinguality
- Mistral Large 2 is trained on a large proportion of multilingual data.
- Excels in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
- Outperforms Llama-3.1 70B, Comparable to Llama-3.1 405B

4/N

5/11
Tool Use & Function Calling
Mistral Large 2 is equipped with enhanced function calling and retrieval skills and has undergone training to proficiently execute both parallel and sequential function call. It achieves state-of-the-art function calling performance.

5/N

6/11
Mistral Large 2 - Access and Customization
- Available today on La Plateforme and Le Chat
- Available on Google Vertex AI, Azure AI Studio, Amazon Bedrock & IBM Watsonx
- Also available for fine-tuning on La Plateforme along with Mistral Nemo 12B and Codestral 22B

6/N

7/11
So uhhh..... @togethercompute could you get this one too 🥰

Looking forward to checking it out.

8/11
🔥🔥🔥 Wow, wow. Can't wait to see Mistral Large 2 in action! Its performance in better function calling than GPT-4o is impressive.

9/11
yeah , rock & roll !

10/11
Awesome! ✨👏✨

11/11
Open source comeback week!!!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTQy42oaYAIpAVN.jpg

GTQpLZIasAArK7_.jpg

GTQrkIoaYAIzbyo.jpg

GTQsR1qaYAIb-oI.png

GTQspJGbMAA-hTN.png

GTQt8y8aQAA_3ds.png





1/9
Mistral just dropped Large 123B - Multilingual (11 languages), 128K context, trained on 80+ coding lang! Scores close to Meta Llama 405B 🤙

Some notes from the blogpost and the model card:

> MMLU - 84.0% vs 79.3% (70B) vs 85.2% (405B)
> HumanEval - 92% vs 80.5% (70B Ins) vs 89% (405B Ins)
> GSM8K - 93% vs 95.5% (70B Ins) vs 96.8% (405B Ins)

> Multilingual: English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch and Polish.
> Trained on 80+ coding languages specially swift + fortran. Looking strong 💪
> Supports native function calling + structured output.
> Released under Mistral Research License (Non-Commercial)

> Integrated with Transformers 🤗

GPU requirements:

fp16/ bf16 - ~250GB VRAM
fp8/ int8 - ~125GB VRAM
int4 - ~60GB VRAM

GG Mistral, deffo looks impressive, especially the coding abilities! 🔥

2/9
Model checkpoint:

mistralai/Mistral-Large-Instruct-2407 · Hugging Face

3/9
Integrated w/ Transformers! 🤗

4/9
July seems to be a great month for open source!!

5/9
and.. it's not over yet! 🤗

6/9
Yay 😀 Awesome news for Open Source models

7/9
Amazing performance

8/9
Absolutely amazing

9/9



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTQz9P9W0AAanym.jpg

GTQ8U72WwAAJxEH.jpg

GTQ4NsPaMAAkFzl.jpg



1/2
We're thrilled to expand our curated portfolio of models with the addition of @MistralAI's latest LLMs to Vertex AI Model Garden, generally available today via Model-as-a-service (MaaS):

1) Codestral
2) Mistral Large v2
3) Mistral Nemo

Learn more → Codestral and Mistral Large V2 on Vertex AI | Google Cloud Blog

2/2



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556

1/2
RFNet is a training-free approach that bring better prompt understanding to image generation.

Adding support for prompt reasoning, conceptual and metaphorical thinking, imaginative scenarios and more.

The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation

2/2
Too bad, not code released yet. Would be interesting to see how it compares to ELLA


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTPP7SaXEAAb7cu.jpg

GTPP7SOXkAACk5A.jpg

GTPP7SOWAAASlbB.jpg

GTPP7SBWMAAcDlj.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556

1/3
Outfit Anyone

2/3
不是很早的恶心项目不开源。。。怎么又都在转

3/3
半年多了,一直没真正的开源


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196





1/6
OutfitAnyone

Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

Virtual Try-On (VTON) has become a transformative technology, empowering users to experiment with fashion without ever having to physically try on clothing. However, existing methods often struggle with generating high-fidelity and detail-consistent results. While diffusion models, such as Stable Diffusion series, have shown their capability in creating high-quality and photorealistic images, they encounter formidable challenges in conditional generation scenarios like VTON. Specifically, these models struggle to maintain a balance between control and consistency when generating images for virtual clothing trials. OutfitAnyone addresses these limitations by leveraging a two-stream conditional diffusion model, enabling it to adeptly handle garment deformation for more lifelike results. It distinguishes itself with scalability-modulating factors such as pose, body shape and broad applicability, extending from anime to in-the-wild images. OutfitAnyone's performance in diverse scenarios underscores its utility and readiness for real-world deployment.

2/6
paper page: Paper page - OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

3/6
daily papers: Daily Papers - Hugging Face

4/6
demo: OutfitAnyone - a Hugging Face Space by HumanAIGC

5/6
Thank you AK! I am Daiheng, the correspondant author of the paper!

6/6
🥕I love /search?q=#OutfitAnyone Thanks @samuel_ys92


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GBTlbLYWkAAGZzZ.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556

1/2
IMAGDressing-v1 under an IDM VTON test.🤩

2/2
For reference 🤓:


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTQ0z73aYAMbQGp.jpg

GLzCuB0WkAAwQYW.jpg




1/3
IMAGDressing-v1 outputs are crazy🤪

The latest viral Virtual Try-On model allows for much more flexibility in showing different poses, faces, or backgrounds with the clothes try-ons [See attached video].

2/3
IMAGDressing-v1 released with a hosted Gradio demo: Gradio

Provides nice controllability:
- Allows flexible control over garments, faces, poses, scenes
- Utilizes Text for controlling different aspects of gen. images

Example shows @ylecun copying Mark's pose!

3/3
IMAGDressing-v1 model is released on @huggingface Hub: feishen29/IMAGDressing · Hugging Face 🤗

Code: GitHub - muzishen/IMAGDressing: 👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing

Stay tuned for 🤗 Spaces demo!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTQNkT3XEAA42UQ.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556

1/4
🪩 Cooking today: MusiConGen @gradio demo 🕺

Musicians, what's your favorite chords progression, so i can add good ones as examples ? 😉

Stay tuned for demo link coming shortly 🤙

2/4
Here you go, enjoy MusiConGen @gradio demo on @huggingface ✨

Try this progression and tell me if you like it 😊
—› B:min D F#:min E

MusiConGen - a Hugging Face Space by fffiloni

3/4
Sounds pretty good

4/4
Would be the first chord progression i would test
4 Chords | Music Videos | The Axis Of Awesome


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTJcrBAXcAAZ3JO.jpg







1/6
MusiConGen

Rhythm and Chord Control for Transformer-Based Text-to-Music Generation

Existing text-to-music models can produce high-quality audio with great diversity. However, textual prompts alone cannot precisely control temporal musical features such as chords

2/6
and rhythm of the generated music. To address this challenge, we introduce MusiConGen, a temporally-conditioned Transformer-based text-to-music model that builds upon the pretrained MusicGen framework. Our innovation lies in an efficient finetuning mechanism,

3/6
tailored for consumer-grade GPUs, that integrates automatically-extracted rhythm and chords as the condition signal. During inference, the condition can either be musical features extracted from a reference audio signal, or be user-defined symbolic chord

4/6
MusiConGen

Rhythm and Chord Control for Transformer-Based Text-to-Music Generation

Existing text-to-music models can produce high-quality audio with great diversity. However, textual prompts alone cannot precisely control temporal musical features such as chords

5/6
paper page: Paper page - MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation

6/6
daily papers: Daily Papers - Hugging Face


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTJcrBAXcAAZ3JO.jpg

GTJcq_0XEAABS6j.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556

1/9
We've already served half a million huggingchat requests to Llama 405B in the last 24 hours 🤯

Try it out!

2/9
Count me as request #500,001.

Forgot to put mine in yesterday.

3/9
Try it out? Say no more 👍

4/9
Hey I was wondering if the data is saved and released from HuggingChat? Seems like it could be a useful dataset for people to train models with

5/9
How do you guys afford this level of inference?

6/9
That's crazy! Half a million huggingchat requests in 24 hours? What's the secret to Llama 405B's success?

7/9
I’m surprised this thing is still responding to be honest.

8/9
so no need to worry about claude rate limits! lots of love to hf and meta 🖤

9/9
You, as HF team, are legend man


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTSIWlUWUAAmSUr.jpg












1/11
Quantized Llama3.1-405B 🤏🦙

It was a pleasure to collaborate with @AIatMeta, @vllm_project to release the official FP8 version of Llama3.1-405B !
The HF checkpoint is compatible with transformers, TGI, VLLM from day 0 !

Model: meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 · Hugging Face

Why is it good ?⬇️

2/11
We have created other quants (AWQ/GPTQ/BNB) for all model sizes here: Llama 3.1 GPTQ, AWQ, and BNB Quants - a hugging-quants Collection
credits to @alvarobartt @reach_vb @xenovacom

3/11
We have also released a bunch of notebook to show you how to run these models easily with transformers: GitHub - huggingface/huggingface-llama-recipes

4/11
Now, let's come back to the FP8 model. A lot of things were done in order to optimize the accuracy and the speed of the model.

5/11
First, they decided to leverage dynamic scaling factors for better accuracy and optimize the kernels to reduce the overhead of calculating the scales.

6/11
Second, they introduced a static upper bound to cap the dynamic scaling factors. This makes the model more robust again outliers that they observed in some rare prompts. To calibrate this value, they used a diverse set of datasets.

7/11
The kernels used to perform the quantization and the inference are the fbgemm-gpu kernels. Check transformers integration for more details: Add new quant method by SunMarc · Pull Request #32047 · huggingface/transformers

8/11
Third, they opted for row-wise quantization, which computes the scale across the rows for the weights and
across the tokens for activations (A8W8). Their experiments shows that this method is the one that preserves the best the accuracy.

9/11
Fourth, FP8 quantization was only applied to the major linear operators of the model, such as the gate and up and down projections for the FFNs (covering 75% of the inference FLOPs) in order to reduce the accuracy degradation.

10/11
Self-attention FP8 quantization was skipped due to its negative impact on model accuracy. Additionally, they didn't quantize the linear layers in the attention blocks, as they account for less than 5% of the total FLOPs.

11/11
Finally, we end up with a FP8 model for the 405B model that takes around 480GB that can be run on a single node 8*H100. Enjoy !


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTQWemWXYAE7X6p.jpg



1/2

Curious about how Llama3.1 stacks up against its predecessor Llama3? We've got you covered!

Check out our community Space for a hands-on vibe-check on 8b model and see the differences for yourself 😄

2/2
Meta Llama3.1 8B V/s Meta Llama3 8B 🥊 🤼

A Gradio chatbot playground for Llama3.1 8b and Llama3 8b LLMs. Access the demo here: Llama3.1 Vs Llama3 - a Hugging Face Space by Gradio-Community


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556


Intron Health gets backing for its speech recognition tool that recognizes African accents​


Annie Njanja



4:04 AM PDT • July 25, 2024



Comment




Intron health raises $1.6 million pre-seed funding
Image Credits: Intron Health

Voice recognition is getting integrated in nearly all facets of modern living, but there remains a big gap: speakers of minority languages, and those with thick accents or speech disorders like stuttering are typically less able to use speech recognition tools that control applications, transcribe or automate tasks, among other functions.

Tobi Olatunji, founder and CEO of clinical speech recognition startup Intron Health, wants to bridge this gap. He claims that Intron is Africa’s largest clinical database, with its algorithm trained on 3.5 million audio clips (16,000 hours) from over 18,000 contributors, mainly healthcare practitioners, representing 29 countries and 288 accents. Olatunji says that drawing most of its contributors from the healthcare sector ensures that medical terms are pronounced and captured correctly for his target markets.


“Because we’ve already trained on many African accents, it’s very likely that the baseline performance of their access will be much better than any other service they use,” he said, adding that data from Ghana, Uganda and South Africa is growing, and that the startup is confident about deploying the model there.

Olatunji’s interest in health-tech stems from two strands of his experience. First, he got training and practiced as a medical doctor in Nigeria, where he saw first-hand the inefficiencies of the systems in that market, including how much paperwork needed to be filled out, and how hard it was to track all of it.

“When I was a doctor in Nigeria a couple years ago, even during medical school and even now, I get irritated easily doing a repetitive task that is not deserving of human efforts,” he said. “An easy example is we had to write a patient’s name on every lab order you do. And just something that’s simple, let’s say I’m seeing the patients, and they need to get some prescriptions, they need to get some labs. I have to manually write out every order for them. It’s just frustrating for me to have to repeat the patient name over and over on each form, the age, the date, and all that… I’m always asking, how can we do things better? How can we make life easier for doctors? Can we take some tasks away and offload them to another system so that the doctor can spend their time doing things that are very valuable?”

Those questions propelled him to the next phase of his life. Olatunji moved to the U.S. to pursue a initially a masters degree in medical informatics from the University of San Francisco and then another in computer science at Georgia Tech.

He then cut his teeth at a number of tech companies. As a clinical natural language programming (NLP) scientist and researcher at Enlitic, a San Francisco Bay Area company, he built models to automate the extraction of information from radiology text reports. He also served Amazon Web Services as a machine learning scientist. At both Enlitic and Amazon, he focused on natural language processing for healthcare, shaping systems that enable hospitals to run better.



Throughout those experiences, he started to form ideas around how what was being developed and used in the U.S. could be used to improve healthcare in Nigeria and other emerging markets like it.



The original aim of Intron Health, launched in 2020, was to digitize hospital operations in Africa through an Electronic Medical Record (EMR) System. But take-up was challenging: it turned out physicians preferred writing to typing, said Olatunji.

That led him to explore how to improve that more basic problem: how to make physicans’ basic data entry, writing, work better. At first the company looked at third-party solutions for automating tasks such as note taking, and embedding existing speech to text technologies into his EMR program.

There were a lot of issues, however, because of constant mis-transcription. It became clear to Olatunji that thick African accents and the pronunciation of complicated medical terms and names made the adoption of existing foreign transcription tools impractical.

This marked the genesis of Intron Health’s speech recognition technology, which can recognize African accents, and can also be integrated in existing EMRs. The tool has to date been adopted in 30 hospitals across five markets, including Kenya and Nigeria.

There have been some immediate positive outcomes. In one case, Olatunji said, Intron Health has helped reduce the waiting time for radiology results at one of West Africa’s largest hospitals from 48 hours to 20 minutes. Such efficiencies are critical in healthcare provision, especially in Africa, where the doctor to patient ratio remains one of the lowest in the world.


“Hospitals have already spent so much on equipment and technology…Ensuring that they apply these tech is important. We’re able to provide value to help them improve the adoption of the EMR system,” he said.

Looking ahead, the startup is exploring new growth frontiers backed by a $1.6 million pre-seed round, led by Microtraction, with participation from Plug and Play Ventures, Jaza Rift Ventures, Octopus Ventures, Africa Health Ventures, OpenseedVC, Pi Campus, Alumni Angel, Baker Bridge Capital and several angel investors.



In terms of technology, Intron Health is working to perfect noise cancelation, as well as ensuring that the platform works well even in low bandwidths. This is in addition to enabling the transcription of multi-speaker conversations, and integrating text-to-speech capabilities.

The plan, Olatunji says, is to add intelligence systems or decision support tools for tasks such as prescription or lab tests. These tools, he adds, can help reduce doctor errors, and ensure adequate patient care besides speeding up their work.

Intron Health is among the growing number of generative AI startups in the medical space, including Microsoft’s DAX Express, which are reducing administrative tasks for clinicians by generating notes within seconds. The emergence and adoption of these technologies come as the global speech and voice recognition market is projected to be valued at $84.97 billion by 2032, following a CAGR of 23.7% from 2024, according to Fortune Business Insights.



Beyond building voice technologies, Intron is also playing a pivotal role in speech research in Africa, having recently partnered with Google Research, the Bill & Melinda Gates Foundation, and Digital Square at PATH to evaluate popular Large Language Models (LLMs) such as OpenAI’s GPT-4o, Google’s Gemini, and Anthropic’s Claude across 15 countries, to identify strengths, weaknesses, and risks of bias or harm in LLMs. This is all in a bid to ensure that culturally attuned models are available for African clinics and hospitals.

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556


Oversight Board wants Meta to refine its policies around AI-generated explicit images​


Ivan Mehta

3:00 AM PDT • July 25, 2024


Comment

Meta housing
Image Credits: Bryce Durbin / TechCrunch

Following investigations into how Meta handles AI-generated explicit images, the company’s semi-independent observer body, the Oversight Board, is now urging the company to refine its policies around such images. The Board wants Meta to change the terminology it uses from “derogatory” to “non-consensual,” and move its policies on such images to the “Sexual Exploitation Community Standards” section from the “Bullying and Harassment” section.

Right now, Meta’s policies around explicit images generated by AI-generated branch out from a “derogatory sexualized photoshop” rule in its Bullying and Harassment section. The Board also urged Meta to replace the word “photoshop” with a generalized term for manipulated media.



Additionally, Meta prohibits non-consensual imagery if it is “non-commercial or produced in a private setting.” The Board suggested that this clause shouldn’t be mandatory to remove or ban images generated by AI-generated or manipulated without consent.

These recommendations come in the wake of two high-profile cases where explicit, AI-generated images of public figures posted on Instagram and Facebook landed Meta in hot water.

One of these cases involved an AI-generated nude image of an Indian public figure that was posted on Instagram. Several users reported the image but Meta did not take it down, and in fact closed the ticket within 48 hours with no further review. Users appealed that decision but the ticket was closed again. The company only acted after the Oversight Board took up the case, removed the content, and banned the account.

The other AI-generated image resembled a public figure from the U.S. and was posted on Facebook. Meta already had the image in its Media Matching Service (MMS) repository (a bank of images that violate its terms of service that can be used to detect similar images) due to media reports, and it quickly removed the picture when another user uploaded it on Facebook.

Notably, Meta only added the image of the Indian public figure to the MMS bank after the Oversight Board nudged it to. The company apparently told the Board the repository didn’t have the image before then because there were no media reports around the issue.


“This is worrying because many victims of deepfake intimate images are not in the public eye and are either forced to accept the spread of their non-consensual depictions or report every instance,” the Board said in its note.



Breakthrough Trust, an Indian organization that campaigns to reduce online gender-based violence, noted that these issues and Meta’s policies have cultural implications. In comments submitted to the Oversight Board, Breakthrough said non-consensual imagery is often trivialized as an identity theft issue rather than gender-based violence.

“Victims often face secondary victimization while reporting such cases in police stations/courts (“why did you put your picture out etc.” even when it’s not their pictures such as deepfakes). Once on the internet, the picture goes beyond the source platform very fast, and merely taking it down on the source platform is not enough because it quickly spreads to other platforms,” Barsha Charkorborty, the head of media at the organization, wrote to the Oversight Board.

Over a call, Charkorborty told TechCrunch that users often don’t know that their reports have been automatically marked as “resolved” in 48 hours, and Meta shouldn’t apply the same timeline for all cases. Plus, she suggested that the company should also work on building more user awareness around such issues.

Devika Malik, a platform policy expert who previously worked in Meta’s South Asia policy team, told TechCrunch earlier this year that platforms largely rely on user reporting for taking down non-consensual imagery, which might not be a reliable approach when tackling AI-generated media.

“This places an unfair onus on the affected user to prove their identity and the lack of consent (as is the case with Meta’s policy). This can get more error-prone when it comes to synthetic media, and to say, the time taken to capture and verify these external signals enables the content to gain harmful traction,” Malik said.



Aparajita Bharti, Founding Partner of Delhi-based think tank The Quantum Hub (TQH), said that Meta should allow users to provide more context when reporting content, as they might not be aware of the different categories of rule violations under Meta’s policy.

“We hope that Meta goes over and above the final ruling [of the Oversight Board] to enable flexible and user-focused channels to report content of this nature,” she said.

“We acknowledge that users cannot be expected to have a perfect understanding of the nuanced difference between different heads of reporting, and advocated for systems that prevent real issues from falling through the cracks on account of technicalities of Meta content moderation policies.’
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556


‘Model collapse’: Scientists warn against letting AI eat its own tail​


Devin Coldewey

8:01 AM PDT • July 24, 2024


Comment


Ouroboros
Image Credits: mariaflaya / Getty Images

When you see the mythical Ouroboros, it’s perfectly logical to think, “Well, that won’t last.” A potent symbol — swallowing your own tail — but difficult in practice. It may be the case for AI as well, which, according to a new study, may be at risk of “model collapse” after a few rounds of being trained on data it generated itself.

In a paper published in Nature, British and Canadian researchers led by Ilia Shumailov at Oxford show that today’s machine learning models are fundamentally vulnerable to a syndrome they call “model collapse.” As they write in the paper’s introduction:



We discover that indiscriminately learning from data produced by other models causes “model collapse” — a degenerative process whereby, over time, models forget the true underlying data distribution …

How does this happen, and why? The process is actually quite easy to understand.

AI models are pattern-matching systems at heart: They learn patterns in their training data, then match prompts to those patterns, filling in the most likely next dots on the line. Whether you ask, “What’s a good snickerdoodle recipe?” or “List the U.S. presidents in order of age at inauguration,” the model is basically just returning the most likely continuation of that series of words. (It’s different for image generators, but similar in many ways.)

But the thing is, models gravitate toward the most common output. It won’t give you a controversial snickerdoodle recipe but the most popular, ordinary one. And if you ask an image generator to make a picture of a dog, it won’t give you a rare breed it only saw two pictures of in its training data; you’ll probably get a golden retriever or a Lab.

Now, combine these two things with the fact that the web is being overrun by AI-generated content and that new AI models are likely to be ingesting and training on that content. That means they’re going to see a lot of goldens!



And once they’ve trained on this proliferation of goldens (or middle-of-the road blogspam, or fake faces, or generated songs), that is their new ground truth. They will think that 90% of dogs really are goldens, and therefore when asked to generate a dog, they will raise the proportion of goldens even higher — until they basically have lost track of what dogs are at all.



This wonderful illustration from Nature’s accompanying commentary article shows the process visually:

dogs.png
Image Credits:Nature

A similar thing happens with language models and others that, essentially, favor the most common data in their training set for answers — which, to be clear, is usually the right thing to do. It’s not really a problem until it meets up with the ocean of chum that is the public web right now.

Basically, if the models continue eating each other’s data, perhaps without even knowing it, they’ll progressively get weirder and dumber until they collapse. The researchers provide numerous examples and mitigation methods, but they go so far as to call model collapse “inevitable,” at least in theory.

Though it may not play out as the experiments they ran show it, the possibility should scare anyone in the AI space. Diversity and depth of training data is increasingly considered the single most important factor in the quality of a model. If you run out of data, but generating more risks model collapse, does that fundamentally limit today’s AI? If it does begin to happen, how will we know? And is there anything we can do to forestall or mitigate the problem?

The answer to the last question at least is probably yes, although that should not alleviate our concerns.



Qualitative and quantitative benchmarks of data sourcing and variety would help, but we’re far from standardizing those. Watermarks of AI-generated data would help other AIs avoid it, but so far no one has found a suitable way to mark imagery that way (well … I did).

In fact, companies may be disincentivized from sharing this kind of information, and instead hoard all the hyper-valuable original and human-generated data they can, retaining what Shumailov et al. call their “first mover advantage.”

[Model collapse] must be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of LLM-generated content in data crawled from the Internet.

t may become increasingly difficult to train newer versions of LLMs without access to data that were crawled from the Internet before the mass adoption of the technology or direct access to data generated by humans at scale.



Add it to the pile of potentially catastrophic challenges for AI models — and arguments against today’s methods producing tomorrow’s superintelligence.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,753
Reputation
7,916
Daps
148,556


After AgentGPT’s success, Reworkd pivots to web-scraping AI agents​


Maxwell Zeff

8:00 AM PDT • July 24, 2024

Comment


Reworkd2.jpg
Image Credits: Reworkd

Reworkd’s founders went viral on GitHub last year with AgentGPT, a free tool to build AI agents that acquired more than 100,000 daily users in a week. This earned them a spot in Y Combinator’s summer 2023 cohort, but the co-founders quickly realized building general AI agents was too broad. So now Reworkd is a web-scraping company, specifically building AI agents to extract structured data from the public web.

AgentGPT provided a simple interface in a browser where users could create autonomous AI agents. Soon, everyone was raving about how agents were the future of computing.



When the tool took off, Asim Shrestha, Adam Watkins, and Srijan Subedi were still living in Canada and Reworkd didn’t exist. The massive user influx caught them off guard; Subedi, now Reworkd’s COO, said the tool was costing them $2,000 a day in API calls. For that reason, they had to create Reworkd and get funded fast. One of the most popular use cases for AgentGPT was creating web scrapers, a relatively simple but high-volume task, so Reworkd made this its singular focus.

Web scrapers have become invaluable in the AI era. The number one reason organizations use public web data in 2024 is to build AI models, according to Bright Data’s latest report. The problem is that web scrapers are traditionally built by humans and must be customized for specific web pages, making them expensive. But Reworkd’s AI agents can scrape more of the web with fewer humans in the loop.

Customers can give Reworkd a list of hundreds, or even thousands, of websites to scrape and then specify the types of data they’re interested in. Then Reworkd’s AI agents use multimodal code generation to turn this into structured data. Agents generate unique code to scrape each website and extract that data for customers to use as they please.

For example, say you want stats on every NFL player, but every team’s website has a different layout. Instead of building a scraper for each website, Reworkd’s agents do that for you given just links and a description of the data you want to extract. With 32 teams, that could save you hours — but if there were 1,000 teams, it could save you weeks.

Reworkd raised a fresh $2.75 million in seed funding from Paul Graham, AI Grant (Nat Friedman and Daniel Gross’ startup accelerator), SV Angel, General Catalyst and Panache Ventures, among others, the startup exclusively told TechCrunch. Combined with a $1.25 million pre-seed investment last year from Panache Ventures and Y Combinator, this brings Reworkd’s total funding raised to date to $4 million.




AI that can use the internet​


Shortly after forming Reworkd and moving to San Francisco, the team hired Rohan Pandey as a founding research engineer. He currently lives in AGI House SF, one of the Bay Area’s most popular hacker houses for the AI era. One investor described Pandey as a “one person research lab within Reworkd.”


“We see ourselves as the culmination of this 30-year dream of the Semantic Web,” said Pandey in an interview with TechCrunch, referring to a vision of world wide web inventor Tim Berners-Lee in which computers can read the entire internet. “Even though some websites don’t have markup, LLMs can understand the websites in the same ways that humans can, in such that we can expose basically any website as an API. So in some sense, Reworkd is like the universal API layer for the internet.”

Reworkd says it’s able to capture the long tail end of customer data needs, meaning its AI agents are specifically good for scraping thousands of smaller public websites that large competitors often skip over. Others, such as Bright Data, have scrapers for large websites like LinkedIn or Amazon already built out, but it may not be worth the trouble for a human to build a scraper for every small website. Reworkd addresses this concern, but potentially raises others.


What exactly is “public” web data?​


Though web scrapers have existed for decades, they have attracted controversy in the AI era. Unfettered scraping of huge swathes of data has thrown OpenAI and Perplexity into legal trouble: News and media organizations allege the AI companies extracted intellectual property from behind a paywall, reproducing it widely without payment. Reworkd is taking precautions to avoid these issues.

“We look at it as uplifting the accessibility of publicly available information,” said Shrestha, co-founder and CEO of Reworkd, in an interview with TechCrunch. “We’re only allowing information that’s publicly available; we’re not going through sign-in walls or anything like that.”

To go a step further, Reworkd says it’s avoiding scraping news altogether, and being selective about who they work with. Watkins, the company’s CTO, says there are better tools for aggregating news content elsewhere, and it is not their focus.



As an example of what is, Reworkd described their work with Axis, a company that helps policy teams comply with government regulations. Axis uses Reworkd’s AI to extract data from thousands of government regulation documents for many countries across the European Union. Axis then trains and fine-tunes an AI model based on this data and offers it to clients as a product.

Starting a web-scraping company these days could be considered wading into dangerous territory, according to Aaron Fiske, partner at Silicon-Valley based law firm Gunderson Dettmer. The landscape is somewhat fluid right now, and the jury is still out on how “public” web data really is for AI models. However, Fiske says Reworkd’s approach, where customers decide what websites to scrape, may insulate them from legal liability.

“It’s like they invented the copying machine, and there’s this one use case for making copies that turned out to be hugely economically valuable, but also legally, really questionable,” said Fiske in an interview with TechCrunch. “It’s not like web scrapers servicing AI companies is necessarily risky, but working with AI companies that are really interested in harvesting copyrighted content is maybe an issue.”

That’s why Reworkd is being careful about who it works with. Web scrapers have obfuscated much of the blame in potential copyright infringement cases related to AI thus far. In the OpenAI case, Fiske points out that The New York Times did not sue the web scraper that collected its articles, but rather the company that allegedly reproduced its work. But even there, it’s yet to be decided if what OpenAI did was truly copyright infringement.

There’s more evidence that web scrapers are legally in the clear during the AI boom. A court recently ruled in favor of Bright Data after it scraped Facebook and Instagram profiles via the web. One example in the court case was a dataset of 615 million records of Instagram user data, which Bright Data sells for $860,000. Meta sued the company, alleging this violated its terms of service. But a court ruled that this data is public and therefore available to scrape.


Investors think Reworkd scales with the big guys​


Reworkd has attracted big names as early investors, from Y Combinator and Paul Graham to Daniel Gross and Nat Friedman. Some investors say this is because Reworkd’s technology stands to improve, and get cheaper, alongside new models. The startup says OpenAI’s GPT-4o is currently the best for its multimodal code generation and that a lot of Reworkd’s technology wasn’t possible until just a few months ago.


“If you try to compete with the rate of technology progress — not building on top of it — then I think that you’ll have a hard time as a founder,” General Catalyst’s Viet Le told TechCrunch. “Reworkd has the mindset of basing its solution on the rate of progress.”



Reworkd is creating AI agents that address a particular gap in the market; companies need more data because AI is advancing quickly. As more companies build custom AI models specific to their business, Reworkd stands to gain more customers. Fine-tuning models necessitates quality, structured data, and lots of it.

Reworkd says its approach is “self-healing,” meaning that its web scrapers won’t break down due to a web page update. The startup claims to avoid hallucination issues traditionally associated with AI models because Reworkd’s agents are generating code to scrape a website. It’s possible the AI could make a mistake and grab the wrong data from a website, but Reworkd’s team created Banana-lyzer, an open source evaluation framework, to regularly assess its accuracy.

Reworkd doesn’t have a large payroll — the team is just four people — but it does have to take on considerable inference costs for running its AI agents. The startup expects its pricing to get increasingly competitive as these costs trend downward. OpenAI just released GPT-4o mini, a smaller version of its industry-leading model with competitive benchmarks. Innovations like these could make Reworkd more competitive.

Paul Graham and AI Grant did not respond to TechCrunch’s request for comment.
 
Top