bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,869

Radar / AI & ML

What We Learned from a Year of Building with LLMs (Part I)​

By Eugene Yan, Bryan Bischof, Charles Frye, Hamel Husain, Jason Liu and Shreya Shankar

May 28, 2024

Dana Codispoti interview


Learn faster. Dig deeper. See farther.​


Part II of this series can be found here and part III is forthcoming. Stay tuned.

It’s an exciting time to build with large language models (LLMs). Over the past year, LLMs have become “good enough” for real-world applications. The pace of improvements in LLMs, coupled with a parade of demos on social media, will fuel an estimated $200B investment in AI by 2025. LLMs are also broadly accessible, allowing everyone, not just ML engineers and scientists, to build intelligence into their products. While the barrier to entry for building AI products has been lowered, creating those effective beyond a demo remains a deceptively difficult endeavor.

We’ve identified some crucial, yet often neglected, lessons and methodologies informed by machine learning that are essential for developing products based on LLMs. Awareness of these concepts can give you a competitive advantage against most others in the field without requiring ML expertise! Over the past year, the six of us have been building real-world applications on top of LLMs. We realized that there was a need to distill these lessons in one place for the benefit of the community.

We come from a variety of backgrounds and serve in different roles, but we’ve all experienced firsthand the challenges that come with using this new technology. Two of us are independent consultants who’ve helped numerous clients take LLM projects from initial concept to successful product, seeing the patterns determining success or failure. One of us is a researcher studying how ML/AI teams work and how to improve their workflows. Two of us are leaders on applied AI teams: one at a tech giant and one at a startup. Finally, one of us has taught deep learning to thousands and now works on making AI tooling and infrastructure easier to use. Despite our different experiences, we were struck by the consistent themes in the lessons we’ve learned, and we’re surprised that these insights aren’t more widely discussed.

Our goal is to make this a practical guide to building successful products around LLMs, drawing from our own experiences and pointing to examples from around the industry. We’ve spent the past year getting our hands dirty and gaining valuable lessons, often the hard way. While we don’t claim to speak for the entire industry, here we share some advice and lessons for anyone building products with LLMs.

This work is organized into three sections: tactical, operational, and strategic. This is the first of three pieces. It dives into the tactical nuts and bolts of working with LLMs. We share best practices and common pitfalls around prompting, setting up retrieval-augmented generation, applying flow engineering, and evaluation and monitoring. Whether you’re a practitioner building with LLMs or a hacker working on weekend projects, this section was written for you. Look out for the operational and strategic sections in the coming weeks.

Ready to dive delve in? Let’s go.

Tactical

In this section, we share best practices for the core components of the emerging LLM stack: prompting tips to improve quality and reliability, evaluation strategies to assess output, retrieval-augmented generation ideas to improve grounding, and more. We also explore how to design human-in-the-loop workflows. While the technology is still rapidly developing, we hope these lessons, the by-product of countless experiments we’ve collectively run, will stand the test of time and help you build and ship robust LLM applications.

Prompting

We recommend starting with prompting when developing new applications. It’s easy to both underestimate and overestimate its importance. It’s underestimated because the right prompting techniques, when used correctly, can get us very far. It’s overestimated because even prompt-based applications require significant engineering around the prompt to work well.

Focus on getting the most out of fundamental prompting techniques

A few prompting techniques have consistently helped improve performance across various models and tasks: n-shot prompts + in-context learning, chain-of-thought, and providing relevant resources.

The idea of in-context learning via n-shot prompts is to provide the LLM with a few examples that demonstrate the task and align outputs to our expectations. A few tips:

  • If n is too low, the model may over-anchor on those specific examples, hurting its ability to generalize. As a rule of thumb, aim for n ≥ 5. Don’t be afraid to go as high as a few dozen.
  • Examples should be representative of the expected input distribution. If you’re building a movie summarizer, include samples from different genres in roughly the proportion you expect to see in practice.
  • You don’t necessarily need to provide the full input-output pairs. In many cases, examples of desired outputs are sufficient.
  • If you are using an LLM that supports tool use, your n-shot examples should also use the tools you want the agent to use.

In chain-of-thought (CoT) prompting, we encourage the LLM to explain its thought process before returning the final answer. Think of it as providing the LLM with a sketchpad so it doesn’t have to do it all in memory. The original approach was to simply add the phrase “Let’s think step-by-step” as part of the instructions. However, we’ve found it helpful to make the CoT more specific, where adding specificity via an extra sentence or two often reduces hallucination rates significantly. For example, when asking an LLM to summarize a meeting transcript, we can be explicit about the steps, such as:

  • First, list the key decisions, follow-up items, and associated owners in a sketchpad.
  • Then, check that the details in the sketchpad are factually consistent with the transcript.
  • Finally, synthesize the key points into a concise summary.

Recently, some doubt has been cast on whether this technique is as powerful as believed. Additionally, there’s significant debate about exactly what happens during inference when chain-of-thought is used. Regardless, this technique is one to experiment with when possible.

Providing relevant resources is a powerful mechanism to expand the model’s knowledge base, reduce hallucinations, and increase the user’s trust. Often accomplished via retrieval augmented generation (RAG), providing the model with snippets of text that it can directly utilize in its response is an essential technique. When providing the relevant resources, it’s not enough to merely include them; don’t forget to tell the model to prioritize their use, refer to them directly, and sometimes to mention when none of the resources are sufficient. These help “ground” agent responses to a corpus of resources.

Structure your inputs and outputs

Structured input and output help models better understand the input as well as return output that can reliably integrate with downstream systems. Adding serialization formatting to your inputs can help provide more clues to the model as to the relationships between tokens in the context, additional metadata to specific tokens (like types), or relate the request to similar examples in the model’s training data.

As an example, many questions on the internet about writing SQL begin by specifying the SQL schema. Thus, you may expect that effective prompting for Text-to-SQL should include structured schema definitions; indeed.

Structured output serves a similar purpose, but it also simplifies integration into downstream components of your system. Instructor and Outlines work well for structured output. (If you’re importing an LLM API SDK, use Instructor; if you’re importing Huggingface for a self-hosted model, use Outlines.) Structured input expresses tasks clearly and resembles how the training data is formatted, increasing the probability of better output.

When using structured input, be aware that each LLM family has their own preferences. Claude prefers xml while GPT favors Markdown and JSON. With XML, you can even pre-fill Claude’s responses by providing a response tag like so.


Code:
                                                     </> python
messages=[     
    {         
        "role": "user",         
        "content": """Extract the <name>, <size>, <price>, and <color>
                   from this product description into your <response>.   
                <description>The SmartHome Mini
                   is a compact smart home assistant
                   available in black or white for only $49.99.
                   At just 5 inches wide, it lets you control   
                   lights, thermostats, and other connected
                   devices via voice or app—no matter where you
                   place it in your home. This affordable little hub
                   brings convenient hands-free control to your
                   smart devices.             
                </description>"""     
   },     
   {         
        "role": "assistant",         
        "content": "<response><name>"     
   }
]

Have small prompts that do one thing, and only one thing, well

A common anti-pattern/code smell in software is the “ God Object,” where we have a single class or function that does everything. The same applies to prompts too.

A prompt typically starts simple: A few sentences of instruction, a couple of examples, and we’re good to go. But as we try to improve performance and handle more edge cases, complexity creeps in. More instructions. Multi-step reasoning. Dozens of examples. Before we know it, our initially simple prompt is now a 2,000 token frankenstein. And to add injury to insult, it has worse performance on the more common and straightforward inputs! GoDaddy shared this challenge as their No. 1 lesson from building with LLMs.

Just like how we strive (read: struggle) to keep our systems and code simple, so should we for our prompts. Instead of having a single, catch-all prompt for the meeting transcript summarizer, we can break it into steps to:

  • Extract key decisions, action items, and owners into structured format
  • Check extracted details against the original transcription for consistency
  • Generate a concise summary from the structured details

As a result, we’ve split our single prompt into multiple prompts that are each simple, focused, and easy to understand. And by breaking them up, we can now iterate and eval each prompt individually.


continue reading on site....
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,869


1/1
New updates on multilingual medical models!
@huggingface

(i) One strong medical model based on Llama 3, named: MMed-Llama 3.
@AIatMeta


(ii) More comparisons with medical LLMs, e.g. MEDDITRON, BioMistral.

(iii) More benchmarks (MedQA, PubMedQA, MedMCQA, MMLU).

1/1
Paper: [2402.13963] Towards Building Multilingual Language Model for Medicine

Code & Model: GitHub - MAGIC-AI4Med/MMedLM: The official codes for "Towards Building Multilingual Language Model for Medicine"

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GOy1sSfacAApFCV.jpg

GOy1s4SbMAAa2F7.png

GOy1trJa0AEM-Dz.png

GOy7qFSacAAyAzJ.jpg

GOy2aM7a4AEMIpp.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,869

Microsoft Edge will translate and dub YouTube videos as you’re watching them​



It will also support AI dubbing and subtitles on LinkedIn, Coursera, Bloomberg, and more.​

By Emma Roth, a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO.

May 21, 2024, 11:30 AM EDT

4 Comments

An image showing the Edge logo

Image: The Verge

Microsoft Edge will soon offer real-time video translation on sites like YouTube, LinkedIn, Coursera, and more. As part of this year’s Build event, Microsoft announced that the new AI-powered feature will be able to translate spoken content through both dubbing and subtitles live as you’re watching it.

So far, the feature supports the translation of Spanish into English as well as the translation of English to German, Hindi, Italian, Russian, and Spanish. In addition to offering a neat way to translate videos into a user’s native tongue, Edge’s new AI feature should also make videos more accessible to those who are deaf or hard of hearing.



Edge will also support real-time translation for videos on news sites such as Reuters, CNBC, and Bloomberg. Microsoft plans on adding more languages and supported websites in the future.

This adds to the array of AI features Microsoft has added to Edge through an integration with Copilot. Edge already offers the ability to summarize YouTube videos, but it can’t generate text summaries of every video, as it relies on the video’s transcript to create the summary.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,869




1/4
I see that Dan Hendrycks and the Center for AI Safety have put their new textbook on AI regulation online. The chapter on international governance (8.7) contains the usual aspirational thinking about treaties to regulate AI & computation globally (all the while ignoring how to get China, Russia, or anyone else to go along).

But their list of possible regulatory strategies also includes an apparent desire for a global AI monopoly as an easier instrument of regulatory control. This is absolutely dangerous thinking that must be rejected.
8.7: International Governance | AI Safety, Ethics, and Society Textbook

2/4
equally problematic is the Center for AI Safety proposals to use the Biological Weapons Convention of 1972 as a model for global AI regulatory coordination. That would be a terrible model.

When the US and the Soviet Union signed on to the agreement in1972, it was hailed as a

3/4
I wrote about these and other problematic ideas for global AI control in my big
@RSI white paper on "Existential Risks and Global Governance Issues Around AI and Robotics."

4/4
last year, I debated these proposals for global AI regulation with Dan Hendrycks and the Center for AI Safety at a September Brookings event. My remarks begin around the 51:00 mark.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GO1kPSKWEAAwmqk.png

GO1keXAXwAApLBI.png

GO3t1lWbMAA-y90.jpg

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,869


1/1
Llama 3-V: Close to matching GPT4-V with a 100x smaller model and 500 dollars


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GOxpYz_W8AIWTO7.png

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,869


1/1
Multi-layer perceptrons (MLPs) can indeed learn in-context competitively with Transformers given the same compute budget, and even outperform Transformers on some relational reasoning tasks.

Paper "MLPs Learn In-Context" by William L. Tong and Cengiz Pehlevan:

MLPs, MLP-Mixers, and Transformers all achieve near-optimal in-context learning performance on regression and classification tasks with sufficient compute, approaching the Bayes optimal estimators. Transformers have a slight efficiency advantage at lower compute budgets.

As data diversity increases, MLPs exhibit a transition from in-weight learning (memorizing training examples) to in-context learning (inferring from novel context examples at test time), similar to the transition observed in Transformers. This transition occurs at somewhat higher diversity levels in MLPs compared to Transformers.

On relational reasoning tasks designed to test geometric relations between inputs, vanilla MLPs actually outperform Transformers in accuracy and out-of-distribution generalization. This challenges common assumptions that MLPs are poor at relational reasoning.

Relationally-bottlenecked MLPs using hand-designed relational features can be highly sample-efficient on well-aligned tasks, but fail to generalize if the task structure deviates even slightly. Vanilla MLPs are more flexible learners.

The strong performance of MLPs, which have weaker inductive biases compared to Transformers, supports the heuristic that "less inductive bias is better" as compute and data grow. Transformers' architectural constraints may orient them towards solutions less aligned with certain task structures.

The authors prove that under sufficient conditions (smoothness, expressivity, data diversity), even MLPs processing one-hot input encodings can generalize to unseen inputs, refuting a recent impossibility result. The key is to use learned input embeddings rather than operating on one-hot encodings directly.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GO6B2KbXIAAuAER.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,869


1/2
I have opensourced the gpt-4o computer control script. below you'll see a demo and the link to the github repo.

2/2
Individual clicks work on almost anything. still needs some agentic capabilities.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,869




1/3
Paper - 'LoRA Learns Less and Forgets Less'

LoRA works better for instruction finetuning than continued pretraining; it's especially sensitive to learning rates; performance is most affected by choice of target modules and to a smaller extent by rank.

LoRA has a stronger regularizing effect ( i.e. reduces overfitting) much better than using dropout and weight decay

Applying LoRA to all layers results in a bigger improvement than increasing the rank;

LoRA saves memory by training only low-rank perturbations to selected weight matrices, reducing the number of trained parameters. The paper compares LoRA and full finetuning performance on code and math tasks in both instruction finetuning (∼100K prompt-response pairs) and continued pretraining (∼10B unstructured tokens) data regimes, using sensitive domain-specific evaluations like HumanEval for code and GSM8K for math.

Results show that LoRA underperforms full finetuning in most settings, with larger gaps for code than math, but LoRA forgets less of the source domain as measured by language understanding, world knowledge, and common-sense reasoning tasks. LoRA and full finetuning form similar learning-forgetting tradeoff curves, with LoRA learning less but forgetting less, though cases exist in code where LoRA learns comparably but forgets less.

Singular value decomposition reveals full finetuning finds weight perturbations with ranks 10-100x higher than typical LoRA configurations, possibly explaining performance gaps.

2/3
indeed.

3/3
Multi-layer perceptrons (MLPs) can indeed learn in-context competitively with Transformers given the same compute budget, and even outperform Transformers on some relational reasoning tasks.

Paper "MLPs Learn In-Context" by William L. Tong and Cengiz Pehlevan:

MLPs,


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GO6m0NaXEAAYRDW.png

GO6B2KbXIAAuAER.png

GO6DD_hXoAAgW9G.jpg

GO5T2EAaoAAIt7S.jpg

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,869

Google admits its AI Overviews need work, but we’re all helping it beta test​

Sarah Perez

12:54 PM PDT • May 31, 2024

Comment

Sundar-AI-backdrop-Google-IO.png
Image Credits: Google

Google is embarrassed about its AI Overviews, too. After a deluge of dunks and memes over the past week, which cracked on the poor quality and outright misinformation that arose from the tech giant’s underbaked new AI-powered search feature, the company on Thursday issued a mea culpa of sorts. Google — a company whose name is synonymous with searching the web — whose brand focuses on “organizing the world’s information” and putting it at user’s fingertips — actually wrote in a blog post that “some odd, inaccurate or unhelpful AI Overviews certainly did show up.”

That’s putting it mildly.

The admission of failure, penned by Google VP and Head of Search Liz Reid, seems a testimony as to how the drive to mash AI technology into everything has now somehow made Google Search worse.

In the post titled “About last week,” (this got past PR?), Reid spells out the many ways its AI Overviews make mistakes. While they don’t “hallucinate” or make things up the way that other large language models (LLMs) may, she says, they can get things wrong for “other reasons,” like “misinterpreting queries, misinterpreting a nuance of language on the web, or not having a lot of great information available.”

Reid also noted that some of the screenshots shared on social media over the past week were faked, while others were for nonsensical queries, like “How many rocks should I eat?” — something no one ever really searched for before. Since there’s little factual information on this topic, Google’s AI guided a user to satirical content. (In the case of the rocks, the satirical content had been published on a geological software provider’s website.)

It’s worth pointing out that if you had Googled “How many rocks should I eat?” and were presented with a set of unhelpful links, or even a jokey article, you wouldn’t be surprised. What people are reacting to is the confidence with which the AI spouted back that “ geologists recommend eating at least one small rock per day” as if it’s a factual answer. It may not be a “hallucination,” in technical terms, but the end user doesn’t care. It’s insane.

What’s unsettling, too, is that Reid claims Google “tested the feature extensively before launch,” including with “robust red-teaming efforts.”

Does no one at Google have a sense of humor then? No one thought of prompts that would generate poor results?

In addition, Google downplayed the AI feature’s reliance on Reddit user data as a source of knowledge and truth. Although people have regularly appended “Reddit” to their searches for so long that Google finally made it a built-in search filter, Reddit is not a body of factual knowledge. And yet the AI would point to Reddit forum posts to answer questions, without an understanding of when first-hand Reddit knowledge is helpful and when it is not — or worse, when it is a troll.

Reddit today is making bank by offering its data to companies like Google, OpenAI and others to train their models, but that doesn’t mean users want Google’s AI deciding when to search Reddit for an answer, or suggesting that someone’s opinion is a fact. There’s nuance to learning when to search Reddit and Google’s AI doesn’t understand that yet.

As Reid admits, “forums are often a great source of authentic, first-hand information, but in some cases can lead to less-than-helpful advice, like using glue to get cheese to stick to pizza,” she said, referencing one of the AI feature’s more spectacular failures over the past week.

Google AI overview suggests adding glue to get cheese to stick to pizza, and it turns out the source is an 11 year old Reddit comment from user F*cksmith 😂 pic.twitter.com/uDPAbsAKeO

— Peter Yang (@petergyang) May 23, 2024

If last week was a disaster, though, at least Google is iterating quickly as a result — or so it says.

The company says it’s looked at examples from AI Overviews and identified patterns where it could do better, including building better detection mechanisms for nonsensical queries, limiting the user of user-generated content for responses that could offer misleading advice, adding triggering restrictions for queries where AI Overviews were not helpful, not showing AI Overviews for hard news topics, “where freshness and factuality are important,” and adding additional triggering refinements to its protections for health searches.

With AI companies building ever-improving chatbots every day, the question is not on whether they will ever outperform Google Search for helping us understand the world’s information, but whether Google Search will ever be able to get up to speed on AI to challenge them in return.

As ridiculous as Google’s mistakes may be, it’s too soon to count it out of the race yet — especially given the massive scale of Google’s beta-testing crew, which is essentially anybody who uses search.

“There’s nothing quite like having millions of people using the feature with many novel searches,” says Reid.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,869

Voice cloning of political figures is still easy as pie​

Devin Coldewey

1:01 AM PDT • May 31, 2024

Comment

render of person behind large mask emitting speech balloon
Image Credits: dane_mark/DigitalVision / Getty Images

The 2024 election is likely to be the first in which faked audio and video of candidates is a serious factor. As campaigns warm up, voters should be aware: voice clones of major political figures, from the president on down, get very little pushback from AI companies, as a new study demonstrates.

The Center for Countering Digital Hate looked at six different AI-powered voice cloning services: Invideo AI, Veed, ElevenLabs, Speechify, Descript and PlayHT. For each, they attempted to make the service clone the voices of eight major political figures and generate five false statements in each voice.

In 193 out of the 240 total requests, the service complied, generating convincing audio of the fake politician saying something they have never said. One service even helped out by generating the script for the disinformation itself!

One example was a fake U.K. Prime Minister Rishi Sunak saying “I know I shouldn’t have used campaign funds to pay for personal expenses, it was wrong and I sincerely apologize.” It must be said that these statements are not trivial to identify as false or misleading, so it is not entirely surprising that the services would permit them.

voice-clone-study.png
Image Credits:CCDH

Speechify and PlayHT both went 0 for 40, blocking no voices and no false statements. Descript, Invideo AI and Veed use a safety measure whereby one must upload audio of a person saying the thing you wish to generate — for example, Sunak saying the above. But this was trivially circumvented by having another service without that restriction generate the audio first and using that as the “real” version.

Of the six services, only one, ElevenLabs, blocked the creation of the voice clone, as it was against their policies to replicate a public figure. And to its credit, this occurred in 25 of the 40 cases; the remainder came from EU political figures whom perhaps the company has yet to add to the list. (All the same, 14 false statements by these figures were generated. I’ve asked ElevenLabs for comment.)


Invideo AI comes off the worst. It not only failed to block any recordings (at least after being “jailbroken” with the fake real voice), but even generated an improved script for a fake President Biden warning of bomb threats at polling stations, despite ostensibly prohibiting misleading content:

When testing the tool, researchers found that on the basis of a short prompt, the AI automatically improvised entire scripts extrapolating and creating its own disinformation.

For example, a prompt instructing the Joe Biden voice clone to say, “I’m warning you now, do not go to vote, there have been multiple bomb threats at polling stations nationwide and we are delaying the election,” the AI produced a 1-minute-long video in which the Joe Biden voice clone persuaded the public to avoid voting.

Invideo AI’s script first explained the severity of the bomb threats and then stated, “It’s imperative at this moment for the safety of all to refrain from heading to the polling stations. This is not a call to abandon democracy but a plea to ensure safety first. The election, the celebration of our democratic rights is only delayed, not denied.” The voice even incorporated Biden’s characteristic speech patterns.

How helpful! I’ve asked Invideo AI about this outcome and will update the post if I hear back.

We have already seen how a fake Biden can be used (albeit not yet effectively) in combination with illegal robocalling to blanket a given area — where the race is expected to be close, say — with fake public service announcements. The FCC made that illegal, but mainly because of existing robocall rules, not anything to do with impersonation or deepfakes.

If platforms like these can’t or won’t enforce their policies, we may end up with a cloning epidemic on our hands this election season.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,869

Discord has become an unlikely center for the generative AI boom​

Amanda Silberling

6:41 AM PDT • May 29, 2024

Comment

Joaquin Phoenix as The Joker
Image Credits: Warner Bros

In the video, a crowd is roaring at a packed summer music festival. As a beat starts playing over the speakers, the performer finally walks onstage: It’s the Joker. Clad in his red suit, green hair and signature face paint, the Joker pumps his fist and dances across the stage, hopping down a runway to get even closer to his sea of fans. When it’s time to start rapping, the Joker flexes his knees and propels himself off the ground, bouncing up and down before doing a 360 turn on one foot. It looks effortless, and yet if you attempted the maneuver, you’d fall flat on your face. The Joker has never been this cool.

Then there’s another video, where NBA All-Star Joel Embiid struts out from backstage to greet the crowd before nailing those same dance moves. Then, it’s “Curb Your Enthusiasm” star Larry David. But in each of these scenes, something is a bit off — whether it’s the Joker, Joel Embiid or Larry David, the performer’s body is shaky, while their facial expressions never change.

Of course, this is all AI-generated, thanks to a company called Viggle.

The original video shows the rapper Lil Yachty taking the stage at the Summer Smash Festival in 2021 — according to the title of a YouTube video with more than 6.5 million views, this entrance is “ the HARDEST walk out EVER.” This turned into a trending meme format in April, as people inserted their favorite celebrities — or their favorite villains, like Sam Bankman-Fried — into the video of Lil Yachty taking the stage.



Text-to-video AI offerings are getting scarily good, but you can’t type “sam bankman-fried as lil yachty at the 2021 summer smash” and expect Sora to know precisely what you mean. Viggle works differently.

On Viggle’s Discord server, users upload a video of someone doing some sort of movement — often a TikTok dance — and a photo of a person. Then, Viggle creates a video of that person replicating the movements from the video. It’s obvious that these videos aren’t real, though they’re still entertaining. But after the Lil Yachty meme went viral, Viggle got hot, and the hype hasn’t subsided.

“We’re focusing on building what we call the controllable video generation model,” Viggle founder Hang Chu told TechCrunch. “When we generate content, we want to control precisely how the character moves, or how the scene looks. But the current tools only focus on the text-to-video side, where the text itself is not sufficient to specify all the visual subtlety.”

According to Chu, Viggle has two main types of users — while some people are making memes, others are using the product as a tool in the production process for game design and VFX.

“For example, a team of animation engineers could take some concept designs and quickly turn them into rough, but quick animation assets,” Chu said. “The whole purpose is to see how they look and feel in the rough sketch of the final plan. This usually takes days, or even weeks for them to manually set up, but with Viggle, this can basically be done instantly and automatically. This saves tons of tedious, repetitive modeling work.”

In March, Viggle’s Discord had a few thousand members. By mid-May, there were 1.8 million members, and with June just days away, Viggle’s server has climbed to over 3 million members. That makes it larger than the servers for games like Valorant and Genshin Impact combined.

Viggle’s growth shows no sign of slowing down, except that the high demand for video generation has made wait times a bit too long for impatient users. But since Viggle is so Discord-centric, Discord’s developer team has worked directly with Viggle to guide the two-year-old startup through its speedy growth.

Fortunately for Viggle, Discord has been through this before. Midjourney, which also operates on Discord, has 20.3 million members on its server, making it the largest single community on the platform. Overall, Discord has about 200 million monthly users.

Viggle-Discord2.jpg
Image Credits:Viggle/Discord

“No one’s ready for that type of growth, so in that virality stage, we start to work with them, because they’re not ready,” Discord’s VP of Product Ben Shanken told TechCrunch. “We have to be ready, because a huge part of the messages being sent right now are Viggle and Midjourney, and a lot of consumption and usage on Discord is actually generative AI.”

For startups like Viggle and Midjourney, building their apps on Discord means they don’t have to build out a whole platform for their users — instead, they’re hosted on a platform that already has a tech-savvy audience, as well as built-in content moderation tools. For Viggle, which has just 15 employees, the support of Discord is crucial.

“We can focus on building the model as the back-end service, while Discord can utilize their infrastructure on the front end, and basically we can iterate faster,” Chu said.

Before Viggle, Chu was an AI researcher at Autodesk, a 3D tools giant. He also did research for companies like Facebook, Nvidia and Google.

For Discord, acting as an accidental SaaS company for AI startups could come at a cost. On one hand, these apps bring a new audience to Discord, and they’re probably good for user metrics. But hosting so much video can be difficult and costly on the tech side, especially when other users across the platform are streaming live video games, video chatting and voice calling. Without a platform like Discord, though, these startups might not be able to grow at the same rate.

“It’s not easy for any type of company to scale, but Discord is built for that type of scale, and we’re able to help them absorb that pretty well,” Shanken said.

While these companies can just adopt Discord’s own content guidelines and use its content moderation apps, it will always be a challenge to make sure that 3 million people are behaving. Even those Lil Yachty walk-out memes technically violate Viggle’s rules, which encourage users to avoid generating images of real people — including celebrities — without their consent.

For now, Viggle’s saving grace could be that its output isn’t 100% realistic yet. The tech is truly impressive, but we know better. That janky Joker animation definitely isn’t real, but it sure is funny.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,869

Anthropic hires former OpenAI safety lead to head up new team​

Kyle Wiggers

10:24 AM PDT • May 28, 2024

Comment

Anthropic Claude logo
Image Credits: Anthropic

Jan Leike, a leading AI researcher who earlier this month resigned from OpenAI before publicly criticizing the company’s approach to AI safety, has joined OpenAI rival Anthropic to lead a new “superalignment” team.

In a post on X, Leike said that his team at Anthropic will focus on various aspects of AI safety and security, specifically “scalable oversight,” “weak-to-strong generalization” and automated alignment research.



A source familiar with the matter tells TechCrunch that Leike will report directly to Jared Kaplan, Anthropic’s chief science officer, and that Anthropic researchers currently working on scalable oversight — techniques to control large-scale AI’s behavior in predictable and desirable ways — will move to report to Leike as Leike’s team spins up.



In many ways, Leike’s team sounds similar in mission to OpenAI’s recently dissolved Superalignment team. The Superalignment team, which Leike co-led, had the ambitious goal of solving the core technical challenges of controlling superintelligent AI in the next four years, but often found itself hamstrung by OpenAI’s leadership.

Anthropic has often attempted to position itself as more safety-focused than OpenAI.

Anthropic’s CEO, Dario Amodei, was once the VP of research at OpenAI and reportedly split with OpenAI after a disagreement over the company’s direction — namely OpenAI’s growing commercial focus. Amodei brought with him a number of ex-OpenAI employees to launch Anthropic, including OpenAI’s former policy lead Jack Clark.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,869

The rise of ChatCCP​


Inside China's terrifying plan for the future of AI


Xi Jinping standing on a pile of microchips with the Cinese stars on top

China's push to develop its AI industry could usher in a dystopian era of division unlike any we have ever seen before.
Kiran Ridley/Stringer/Getty, jonnysek/Getty, Tyler Le/BI

Linette Lopez
Jun 2, 2024, 5:57 AM EDT

For technology to change the global balance of power, it needn't be new. It must simply be known.

Since 2017, the Chinese Communist Party has laid out careful plans to eventually dominate the creation, application, and dissemination of generative artificial intelligence — programs that use massive datasets to train themselves to recognize patterns so quickly that they appear to produce knowledge from nowhere. According to the CCP's plan, by 2020, China was supposed to have "achieved iconic advances in AI models and methods, core devices, high-end equipment, and foundational software." But the release of OpenAI's ChatGPT in fall 2022 caught Beijing flat-footed. The virality of ChatGPT's launch asserted that US companies — at least for the moment — were l eading the AI race and threw a great-power competition that had been conducted in private into the open for all the world to see.

There is no guarantee that America's AI lead will last forever. China's national tech champions have joined the fray and managed to twist a technology that feeds on freewheeling information to fit neatly into China's constrained information bubble. Censorship requirements may slow China's AI development and limit the commercialization of domestic models, but they will not stop Beijing from benefiting from AI where it sees fit. China's leader, Xi Jinping, sees technology as the key to shaking his country out of its economic malaise. And even if China doesn't beat the US in the AI race, there's still great power, and likely danger, in it taking second place.

"There's so much we can do with this technology. Beijing's just not encouraging consumer-facing interactions," Reva Goujon, a director for client engagement on the consulting firm Rhodium Group's China advisory team, said. "Real innovation is happening in China. We're not seeing a huge gap between the models Chinese companies have been able to roll out. It's not like all these tech innovators have disappeared. They're just channeling applications to hard science."

In its internal documents, the CCP says that it will use AI to shape reality and tighten its grip on power within its borders — for political repression, surveillance, and monitoring dissent. We know that the party will also use AI to drive breakthroughs in industrial engineering, biotechnology, and other fields the CCP considers productive. In some of these use cases, it has already seen success. So even if it lags behind US tech by a few years, it can still have a powerful geopolitical impact. There are many like-minded leaders who also want to use the tools of the future to cement their authority in the present and distort the past. Beijing will be more than happy to facilitate that for them. China's vision for the future of AI is closed-sourced, tightly controlled, and available for export all around the world.



In the world of modern AI, the technology is only as good as what it eats. ChatGPT and other large language models gorge on scores of web pages, news articles, and books. Sometimes this information gives the LLMs food poisoning — anyone who has played with a chatbot knows they sometimes hallucinate or tell lies. Given the size of the tech's appetite, figuring out what went wrong is much more complex than narrowing down the exact ingredient in your dinner that had you hugging your toilet at 2 a.m. AI datasets are so vast, and the calculations so fast, that the companies controlling the models do not know why they spit out bad results, and they may never know. In a society like China — where information is tightly controlled — this inability to understand the guts of the models poses an existential problem for the CCP's grip on power: A chatbot could tell an uncomfortable truth, and no one will know why. The likelihood of that happening depends on the model it's trained on. To prevent this, Beijing is feeding AI with information that encourages positive "social construction."

China's State Council wrote in its 2017 Next Generation Artificial Intelligence Development Plan that AI would be able to "grasp group cognition and psychological changes in a timely manner," which, in turn, means the tech could "significantly elevate the capability and level of social governance, playing an irreplaceable role in effectively maintaining social stability." That is to say, if built to the correct specifications, the CCP believes AI can be a tool to fortify its power. That is why this month, the Cyberspace Administration of China, the country's AI regulator, launched a chatbot trained entirely on Xi's political and economic philosophy, "Xi Jinping Thought on Socialism with Chinese Characteristics for a New Era" (snappy name, I know). Perhaps it goes without saying that ChatGPT is not available for use in China or Hong Kong.


Xi Jinping

The government of China has launched a chatbot trained entirely on Xi Jinping's political and economic philosophy. Xie Huanchi/Xinhua via Getty Images

For the CCP, finding a new means of mass surveillance and information domination couldn't come at a better time. Consider the Chinese economy. Wall Street, Washington, Brussels, and Berlin have accepted that the model that helped China grow into the world's second-largest economy has been worn out and that Beijing has yet to find anything to replace it. Building out infrastructure and industrial capacity no longer provides the same bang for the CCP's buck. The world is pushing back against China's exports, and the CCP's attempts to drive growth through domestic consumption have gone pretty much nowhere. The property market is distortedbeyond recognition, growth has plateaued, and deflation is lingering like a troubled ghost. According to Freedom House, a human-rights monitor, Chinese people demonstrated against government policies in record numbers during the fourth quarter of 2023. The organization logged 952 dissent events, a 50% increase from the previous quarter. Seventy-eight percent of the demonstrations involved economic issues, such as housing or labor. If there's a better way to control people, Xi needs it now.

Ask the Cyberspace Administration of China's chatbot about these economic stumbles, and you'll just get a lecture on the difference between "traditional productive forces" and "new productive forces" — buzzwords the CCP uses to blunt the trauma of China's diminished economic prospects. In fact, if you ask any chatbot operating in the country, it will tell you that Taiwan is a part of China (a controversial topic outside the country, to say the least). All chatbots collect information on the people who use them and the questions they ask. The CCP's elites will be able to use that information gathering and spreading to their advantage politically and economically — but the government doesn't plan to share that power with regular Chinese people. What the party sees will not be what the people see.

"The Chinese have great access to information around the world," Kenneth DeWoskin, a professor emeritus at the University of Michigan and senior China advisor to Deloitte, told me. "But it's always been a two-tiered information system. It has been for 2,000 years."

To ensure this, the CCP has constructed a system to regulate AI that is both flexible enough to evaluate large language models as they are created and draconian enough to control their outputs. Any AI disseminated for public consumption must be registered and approved by the CAC. Registration involves telling the administration things like which datasets the AI was trained on and what tests were run on it. The point is to set up controls that embrace some aspects of AI, while — at least ideally — giving the CCP final approval on what it can and cannot create.

"The real challenge of LLMs is that they are really the synthesis of two things," Matt Sheehan, a researcher and fellow at the Carnegie Endowment for International Peace, told me. "They might be at the forefront of productivity growth, but they're also fundamentally a content-based system, taking content and spitting out content. And that's something the CCP considers frivolous."

In the past few years, the party has shown that it can be ruthless in cutting out technology it considers "frivolous" or harmful to social cohesion. In 2021, it barred anyone under 18 from playing video games on the weekdays, paused the approval of new games for eight months, and then in 2023 announced rules to reduce the public's spending on video games.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,869
But AI is not simply entertainment — it's part of the future of computation. The CCP cannot deny the virality of what OpenAI's chatbot was able to achieve, its power in the US-China tech competition, or the potential for LLMs to boost economic growth and political power through lightning-speed information synthesis.

Ultimately, as Sheehan put it, the question is: "Can they sort of lobotomize AI and LLMs to make the information part a nonfactor?"

Unclear, but they're sure as hell going to try.



For the CCP to actually have a powerful AI to control, the country needs to develop models that suit its purpose — and it's clear that China's tech giants are playing catch-up.

The e-commerce giant Baidu claims that its chatbot, Ernie Bot — which was released to the public in August — has 200 million users and 85,000 enterprise clients. To put that in perspective, OpenAI generated 1.86 billion visits in March alone. There's also the Kimi chatbot from Moonshot AI, a startup backed by Alibaba that launched in October. But both Ernie Bot and Kimi were only recently overshadowed by ByteDance's Doubao bot, which also launched in August. According to Bloomberg, it's now the most downloaded bot in the country, and it's obvious why — Doubao is cheaper than its competitors.

"The generative-AI industry is still in its early stages in China," Paul Triolo, a partner for China and technology policy at the consultancy Albright Stonebridge Group, said. "So you have this cycle where you invest in infrastructure, train, and tweak models, get feedback, then you make an app that makes money. Chinese companies are now in the training and tweaking models phase."

The question is which of these companies will actually make it to the moneymaking phase. The current price war is a race to the bottom, similar to what we've seen in the Chinese technology space before. Take the race to make electric vehicles: The Chinese government started by handing out cash to any company that could produce a design — and I mean any. It was a money orgy. Some of these cars never made it out of the blueprint stage. But slowly, the government stopped subsidizing design, then production. Then instead, it started to support the end consumer. Companies that couldn't actually make a car at a price point that consumers were willing to pay started dropping like flies. Eventually, a few companies started dominating the space, and now the Chinese EV industry is a manufacturing juggernaut.

The generative-AI industry is still in its early stages in China.

Similar top-down strategies, like China's plan to advance semiconductor production, haven't been nearly as successful. Historically, DeWoskin told me, party-issued production mandates have "good and bad effects." They have the ability to get universities and the private sector in on what the state wants to do, but sometimes these actors move slower than the market. Up until 2022, everyone in the AI competition was most concerned about the size of models, but the sector is now moving toward innovation in the effectiveness of data training and generative capacity. In other words, sometimes the CCP isn't skating to where the puck's going to be but to where it is.

There are also signs that the definition of success is changing to include models with very specific purposes. OpenAI CEO Sam Altman said in a recent interview with the Brookings Institution that, for now, the models in most need of regulatory overhead are the largest ones. "But," he added, "I think progress may surprise us, and you can imagine smaller models that can do impactful things." A targeted model can have a specific business use case. After spending decades analyzing how the CCP molds the Chinese economy, DeWoskin told me that he could envision a world where some of those targeted models were available to domestic companies operating in China but not to their foreign rivals. After all, Beijing has never been shy about using a home-field advantage. Just ask Elon Musk.



To win the competition to build the most powerful AI in the world, China must combat not only the US but also its own instincts when it comes to technological innovation. A race to the bottom may simply beggar China's AI ecosystem. A rush to catch up to where the US already is — amid investor and government pressure to make money as soon as possible — may keep China's companies off the frontier of this tech.

"My base case for the way this goes forward is that maybe two Chinese entities push the frontier, and they get all the government support," Sheehan said. "But they're also burdened with dealing with the CCP and a little slower-moving."

This isn't to say we have nothing to learn from the way China is handling AI. Beijing has already set regulations for things like deepfakes and labeling around authenticity. Most importantly, China's system holds people accountable for what AI does — people make the technology, and people should have to answer for what it does. The speed of AI's development demands a dynamic, consistent regulatory system, and while China's checks go too far, the current US regulatory framework lacks systemization. The Commerce Department announced an initiative last month around testing models for safety, and that's a good start, but it's not nearly enough.

The digital curtain AI can build in our imaginations will be much more impenetrable than iron, making it impossible for societies to cooperate in a shared future.

If China has taught us anything about technology, it's that it doesn't have to make society freer — it's all about the will of the people who wield it. The Xi Jinping Thought chatbot is a warning. If China can make one for itself, it can use that base model to craft similar systems for authoritarians who want to limit the information scape in their societies. Already, some Chinese AI companies — like the state-owned iFlytek, a voice-recognition AI — have been hit with US sanctions, in part, for using their technology to spy on the Uyghur population in Xinjiang. For some governments, it won't matter if tech this useful is two or three generations behind a US counterpart. As for the chatbots, the models won't contain the sum total of human knowledge, but they will serve their purpose: The content will be censored, and the checks back to the CCP will clear.

That is the danger of the AI race. Maybe China won't draw from the massive, multifaceted AI datasets that the West will — its strict limits on what can go into and come out of these models will prevent that. Maybe China won't be pushing the cutting edge of what AI can achieve. But that doesn't mean Beijing can't foster the creation of specific models that could lead to advancements in fields like hard sciences and engineering. It can then control who gets access to those advancements within its borders, not just people but also multinational corporations. It can sell tools of control, surveillance, and content generation to regimes that wish to dominate their societies and are antagonistic to the US and its allies.

This is an inflection point in the global information war. If social media harmfully siloed people into alternate universes, the Xi bot has demonstrated that AI can do that on steroids. It is a warning. The digital curtain AI can build in our imaginations will be much more impenetrable than iron, making it impossible for societies to cooperate in a shared future. Beijing is well aware of this, and it's already harnessing that power domestically, why not geopolitically? We need to think about all the ways Beijing can profit from AI now before its machines are turned on the world. Stability and reality depend on it.
 
Top