bnew

Veteran
Joined
Nov 1, 2015
Messages
56,195
Reputation
8,249
Daps
157,876

META

Mark Zuckerberg’s new goal is creating artificial general intelligence​

And he wants Meta to open source it. Eventually. Maybe.​


By Alex Heath
Alex Heath Profile and Activity - The Verge, a deputy editor and author of the Command Line newsletter. He’s covered the tech industry for over a decade at The Information and other outlets.

Jan 18, 2024, 12:59 PM EST103 Comments / 103 New



246967_Meta_Zuckerberg_Interview_final4_CVirginia.jpg

Cath Virginia / The Verge | Photos by Getty Images

Fueling the generative AI craze is a belief that the tech industry is on a path to achieving superhuman, god-like intelligence.

OpenAI’s stated mission is to create this artificial general intelligence, or AGI. Demis Hassabis, the leader of Google’s AI efforts, has the same goal.

Now, Meta CEO Mark Zuckerberg is entering the race. While he doesn’t have a timeline for when AGI will be reached, or even an exact definition for it, he wants to build it. At the same time, he’s shaking things up by moving Meta’s AI research group, FAIR, to the same part of the company as the team building generative AI products across Meta’s apps. The goal is for Meta’s AI breakthroughs to more directly reach its billions of users.

“We’ve come to this view that, in order to build the products that we want to build, we need to build for general intelligence,” Zuckerberg tells me in an exclusive interview. “I think that’s important to convey because a lot of the best researchers want to work on the more ambitious problems.”

Here, Zuckerberg is saying the quiet part aloud. The battle for AI talent has never been more fierce, with every company in the space vying for an extremely small pool of researchers and engineers. Those with the needed expertise can command eye-popping compensation packages to the tune of over $1 million a year. CEOs like Zuckerberg are routinely pulled in to try to win over a key recruit or keep a researcher from defecting to a competitor.

“We’re used to there being pretty intense talent wars,” he says. “But there are different dynamics here with multiple companies going for the same profile, [and] a lot of VCs and folks throwing money at different projects, making it easy for people to start different things externally.”

After talent, the scarcest resource in the AI field is the computing power needed to train and run large models. On this topic, Zuckerberg is ready to flex. He tells me that, by the end of this year, Meta will own more than 340,000 of Nvidia’s H100 GPUs — the industry’s chip of choice for building generative AI.

“We have built up the capacity to do this at a scale that may be larger than any other individual company”

External research has pegged Meta’s H100 shipments for 2023 at 150,000, a number that is tied only with Microsoft’s shipments and at least three times larger than everyone else’s. When its Nvidia A100s and other AI chips are accounted for, Meta will have a stockpile of almost 600,000 GPUs by the end of 2024, according to Zuckerberg.

“We have built up the capacity to do this at a scale that may be larger than any other individual company,” he says. “I think a lot of people may not appreciate that.”

The realization

No one working on AI, including Zuckerberg, seems to have a clear definition for AGI or an idea of when it will arrive.

“I don’t have a one-sentence, pithy definition,” he tells me. “You can quibble about if general intelligence is akin to human level intelligence, or is it like human-plus, or is it some far-future super intelligence. But to me, the important part is actually the breadth of it, which is that intelligence has all these different capabilities where you have to be able to reason and have intuition.”

Related

Inside Meta’s big AI reorg

He sees its eventual arrival as being a gradual process, rather than a single moment. “I’m not actually that sure that some specific threshold will feel that profound.”

As Zuckerberg explains it, Meta’s new, broader focus on AGI was influenced by the release of Llama 2, its latest large language model, last year. The company didn’t think that the ability for it to generate code made sense for how people would use a LLM in Meta’s apps. But it’s still an important skill to develop for building smarter AI, so Meta built it anyway.

“One hypothesis was that coding isn’t that important because it’s not like a lot of people are going to ask coding questions in WhatsApp,” he says. “It turns out that coding is actually really important structurally for having the LLMs be able to understand the rigor and hierarchical structure of knowledge, and just generally have more of an intuitive sense of logic.”

“Our ambition is to build things that are at the state of the art and eventually the leading models in the industry”

Meta is training Llama 3 now, and it will have code-generating capabilities, he says. Like Google’s new Gemini model, another focus is on more advanced reasoning and planning abilities.

“Llama 2 wasn’t an industry-leading model, but it was the best open-source model,” he says. “With Llama 3 and beyond, our ambition is to build things that are at the state of the art and eventually the leading models in the industry.”

Open versus closed

The question of who gets to eventually control AGI is a hotly debated one, as the near implosion of OpenAI recently showed the world.

Zuckerberg wields total power at Meta thanks to his voting control over the company’s stock. That puts him in a uniquely powerful position that could be dangerously amplified if AGI is ever achieved. His answer is the playbook that Meta has followed so far for Llama, which can — at least for most use cases — be considered open source.

“I tend to think that one of the bigger challenges here will be that if you build something that’s really valuable, then it ends up getting very concentrated,” Zuckerberg says. “Whereas, if you make it more open, then that addresses a large class of issues that might come about from unequal access to opportunity and value. So that’s a big part of the whole open-source vision.”

Without naming names, he contrasts Meta’s approach to that of OpenAI’s, which began with the intention of open sourcing its models but has becoming increasingly less transparent. “There were all these companies that used to be open, used to publish all their work, and used to talk about how they were going to open source all their work. I think you see the dynamic of people just realizing, ‘Hey, this is going to be a really valuable thing, let’s not share it.’”

While Sam Altman and others espouse the safety benefits of a more closed approach to AI development, Zuckerberg sees a shrewd business play. Meanwhile, the models that have been deployed so far have yet to cause catastrophic damage, he argues.

“The biggest companies that started off with the biggest leads are also, in a lot of cases, the ones calling the most for saying you need to put in place all these guardrails on how everyone else builds AI,” he tells me. “I’m sure some of them are legitimately concerned about safety, but it’s a hell of a thing how much it lines up with the strategy.”

“I’m sure some of them are legitimately concerned about safety, but it’s a hell of a thing how much it lines up with the strategy”

Zuckerberg has his own motivations, of course. The end result of his open vision for AI is still a concentration of power, just in a different shape. Meta already has more users than almost any company on Earth and a wildly profitable social media business. AI features can arguably make his platforms even stickier and more useful. And if Meta can effectively standardize the development of AI by releasing its models openly, its influence over the ecosystem will only grow.

There’s another wrinkle: If AGI is ever achieved at Meta, the call to open source it or not is ultimately Zuckerberg’s. He’s not ready to commit either way.

“For as long as it makes sense and is the safe and responsible thing to do, then I think we will generally want to lean towards open source,” he says. “Obviously, you don’t want to be locked into doing something because you said you would.”

Don’t call it a pivot

In the broader context of Meta, the timing of Zuckerberg’s new AGI push is a bit awkward.

It has been only two years since he changed the company name to focus on the metaverse. Meta’s latest smart glasses with Ray-Ban are showing early traction, but full-fledged AR glasses feel increasingly further out. Apple, meanwhile, has recently validated his bet on headsets with the launch of the Vision Pro, even though VR is still a niche industry.

Zuckerberg, of course, disagrees with the characterization of his focus on AI being a pivot.

“I don’t know how to more unequivocally state that we’re continuing to focus on Reality Labs and the metaverse,” he tells me, pointing to the fact that Meta is still spending north of $15 billion a year on the initiative. Its Ray-Ban smart glasses recently added a visual AI assistant that can identify objects and translate languages. He sees generative AI playing a more critical role in Meta’s hardware efforts going forward.

“I don’t know how to more unequivocally state that we’re continuing to focus on Reality Labs and the metaverse”

He sees a future in which virtual worlds are generated by AI and filled with AI characters that accompany real people. He says a new platform is coming this year to let anyone create their own AI characters and distribute them across Meta’s social apps. Perhaps, he suggests, these AIs will even be able to post their own content to the feeds of Facebook, Instagram, and Threads.

Meta is still a metaverse company. It’s the biggest social media company in the world. It’s now trying to build AGI. Zuckerberg frames all this around the overarching mission of “building the future of connection.”

To date, that connection has been mostly humans interacting with each other. Talking to Zuckerberg, it’s clear that, going forward, it’s increasingly going to be about humans talking to AIs, too. It’s obvious that he views this future as inevitable and exciting, whether the rest of us are ready for it or not.


























 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,195
Reputation
8,249
Daps
157,876

Microsoft makes its AI-powered reading tutor free​

Kyle Wiggers @kyle_l_wiggers / 1:00 PM EST•January 18, 2024
A Microsoft store entrance with the company's logo

Image Credits: Nicolas Economou/NurPhoto / Getty Images

Microsoft today made Reading Coach, its AI-powered tool that provides learners with personalized reading practice, available at no cost to anyone with a Microsoft account.

As of this morning, Reading Coach is accessible on the web in preview — a Windows app is forthcoming. And soon (in late spring), Reading Coach will integrate with learning management systems such as Canva, Microsoft says.

“It’s well known that reading is foundational to a student’s academic success; studies show that fluent readers are four times more likely to graduate high school and get better jobs,” Microsoft writes in a blog post. “With the latest AI technology, we have an opportunity to provide learners with personalized, engaging, and transformative reading experiences.”

Reading Coach builds on Reading Progress, a plug-in for the education-focused version of Microsoft Teams, Teams for Education, designed to help teachers foster reading fluency in their students. Inspired by the success of Reading Progress (evidently), Microsoft launched Reading Coach in 2022 as a part of Teams for Education and Immersive Reader, the company’s cross-platform assistive service for language and reading comprehension.

Microsoft Reading Coach

Image Credits: Microsoft

Reading Coach works by having learners identify words they struggle with the most and presenting them with tools to support independent, individualized practice. Based on an educator’s preferences, the tools available can include text to speech, syllable breaking and picture dictionaries.

After a learner practices in Reading Coach, educators can view their work, including which words the student practiced, how many attempts they made and which tools they used. Educators can also share this information with students if they choose.

Recently, Reading Coach received a spruce-up in the form of a “choose your own story” feature, powered by Microsoft’s Azure OpenAI Service, that lets learners tap AI to generate their own narrative adventure.

Akin to the AI-generated story tool on the Amazon Echo Show, Reading Coach’s “choose your own story” has learners select a character, setting and reading level and have AI create content based on these selections and the learner’s most challenging words. (Microsoft says that story content is moderated and filtered for things like “quality, safety and age appropriateness.”) Reading Coach provides feedback on pronunciation, listening to the learner read the story and awarding badges that unlock new characters and scenes as they progress.

Learners who opt not to create their own story can pick from curated passages in ReadWorks, a library of resources for reading comprehension.

“Reading Coach intrinsically motivates learners to continue advancing their skills in several ways,” Microsoft continues. “With the use of AI in an impactful, safe, responsible way, we believe that personalized learning at scale is within reach.”

Microsoft’s rosy view of AI for teaching reading comprehension isn’t shared by all educators, it’s key to note. Experts say that there isn’t a foolproof tool on the market for measuring comprehension, which involves assessing what students know and the strength of their vocabulary as well as whether they can sound out and pronounce words. Students can inadvertently affect evaluations by pressing a wrong button. Or they might get bored with a task a tool’s presenting to them and disengage, leading to a low score.

All that being said, teachers don’t think tools like Reading Coach can hurt. In a recent EdWeek Research Center survey, 44% of educators said that they think adaptive tech does a better job of accurately assessing a students’ reading level than non-adaptive software or pen-and-paper methods.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,195
Reputation
8,249
Daps
157,876

OpenAI partners with Arizona State University to use ChatGPT in classrooms​

Students and faculty can expect to use more AI on campus.​


By Emilia David, a reporter who covers AI. Prior to joining The Verge, she covered the intersection between technology, finance, and the economy.

Jan 18, 2024, 6:20 PM EST|

acastro_181017_1777_brain_ai_0001.jpg

Illustration by Alex Castro / The Verge

Arizona State University (ASU) and OpenAI announced a partnership to bring ChatGPT into ASU’s classrooms.

In a press release, ASU stated that it wants to focus on “three key areas of concentration” where it can use ChatGPT Enterprise, like “enhancing student success, forging new avenues for innovative research, and streamlining organizational processes.”

ASU deputy chief information officer Kyle Bowen told The Verge, “Our faculty and staff were already using ChatGPT, and after the launch of ChatGPT Enterprise, which for us addressed a lot of the security concerns we had, we believed it made sense to connect with OpenAI.” He added that ASU faculty members, some of whom have expertise in AI, will help guide the usage of generative AI on campus.

The university will begin taking project submissions from faculty and students on where to use ChatGPT beginning in February. Anne Jones, vice provost for undergraduate education, said in an interview some professors already use generative AI in their classes. She mentioned some composition classes that use AI to improve writing and journalism classes that use AI platforms to make multimedia stories. There may even be room for chatbots to act as personalized tutors for ASU students, said Jones.

Jones and Bowen say that universities offer a live testing ground for many generative AI use cases.

“Universities hope to foster critical thinking, so we never considered closing ourselves off from the technology. We want to help determine the conditions in which this technology can be used in education,” Jones said.

Last year, ASU launched an AI accelerator program, bringing researchers and engineers together to create AI-powered services. The university also began offering prompt engineering classes to promote AI literacy.

This is the first partnership between OpenAI and an educational institution. The company has slowly begun to forge collaborations with more public-facing organizations. It announced a deal with the Pennsylvania state government to bring ChatGPT Enterprise to some state employees.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,195
Reputation
8,249
Daps
157,876

AI-Generated, Virtual Products Are Coming to a Podcast Near You​


A startup backed by United Talent Agency and L’Oreal is aiming to give podcasters a new line of revenue

-1x-1.png

A screenshot of the AsianBossGirl podcast and an AI-generated Garnier Fructis poster

By Ashley Carman

January 18, 2024 at 2:26 PM EST


Hello and welcome back to Soundbite. After a year of unrelenting, bad podcast-industry news, we’re looking ahead today to new technology that could help podcasters make more money from their shows as they pivot to video.

As always, reach me through email, and if you haven’t yet subscribed to this newsletter, please do so here . Tell a friend to sign up, too!

First up, a few stories I’m reading:

Audible laid off around 5% of its workforce, or about 100 employees. The content teams were not impacted, and from a quick LinkedIn search, it seems folks on the recruitment team were among those cut.

Condé Nast announced it will fold Pitchfork under the GQ team and is laying off some staff, including Editor-In-Chief Puja Patel. It’s unclear whether this means the Pitchfork brand will cease to exist or what to expect of the site in the future, but as someone who religiously read and checked out every “Best New” music pick, it feels like the end of an era (and my youth).

My colleagues Devin Leonard and Dasha Afanasieva cover the “downfall of Diddy Inc.” A must read.

Podcasters’ newest money-making effort? Using AI to generate virtual products

When watching a somewhat recent episode of the AsianBossGirl podcast, you might not immediately register the lime green, Garnier Fructis poster in the background, just behind the heads of the hosts. That is, until a disclaimer unfurls on the lower third of the screen disclosing Garnier’s sponsorship. But even then, you might be additionally surprised to learn that this poster neatly hanging on the wall doesn’t physically exist at all. In fact, you might have no idea something is up until the poster disappears midway through the episode, making you wonder how it just went...poof.

As it turns out, a company called Rembrand generated the image of the poster using artificial intelligence and placed it in the video of the show after the whole thing had been recorded.

Rembrand believes its technology could power the future of podcasting, and, more ambitiously, all video product-placement in general, moving the form beyond visual effects and relying entirely on AI to do the creating.

The company began formally selling and placing ads on shows this past June and worked consistently with around 50 creators last quarter, said Cory Treffiletti, Rembrand’s chief marketing officer.

To date, the team has raised $9 million, according to Omar Tawakol, chief executive officer. Investors include UTA.VC, an arm of United Talent Agency, and BOLD, the venture-capital fund from L’Oreal SA.

Some UTA-represented podcasts, including AsianBossGirl and Zane and Heath: Unfiltered , have embraced the technology. Janet Wang, co-host and co-founder of AsianBossGirl, said her UTA agent introduced her and her co-hosts to Rembrand. The revenue they make from the virtual products supplements their typical host-read ads. They only began regularly uploading a video version of their show nine months ago, so they hadn’t previously dealt with product placement.

Typically, they record their episodes in batches. With Rembrand, they don’t have to swap out the parts of the set or the items being promoted – they just send the final video to Rembrand, and its team places the ads.

“We thought it was really cool they could add a poster on the wall or products on tables, as long as the brands aligned with our branding and what our viewers would be receptive to,” she said.

The pitch for brands is to make product placement easier and more affordable because they don’t have to ship physical items.

It takes Rembrand’s team a few hours to generate a synthetic version of the product and place it in videos, said Tawakol, though they eventually hope to get the process down to a few minutes.

He said the team pays anywhere from $10 to $30 per thousand impressions, or CPM. Currently, the average podcast CPM sits at $25 for a 60-second ad, according to Advertisecast .

The podcast industry, still reeling from a tough year of tighter ad budgets, might see this as a salve. But Tawakol’s ambitions for the field come from a more practical place.

“What’s good about podcasts is they’re usually indoors and have less motion,” he said, adding that they initially started to work in the space because it’s a relatively easy environment in which to operate.

This isn’t the first time the podcast industry has seen potential in product placement. QCODE, a network known for its fiction podcasts starring A-list talent, forged partnerships with various brands to include in its shows. Its 2022 thriller, Listening In , starring Rachel Brosnahan, featured ads for Johnnie Walker.

Episodes began with the sound of ice cubes and liquid being poured into a glass and a producer asking a smart speaker to play the podcast. A fictional smart speaker voice then says, “Playing Listening In, presented by Johnnie Walker blended Scotch whiskey.”

QCODE also sells ads on AsianBossGirl , according to Wang.

At its core, this is what the pivot to video in podcasting has always been about – the ability to make more money off shows through additional audience and YouTube monetization. (Video also gives shows a better shot at being discovered through YouTube’s own recommendation system and short-form clip platforms.) And while the AI aspect of this technology is just getting started, for podcasters, any new revenue line feels like a win.

Odds and ends

EU Parliament calls for new rules around music streaming services


The EU Parliament is looking to ensure European artists not only receive more attention from music streaming services and their audiences but also are paid fairly as a result. Members of the government called for new rules, including a guarantee that European musical works are “visible, prominent and accessible” and a public label disclosing when a song is AI-generated. The push follows various efforts with similar aims in several other countries, including Uruguay and Canada.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,195
Reputation
8,249
Daps
157,876

Garbage AI on Google News​

JOSEPH COX

JAN 18, 2024 AT 9:33 AM

404 Media reviewed multiple examples of AI rip-offs making their way into Google News. Google said it doesn't focus on how an article was produced—by an AI or human—opening the way for more AI-generated articles.

The Google logo.


Google News is boosting sites that rip-off other outlets by using AI to rapidly churn out content, 404 Media has found. Google told 404 Media that although it tries to address spam on Google News, the company ultimately does not focus on whether a news article was written by an AI or a human, opening the way for more AI-generated content making its way onto Google News.

The presence of AI-generated content on Google News signals two things: first, the black box nature of Google News, with entry into Google News’ rankings in the first place an opaque, but apparently gameable, system. Second, is how Google may not be ready for moderating its News service in the age of consumer-access AI, where essentially anyone is able to churn out a mass of content with little to no regard for its quality or originality.

“I want to read the original stories written by journalists who actually researched them and spoke to primary sources. Any news junkie would,” Brian Penny, a ghostwriter who first flagged some of the seemingly AI-generated articles to 404 Media, said.

💡

Do you know about any other AI-generated content farms? I would love to hear from you. Using a non-work device, you can message me securely on Signal at +44 20 8133 5190. Otherwise, send me an email at joseph@404media.co.

One example was a news site called Worldtimetodays.com, which is littered with full page and other ads. On Wednesday it published an article about Star Wars fandom. The article was very similar to one published a day earlier on the website Distractify, with even the same author photo. One major difference, though, was that Worldtimetodays.com wrote “Let’s be honest, war of stars fans,” rather than Star Wars fans. Another article is a clear rip-off of a piece from Heavy.com, with Worldtimetodays.com not even bothering to replace the Heavy.com watermarked artwork. Gary Graves, the listed author on Worldtimetodays.com, has published more than 40 articles in a 24 hour period.

Both of these rip-off articles appear in Google News search results. The first appears when searching for “Star Wars theory” and setting the results to the past 24 hours. The second appears when searching for the subject of the article with a similar 24 hour setting.

Gallery Image

Gallery Image

LEFT: THE DISTRACTIFY ARTICLE. RIGHT: THE ARTICLE ON WORLDTIMETODAYS.COM.​

Aaron Nobel, editor-in-chief of Heavy.com, told 404 Media in an email that “I was not aware of this particular ripoff or this particular website. But over the years we've encountered many other sites that rip and republish content at scale.” Neither Distractify or Worldtimetodays.com responded to a request for comment.

There are a few different ways to use Google News. One is to simply open the main Google News homepage, where Google surfaces what it thinks are the most important stories of the day. Another is to search for a particular outlet, where you’ll then see recent stories from just that site. A third is to search by “topic,” such as “artificial intelligence,” “Taylor Swift,” or whatever it is you’re interested in. Appearing in topic searches is especially important for outlets looking to garner more attention for their writings on particular beats. 404 Media, at the time of writing does not appear in topic searches (except people, funnily enough, writing about 404 Media, like this Fast Company article about us and other worker-owned media outlets). As in, if you searched “CivitAI,” an artificial intelligence company we’ve investigated extensively, our investigations would not appear in Google News, only people aggregating our work or producing their own would.


In another example of AI-generated rip-off content, Penny sent screenshots of search results for news related to the AI tool “midjourney.” At one point, those included articles from sites such as “WatchdogWire” and “Examiner.com.” These articles appear to use the same images, very similar or identical headlines, and pockets of similar text.

The Examiner.com domain was once used by a legitimate news service and went through various owners and iterations. The site adopted its current branding in around 2022, according to archived versions of the site on the Wayback Machine. With that in mind, it’s worth remembering that some of these sites that more recently pivoted to AI-generated content may have been accepted into Google News long ago, even before the advent of consumer-level AI.

Gallery Image

Gallery Image

Gallery Image

A SERIES OF GOOGLE NEWS SCREENSHOTS PROVIDED BY PENNY.​

Looking at WatchdogWire and Examiner.com more broadly, both sites regularly publish content with the same art and identical or very similar headlines in quick succession every day. Ahmed Baig, one of the listed authors on WatchdogWire, has published more than 500 articles in the past 30 days, according to his author page. Baig did not respond to a request for comment sent over LinkedIn asking whether he was taking work from other outlets and using AI to reword them. Baig lists himself as the editor-in-chief of WatchdogWire, as well as the head of SEO for a company called Sproutica. A contact email for Examiner.com uses the Sproutica domain.

Someone who replied to a request for comment to that address, and who signed off as “Nabeel,” confirmed Examiner.com is using AI to copy other peoples’ articles. “Sometimes it doesn’t perform well by answering out of context text, therefore, my writer proofread the content,” they wrote. “It's an experiment for now which isn't responding as expected in terms of Google Search. Despite publishing 400+ stories it attracted less than 1000 visits.”

The articles on WatchdogWire and Examiner.com are almost always very similar to those published on Watcher.Guru, another news site which also has a popular Twitter account with 2.1 million followers and which regularly goes viral on the platform. When asked if Watcher.Guru has any connection to WatchdogWire or Examiner.com, a person in control of the Watcher.Guru Twitter account told 404 Media in a direct message that “we are not affiliated with these sites. These sites are using AI to steal our content and featured images.”

In another case, Penny sent a screenshot of a Google News result that showed articles from CBC and another outlet called “PiPa News.” The PiPa News piece appears to be a rewrite of the CBC one, with a very similar headline and body of text. PiPa News did not respond to an emailed request for comment. Kerry Kelly from CBC’s public affairs department, said in an email that “We are aware of an increase in outlets and individuals using CBC News articles without proper licensing or attribution, and are working to curb this trend through media monitoring, takedown requests for individual sites, and connecting with social media platforms when appropriate.”




A SCREENSHOT OF WATCHER.GURU'S WEBSITE ON THURSDAY.​




A SCREENSHOT OF EXAMINER.COM'S WEBSITE ON THURSDAY.​

A Google spokesperson said the company focuses on the quality of the content, and not how it was created. Their statement read: “Our focus when ranking content is on the quality of the content, rather than how it was produced. Automatically-generated content produced primarily for ranking purposes is considered spam, and we take action as appropriate under our policies.” Google reiterated that websites are automatically considered for Google News, and that it can take time for the system to identify new websites. The company added that its Google News ranking systems aim to reward original content that demonstrates things such as expertise and trustworthiness.

With that in mind, after 404 Media approached Google for comment, Penny found that the WatchdogWire and Examiner.com results had apparently been removed from search results for the “midjourney” query and another for and “stable diffusion.” Google did not respond when asked multiple times to confirm if it took any action.

404 Media remains outside of news topics results for the beats we cover.









 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,195
Reputation
8,249
Daps
157,876

TikTok can generate AI songs, but it probably shouldn’t​


These AI-generated songs are no AI Drake — yet.

By Emilia David, a reporter who covers AI. Prior to joining The Verge, she covered the intersection between technology, finance, and the economy.

Jan 18, 2024, 6:16 PM EST|2 Comments / 2 New


acastro_200803_1777_tikTok_0001.0.jpg

Illustration by Alex Castro / The Verge

TikTok has launched many songs that have gone viral over the years, but now it’s testing a feature that lets more people exercise their songwriting skills… with some help from AI.

AI Song generates songs from text prompts with help from the large language model Bloom. Users can write out lyrics on the text field when making a post. TikTok will then recommend AI Song to add sounds to the post, and they can toggle the song’s genre.

The Verge reached out to TikTok for comment. The feature was first spotted last week.

AI Song doesn’t seem to be available to everyone yet, but some TikTok users have already begun experimenting with it. The results so far are not great. Many are out of tune despite the availability of auto-tuning vocals. Take this one from TikTok user Jonah Manzano, who created a song that somehow tried to make the word comedy have more syllables than it needs. Another user, Kristi Leilani, sort of recreated a Britney Spears song but, again, with severely out-of-tune vocals.


@kristileilani

What is TikTok AI Song? The new experimental feature, powered by the Bloom LLM, lets creators generate unique songs and lyrics for videos and photos. 🎧 #tiktoknews #newfeature #generativeai #aimusic #aimusicvideo #tiktokai #tiktokaisong #bloom #llm #macinelearning #whatisit #testing

♬ I'm an a i - Kristi Leilani


AI-generated songs, however, are not new to TikTok. The now infamous AI Drake and The Weeknd song “Heart on My Sleeve” gained virality on the platform. Bad Bunny also criticized people for listening to an AI sound-alike posted on TikTok.

TikTok is not the only platform to lean into generative AI features for its users. YouTube began testing a music creation functionality that lets users make songs from either text prompts or a hummed tune. Dream Track allows for 30-second snippets in the style of other popular artists.

To be more transparent, TikTok rolled out other features that help identify AI-created content on the app and updated its rules requiring users to be upfront about using AI in their content.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,195
Reputation
8,249
Daps
157,876







Computer Science > Computation and Language​

[Submitted on 18 Jan 2024]

ChatQA: Building GPT-4 Level Conversational QA Models​

Zihan Liu, Wei Ping, Rajarshi Roy, Peng Xu, Mohammad Shoeybi, Bryan Catanzaro
In this work, we introduce ChatQA, a family of conversational question answering (QA) models, that obtain GPT-4 level accuracies. Specifically, we propose a two-stage instruction tuning method that can significantly improve the zero-shot conversational QA results from large language models (LLMs). To handle retrieval in conversational QA, we fine-tune a dense retriever on a multi-turn QA dataset, which provides comparable results to using the state-of-the-art query rewriting model while largely reducing deployment cost. Notably, our ChatQA-70B can outperform GPT-4 in terms of average score on 10 conversational QA datasets (54.14 vs. 53.90), without relying on any synthetic data from OpenAI GPT models.
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:arXiv:2401.10225 [cs.CL]
(or arXiv:2401.10225v1 [cs.CL] for this version)

Submission history​

From: Wei Ping [view email]
[v1] Thu, 18 Jan 2024 18:59:11 UTC (558 KB)


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,195
Reputation
8,249
Daps
157,876

Stability AI unveils smaller, more efficient 1.6B language model as part of ongoing innovation​

Sean Michael Kerner @TechJournalist

January 19, 2024 3:57 PM

Credit: VentureBeat made with Midjourney

Credit: VentureBeat made with Midjourney

Size certainly matters when it comes to large language models (LLMs) as it impacts where a model can run.


Stability AI, the vendor that is perhaps best known for its stable diffusion text to image generative AI technology, today released one of its smallest models yet, with the debut of Stable LM 2 1.6B. Stable LM is a text content generation LLM that Stability AI first launched in April 2023 with both 3 billion and 7 billion parameter models. The new StableLM model is actually the second model released in 2024 by Stability AI, following the company’s Stable Code 3B launched earlier this week.

The new compact yet powerful Stable LM model aims to lower barriers and enable more developers to participate in the generative AI ecosystem incorporating multilingual data in seven languages – English, Spanish, German, Italian, French, Portuguese, and Dutch. The model utilizes recent algorithmic advancements in language modeling to strike what Stability AI hopes is an optimal balance between speed and performance.

“In general, larger models trained on similar data with a similar training recipe tend to do better than smaller ones,” Carlos Riquelme, Head of the Language Team at Stability AI told VentureBeat. ” However, over time, as new models get to implement better algorithms and are trained on more and higher quality data, we sometimes witness recent smaller models outperforming older larger ones.”



Why smaller is better (this time) with Stable LM​

According to Stability AI, the model outperforms other small language models with under 2 billion parameters on most benchmarks, including Microsoft’s Phi-2 (2.7B), TinyLlama 1.1B,and Falcon 1B.

The new smaller Stable LM is even able to surpass some larger models, including Stability AI’s own earlier Stable LM 3B model.

“Stable LM 2 1.6B performs better than some larger models that were trained a few months ago,” Riquelme said. “If you think about computers, televisions or microchips, we could roughly see a similar trend, they got smaller, thinner and better over time.”

To be clear, the smaller Stable LM 2 1.6B does have some drawbacks due to its size. Stability AI in its release for the new model cautions that,”… due to the nature of small, low-capacity language models, Stable LM 2 1.6B may similarly exhibit common issues such as high hallucination rates or potential toxic language.”



Transparency and more data are core to the new model release​

The more toward smaller more powerful LLM options is one that Stability AI has been on for the last few months.

In December 2023, the StableLM Zephyr 3B model was released, providing more performance to StableLM with a smaller size than the initial iteration back in April.

Riquelme explained that the new Stable LM 2 models are trained on more data, including multilingual documents in 6 languages in addition to English (Spanish, German, Italian, French, Portuguese and Dutch). Another interesting aspect highlighted by Riquelme is the order in which data is shown to the model during training. He noted that it may pay off to focus on different types of data during different training stages.

Going a step further, Stability AI is making the new models available in with pre-trained and fine-tuned options as well as a format that the researchers describe as , “…the last model checkpoint before the pre-training cooldown.”

“Our goal here is to provide more tools and artifacts for individual developers to innovate, transform and build on top of our current model,” Riquelme said. “Here we are providing a specific half-cooked model for people to play with.”

Riquelme explained that during training, the model gets sequentially updated and its performance increases. In that scenario, the very first model knows nothing, while the last one has consumed and hopefully learned most aspects of the data. At the same time, Riquelme said that models may become less malleable towards the end of their training as they are forced to wrap up learning.

“We decided to provide the model in its current form right before we started the last stage of training, so that –hopefully– it’s easier to specialize it to other tasks or datasets people may want to use,” he said. “We are not sure if this will work well, but we really believe in people’s ability to leverage new tools and models in awesome and surprising ways.”





 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,195
Reputation
8,249
Daps
157,876

Google’s new ASPIRE system teaches AI the value of saying ‘I don’t know’​

Michael Nuñez @MichaelFNunez

January 18, 2024 2:06 PM

Credit: VentureBeat made with Midjourney

Credit: VentureBeat made with Midjourney

Google researchers are shaking up the AI world by teaching artificial intelligence to say “I don’t know.” This new approach, dubbed ASPIRE, could revolutionize how we interact with our digital helpers by encouraging them to express doubt when they’re unsure of an answer.

The innovation, showcased at the EMNLP 2023 conference, is all about instilling a sense of caution in AI responses. ASPIRE, which stands for “ Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs,” acts like a built-in confidence meter for AI, helping it to assess its own answers before offering them up.

Imagine you’re asking your smartphone for advice on a health issue. Instead of giving a potentially wrong answer, the AI might respond with, “I’m not sure,” thanks to ASPIRE. This system trains the AI to assign a confidence score to its answers, signaling how much trust we should put in its response.

The team behind this, including Jiefeng Chen and Jinsung Yoon from Google, is pioneering a shift towards more reliable digital decision-making. They argue that it’s crucial for AI, especially when it comes to critical information, to know its limits and communicate them clearly.

“LLMs can now understand and generate language at unprecedented levels, but their use in high-stakes applications is limited because they sometimes make mistakes with high confidence,” said Chen, a researcher at the University of Wisconsin-Madison and co-author of the paper.

Their research indicates that even smaller AI models equipped with ASPIRE can surpass larger ones that lack this introspective feature. This system essentially creates a more cautious and, ironically, a more reliable AI that can acknowledge when a human might be better suited to answer.

By promoting honesty over guesswork, ASPIRE is set to make AI interactions more trustworthy. It paves the way for a future where your AI assistant can be a thoughtful advisor rather than an all-knowing oracle, a future where saying “I don’t know” is actually a sign of advanced intelligence.[/SIZE]
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,195
Reputation
8,249
Daps
157,876

Beyond chatbots: The wide world of embeddings​

Ben dikkson @BenDee983

January 18, 2024 12:23 PM

Credit: VentureBeat made with Midjourney

Credit: VentureBeat made with Midjourney

The growing popularity of large language models (LLM) has also created interest in embedding models, deep learning systems that compress the features of different data types into numerical representations.

Embedding models are one of the key components of retrieval augmented generation (RAG), one of the important applications of LLMs for the enterprise. But the potential of embedding models goes beyond current RAG applications. The past year has seen impressive advances in embedding applications, and 2024 promises to have even more in stock.



How embeddings work​

The basic idea of embeddings is to transform a piece of data such as an image or text document into a list of numbers representing its most important features. Embedding models are trained on large datasets to learn the most relevant features that can tell different types of data apart.

For example, in computer vision, embeddings can represent important features such as the presence of certain objects, shapes, colors, or other visual patterns. In text applications, embeddings can encode semantic information such as concepts, geographical locations, persons, companies, objects, and more.

In RAG applications, embedding models are used to encode the features of a company’s documents. The embedding of each document is then stored in a vector store, a database that specializes in recording and comparing embeddings. At inference time, the application computes the embedding of new prompts and sends them to the vector database to retrieve the documents whose embedding values are closest to that of the prompt. The content of the relevant documents is then inserted into the prompt and the LLM is instructed to generate its responses based on those documents.

This simple mechanism plays a great role in customizing LLMs to respond based on proprietary documents or information that was not included in their training data. It also helps address problems such as hallucinations, where LLMs generate false facts due to a lack of proper information.



Beyond basic RAG​

While RAG has been an important addition to LLMs, the benefits of retrieval and embeddings go beyond matching prompts to documents.

“Embeddings are primarily used for retrieval (and maybe for nice visualizations of concepts),” Jerry Liu, CEO of LlamaIndex, told VentureBeat. “But retrieval itself is actually quite broad and extends beyond simple chatbots for question-answering.”

Retrieval can be a core step in any LLM use case, Liu says. LlamaIndex has been creating tools and frameworks to allow users to match LLM prompts to other types of tasks and data, such as sending commands to SQL databases, extracting information from structured data, long-form generation, or agents that can automate workflows.

“[Retrieval] is a core step towards augmenting the LLM with relevant context, and I imagine most enterprise LLM use cases will need to have retrieval in at least some form,” Liu said.

Embeddings can also be used in applications beyond simple document retrieval. For example, in a recent study, researchers at the University of Illinois at Urbana-Champaign and Tsinghua University used embedding models to reduce the costs of training coding LLMs. They developed a technique that uses embeddings to choose the smallest subset of a dataset that is also diverse and representative of the different types of tasks that the LLM must accomplish. This allowed them to train the model at a high quality with fewer examples.



Embeddings for enterprise applications​

“Vector embeddings introduced the possibility of working with any unstructured and semi-structured data. Semantic search—and, to be honest, RAG is a type of semantic search application—is just one use case,” Andre Zayarni, CEO of Qdrant, told VentureBeat. “Working with data other than textual (image, audio, video) is a big topic, and new multimodal transformers will make it happen.”

Qdrant is already providing services for using embeddings in different applications, including anomaly detection, recommendation, and time-series processing.

“In general, there are a lot of untapped use cases, and the number will grow with upcoming embedding models,” Zayarni said.

More companies are exploring the use of embedding models to examine the large amounts of unstructured data they are generating. For example, embeddings can help companies categorize millions of customer feedback messages or social media posts to detect trends, common themes, and sentiment changes.

“Embeddings are ideal for enterprises looking to sort through huge amounts of data to identify trends and develop insights,” Nils Reimers, Embeddings Lead at Cohere, told VentureBeat.



Fine-tuned embeddings​

2023 saw a lot of progress around fine-tuning LLMs with custom datasets. However, fine-tuning remains a challenge, and few companies with great data and expertise are doing it so far.

“I think there will always be a funnel from RAG to finetuning; people will start with the easiest thing to use (RAG), and then look into fine-tuning as an optimization step,” Liu said. “I anticipate more people will do finetuning this year for LLMs/embeddings as open-source models themselves also improve, but this number will be smaller than the number of people that do RAG unless we somehow have a step-change in making fine-tuning super easy to use.”

Fine-tuning embeddings also has its challenges. For example, embeddings are sensitive to data shifts. If you train them on short search queries, they will not do as well on longer queries, and vice versa. Similarly, if you train them on “what” questions they will not perform as well on “why” questions.

“Currently, enterprises would need very strong in-house ML teams to make embedding finetuning effective, so it’s usually better to use out-of-the-box options, in contrast to other facets of LLM use cases,” Reimers said.

Nonetheless, there have been advances in making the training process for embedding models more efficient. For example, a recent study by Microsoft shows that pre-trained LLMs such as Mistral-7B can be fine-tuned for embedding tasks with a small dataset generated by a strong LLM. This is much simpler than the traditional multi-step process that requires heavy manual labor and expensive data acquisition.

The pace at which LLMs and embedding models are advancing, we can expect more exciting developments in the coming months.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,195
Reputation
8,249
Daps
157,876

Anthropic hits back at music publishers in AI copyright lawsuit, accusing them of ‘volitional conduct’​

Bryson Masse @Bryson_M

January 18, 2024 3:56 PM

A robot lawyer holds a document in front of a counsel bench in a courtroom in a graphic novel style image.

Credit: VentureBeat made with OpenAI DALL-E 3 via ChatGPT Plus

Anthropic, a major generative AI startup, laid out its case why accusations of copyright infringement from a group of music publishers and content owners are invalid in a new court filing on Wednesday.

In fall 2023, music publishers including Concord, Universal, and ABKCO filed a lawsuit against Anthropic accusing it of copyright infringement over its chatbot Claude (now supplanted by Claude 2).

The complaint, filed in federal court in Tennessee (home of Nashville, one of America’s “Music Cities” and to many labels and musicians), alleges that Anthropic’s business profits from “unlawfully” scraping song lyrics from the internet to train its AI models, which then reproduce the copyrighted lyrics for users in the form of chatbot responses.

Responding to a motion for preliminary injunction — a measure that, if granted by the court, would force Anthropic to stop making its Claude AI model available — Anthropic laid out familiar arguments that have emerged in numerous other copyright disputes involving AI training data.

Gen AI companies like OpenAI and Anthropic rely heavily on scraping massive amounts of publicly available data, including copyrighted works, to train their models but they maintain this use constitutes fair use under the law. It’s expected the question of data scraping copyright will reach the Supreme Court.



Song lyrics only a ‘miniscule fracion’ of training data​

In its response, Anthropic argues its “use of Plaintiffs’ lyrics to train Claude is a transformative use” that adds “a further purpose or different character” to the original works.

To support this, the filing directly quotes Anthropic research director Jared Kaplan, stating the purpose is to “create a dataset to teach a neural network how human language works.”

Anthropic contends its conduct “has no ‘substantially adverse impact’ on a legitimate market for Plaintiffs’ copyrighted works,” noting song lyrics make up “a minuscule fraction” of training data and licensing the scale required is incompatible.

Joining OpenAI, Anthropic claims licensing the vast troves of text needed to properly train neural networks like Claude is technically and financially infeasible. Training demands trillions of snippets across genres may be an unachievable licensing scale for any party.

Perhaps the filing’s most novel argument claims the plaintiffs themselves, not Anthropic, engaged in the “volitional conduct” required for direct infringement liability regarding outputs.

Volitional conduct” in copyright law refers to the idea that a person accused of committing infringement must be shown to have control over the infringing content outputs. In this case, Anthropic is essentially saying that the label plaintiffs caused its AI model Claude to produce the infringing content, and thus, are in control of and responsible for the infringement they report, as opposed to Anthropic or its Claude product, which reacts to inputs of users autonomously.

The filing points to evidence the outputs were generated through the plaintiffs’ own “attacks” on Claude designed to elicit lyrics.



Irreparable harm?​

On top of contesting copyright liability, Anthropic maintains the plaintiffs cannot prove irreparable harm.

Citing a lack of evidence that song licensing revenues have decreased since Claude launched or that qualitative harms are “certain and immediate,” Anthropic pointed out that the publishers themselves believe monetary damages could make them whole, contradicting their own claims of “irreparable harm” (as, by definition, accepting monetary damages would indicate the harms do have a price that could be quantified and paid).

Anthropic asserts the “extraordinary relief” of an injunction against it and its AI models is unjustified given the plaintiffs’ weak showing of irreparable harm. It also argued that any output of lyrics by Claude was an unintentional “bug” that has now been fixed through new technological guardrails.

Specifically, Anthropic claims it has implemented additional safeguards in Claude to prevent any further display of the plaintiffs’ copyrighted song lyrics. Because the alleged infringing conduct cannot reasonably occur again, the model maker says the plaintiffs’ request for relief preventing Claude from outputting lyrics is moot.

It contends the music publishers’ request is overbroad, seeking to restrain use not just of the 500 representative works in the case, but millions of others that the publishers further claim to control.

As well, the AI start up pointed to the Tennessee venue and claimed the lawsuit was filed in the incorrect jurisdiction. Anthropic maintained that it has no relevant business connections to Tennessee. The company noted that its headquarters and principal operations are based in California.

Further, Anthropic stated that none of the allegedly infringing conduct cited in the suit, such as training its AI technology or providing user responses, took place within Tennessee’s borders.

The filing pointed out users of Anthropic’s products agreed any disputes would be litigated in California courts.



Copyright fight far from over​

The copyright battle in the burgeoning generative AI industry continues to intensify.

More artists joined lawsuits against art generators like Midjourney and OpenAI with the latter’s DALL-E model, bolstering evidence of infringement from diffusion model reconstructions.

The New York Times recently filed a copyright infringement lawsuit against OpenAI and Microsoft, alleging that their use of scraped Times’ content to train models for ChatGPT and other AI systems violated its copyrights. The suit calls for billions in damages and demands that any models or data trained on Times content be destroyed.

Amid these debates, a nonprofit group called “Fairly Trained” launched this week advocating for a “licensed model” certification for data used to train AI models — supported by Concord and Universal Music Group, among others.

Platforms have also stepped in, with Anthropic, Google and OpenAI as well as content companies like Shutterstock and Adobe pledging legal defenses for enterprise users of AI generated content.

Creators are undaunted though, fighting bids to dismiss claims from authors like Sarah Silverman’s against OpenAI. Judges will need to weigh technological progress and statutory rights in nuanced disputes.

Furthermore, regulators are listening to worries over datamining scopes. Lawsuits and congressional hearings may decide whether fair use shelters proprietary appropriations, frustrating some while enabling others. Overall, negotiations seem inevitable to satisfy all involved as generative AI matures.

What comes next remains unclear, but this week’s filing suggests generative AI companies are coalescing around a core set of fair use and harm-based defenses, forcing courts to weigh technological progress against rights owners’ control.

As VentureBeat reported previously, no copyright plaintiffs so far have won a preliminary injunction in these types of AI disputes. Anthropic’s arguments aim to ensure this precedent will persist, at least through this stage in one of many ongoing legal battles. The endgame remains to be seen.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,195
Reputation
8,249
Daps
157,876

Stability AI releases Stable Code 3B to fill in blanks of AI-powered code generation​

Sean Michael Kerner @TechJournalist

January 16, 2024 4:24 PM

Computer monitor leaks strings of code out into the air.

Credit: VentureBeat made with Visual Electric/Stable Diffusion

Generative AI powered code generation is getting more powerful and more compact.


Stability AI, the vendor that is still perhaps best known for its stable diffusion text to image generative AI technology today announced its first new AI model of 2024: the commercially licensed (via membership) Stable Code 3B.

As the model name implies Stable Code 3B is a 3-billion parameter model, and it is focused on code completion capabilities for software development.

At only 3 billion parameters, Stable Code 3B can run locally on laptops without dedicated GPUs while still providing competitive performance and capabilities against larger models like Meta’s CodeLLaMA 7B.

The push toward smaller, more compact and capable models is one that Stability AI began to push forward at the end of 2023 with models like StableLM Zephyr 3B for text generation.

Stability AI first previewed Stable Code in August 2023 with the code generation LLM’s initial release and has been steadily working on improving the technology ever since.



How Stability AI improved Stable Code 3B​

Stability AI has improved Stable Code in a number of ways since the initial release.

With the new Stable Code 3B not only does the model suggest new lines of code, but it can also fill in larger missing sections in existing code.

The ability to fill in missing sections of code is an advanced code completion capability known as Fill in the Middle (FIM).

The training for the model was also optimized with an expanded context size using a technique known as Rotary Position Embeddings (RoPE ), optionally allowing context length up to 100k tokens. The RoPE technique is one that other LLMs also use, including Meta’s Llama 2 Long.

Stable Code 3B is built on Stability AI’s Stable LM 3B natural language model. With further training focused on software engineering data, the model gained code completion skills while retaining strengths in general language tasks.

Its training data included code repositories, programmer forums, and other technical sources.

It also trained on 18 different programming languages, and Stability AI claims that Stable Code 3B demonstrates leading performance on benchmark tests across multiple languages.

The model covers popular languages like Python, Java, JavaScript, Go, Ruby, and C++. Early benchmarks indicate it matches or exceeds the completion quality of models over twice its size.

The market for generative AI code generation tools is competitive with multiple tools including Meta’s CodeLLaMA 7B being one of the larger and most popular options.

On the 3-billion parameter side, the StarCoder LLM — which is co-developed as an open source effort with the participation of IBM, HuggingFace and ServiceNow — is another popular option.

Stability AI claims Stable Code 3B outperforms StarCoder across Python, C++, JavaScript, Java, PHP and Rust programming languages.



Part of Stability AI’s membership subscription offering​

Stable Code 3B is being made available for commercial use as part of Stability AI’s new membership subscription service that was first announced in December.

Members gain access to Stable Code 3B alongside other AI tools in Stability AI’s portfolio including the SDXL stable diffusion image generation tools, StableLM Zephyr 3B for text content generation, Stable Audio for audio generation, Stable Video for video generation.

image_3dc7d1.png

Image / credit: Stability AI
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,195
Reputation
8,249
Daps
157,876

Runway Gen-2 adds multiple motion controls to AI videos with Multi Motion Brush​

Shubham Sharma @mr_bumss

January 18, 2024 7:44 AM

Screenshot of Runway Motion Brush

Screenshot of Runway Motion Brush

Image Credit: Runway

AI video is still in its early days but the tech is advancing fast. Case in point, New York City-based Runway, a generative AI startup enabling individuals and enterprises to produce videos in different styles, today updated its Gen-2 foundation model with a new tool, Muti Motion Brush, which allows creators to add multiple directions and types of motion to their AI video creations.

The advance is a first-of-its-kind for commercially available AI video projects: all other rival productrs on the market at this stage simply use AI to add motion to the entire image or a selected highlighted area, not multiple areas.

The offering builds upon the Motion Brush feature that first debuted in November 2023, which allowed creators to add only a single type of motion to their video at a time.

Multi Motion Brush was previewed earlier this year through Runway’s Creative Partners Program, which rewards select power users with unlimited plans and pre-release features. But it’s now available for every user of Gen-2, adding to the 30+ tools the model already has on offer for creative video producers.

The move strengthens the product in the rapidly growing creative AI market, which includes players such as Pika Labs and Leonardo AI.



What to expect from Motion Brush?​

The idea with Multi Motion Brush is simple: give users better control over the AI videos they generate by allowing them to add independent motion to areas of choice.

This could be anything, from the movement of a face to the direction of clouds in the sky.

The user starts by uploading a still image as a prompt and “painting” it with a digital brush controlled by their computer cursor.

The user then uses slider controls in Runway’s web interface to select which way they want the painted portions of their image to move and how much (intensity), with multiple paint colors each being controlled independently.

The user can adjust the horizontal, vertical and proximity sliders to define the direction in which the motion is supposed to be executed – left/right, up/down, or closer/further – and hit save.

“Each slider is controlled with a decimal point value with a range from -10 to +10. You can manually input numerical value, drag the text field left or right or use the sliders. If you need to reset everything, click the ‘Clear’ button to reset everything back to 0,” Runway notes on its website.



Gen-2 has been getting improved controls​

The introduction of Multi Motion Brush strengthens the set of tools Runway has on offer to control the video outputs from the Gen-2 model.

Originally unveiled in March 2023, the model introduced text, video and image-based generation and came as a major upgrade over Gen-1, which only supported video-based outputs.

However, in the initial stage, it generated clips only up to four seconds. This changed in August when the company added the ability to extend clips up to 18 seconds.

It also debuted additional features such as a “Director Mode,” allowing users to choose the direction and intensity/speed of the camera movement in generated videos, as well as options to choose the style of the video to be produced – from 3D cartoon and render to cinematic to advertising.

In the space of AI-driven video generation, Runway takes on players like Pika Labs, which recently debuted its web platform Pika 1.0 for video generation, as well as Stability AI’s Stable Video Diffusion models.

The company also offers a text-to-image tool, which takes on offerings like Midjourney and DALL-E 3. However, it is important to note that while the outputs from these tools have improved over time, they are still not perfect and can generate images/videos that are blurred, incomplete or inconsistent in different ways.
 
Top