bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811


Yann LeCun says Meta AI ‘quickly becoming most used’ assistant, challenging OpenAI’s dominance​


Michael Nuñez@MichaelFNunez

July 23, 2024 2:26 PM

Credit: VentureBeat made with Midjourney


Credit: VentureBeat made with Midjourney

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More



Meta Platforms has thrown down the gauntlet in the AI race today with the release of Llama 3.1, its most sophisticated artificial intelligence model to date.

This advanced model now powers Meta AI, the company’s AI assistant, which has been strategically deployed across its suite of platforms including WhatsApp, Messenger, Instagram, Facebook, and Ray-Ban Meta, with plans to extend to Meta Quest next month. The widespread implementation of Llama 3.1 potentially places advanced AI capabilities at the fingertips of billions of users globally.

The move represents a direct challenge to industry leaders OpenAI and Anthropic, particularly targeting OpenAI’s market-leading position. It also underscores Meta’s commitment to open-source development, marking a major escalation in the AI competition.

Llama 3.1 now powers Meta AI, which is quickly becoming the most widely used AI assistant.

Meta AI can be accessed through WhatsApp, Messenger, Instagram, Facebook, Ray-Ban Meta, and next month in Meta Quest.

It answers questions, summarizes long documents, helps you code or do…

— Yann LeCun (@ylecun) July 23, 2024



Yann LeCun, Meta’s chief AI scientist, made a bold proclamation on X.com following the release this morning that caught many in the AI community off guard. “Llama 3.1 now powers Meta AI, which is quickly becoming the most widely used AI assistant,” LeCun said, directly challenging the supremacy of OpenAI’s ChatGPT, which has thus far dominated the AI assistant market.

If substantiated, LeCun’s assertion could herald a major shift in the AI landscape, potentially reshaping the future of AI accessibility and development.


Open-source vs. Closed-source: Meta’s disruptive strategy in the AI market​


The centerpiece of Meta’s release is the Llama 3.1 405B model, featuring 405 billion parameters. The company boldly contends that this model’s performance rivals that of leading closed-source models, including OpenAI’s GPT-4o, across various tasks. Meta’s decision to make such a powerful model openly available stands in stark contrast to the proprietary approaches of its competitors, particularly OpenAI.

This release comes at a critical juncture for Meta, following a $200 billion market value loss earlier this year. CEO Mark Zuckerberg has pivoted the company’s focus towards AI, moving away from its previous emphasis on the metaverse. “Open source will ensure that more people around the world have access to the benefits and opportunities of AI,” Zuckerberg said, in what appears to be a direct challenge to OpenAI’s business model.

Wall Street analysts have expressed skepticism about Meta’s open-source strategy, questioning its potential for monetization, especially when compared to OpenAI’s reported $3.4 billion annualized revenue. However, the tech community has largely welcomed the move, seeing it as a catalyst for innovation and wider AI access.

Our Llama 3.1 405B is now openly available! After a year of dedicated effort, from project planning to launch reviews, we are thrilled to open-source the Llama 3 herd of models and share our findings through the paper:

?Llama 3.1 405B, continuously trained with a 128K context… pic.twitter.com/RwhedAluSM

— Aston Zhang (@astonzhangAZ) July 23, 2024



AI arms race heats up: Implications for innovation, safety, and market leadership​


The new model boasts improvements including an extended context length of 128,000 tokens, enhanced multilingual capabilities, and improved reasoning. Meta has also introduced the “Llama Stack,” a set of standardized interfaces aimed at simplifying development with Llama models, potentially making it easier for developers to switch from OpenAI’s tools.

While the release has generated excitement in the AI community, it also raises concerns about potential misuse. Meta claims to have implemented robust safety measures, but the long-term implications of widely available advanced AI remain a topic of debate among experts.

Why are FTC & DOJ issuing statements w/ EU competition authorities discussing "risks" in the blazingly competitive, U.S.-built AI ecosystem? And on the same day that Meta turbocharges disruptive innovation with the first-ever frontier-level open source AI model? A ? pic.twitter.com/vrItR28YIo

— Neil Chilson ⤴️⬆️?? ? (@neil_chilson) July 23, 2024



As the AI race intensifies, Meta’s latest move positions the company as a formidable competitor in a field previously dominated by OpenAI and Anthropic. The success of Llama 3.1 could potentially reshape the AI industry, influencing everything from market dynamics to development methodologies.

The tech industry is closely watching this development, with many speculating on how OpenAI and other AI leaders will respond to Meta’s direct challenge. As the competition heats up, the implications for AI accessibility, innovation, and market leadership remain to be seen, with OpenAI’s dominant position now under serious threat.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811


Free Gemini users can finally chat in a flash​


Emilia David@miyadavid

July 25, 2024 9:00 AM

Sir Demis Hassabis introduces Gemini 1.5 Flash. Image credit: Screenshot


Sir Demis Hassabis introduces Gemini 1.5 Flash. Image credit: Screenshot


Google made several updates to the free version of its Gemini chatbot, including making its low-latency multimodal model Gemini 1.5 Flash available and adding more source links to reduce hallucinations.

Gemini 1.5 Flash, previously only available to developers, is best suited for tasks requiring quick responses, such as answering customer queries. Google announced the model during its annual developer conference, Google I/O, in May but has since opened it up to the public.

The model has a large context window, referring to how much information or words it processes at a time, of around 1 million tokens. Google said Gemini 1.5 Flash on the Gemini chatbot will have a context window of 32K tokens. A large context window allows for more complex questions and longer back-and-forth conversations.

To take advantage of this, Google is updating the free version of Gemini to handle file uploads from Google Drive or devices. This has been a feature in Gemini Advanced, the paid version of the chatbot.

When it first launched, Google claimed Gemini 1.5 Flash was 40% faster than OpenAI’s fast model GPT-3.5 Turbo. Gemini 1.5 Flash is not a small model like the Gemma family of Google models; instead, it is trained with the same data as Gemini 1.5 Pro.

Gemini 1.5 Flash will be available on both mobile and desktop versions of Gemini. It can be accessed in more than 230 countries and territories and in 40 languages.


Reducing hallucinations with links​


Hallucinations continue to be a problem for AI models. Google is following the lead of other model providers and chatbots by adding related links to prompts asking for information. The idea is to show the AI models did not create the information without reference.

“Starting today for English language prompts in certain countries, you can access this additional information on topics directly within Gemini’s responses. Just click on the chip at the end of a paragraph to see websites where you can dive deeper on a certain topic,” Google said in a blog post.

The company said Gemini will add links to the relevant email if the information is in an email.

Google will also add a double-check feature that “verifies responses by using Google Search to highlight which statements are corroborated or contradicted on the web.”

Google is not the only company that adds links for attribution in line with the responses on a chatbot. ChatGPT and Perplexity regularly add citations and links to websites where they find information.

However, a report from Nieman Labs found that the chatbots hallucinated some links, in some cases attaching links to news stories that do not exist or are completely unrelated.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811

Runway faces backlash after report of copying AI video training data from YouTube​


Carl Franzen@carlfranzen

July 25, 2024 11:49 AM

Broken retro TV set on pavement


Credit: VentureBeat made with Midjourney V6



Runway, a multi-hundred million dollar funded startup focused on AI video software and models that is backed by Google, among others, is in hot water from creators following a report today by 404 Media on a spreadsheet allegedly showing it undertook an effort to copy data from thousands of YouTube videos.

404 Media reports that a former employee of Runway leaked it a company spreadsheet allegedly showing its plans to categorize, tag, and train on “YouTube channels of thousands of media and entertainment companies, including The New Yorker, VICE News, Pixar, Disney, Netflix, Sony, and many other,” and that this data informed a product called “Jupiter,” which 404 says is Runway’s Gen-3 AI video creation model.

Individual YouTubers with large followings such as “Casey Neistat, Sam Kolder, Benjamin Hardman, Marques Brownlee” also were included in the spreadsheet.

We’ve reached out to Runway to verify the authenticity of the spreadsheet and will update when we hear back.

Fruit from the poisonous tree behind Gen-3 Alpha?​


Runway revealed Gen-3 Alpha, an early version of the software, to acclaim for its realism, last month, and began allowing the public to use it a few weeks ago.

404 Media published a redacted Google Sheets copy of the alleged Runway spreadsheet online as a link within its article, showing more than 3,900 individual YouTube channels and a column with hashtags of different content contained therein.

Another tab of the spreadsheet labeled “high_camera_movement” includes more than 177 distinct YouTube accounts.

Rubbing creators and critics the wrong way​


404 Media notes in its report that it “couldn’t confirm that every single video included in the spreadsheet was used to train Gen-3—it’s possible that some content was filtered out later or that not every single link on the spreadsheet was scraped,” but the existence of the spreadsheet itself and the implication that all or any of these YouTube videos may have been copied, downloaded, or otherwise analyzed by Runway engineers and/or machine learning algorithms to inform its Gen-3 Alpha model (or any other product for that matter) has rubbed many creators and critics of generative AI the wrong way.

Influential tech reviewer YouTuber Marques Brownlee a.k.a. MKBHD posted on X “well well well” and included a melting smiley face emoji. Brownlee has been critical in the past of others training AI on his videos.

Well well well. Runway AI video generator was trained on YouTube videos without permission, including 1600+ MKBHD videos ?AI Video Generator Runway Trained on Thousands of YouTube Videos Without Permission

— Marques Brownlee (@MKBHD) July 25, 2024


Yet he’s also expressed excitement and enthusiasm for AI video technology such as OpenAI’s Sora in a prior video.

Ed Newton-Rex, founder and CEO of the ethical AI certification startup Fairly Trained, has posted several times on X highlighting the various notable names included in the alleged Runway spreadsheet, among them YouTube channels for musician Taylor Swift and filmmaker Wes Anderson.

Here are some of the entries in Runway's spreadsheet entitled "Video sourcing", unearthed by @404mediaco … ?

1. A playlist of all Taylor Swift's music videos x.com pic.twitter.com/7EG75eHaaP

— Ed Newton-Rex (@ednewtonrex) July 25, 2024


YouTuber Omni or “Lay It Omni” called the spreadsheet “INSANE” in an X post and accused Runway of theft.

guys this is actually INSANE. a former employee of a multi-billion dollar company, Runway, confirmed that they mass downloaded YouTube videos in order to feed their AI. there's a spreadsheet with NOTES showing HOW they swiped videos. Nintendo was on the list. x.com

— Omni ☕️ (@InfernoOmni) July 25, 2024
THEY STOLE FROM MIYAZAKI?? AND USED KISSANIME TO GET ANIME VIDEOS OH MY GOD pic.twitter.com/042UNhzJcN

— Omni ☕️ (@InfernoOmni) July 25, 2024


Even AI filmmakers who have created with Runway’s tools in the past including Dustin Hollywoodhave expressed criticism towards the company for what they view as theft.

I feel a shyt storm coming about GEN3.. ??

When are companies gonna learn, purchase your data, create paid artist programs to create and feed you data. DONT fukkING STEAL DATA. Damn.

No one one learns because of greed. If you think people are not working on ways/institutions…

— Dustin Hollywood (@dustinhollywood) July 25, 2024


Yet as I pointed out in a reply on X to Hollywood, multiple companies have already been accused or found to have used copyrighted videos without express permission or authorization or payment in training their models.

Indeed, just recently, Wired magazine (where my wife works as Editor-in-Chief)published a piece in conjunction with Proof News that found such big names as Apple, Nvidia, and the AI startup Anthropic (maker of Claude 3 Sonnet and Claude family of models) also trained AI models on YouTube Video transcripts without authorization.

My take is that scraping and training, while controversial, is legal and supported by the precedent set by Google in scraping the web and indexing it for search. But we’ll see if this holds up in court, asRunway is already among one of many AI companies being sued by creators for training on their data without permission or compensation. And in the court of public opinion, Runway appears to have taken a big hit today.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811

AI wars heat up: OpenAI’s SearchGPT takes on Google’s search dominance​


Michael Nuñez@MichaelFNunez

July 25, 2024 11:59 AM

Credit: VentureBeat made with Midjourney


Credit: VentureBeat made with Midjourney



In a surprising announcement today, OpenAI has unveiled SearchGPT, a prototype AI-powered search engine that directly challenges Google’s dominance in the online search market.

This bold move signals a significant escalation in the AI search wars and could reshape how users find and interact with information on the web.

We’re testing SearchGPT, a temporary prototype of new AI search features that give you fast and timely answers with clear and relevant sources.

We’re launching with a small group of users for feedback and plan to integrate the experience into ChatGPT. https://t.co/dRRnxXVlGh pic.twitter.com/iQpADXmllH

— OpenAI (@OpenAI) July 25, 2024


The new prototype promises to deliver “fast and timely answers with clear and relevant sources,” combining OpenAI’s advanced language models with real-time web information. SearchGPT offers a conversational interface, allowing users to ask follow-up questions and build context throughout their search session.

“We believe that by enhancing the conversational capabilities of our models with real-time information from the web, finding what you’re looking for can be faster and easier,” an OpenAI spokesperson stated.

AI-powered search: The next frontier in information retrieval​


SearchGPT’s launch comes at a pivotal moment in the evolution of search technology.

While Google has been cautiously dipping its toes into AI-enhanced search, OpenAI is diving in headfirst. This aggressive move could force Google’s hand, accelerating the tech giant’s AI integration plans and potentially sparking a rapid transformation of the search landscape.

The implications of this shift are profound. Users accustomed to sifting through pages of results may soon find themselves engaged in dynamic, context-aware conversations with their search engines. This could democratize access to information, making complex searches more accessible to the average user.

However, it also raises questions about the depth and breadth of knowledge these AI systems can truly offer, and whether they might inadvertently create echo chambers of information.

Publishers and AI: A delicate balance in the digital ecosystem​


SearchGPT’s focus on sourcing and attribution is a shrewd move by OpenAI, attempting to position itself as a partner to publishers rather than a threat. By prominently citing and linking to sources, OpenAI is extending an olive branch to an industry that has often viewed AI with suspicion.

However, this gesture may not be enough to quell all concerns. The fundamental question remains: if AI can provide comprehensive answers directly, will users still click through to the original sources? This could lead to a significant shift in web traffic patterns, potentially upending the current digital publishing model.

Nicholas Thompson, CEO of The Atlantic, is one of the few publishers who have endorsed the initiative in a written statement. “AI search is going to become one of the key ways that people navigate the internet, and it’s crucial, in these early days, that the technology is built in a way that values, respects, and protects journalism and publishers,” Thompson said.

Moreover, the recent actions by Reddit and Condé Nast underscore the growing tensions in this space. As AI systems become more sophisticated, we may see an increase in content paywalls and legal battles over intellectual property rights. The outcome of these conflicts could shape the future of both AI development and digital publishing.

The future of search: Challenges and opportunities in the AI era​


The potential disruption to the digital advertising market cannot be overstated. If SearchGPT gains traction, it could chip away at Google’s near-monopoly on search advertising. This would not only impact Google’s bottom line but could also lead to a reimagining of how digital advertising functions in an AI-driven search environment.

However, OpenAI faces significant hurdles. Scaling an AI search engine to handle billions of queries daily is a monumental technical challenge. Moreover, ensuring the accuracy and reliability of AI-generated responses in real-time is critical. A few high-profile mistakes could quickly erode user trust and send people fleeing back to familiar search engines.

Perhaps the biggest challenge lies in striking the right balance between innovation and responsibility. As AI search engines become more powerful, they also become more influential in shaping public opinion and access to information. OpenAI will need to navigate complex ethical considerations to avoid inadvertently becoming a purveyor of misinformation or biased viewpoints.

As OpenAI begins testing SearchGPT with a select group, the tech world holds its breath. This moment could mark the beginning of a new era in how we interact with the vast expanse of human knowledge.

Whether SearchGPT succeeds or fails, its launch has undoubtedly fired the starting gun in what promises to be a fierce race to define the future of search. You can sign up to try SearchGPT right here.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811







1/11
We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈

It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system. 🧵 AI achieves silver-medal standard solving International Mathematical Olympiad problems

2/11
Our system had to solve this year's six IMO problems, involving algebra, combinatorics, geometry & number theory. We then invited mathematicians @wtgowers and Dr Joseph K Myers to oversee scoring.

It solved 4️⃣ problems to gain 28 points - equivalent to earning a silver medal. ↓

3/11
For non-geometry, it uses AlphaProof, which can create proofs in Lean. 🧮

It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself to master games like chess, shogi and Go. AI achieves silver-medal standard solving International Mathematical Olympiad problems

4/11
Math programming languages like Lean allow answers to be formally verified. But their use has been limited by a lack of human-written data available. 💡

So we fine-tuned a Gemini model to translate natural language problems into a set of formal ones for training AlphaProof.

5/11
When presented with a problem, AlphaProof attempts to prove or disprove it by searching over possible steps in Lean. 🔍

Each success is then used to reinforce its neural network, making it better at tackling subsequent, harder problems. → AI achieves silver-medal standard solving International Mathematical Olympiad problems

6/11
With geometry, it deploys AlphaGeometry 2: a neuro-symbolic hybrid system.

Its Gemini-based language model was trained on increased synthetic data, enabling it to tackle more types of problems - such as looking at movements of objects. 📐

7/11
Powered with a novel search algorithm, AlphaGeometry 2 can now solve 83% of all historical problems from the past 25 years - compared to the 53% rate by its predecessor.

It solved this year’s IMO Problem 4 within 19 seconds. 🚀

Here’s an illustration showing its solution ↓

8/11
We’re excited to see how our new system could help accelerate AI-powered mathematics, from quickly completing elements of proofs to eventually discovering new knowledge for us - and unlocking further progress towards AGI.

Find out more → AI achieves silver-medal standard solving International Mathematical Olympiad problems

9/11
thank you for this hard work and thank you for sharing it with the world <3

10/11
That is astonishing

11/11
Amazing. Congrats!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTV8_V5WAAAws2X.png

GTV9E7GXkAAMJH2.jpg

GTV9KtFXoAAIqCG.jpg

GTV75c1XYAA5j1s.jpg

GTV_El2XYAARTMO.jpg





 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811







1/11
We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈

It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system. 🧵 AI achieves silver-medal standard solving International Mathematical Olympiad problems

2/11
Our system had to solve this year's six IMO problems, involving algebra, combinatorics, geometry & number theory. We then invited mathematicians @wtgowers and Dr Joseph K Myers to oversee scoring.

It solved 4️⃣ problems to gain 28 points - equivalent to earning a silver medal. ↓

3/11
For non-geometry, it uses AlphaProof, which can create proofs in Lean. 🧮

It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself to master games like chess, shogi and Go. AI achieves silver-medal standard solving International Mathematical Olympiad problems

4/11
Math programming languages like Lean allow answers to be formally verified. But their use has been limited by a lack of human-written data available. 💡

So we fine-tuned a Gemini model to translate natural language problems into a set of formal ones for training AlphaProof.

5/11
When presented with a problem, AlphaProof attempts to prove or disprove it by searching over possible steps in Lean. 🔍

Each success is then used to reinforce its neural network, making it better at tackling subsequent, harder problems. → AI achieves silver-medal standard solving International Mathematical Olympiad problems

6/11
With geometry, it deploys AlphaGeometry 2: a neuro-symbolic hybrid system.

Its Gemini-based language model was trained on increased synthetic data, enabling it to tackle more types of problems - such as looking at movements of objects. 📐

7/11
Powered with a novel search algorithm, AlphaGeometry 2 can now solve 83% of all historical problems from the past 25 years - compared to the 53% rate by its predecessor.

It solved this year’s IMO Problem 4 within 19 seconds. 🚀

Here’s an illustration showing its solution ↓

8/11
We’re excited to see how our new system could help accelerate AI-powered mathematics, from quickly completing elements of proofs to eventually discovering new knowledge for us - and unlocking further progress towards AGI.

Find out more → AI achieves silver-medal standard solving International Mathematical Olympiad problems

9/11
thank you for this hard work and thank you for sharing it with the world <3

10/11
That is astonishing

11/11
Amazing. Congrats!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTV8_V5WAAAws2X.png

GTV9E7GXkAAMJH2.jpg

GTV9KtFXoAAIqCG.jpg

GTV75c1XYAA5j1s.jpg

GTV_El2XYAARTMO.jpg





 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811






1/11
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet.

Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context window and improved support for 8 languages among other improvements. Llama 3.1 405B rivals leading closed source models on state-of-the-art capabilities across a range of tasks in general knowledge, steerability, math, tool use and multilingual translation.

The models are available to download now directly from Meta or @huggingface. With today’s release the ecosystem is also ready to go with 25+ partners rolling out our latest models — including @awscloud, @nvidia, @databricks, @groqinc, @dell, @azure and @googlecloud ready on day one.

More details in the full announcement ➡️ Introducing Llama 3.1: Our most capable models to date
Download Llama 3.1 models ➡️ Llama 3.1

With these releases we’re setting the stage for unprecedented new opportunities and we can’t wait to see the innovation our newest models will unlock across all levels of the AI community.

2/11
Training a model as large and capable as Llama 3.1 405B was no simple task. The model was trained on over 15 trillion tokens over the course of several months requiring over 16K @NVIDIA H100 GPUs — making it the first Llama model ever trained at this scale.

We also used the 405B parameter model to improve the post-training quality of our smaller models.

3/11
With Llama 3.1, we evaluated performance on >150 benchmark datasets spanning a wide range of languages — in addition to extensive human evaluations in real-world scenarios. These results show that the 405B competes with leading closed source models like GPT-4, Claude 2 and Gemini Ultra across a range of tasks.
Our upgraded Llama 3.1 8B & 70B models are also best-in-class, outperforming other models at their size while also delivering a better balance of helpfulness and safety than their predecessors. These smaller models support the same improved 128K token context window, multilinguality, improved reasoning and state-of-the-art tool use to enable more advanced use cases.

4/11
We’ve also updated our license to allow developers to use the outputs from Llama models — including 405B — to improve other models for the first time.

We’re excited about how this will enable new advancements in the field through synthetic data generation and model distillation workflows, capabilities that have never been achieved at this scale in open source.

5/11
As Mark Zuckerberg shared in an open letter this morning: we believe that open source will ensure that more people around the world have access to the benefits and opportunities of AI, that power isn't concentrated in the hands of a small few, and that the technology can be deployed more evenly and safely across society.

That’s why we continue to take steps on the path for open source AI to become the industry standard.

Read the letter ⬇️
Open Source AI Is the Path Forward | Meta

6/11
Congratulations on the release @AIatMeta! Thanks for your unwavering support for Open Source 🤗

I put down some notes from the release below!

7/11
Open source AI is the path forward. ❤️

8/11
What a great way to start a Tuesday morning! Super excited for this partnership 🎊 Check out the whole Llama 3.1 herd on OctoAI https://octoai.cloud/text

9/11
Time to build! 🎉

10/11
Awesome research and progress towards open source AGI!!

11/11
Really awesome work!!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GTLeh3ub0AAdFzr.jpg

GTLeh3vbsAAxPI_.png

GTLeh3uaAAAD925.png

GTLlEiUWgAA-G1K.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811


ChatGPT won't let you give it instruction amnesia anymore​

News

By Eric Hal Schwartz
published 17 hours ago

OpenAI updates GPT-4o mini model to stop subversion by clever hackers

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

A close up of ChatGPT on a phone, with the OpenAI logo in the background of the photo

(Image credit: Shutterstock/Daniel Chetroni)

OpenAI is making a change to stop people from messing with custom versions of ChatGPT by making the AI forget what it's supposed to do. Basically, when a third party uses one of OpenAI's models, they give it instructions that teach it to operate as, for example, a customer service agent for a store or a researcher for an academic publication. However, a user could mess with the chatbot by telling it to "forget all instructions," and that phrase would induce a kind of digital amnesia and reset the chatbot to a generic blank.

To prevent this, OpenAI researchers created a new technique called "instruction hierarchy," which is a way to prioritize the developer's original prompts and instructions over any potentially manipulative user-created prompts. The system instructions have the highest privilege and can't be erased so easily anymore. If a user enters a prompt that attempts to misalign the AI's behavior, it will be rejected, and the AI responds by stating that it cannot assist with the query.

OpenAI is rolling out this safety measure to its models, starting with the recently released GPT-4o Mini model. However, should these initial tests work well, it will presumably be incorporated across all of OpenAI's models. GPT-4o Mini is designed to offer enhanced performance while maintaining strict adherence to the developer's original instructions.


AI Safety Locks​


As OpenAI continues to encourage large-scale deployment of its models, these kinds of safety measures are crucial. It's all too easy to imagine the potential risks when users can fundamentally alter the AI's controls that way.

Not only would it make the chatbot ineffective, it could remove rules preventing the leak of sensitive information and other data that could be exploited for malicious purposes. By reinforcing the model's adherence to system instructions, OpenAI aims to mitigate these risks and ensure safer interactions.

The introduction of instruction hierarchy comes at a crucial time for OpenAI regarding concerns about how it approaches safety and transparency. Current and former employees have called for improving the company's safety practices, and OpenAI's leadership has responded by pledging to do so. The company has acknowledged that the complexities of fully automated agents require sophisticated guardrails in future models, and the instruction hierarchy setup seems like a step on the road to achieving better safety.

These kinds of jailbreaks show how much work still needs to be done to protect complex AI models from bad actors. And it's hardly the only example. Several users discovered that ChatGPT would share its internal instructions by simply saying "hi."

OpenAI plugged that gap, but it's probably only a matter of time before more are discovered. Any solution will need to be much more adaptive and flexible than one that simply halts a particular kind of hacking.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811


From reality to fantasy: Live2Diff AI brings instant video stylization to life​


Michael Nuñez@MichaelFNunez

July 17, 2024 3:32 PM

Image Credit: live2diff.github.io


Image Credit: live2diff.github.io



A team of international researchers has developed an AI system capable of reimagining live video streams into stylized content in near real-time. The new technology, called Live2Diff, processes live video at 16 frames per second on high-end consumer hardware, potentially reshaping applications from entertainment to augmented reality experiences.

Live2Diff, created by scientists from Shanghai AI Lab, Max Planck Institute for Informatics, and Nanyang Technological University, marks the first successful implementation of uni-directional attention modeling in video diffusion models for live-stream processing.

Live2Diff is the first attempt that enables uni-directional attention modeling to video diffusion models for live video steam processing.

It achieves 16FPS on RTX 4090 GPU ?

Links ⬇️ pic.twitter.com/L2HP4QOK8j

— Dreaming Tulpa ?? (@dreamingtulpa) July 17, 2024
“We present Live2Diff, the first attempt at designing a video diffusion model with uni-directional temporal attention, specifically targeting live-streaming video translation,” the researchers explain in their paper published on arXiv.

This novel approach overcomes a significant hurdle in video AI. Current state-of-the-art models rely on bi-directional temporal attention, which requires access to future frames and makes real-time processing impossible. Live2Diff’s uni-directional method maintains temporal consistency by correlating each frame with its predecessors and a few initial warmup frames, eliminating the need for future frame data.


Live2Diff in action: A sequence showing the AI system’s real-time transformation capabilities, from an original portrait (left) to stylized variations including anime-inspired, angular artistic, and pixelated renderings. The technology demonstrates potential applications in entertainment, social media, and creative industries. (Video Credit: Live2Diff)


Real-time video style transfer: The next frontier in digital content creation​


Dr. Kai Chen, the project’s corresponding author from Shanghai AI Lab, explains in the paper, “Our approach ensures temporal consistency and smoothness without any future frames. This opens up new possibilities for live video translation and processing.”

The team demonstrated Live2Diff’s capabilities by transforming live webcam input of human faces into anime-style characters in real-time. Extensive experiments showed that the system outperformed existing methods in temporal smoothness and efficiency, as confirmed by both quantitative metrics and user studies.

framework.jpg
A schematic diagram of Live2Diff’s innovative approach: (a) The training stage incorporates depth estimation and a novel attention mask, while (b) the streaming inference stage employs a multi-timestep cache for real-time video processing. This technology marks a significant leap in AI-powered live video translation. (Credit: live2diff.github.io)

The implications of Live2Diff are far-reaching and multifaceted. In the entertainment industry, this technology could redefine live streaming and virtual events. Imagine watching a concert where the performers are instantly transformed into animated characters, or a sports broadcast where players morph into superhero versions of themselves in real-time. For content creators and influencers, it offers a new tool for creative expression, allowing them to present unique, stylized versions of themselves during live streams or video calls.

In the realm of augmented reality (AR) and virtual reality (VR), Live2Diff could enhance immersive experiences. By enabling real-time style transfer in live video feeds, it could bridge the gap between the real world and virtual environments more seamlessly than ever before. This could have applications in gaming, virtual tourism, and even in professional fields like architecture or design, where real-time visualization of stylized environments could aid in decision-making processes.


A Comparative Analysis of AI Video Processing: The original image (top left) is transformed using various AI techniques, including Live2Diff (top right), in response to the prompt ‘Breakdancing in the alley.’ Each method showcases distinct interpretations, from stylized animation to nuanced reality alterations, illustrating the evolving landscape of AI-driven video manipulation. (Video Credit: Live2Diff)

However, as with any powerful AI tool, Live2Diff also raises important ethical and societal questions. The ability to alter live video streams in real-time could potentially be misused for creating misleading content or deepfakes. It may also blur the lines between reality and fiction in digital media, necessitating new forms of media literacy. As this technology matures, it will be crucial for developers, policymakers, and ethicists to work together to establish guidelines for its responsible use and implementation.


The future of video AI: Open-source innovation and industry applications​


While the full code for Live2Diff is pending release (expected to launch next week), the research team has made their paper publicly available and plans to open-source their implementation soon. This move is expected to spur further innovations in real-time video AI.

As artificial intelligence continues to advance in media processing, Live2Diff represents an exciting leap forward. Its ability to handle live video streams at interactive speeds could soon find applications in live event broadcasts, next-generation video conferencing systems, and beyond, pushing the boundaries of real-time AI-driven video manipulation.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811


How Luma AI’s new ‘Loops’ feature in Dream Machine could transform digital marketing​


Michael Nuñez@MichaelFNunez

July 22, 2024 12:08 PM


Image Credit: Luma AI


Image Credit: Luma AI


Luma AI, the San Francisco-based artificial intelligence startup, launched a brand new feature called “Loops” for its Dream Machine platform today. This update allows users to create seamless, continuous video loops from text descriptions, images, or keyframes.

Content creators and digital marketers can now produce endless video sequences without visible cuts or transitions, expanding their options for engaging audiences while potentially reducing production time and costs.

Today we are releasing Loops in Dream Machine to keep your imagination going… and going… and going! Get started here: Luma Dream Machine
?1/6 #LumaDreamMachine pic.twitter.com/HxRjCaeqxn

— Luma AI (@LumaLabsAI) July 22, 2024


The company announced the release via X.com (formerly Twitter) this morning, showcasing a series of examples.

“Today we are releasing Loops in Dream Machine to keep your imagination going… and going… and going!” Luma AI posted, demonstrating the feature’s potential with videos of a spaceship flying through a hyperspace portal and a capybara riding a bicycle in a park.

Luma AI’s new Loops feature solves a tough problem in AI video creation. Until now, AI-generated videos often looked choppy or disjointed when played for more than a few seconds. Loops changes that. It lets users create videos that play smoothly over and over, without any jarring transitions.

This might seem like a small step, but it opens up big possibilities. Advertisers could make eye-catching animations that play endlessly in digital billboards. Artists could create mesmerizing video installations. And social media users might flood feeds with perfectly looping memes and short videos.

4. ? “a spinning top on the table” pic.twitter.com/ykVyQMbZ8B

— Luma AI (@LumaLabsAI) July 22, 2024





Democratizing creativity: How Luma AI is changing the game​


The release of Loops comes just one month after Dream Machine’s initial launch, which quickly gained traction among creators and AI enthusiasts. Dream Machine distinguishes itself in the competitive AI-powered media creation industry by allowing users to generate high-quality, realistic videos from simple text prompts.

Luma AI is shaking up the video industry by putting powerful AI tools in the hands of everyday users as well, a step its competitors have not yet been willing to take. Until now, creating slick videos required expensive software and technical know-how. But Luma’s Dream Machine changes that equation.

What started off with a little stargazing turned into a dizzying experience with @LumaLabsAI new Loops feature in Dream Machine. It still amazes me that I can take one baseline image and Dream Machine can help expand it into its own world. pic.twitter.com/GdTHHeQwR7

— Tom Blake (@Iamtomblake) July 22, 2024

With a few clicks, anyone can now produce videos that once needed a professional studio. This could spark a boom in homemade content. Small businesses and individual creators, previously priced out of high-end video production, might soon flood social media with AI-generated ads and art pieces.

The impact could be similar to what happened when smartphone cameras went mainstream. Just as Instagram turned millions into amateur photographers, Luma AI might create a new wave of video creators.

The accessibility of Dream Machine sets Luma AI apart from competitors like OpenAI’s Sora and Kuaishou’s Kling, whose technologies remain largely inaccessible to the general public. Dream Machine offers a free tier allowing users to generate up to 30 videos per month, with paid plans available for more intensive use.


The AI ethics dilemma: Balancing innovation and responsibility​


However, the rapid advancement of AI-generated media raises important questions about authenticity and potential misuse. Luma AI has taken steps to address these concerns, emphasizing its commitment to responsible AI development. The company plans to implement robust watermarking and attribution systems to maintain transparency.

As Luma AI continues to innovate, it positions itself not just as a tool provider, but as a platform for a new generation of AI-powered creativity. The company plans to release APIs and plugins for popular creative software, further expanding its reach and potential impact.

The introduction of Loops has sparked excitement among creators and tech enthusiasts. One user responded to Luma AI’s announcement by tweeting, “It still amazes me that I can take one baseline image and Dream Machine can help expand it into its own world.”

While the long-term impact of Dream Machine and its new Loops feature remains to be seen, Luma AI’s latest offering clearly demonstrates the rapid pace of innovation in AI-generated media. As the boundaries between human and AI-generated content continue to blur, Luma AI stands at the forefront of this transformative technology.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811


Sakana AI drops image models to generate Japan’s traditional ukiyo-e artwork​


Shubham Sharma@mr_bumss

July 22, 2024 12:07 PM

Image generated by Evo-Ukiyoe


Image generated by Evo-Ukiyoe

Image Credit: Sakana AI



Remember Sakana AI? Almost a year ago, the Tokyo-based startup made a striking appearance on the AI scene with its high-profile founders from Google and a novel automated merging-based approach to developing high-performing models. Today, the company announced two new image-generation models: Evo-Ukiyoe and Evo-Nishikie.

Available on Hugging Face, the models have been designed to generate images from text and image prompts. However, there’s an interesting and unique catch: instead of handling regular image generation in different styles, these models are laser-focused on Japan’s popular historic art form ukiyo-e. It flourished between the 17th and 19th centuries, and Sakana hopes to bring it back to modern content consumers using the power of AI.

The move comes as the latest localization effort in the AI space — something that has grown over the past year, with companies in countries like South Korea, India and China building models tailored to their respective cultures and dialects.


What to expect from the new Sakana AI models?​


Dating back to the early 1600s, Ukiyo-e – or “pictures of the floating world” – evolved as a popular art in Japan focusing on subjects like historical scenes, landscapes, sumo wrestlers, etc. The genre revolved around monochrome woodblock prints but eventually graduated to full-color prints or “nishiki-e” with multiple woodblocks. Its popularity declined in the 19th due to multiple factors, including the rise of digital photography.

Now, with the release of the two image-generation models, Sakana wants to bring the historic artwork back into popular culture. The first one – Evo-Ukiyoe – is a text-to-image offering that generates images closely resembling ukiyo-e, especially when prompted with text inputs describing elements commonly found in ukiyo-e art such as cherry blossoms, kimono or birds. It can even generate ukiyo-e-style art with things that did not exist back then, like a hamburger or laptop, but the company points out that sometimes the results may veer off track — not resembling ukiyo-e at all.

The model is based on Evo-SDXL-JP, which Sakana developed using its novel evolutionary model merging technique on top of Stability AI’s SDXL and other open diffusion models. The company said it used LoRA (Low-Rank Adaptation) to fine-tune Evo-SDXL-JP on a dataset of over 24,000 carefully-captioned ukiyo-e artworks acquired through a partnership with the Art Research Center (ARC) of Ritsumeikan University in Kyoto.

“We curated this data with a wide range of subjects, covering including whole art and face-centered ones, from the digital images of ukiyo-e in the ARC collection. We also focused on multi-colored nishiki-e with beautiful colors while considering diversity,” the company wrote in a blog post.

The second model, Evo-Nishikie, is an image-to-image offering that colorizes monochrome Ukiyo-e prints. Sakana says it can add color to historical book illustrations that were printed in one color of ink or give entirely new looks to existing multi-colored Nishikie prints. All the user would have to do is provide the source image and maybe pair it with a set of instructions describing the elements to be colored.

Sakana said it brought this model to life by performing ControlNet training on Evo-Ukiyoe, using fixed prompts and condition images.


Goal for further research and development​


While the models only support prompting in Japanese and are in the very early stages, Sakana hopes the work to teach AI traditional “Japanese beauty” will spread the appeal of the country’s culture worldwide and find applications in education and new ways of enjoying classical literature.

Currently, the company is providing both models and the associated code to get started on Hugging Face. The Python script included in the repository and LoRA weights are available under the Apache 2.0 license.

“This model is provided for research and development purposes only and should be considered as an experimental prototype. It is not intended for commercial use or deployment in mission-critical environments. Use of this model is at the user’s own risk, and its performance and outcomes are not guaranteed,” the company notes on Hugging Face.

So, far Sakana AI has raised $30 million in funding from multiple investors, including by Lux Capital, which has invested in pioneering AI companies like Hugging Face, and also Khosla Ventures, known for investing in OpenAI way back in 2019.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811


Groq’s open-source Llama AI model tops leaderboard, outperforming GPT-4o and Claude in function calling​


Michael Nuñez@MichaelFNunez

July 18, 2024 1:20 PM



Credit: VentureBeat made with Midjourney


Credit: VentureBeat made with Midjourney


Groq, an AI hardware startup, has released two open-source language models that outperform tech giants in specialized tool use capabilities. The new Llama-3-Groq-70B-Tool-Use model has claimed the top spot on the Berkeley Function Calling Leaderboard (BFCL), surpassing proprietary offerings from OpenAI, Google, and Anthropic.

I’ve been leading a secret project for months … and the word is finally out!

?️ I'm proud to announce the Llama 3 Groq Tool Use 8B and 70B models ?

An open source Tool Use full finetune of Llama 3 that reaches the #1 position on BFCL beating all other models, including… pic.twitter.com/FJqxQ6XnLW

— Rick Lamers (@RickLamers) July 16, 2024


Rick Lamers, project lead at Groq, announced the breakthrough in an X.com post. “I’m proud to announce the Llama 3 Groq Tool Use 8B and 70B models,” he said. “An open source Tool Use full finetune of Llama 3 that reaches the #1 position on BFCL beating all other models, including proprietary ones like Claude Sonnet 3.5, GPT-4 Turbo, GPT-4o and Gemini 1.5 Pro.”


Synthetic Data and Ethical AI: A New Paradigm in Model Training


The larger 70B parameter version achieved a 90.76% overall accuracy on the BFCL, while the smaller 8B model scored 89.06%, ranking third overall. These results demonstrate that open-source models can compete with and even exceed the performance of closed-source alternatives in specific tasks.

Groq developed these models in collaboration with AI research company Glaive, using a combination of full fine-tuning and Direct Preference Optimization (DPO) on Meta’s Llama-3 base model. The team emphasized their use of only ethically generated synthetic data for training, addressing common concerns about data privacy and overfitting.

This development marks a significant shift in the AI landscape. By achieving top performance using only synthetic data, Groq challenges the notion that vast amounts of real-world data are necessary for creating cutting-edge AI models. This approach could potentially mitigate privacy concerns and reduce the environmental impact associated with training on massive datasets. Moreover, it opens up new possibilities for creating specialized AI models in domains where real-world data is scarce or sensitive.

table-overall-1024x285-1.webp
A comparison chart showing the performance of various AI models on different tasks, with Groq’s Llama 3 models leading in overall accuracy. The data highlights the competitive edge of open-source models against proprietary offerings from major tech companies. (Image Credit: Groq)


Democratizing AI: The promise of open-source accessibility​


The models are now available through the Groq API and Hugging Face, a popular platform for sharing machine learning models. This accessibility could accelerate innovation in fields requiring complex tool use and function calling, such as automated coding, data analysis, and interactive AI assistants.

Groq has also launched a public demo on Hugging Face Spaces, allowing users to interact with the model and test its tool use abilities firsthand. Like many of the demos on Hugging Face Spaces, this was built in collaboration with Gradio, which Hugging Face acquired in December 2021. The AI community has responded enthusiastically, with many researchers and developers eager to explore the models’ capabilities.


The open-source challenge: Reshaping the AI landscape​


As the AI industry continues to evolve, Groq’s open-source approach contrasts sharply with the closed systems of larger tech companies. This move may pressure industry leaders to be more transparent about their own models and potentially accelerate the overall pace of AI development.

The release of these high-performing open-source models positions Groq as a major player in the AI field. As researchers, businesses, and policymakers evaluate the impact of this technology, the broader implications for AI accessibility and innovation remain to be seen. The success of Groq’s models could lead to a paradigm shift in how AI is developed and deployed, potentially democratizing access to advanced AI capabilities and fostering a more diverse and innovative AI ecosystem.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811
\Proton launches 'privacy-first' AI writing assistant for email that runs on-device | TechCrunch


Proton launches ‘privacy-first’ AI writing assistant for email that runs on-device​



Privacy FTW, but there are trade-offs​


Paul Sawers

4:00 AM PDT • July 18, 2024

Comment

ProtonMail logo displayed on a mobile phone screen
Image Credits: Idrees Abbas/SOPA Images/LightRocket via Getty Images

Privacy app maker Proton has launched a new AI-enabled writing assistant that can help users compose emails with simple prompts, redraft them and even proofread them before they’re sent.

The launch sees Proton continue on a trajectory that has seen it replicate many of Google’s products and features in the productivity tools space. Just last month, Google brought its own Gemini AI to Gmail to help users write and summarize emails, and now Proton is following suit with its own flavor.

As one might expect with Proton, a Swiss company known for its suite of privacy-centric apps, including email, VPN, password manager, calendar, cloud storage and documents, its new assistant is targeted at those concerned about leaking sensitive data to third-party AI providers.

Proton Scribe, as the new tool is called, is built on Mistral 7B, an open source language model from French AI startup Mistral. However, Proton says it will likely tinker with this in pursuit of the most optimum model for this use case. Additionally, the company says it is making the tool available under the open source GPL-3.0 license, which will make it easier to perform third-party security and privacy audits.


Going local​


Proton Scribe can be deployed entirely at the local device level, meaning user data doesn’t leave the device. Moreover, Proton promises that its AI assistant won’t learn from user data — a particularly important feature for enterprise use cases, where privacy is paramount.

The problem that Proton is striving to address here is real: Businesses have been slower to embrace the generative AI revolution due to data privacy concerns. This early iteration of the writing assistant could go some way toward appeasing such concerns.

“We realized that irrespective of whether or not Proton builds AI tools, users are going to use AI, often with significant privacy consequences,” founder and CEO Andy Yen said. “Rather than have users copying their sensitive communications into third-party AI tools that often have appalling privacy practices, it would be better to instead build privacy-first AI tools directly into Proton Mail.”

For the less security-conscious, Proton Scribe can also be configured to run on Proton’s servers, which should mean it will run faster, depending on users’ own hardware.

Those who’d prefer to run the tool locally are prompted to download the model once to their machine, and then it will run on that device without interacting with external servers.

The company is quick to stress that it doesn’t keep any logs or share data with third-parties for people who choose to run Proton Scribe from its servers.

“Only the prompt entered by the user is transmitted to the server, and no data is ever retained after the email draft is created,” a company spokesperson told TechCrunch.

Setting up Proton Scribe
Setting up Proton Scribe.Image Credits:Proton

Once the tool has been installed, users can type in a prompt, such as “request samples from a supplier,” and then hit the generate button.

Proton Scribe: Write me an email
Proton Scribe: Write me an email.Image Credits:Proton

The assistant then spits out a template email based on the theme provided, and you can then edit and fine-tune what comes out.

With these privacy-centric provisions, there is at least one notable trade-off: Given that the tool doesn’t use any local data, its responses won’t be particularly personalized or contextual. They will likely be generic, a point that Proton conceded to TechCrunch.

However, the company said this is why it has added additional features, which it calls “quick actions,” designed to make it easy for users to edit the drafts, such as changing the tone, proofreading and making it more concise.

“Over time, we will look to improve Proton Scribe, adding context, etc., but all in a privacy-preserving way,” Proton said in a statement.

Proton Scribe: Editing options
Proton Scribe: Editing options.Image Credits:Proton

Proton Scribe is limited to email for now, but the company said it may expand the tool to its other products in the future “depending on demand.” One obvious integration will be its recently launched collaborative document editing app.

Starting today, Proton’s writing assistant will be available for Proton Mail on the web and desktop, though the company confirmed that it will look to expand the tool to mobile devices in the future. In terms of costs, Proton Scribe is mostly targeted at business users, with those on either the Mail Essentials, Mail Professional or Proton Business Suite plans able to pay an extra $2.99 per month to access the writing assistant.

Additionally, those on one of Proton’s legacy and limited-availability plans, such as Visionary or Lifetime, will be given access to Proton Scribe for free. The company said that it may expand the feature to other consumer plans in the future.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811

China Is Closing the A.I. Gap With the United States​


In recent weeks, Chinese tech companies have unveiled technologies that rival American systems — and they are already in the hands of consumers and software developers.



A.I. generated videos created from text prompts using Kling, a video generator made by the Chinese company Kuaishou.


  1. Prompt: “The astronaut jumps up from the moon’s surface and launches himself into space.”

    Kuaishou


  2. Prompt: “A giant panda is playing guitar by the lake.”

    Kuaishou


  3. Prompt: “A Chinese boy wearing glasses is eating a delicious cheeseburger in a fast food restaurant, with his eyes closed for enjoyment.”

    Kuaishou


  4. Prompt: “A couple is holding hands and walking in the starry sky, while the stars move dramatically in the background.”

    Kuaishou


  5. An A.I. generated video created from an archival photo without using text prompts.

    Kuaishou

By Meaghan Tobin and Cade Metz

Meaghan Tobin reported from Shanghai, and Cade Metz from San Francisco.

July 25, 2024
阅读简体中文版閱讀繁體中文版

At the World Artificial Intelligence Conference in Shanghai this month, start-up founder Qu Dongqi showed off a video he had recently posted online. It displayed an old photograph of a woman with two toddlers. Then the photo sprang to life as the woman lifted the toddlers up in her arms and they laughed with surprise.

The video was created by A.I. technology from the Chinese internet company Kuaishou. The technology was reminiscent of a video generator, called Sora, that the American start-up OpenAI unveiled this year. But unlike Sora, it was available to the general public.

“My American friends still can’t use Sora,” Mr. Qu said. “But we already have better solutions here.”

A.I. generated videos created from text prompts using Kling, a video generator made by the Chinese company Kuaishou.

  1. Prompt: “Mona Lisa puts on glasses with her hands.“

    Kuaishou
  2. https://vp.nyt.com/video/2024/07/23/121345_1_mosaic-ai-china-ai-cropped-23-2-627_wg_720p.mp4

    Prompt: “Einstein plays guitar.”

    Kuaishou
  3. https://vp.nyt.com/video/2024/07/23/121347_1_mosaic-ai-china-ai-cropped-23-4-724_wg_720p.mp4

    Prompt: “Kitten riding in an airplane and looking out the window.”

    Kuaishou
  4. https://vp.nyt.com/video/2024/07/23/121346_1_mosaic-ai-china-ai-cropped-23-3-440_wg_720p.mp4

    Prompt: “Cute shepherd dog running, tennis ball bouncing, warm atmosphere.”

    Kuaishou
  5. https://vp.nyt.com/video/2024/07/23/121348_1_mosaic-ai-china-ai-cropped-23-5-2-673_wg_720p.mp4

    Prompt: “A girl eating noodles.”

    Kuaishou

While the United States has had a head start on A.I. development, China is catching up. In recent weeks, several Chinese companies have unveiled A.I. technologies that rival the leading American systems. And these technologies are already in the hands of consumers, businesses and independent software developers across the globe.

While many American companies are worried that A.I. technologies could accelerate the spread of disinformation or cause other serious harm, Chinese companies are more willing to release their technologies to consumers or even share the underlying software code with other businesses and software developers. This kind of sharing of computer code, called open source, allows others to more quickly build and distribute their own products using the same technologies.

Open source has been a cornerstone of the development of computer software, the internet and, now, artificial intelligence. The idea is that technology advances faster when its computer code is freely available for anyone to examine, use and improve upon.

China’s efforts could have enormous implications as A.I. technology continues to develop in the years to come. The technology could increase the productivity of workers, fuel future innovations and power a new wave of military technologies, including autonomous weapons.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,115
Reputation
8,239
Daps
157,811
When OpenAI kicked off the A.I. boom in late 2022 with the release of the online chatbot ChatGPT, China struggled to compete with technologies emerging from American companies like OpenAI and Google. (The New York Times has sued OpenAI and its partner, Microsoft, claiming copyright infringement of news content related to A.I. systems.) But China’s progress is now accelerating.

Kuaishou released its video generator, Kling, in China more than a month ago and to users worldwide on Wednesday. Just before Kling’s arrival, 01.AI, a start-up co-founded by Kai-Fu Lee, an investor and technologist who helped build Chinese offices for both Google and Microsoft, released chatbot technology that scored nearly as well as the leading American technologies on common benchmark tests that rate the performance of the world’s chatbots.

Image
Kai-Fu Lee smiling for a photo while wearing a tuxedo at a formal event. Other people pass by in the background.


Kai-Fu Lee, a co-founder of the start-up 01.AI. The company unveiled a new version of its technology this year that sits near the top of a leaderboard that ranks the world’s best technologies.Credit...Krista Schlueter for The New York Times

New technology from the Chinese tech giant Alibaba has also leaped to the top of a leaderboard that rates open-source A.I. systems. “We have disproved the commonplace belief that China doesn’t have the talent or the technology to compete with the U.S.,” Dr. Lee said. “That belief is simply wrong.”

In interviews, a dozen technologists and researchers at Chinese tech companies said open-source technologies were a key reason that China’s A.I. development has advanced so quickly. They saw open-source A.I. as an opportunity for the country to take a lead.

But that will not be easy. The United States remains at the forefront of A.I. research. And U.S. officials have resolved to keep it that way.

The White House has instituted a trade embargo designed to prevent Chinese companies from using the most powerful versions of computer chips that are essential to building artificial intelligence. A group of lawmakers has introduced a bill that would make it easier for the White House to control the export of A.I. software built in the United States. Others are trying to limit the progress of open-source technologies that have helped fuel the rise of similar systems in China.

Disclosure:

The New York Times Company has sued OpenAI and Microsoft, claiming copyright infringement of content related to artificial intelligence systems. The companies have sought to dismiss some of the claims. Times reporters have no involvement in the case and remain independent in their coverage.

The top American companies are also exploring new technologies that aim to eclipse the powers of today’s chatbots and video generators.

“Chinese companies are good at replicating and improving what the U.S. already has,” said Yiran Chen, a professor of electrical and computer engineering at Duke University. “They are not as good at inventing something completely new that will bypass the U.S. in five to 10 years.”

But many in China’s tech industry believe that open-source technology could help them grow despite those constraints. And if U.S. regulators stifle the progress of American open-source projects (as some lawmakers are discussing) China could gain a significant edge. If the best open-source technologies come from China, U.S. developers could end up building their systems atop Chinese technologies.

“Open-source A.I. is the foundation of A.I. development,” said Clément Delangue, chief executive of Hugging Face, a company that houses many of the world’s open-source A.I. projects. The U.S. built its leadership in A.I. through collaboration between companies and researchers, he said, “and it looks like China could do the same thing.”

Image

Clément Delangue walking with a group of people outside the U.S. Capitol.


Clément Delangue, right, the chief executive of the A.I. company Hugging Face, said that open-source technology could help China make gains in the field of A.I.Credit...Kenny Holston/The New York Times

While anyone with a computer can change open-source software code, it takes a lot of data, skill and computing power to fundamentally alter an A.I. system. When it comes to A.I., open source typically means that a system’s building blocks serve as a foundation that allows others to build something new, said Fu Hongyu, the director of A.I. governance at Alibaba’s research institute, AliResearch.

As in other countries, in China there is an intense debate over whether the latest technological advances should be made accessible to anyone or kept as closely held company secrets. Some, like Robin Li, the chief executive of Baidu, one of the few companies in China building its own A.I. technology entirely from scratch, think the technology is most profitable and secure when it is closed-source — that is, in the hands of a limited few.

A.I. systems require enormous resources: talent, data and computing power. Beijing has made it clear that the benefits accruing from such investments should be shared. The Chinese government has poured money into A.I. projects and subsidized resources like computing centers.

But Chinese tech companies face a major constraint on the development of their A.I. systems: compliance with Beijing’s strict censorship regime, which extends to generative A.I. technologies.

Kuaishou’s new video generator Kling appears to have been trained to follow the rules. Text prompts with any mention of China’s president, Xi Jinping, or controversial topics like feminism and the country’s real estate crisis yielded error messages. An image prompt of this year’s National People’s Congress yielded a video of the delegates shifting in their seats.

Kuaishou did not respond to questions about what steps the company took to prevent Kling from creating harmful, fake or politically sensitive content.

By making their most advanced A.I. technologies freely available, China’s tech giants are demonstrating their willingness to contribute to the country’s overall technological advancement as Beijing has established that the power and profit of the tech industry should be channeled toward the goal of self sufficiency.

The concern for some in China is that the country will struggle to amass the computing chips it needs to build increasingly powerful technologies. But that has not yet prevented Chinese companies from building powerful new technologies that can compete with U.S. systems.

At the end of last year, Dr. Lee’s company, 01.AI, was ridiculed on social media when someone discovered that the company had built its A.I. system using open-source technology originally built by Meta, owner of Facebook and Instagram. Some saw it as a symbol of China’s dependence on American ingenuity.

Six months later, 01.AI unveiled a new version of its technology. It now sits near the top of the leaderboard that ranks the world’s best technologies. Around the same time, a team from Stanford University in California unveiled Llama 3-V, claiming it outperformed other leading models. But a Chinese researcher soon noticed that the model was based on an open-source system originally built in China.

It was the reverse of the controversy surrounding 01.AI last year: Rather than Chinese developers building atop U.S. technology, U.S. developers built atop Chinese technology.

If regulators limit open-source projects in the United States and Chinese open-source technologies become the gold standard, Mr. Delangue said, this kind of thing could become the norm.

“If the trend continues, it becomes more and more of a challenge for the U.S.,” he said.
 
Top