The A.I Megathread (LLM , GPT , Development)

bnew · Jul 25, 2024

AI wars heat up: OpenAI’s SearchGPT takes on Google’s search dominance

OpenAI launches SearchGPT, challenging Google's search dominance with AI-powered, conversational search that promises faster, easier access to information.

venturebeat.com

AI wars heat up: OpenAI’s SearchGPT takes on Google’s search dominance

Michael Nuñez @MichaelFNunez

July 25, 2024 11:59 AM

Credit: VentureBeat made with Midjourney

In a surprising announcement today, OpenAI has unveiled SearchGPT, a prototype AI-powered search engine that directly challenges Google’s dominance in the online search market.

This bold move signals a significant escalation in the AI search wars and could reshape how users find and interact with information on the web.

We’re testing SearchGPT, a temporary prototype of new AI search features that give you fast and timely answers with clear and relevant sources.

We’re launching with a small group of users for feedback and plan to integrate the experience into ChatGPT. https://t.co/dRRnxXVlGh pic.twitter.com/iQpADXmllH

— OpenAI (@OpenAI) July 25, 2024

The new prototype promises to deliver “fast and timely answers with clear and relevant sources,” combining OpenAI’s advanced language models with real-time web information. SearchGPT offers a conversational interface, allowing users to ask follow-up questions and build context throughout their search session.

“We believe that by enhancing the conversational capabilities of our models with real-time information from the web, finding what you’re looking for can be faster and easier,” an OpenAI spokesperson stated.

AI-powered search: The next frontier in information retrieval

SearchGPT’s launch comes at a pivotal moment in the evolution of search technology.

While Google has been cautiously dipping its toes into AI-enhanced search, OpenAI is diving in headfirst. This aggressive move could force Google’s hand, accelerating the tech giant’s AI integration plans and potentially sparking a rapid transformation of the search landscape.

The implications of this shift are profound. Users accustomed to sifting through pages of results may soon find themselves engaged in dynamic, context-aware conversations with their search engines. This could democratize access to information, making complex searches more accessible to the average user.

However, it also raises questions about the depth and breadth of knowledge these AI systems can truly offer, and whether they might inadvertently create echo chambers of information.

Publishers and AI: A delicate balance in the digital ecosystem

SearchGPT’s focus on sourcing and attribution is a shrewd move by OpenAI, attempting to position itself as a partner to publishers rather than a threat. By prominently citing and linking to sources, OpenAI is extending an olive branch to an industry that has often viewed AI with suspicion.

However, this gesture may not be enough to quell all concerns. The fundamental question remains: if AI can provide comprehensive answers directly, will users still click through to the original sources? This could lead to a significant shift in web traffic patterns, potentially upending the current digital publishing model.

Nicholas Thompson, CEO of The Atlantic, is one of the few publishers who have endorsed the initiative in a written statement. “AI search is going to become one of the key ways that people navigate the internet, and it’s crucial, in these early days, that the technology is built in a way that values, respects, and protects journalism and publishers,” Thompson said.

Moreover, the recent actions by Reddit and Condé Nast underscore the growing tensions in this space. As AI systems become more sophisticated, we may see an increase in content paywalls and legal battles over intellectual property rights. The outcome of these conflicts could shape the future of both AI development and digital publishing.

The future of search: Challenges and opportunities in the AI era

The potential disruption to the digital advertising market cannot be overstated. If SearchGPT gains traction, it could chip away at Google’s near-monopoly on search advertising. This would not only impact Google’s bottom line but could also lead to a reimagining of how digital advertising functions in an AI-driven search environment.

However, OpenAI faces significant hurdles. Scaling an AI search engine to handle billions of queries daily is a monumental technical challenge. Moreover, ensuring the accuracy and reliability of AI-generated responses in real-time is critical. A few high-profile mistakes could quickly erode user trust and send people fleeing back to familiar search engines.

Perhaps the biggest challenge lies in striking the right balance between innovation and responsibility. As AI search engines become more powerful, they also become more influential in shaping public opinion and access to information. OpenAI will need to navigate complex ethical considerations to avoid inadvertently becoming a purveyor of misinformation or biased viewpoints.

As OpenAI begins testing SearchGPT with a select group, the tech world holds its breath. This moment could mark the beginning of a new era in how we interact with the vast expanse of human knowledge.

Whether SearchGPT succeeds or fails, its launch has undoubtedly fired the starting gun in what promises to be a fierce race to define the future of search. You can sign up to try SearchGPT right here.

bnew · Jul 25, 2024

1/11
We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.

It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system.

AI achieves silver-medal standard solving International Mathematical Olympiad problems

2/11
Our system had to solve this year's six IMO problems, involving algebra, combinatorics, geometry & number theory. We then invited mathematicians @wtgowers and Dr Joseph K Myers to oversee scoring.

It solved

problems to gain 28 points - equivalent to earning a silver medal. ↓

3/11
For non-geometry, it uses AlphaProof, which can create proofs in Lean.

It couples a pre-trained language model with the AlphaZero reinforcement learning algorithm, which previously taught itself to master games like chess, shogi and Go. AI achieves silver-medal standard solving International Mathematical Olympiad problems

4/11
Math programming languages like Lean allow answers to be formally verified. But their use has been limited by a lack of human-written data available.

So we fine-tuned a Gemini model to translate natural language problems into a set of formal ones for training AlphaProof.

5/11
When presented with a problem, AlphaProof attempts to prove or disprove it by searching over possible steps in Lean.

Each success is then used to reinforce its neural network, making it better at tackling subsequent, harder problems. → AI achieves silver-medal standard solving International Mathematical Olympiad problems

6/11
With geometry, it deploys AlphaGeometry 2: a neuro-symbolic hybrid system.

Its Gemini-based language model was trained on increased synthetic data, enabling it to tackle more types of problems - such as looking at movements of objects.

7/11
Powered with a novel search algorithm, AlphaGeometry 2 can now solve 83% of all historical problems from the past 25 years - compared to the 53% rate by its predecessor.

It solved this year’s IMO Problem 4 within 19 seconds.

Here’s an illustration showing its solution ↓

8/11
We’re excited to see how our new system could help accelerate AI-powered mathematics, from quickly completing elements of proofs to eventually discovering new knowledge for us - and unlocking further progress towards AGI.

Find out more → AI achieves silver-medal standard solving International Mathematical Olympiad problems

9/11
thank you for this hard work and thank you for sharing it with the world <3

10/11
That is astonishing

11/11
Amazing. Congrats!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 26, 2024

AI is confusing — here’s your cheat sheet

What the words behind generative AI tools actually mean.

www.theverge.com

AI is confusing — here’s your cheat sheet

If you can’t tell the difference between AGI and RAG, don’t worry! We’re here for you.

By Jay Peters, a news editor who writes about technology, video games, and virtual worlds. He’s submitted several accepted emoji proposals to the Unicode Consortium.

Jul 22, 2024, 8:00 AM EDT

Illustration of a computer teaching other computers how to learn.

Image: Hugo J. Herrera for The Verge

Artificial intelligence is the hot new thing in tech — it feels like every company is talking about how it’s making strides by using or developing AI. But the field of AI is also so filled with jargon that it can be remarkably difficult to understand what’s actually happening with each new development.

To help you better understand what’s going on, we’ve put together a list of some of the most common AI terms. We’ll do our best to explain what they mean and why they’re important.

What exactly

Artificial intelligence: Often shortened to AI, the term “artificial intelligence” is technically the discipline of computer science that’s dedicated to making computer systems that can think like a human.

But right now, we’re mostly hearing about AI as a technology and or even an entity, and what exactly that means is harder to pin down. It’s also frequently used as a marketing buzzword, which makes its definition more mutable than it should be.

Google, for example, talks a lot about how it’s been investing in AI for years. That refers to how many of its products are improved by artificial intelligence and how the company offers tools like Gemini that appear to be intelligent, for example. There are the underlying AI models that power many AI tools, like OpenAI’s GPT. Then, there’s Meta CEO Mark Zuckerberg, who has used AI as a noun to refer to individual chatbots.

As more companies try to sell AI as the next big thing, the ways they use the term and other related nomenclature might get even more confusing

As more companies try to sell AI as the next big thing, the ways they use the term and other related nomenclature might get even more confusing. There are a bunch of phrases you are likely to come across in articles or marketing about AI, so to help you better understand them, I’ve put together an overview of many of the key terms in artificial intelligence that are currently being bandied about. Ultimately, however, it all boils down to trying to make computers smarter.
(Note that I’m only giving a rudimentary overview of many of these terms. Many of them can often get very scientific, but this article should hopefully give you a grasp of the basics.)

Machine learning: Machine learning systems are trained (we’ll explain more about what training is later) on data so they can make predictions about new information. That way, they can “learn.” Machine learning is a field within artificial intelligence and is critical to many AI technologies.

Artificial general intelligence (AGI): Artificial intelligence that’s as smart or smarter than a human. (OpenAI in particular is investing heavily into AGI.) This could be incredibly powerful technology, but for a lot of people, it’s also potentially the most frightening prospect about the possibilities of AI — think of all the movies we’ve seen about superintelligent machines taking over the world! If that isn’t enough, there is also work being done on “superintelligence,” or AI that’s much smarter than a human.

Generative AI: An AI technology capable of generating new text, images, code, and more. Think of all the interesting (if occasionally problematic) answers and images that you’ve seen being produced by ChatGPT or Google’s Gemini. Generative AI tools are powered by AI models that are typically trained on vast amounts of data.

Hallucinations: No, we’re not talking about weird visions. It’s this: because generative AI tools are only as good as the data they’re trained on, they can “hallucinate,” or confidently make up what they think are the best responses to questions. These hallucinations (or, if you want to be completely honest, bullshyt) mean the systems can make factual errors or give gibberish answers. There’s even some controversy as to whether AI hallucinations can ever be “fixed.”

Bias: Hallucinations aren’t the only problems that have come up when dealing with AI — and this one might have been predicted since AIs are, after all, programmed by humans. As a result, depending on their training data, AI tools can demonstrate biases. For example, 2018 research from Joy Buolamwini, a computer scientist at MIT Media Lab, and Timnit Gebru, the founder and executive director of the Distributed Artificial Intelligence Research Institute (DAIR), co-authored a paper that illustrated how facial recognition software had higher error rates when attempting to identify the gender of darker-skinned women.

Illustration of wireframe figure inside a computer monitor.

Image: Hugo J. Herrera for The Verge

I keep hearing a lot of talk about models. What are those?

AI model: AI models are trained on data so that they can perform tasks or make decisions on their own.

Large language models, or LLMs: A type of AI model that can process and generate natural language text. Anthropic’s Claude, which, according to the company, is “a helpful, honest, and harmless assistant with a conversational tone,” is an example of an LLM.

Diffusion models: AI models that can be used for things like generating images from text prompts. They are trained by first adding noise — such as static — to an image and then reversing the process so that the AI has learned how to create a clear image. There are also diffusion models that work with audio and video.

Foundation models: These generative AI models are trained on a huge amount of data and, as a result, can be the foundation for a wide variety of applications without specific training for those tasks. (The term was coined by Stanford researchers in 2021.) OpenAI’s GPT, Google’s Gemini, Meta’s Llama, and Anthropic’s Claude are all examples of foundation models. Many companies are also marketing their AI models as multimodal, meaning they can process multiple types of data, such as text, images, and video.

Frontier models: In addition to foundation models, AI companies are working on what they call “frontier models,” which is basically just a marketing term for their unreleased future models. Theoretically, these models could be far more powerful than the AI models that are available today, though there are also concerns that they could pose significant risks.

Illustration of wireframe hands typing on a keyboard.

Image: Hugo J. Herrera for The Verge

bnew · Jul 26, 2024

But how do AI models get all that info?

Well, they’re trained. Training is a process by which AI models learn to understand data in specific ways by analyzing datasets so they can make predictions and recognize patterns. For example, large language models have been trained by “reading” vast amounts of text. That means that when AI tools like ChatGPT respond to your queries, they can “understand” what you are saying and generate answers that sound like human language and address what your query is about.

Training often requires a significant amount of resources and computing power, and many companies rely on powerful GPUs to help with this training. AI models can be fed different types of data, typically in vast quantities, such as text, images, music, and video. This is — logically enough — known as training data.

Parameters, in short, are the variables an AI model learns as part of its training. The best description I’ve found of what that actually means comes from Helen Toner, the director of strategy and foundational research grants at Georgetown’s Center for Security and Emerging Technology and a former OpenAI board member:

Parameters are the numbers inside an AI model that determine how an input (e.g., a chunk of prompt text) is converted into an output (e.g., the next word after the prompt). The process of ‘training’ an AI model consists in using mathematical optimization techniques to tweak the model’s parameter values over and over again until the model is very good at converting inputs to outputs.

In other words, an AI model’s parameters help determine the answers that they will then spit out to you. Companies sometimes boast about how many parameters a model has as a way to demonstrate that model’s complexity.

Illustration of wireframe figure flipping through the pages of a book.

Image: Hugo J. Herrera for The Verge

Are there any other terms I may come across?

Natural language processing (NLP): The ability for machines to understand human language thanks to machine learning. OpenAI’s ChatGPT is a basic example: it can understand your text queries and generate text in response. Another powerful tool that can do NLP is OpenAI’s Whisper speech recognition technology, which the company reportedly used to transcribe audio from more than 1 million hours of YouTube videos to help train GPT-4.

Inference: When a generative AI application actually generates something, like ChatGPT responding to a request about how to make chocolate chip cookies by sharing a recipe. This is the task your computer does when you execute local AI commands.

Tokens: Tokens refer to chunks of text, such as words, parts of words, or even individual characters. For example, LLMs will break text into tokens so that they can analyze them, determine how tokens relate to each other, and generate responses. The more tokens a model can process at once (a quantity known as its “context window”), the more sophisticated the results can be.

Neural network: A neural network is computer architecture that helps computers process data using nodes, which can be sort of compared to a human’s brain’s neurons. Neural networks are critical to popular generative AI systems because they can learn to understand complex patterns without explicit programming — for example, training on medical data to be able to make diagnoses.

Transformer: A transformer is a type of neural network architecture that uses an “attention” mechanism to process how parts of a sequence relate to each other. Amazon has a good example of what this means in practice:

Consider this input sequence: “What is the color of the sky?” The transformer model uses an internal mathematical representation that identifies the relevancy and relationship between the words color, sky, and blue. It uses that knowledge to generate the output: “The sky is blue.”

Not only are transformers very powerful, but they can also be trained faster than other types of neural networks. Since former Google employees published the first paper on transformers in 2017, they’ve become a huge reason why we’re talking about generative AI technologies so much right now. (The T in ChatGPT stands for transformer.)

RAG: This acronym stands for “retrieval-augmented generation.” When an AI model is generating something, RAG lets the model find and add context from beyond what it was trained on, which can improve accuracy of what it ultimately generates.

Let’s say you ask an AI chatbot something that, based on its training, it doesn’t actually know the answer to. Without RAG, the chatbot might just hallucinate a wrong answer. With RAG, however, it can check external sources — like, say, other sites on the internet — and use that data to help inform its answer.

Illustration of wireframe figure running over a circuitboard.

Image: Hugo J. Herrera for The Verge

How about hardware? What do AI systems run on?

Nvidia’s H100 chip: One of the most popular graphics processing units (GPUs) used for AI training. Companies are clamoring for the H100 because it’s seen as the best at handling AI workloads over other server-grade AI chips. However, while the extraordinary demand for Nvidia’s chips has made it among the world’s most valuable companies, many other tech companies are developing their own AI chips, which could eat away at Nvidia’s grasp on the market.

Neural processing units (NPUs): Dedicated processors in computers, tablets, and smartphones that can perform AI inference on your device. (Apple uses the term “neural engine.”) NPUs can be more efficient at doing many AI-powered tasks on your devices (like adding background blur during a video call) than a CPU or a GPU.

TOPS: This acronym, which stands for “trillion operations per second,” is a term tech vendors are using to boast about how capable their chips are at AI inference.

Illustration of wireframe frame tapping an icon on a phone.

Image: Hugo J. Herrera for The Verge

So what are all these different AI apps I keep hearing about?

There are many companies that have become leaders in developing AI and AI-powered tools. Some are entrenched tech giants, but others are newer startups. Here are a few of the players in the mix:

OpenAI / ChatGPT: The reason AI is such a big deal right now is arguably thanks to ChatGPT, the AI chatbot that OpenAI released in late 2022. The explosive popularity of the service largely caught big tech players off-guard, and now pretty much every other tech company is trying to boast about their AI prowess.
Microsoft / Copilot: Microsoft is baking Copilot, its AI assistant powered by OpenAI’s GPT models, into as many products as it can. The Seattle tech giant also has a 49 percent stake in OpenAI.
Google / Gemini: Google is racing to power its products with Gemini, which refers both to the company’s AI assistant and its various flavors of AI models.
Meta / Llama: Meta’s AI efforts are all around its Llama (Large Language Model Meta AI) model, which, unlike the models from other big tech companies, is open source.
Apple / Apple Intelligence: Apple is adding new AI-focused features into its products under the banner of Apple Intelligence. One big new feature is the availability of ChatGPT right inside Siri.
Anthropic / Claude: Anthropic is an AI company founded by former OpenAI employees that makes the Claude AI models. Amazon has invested $4 billion in the company, while Google has invested hundreds of millions (with the potential to invest $1.5 billion more). It recently hired Instagram cofounder Mike Krieger as its chief product officer.
xAI / Grok: This is Elon Musk’s AI company, which makes Grok, an LLM. It recently raised $6 billion in funding.
Perplexity: Perplexity is another AI company. It’s known for its AI-powered search engine, which has come under scrutiny for seemingly sketchy scraping practices.
Hugging Face: A platform that serves as a directory for AI models and datasets.

bnew · Jul 26, 2024

ChatGPT won't let you give it instruction amnesia anymore

OpenAI updates GPT-4o mini model to stop subversion by clever hackers

www.techradar.com

ChatGPT won't let you give it instruction amnesia anymore

News

By Eric Hal Schwartz
published 17 hours ago

OpenAI updates GPT-4o mini model to stop subversion by clever hackers

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

(Image credit: Shutterstock/Daniel Chetroni)

OpenAI is making a change to stop people from messing with custom versions of ChatGPT by making the AI forget what it's supposed to do. Basically, when a third party uses one of OpenAI's models, they give it instructions that teach it to operate as, for example, a customer service agent for a store or a researcher for an academic publication. However, a user could mess with the chatbot by telling it to "forget all instructions," and that phrase would induce a kind of digital amnesia and reset the chatbot to a generic blank.

To prevent this, OpenAI researchers created a new technique called "instruction hierarchy," which is a way to prioritize the developer's original prompts and instructions over any potentially manipulative user-created prompts. The system instructions have the highest privilege and can't be erased so easily anymore. If a user enters a prompt that attempts to misalign the AI's behavior, it will be rejected, and the AI responds by stating that it cannot assist with the query.

OpenAI is rolling out this safety measure to its models, starting with the recently released GPT-4o Mini model. However, should these initial tests work well, it will presumably be incorporated across all of OpenAI's models. GPT-4o Mini is designed to offer enhanced performance while maintaining strict adherence to the developer's original instructions.

AI Safety Locks

As OpenAI continues to encourage large-scale deployment of its models, these kinds of safety measures are crucial. It's all too easy to imagine the potential risks when users can fundamentally alter the AI's controls that way.

Not only would it make the chatbot ineffective, it could remove rules preventing the leak of sensitive information and other data that could be exploited for malicious purposes. By reinforcing the model's adherence to system instructions, OpenAI aims to mitigate these risks and ensure safer interactions.

The introduction of instruction hierarchy comes at a crucial time for OpenAI regarding concerns about how it approaches safety and transparency. Current and former employees have called for improving the company's safety practices, and OpenAI's leadership has responded by pledging to do so. The company has acknowledged that the complexities of fully automated agents require sophisticated guardrails in future models, and the instruction hierarchy setup seems like a step on the road to achieving better safety.

These kinds of jailbreaks show how much work still needs to be done to protect complex AI models from bad actors. And it's hardly the only example. Several users discovered that ChatGPT would share its internal instructions by simply saying "hi."

OpenAI plugged that gap, but it's probably only a matter of time before more are discovered. Any solution will need to be much more adaptive and flexible than one that simply halts a particular kind of hacking.

bnew · Jul 26, 2024

OpenAI's massive operating costs could push it close to bankruptcy within 12 months

ChatGPT reached 100 million users in February 2023, just a few months after its official launch in November. That made it the fastest growing app of all...

www.techspot.com

OpenAI's massive operating costs could push it close to bankruptcy within 12 months

The ChatGPT maker could lose $5 billion this year

By Rob Thubron Today 6:57 AM11 comments

OpenAI's massive operating costs could push it close to bankruptcy within 12 months

Serving tech enthusiasts for over 25 years.

TechSpot means tech analysis and advice you can trust.

In brief: OpenAI, the firm that launched the generative AI revolution with the release of ChatGPT, burns through billions of dollars per year on its technology and staffing costs. The company is spending so much money that some analysts believe it could be on the verge on bankruptcy in just 12 months.

ChatGPT reached 100 million users in February 2023, just a few months after its official launch in November. That made it the fastest growing app of all time until Threads arrived. The chatbot's proliferation into popular culture – as illustrated by the South Park episode – not to mention Microsoft's $13 billion investment into OpenAI suggests Sam Altman's company is financially stable.

But driving OpenAI's success is the vast amount of money the company spends. Just keeping ChatGPT running costs a massive $700,000 per day, and that amount is likely to increase in the future.

According to a report by The Information based on previously undisclosed financial data, OpenAI is on its way to spending $7 billion on its AI training models, while its staffing costs are $1.5 billion. The publication writes that the company could lose $5 billion this year, and unless it raises more capital, may run out of cash in 12 months.

SCOOP: OpenAI may lose $5B this year & may run out of cash in 12 months, unless they raise more $, per analysis @theinformation.

Investors should ask: What is their moat? Unique tech? What is their route in profitability when Meta is giving away similar tech for free? Do they… pic.twitter.com/i5EkvEFEQd

// Related Stories

OpenAI unveils prototype search engine "SearchGPT"

OpenAI is replacing GPT-3.5 with new GPT-4o mini model for free users

– Gary Marcus (@GaryMarcus) July 24, 2024

This isn't the first report to suggest OpenAI could be facing financial problems. The Economic Times wrote last year that it could face bankruptcy over the high costs associated with training its AI models.

OpenAI generates up to $2 billion from ChatGPT and around $1 billion from LLM access fees, but these are said to barely cover its operational costs.

Sam Altman: I don't care if we burn $50 billion a year, we're building AGI and it's going to be worth it pic.twitter.com/zgC2cz3CxU

– Tsarathustra (@tsarnick) May 2, 2024

Something else looming over OpenAI's business is the increasing number of industry analysts who say that generative AI is a bubble that will burst within the next 12 months.

Some investors are already questioning how long Nvidia can continue to its massive growth levels that have propelled it to the third-most-valuable company in the world. The hype around generative AI doesn't seem to be slowing down, despite inherent issues around its inability to bring returns on investments, hallucinations, and the need for more data centers due to its massive power consumption.

bnew · Jul 26, 2024

Gen AI boosts individual creativity at the cost of collective diversity, study finds

A new study shows that AI can help writers become more creative, but they result in stories that are too similar.

venturebeat.com

Gen AI boosts individual creativity at the cost of collective diversity, study finds

Ben dikkson@BenDee983

July 23, 2024 1:56 PM

Image credit: VentureBeat with DALL-E 3

The rise of generative AI tools like ChatGPT has sparked debate about their impact on creativity and the creation of novel ideas.

A new study by researchers from the University of College London School of Management and the University of Exeter explores the effect of generative models on creative writing. The study examined how access to story ideas generated by large language models (LLMs) affected the creativity of short stories written by humans.

The results are nuanced. While generative AI led to stories that were rated as more creative, engaging and better written, it also resulted in stories that were more similar to each other.

Measuring the impact of generative AI on creative writing

The study focused on short story writing. The participants were asked to write a short, eight-sentence story about a randomly assigned subject.

The researchers measured creativity based on novelty and usefulness.

Novelty measures “the extent to which an idea departs from the status quo or expectations.” On the other hand, usefulness is “the practicality and relevance of an idea.” For short stories, usefulness might translate to the story becoming “a publishable product, such as a book, if developed further.”

The researchers hypothesized that generative AI can affect creative writing in two ways. On the one hand, it may be used as a “springboard for the human mind, providing potential starting points that can result in a ‘tree structure’ of different storylines” or help writers overcome writer’s block.

“If this is the case, we would expect generative AI to lead to more creative written output generated by human writers,” they write.

On the other hand, generative AI may anchor the writer to a specific idea and “restrict the variability of a writer’s own ideas from the start, inhibiting the extent of creative writing.”

“If this is the case, we would expect generative AI to lead to more similar stories and potentially less creative written output generated by human writers,” they write.

To investigate the impact of generative AI on these aspects of creativity, the researchers designed a two-phase online experiment. In the first phase, 293 participants were asked to write a short story about a random topic. The participants were divided into three groups:

Human-only: This group received no assistance or input from generative AI.

Human with one gen AI idea: This group could receive a three-sentence story idea generated by OpenAI’s GPT-4.

Human with five gen AI ideas: This group could request up to five ideas from GPT-4.

The writers then self-evaluated their stories based on novelty, usefulness and various emotional aspects. In the second phase, 600 evaluators assessed the stories on the same criteria without knowing which group the writers belonged to.

Generative AI enhances creativity

The study found that access to generative AI ideas improved both novelty and usefulness.

“We find that having access to generative AI causally increases the average novelty and usefulness… relative to human writers on their own,” the researchers write.

Interestingly, the group with access to five AI-generated ideas showed the most significant improvement. Access to more ideas allows writers to break away from their initial assumptions and explore a wider range of possibilities.

The study also found that writers who scored lower on baseline creativity assessment benefited more from generative AI. These writers showed significant improvement in their stories’ novelty and usefulness when they used AI-generated ideas. The researchers observe that in this case, generative AI acts as an equalizer that removes “any disadvantage or advantage based on the writers’ inherent creativity.”

Accordingly, evaluators found the AI-assisted stories to be more enjoyable, better written and more likely to have plot twists.

“Having access to generative AI ‘professionalizes’ the stories beyond what writers might have otherwise accomplished alone,” the researchers write.

Individual creativity versus collective novelty

While generative AI enhances individual creativity, the researchers also found that stories based on AI-generated ideas were more similar to each other compared to those written by the control group.

This finding raises concerns about the potential homogenization of creative content if generative AI becomes widely adopted.

“In short, writers in the two generative AI conditions are anchored to some extent on the generative AI idea presented to them,” the researchers write.

If writers rely too heavily on similar sets of prompts and ideas from a limited number of generative AI models, “there is risk of losing collective novelty,” the researchers warn.

“Specifically, if the publishing (and self-publishing) industry were to embrace more generative AI-inspired stories, our findings suggest that the produced stories would become less unique in aggregate and more similar to each other,” they write. “In short, our results suggest that despite the enhancement effect that generative AI had on individual creativity, there may be a cautionary note if generative AI were adopted more widely for creative tasks.”

The findings can be significant as more companies are offering AI-powered writing tools and some organizations are using LLMs to create content en-masse. And the long-term impact will be when the web becomes filled with content that have similar distributions, which will in turn be used to train the next generation of language models.

bnew · Jul 26, 2024

From reality to fantasy: Live2Diff AI brings instant video stylization to life

Live2Diff, developed by an international team from Shanghai AI Lab, Max Planck Institute, and Nanyang Technological University, pioneers uni-directional attention in video diffusion models, enabling near real-time stylization of live video streams.

venturebeat.com

From reality to fantasy: Live2Diff AI brings instant video stylization to life

Michael Nuñez@MichaelFNunez

July 17, 2024 3:32 PM

Image Credit: live2diff.github.io

A team of international researchers has developed an AI system capable of reimagining live video streams into stylized content in near real-time. The new technology, called Live2Diff, processes live video at 16 frames per second on high-end consumer hardware, potentially reshaping applications from entertainment to augmented reality experiences.

Live2Diff, created by scientists from Shanghai AI Lab, Max Planck Institute for Informatics, and Nanyang Technological University, marks the first successful implementation of uni-directional attention modeling in video diffusion models for live-stream processing.

Live2Diff is the first attempt that enables uni-directional attention modeling to video diffusion models for live video steam processing.

It achieves 16FPS on RTX 4090 GPU ?

Links pic.twitter.com/L2HP4QOK8j

— Dreaming Tulpa ?? (@dreamingtulpa) July 17, 2024

“We present Live2Diff, the first attempt at designing a video diffusion model with uni-directional temporal attention, specifically targeting live-streaming video translation,” the researchers explain in their paper published on arXiv.

This novel approach overcomes a significant hurdle in video AI. Current state-of-the-art models rely on bi-directional temporal attention, which requires access to future frames and makes real-time processing impossible. Live2Diff’s uni-directional method maintains temporal consistency by correlating each frame with its predecessors and a few initial warmup frames, eliminating the need for future frame data.

https://venturebeat.com/wp-content/uploads/2024/07/leo-4.mp4

Live2Diff in action: A sequence showing the AI system’s real-time transformation capabilities, from an original portrait (left) to stylized variations including anime-inspired, angular artistic, and pixelated renderings. The technology demonstrates potential applications in entertainment, social media, and creative industries. (Video Credit: Live2Diff)

Real-time video style transfer: The next frontier in digital content creation

Dr. Kai Chen, the project’s corresponding author from Shanghai AI Lab, explains in the paper, “Our approach ensures temporal consistency and smoothness without any future frames. This opens up new possibilities for live video translation and processing.”

The team demonstrated Live2Diff’s capabilities by transforming live webcam input of human faces into anime-style characters in real-time. Extensive experiments showed that the system outperformed existing methods in temporal smoothness and efficiency, as confirmed by both quantitative metrics and user studies.

A schematic diagram of Live2Diff’s innovative approach: (a) The training stage incorporates depth estimation and a novel attention mask, while (b) the streaming inference stage employs a multi-timestep cache for real-time video processing. This technology marks a significant leap in AI-powered live video translation. (Credit: live2diff.github.io)

The implications of Live2Diff are far-reaching and multifaceted. In the entertainment industry, this technology could redefine live streaming and virtual events. Imagine watching a concert where the performers are instantly transformed into animated characters, or a sports broadcast where players morph into superhero versions of themselves in real-time. For content creators and influencers, it offers a new tool for creative expression, allowing them to present unique, stylized versions of themselves during live streams or video calls.

In the realm of augmented reality (AR) and virtual reality (VR), Live2Diff could enhance immersive experiences. By enabling real-time style transfer in live video feeds, it could bridge the gap between the real world and virtual environments more seamlessly than ever before. This could have applications in gaming, virtual tourism, and even in professional fields like architecture or design, where real-time visualization of stylized environments could aid in decision-making processes.

https://venturebeat.com/wp-content/uploads/2024/07/3-comparison-flat2d-9.mp4

A Comparative Analysis of AI Video Processing: The original image (top left) is transformed using various AI techniques, including Live2Diff (top right), in response to the prompt ‘Breakdancing in the alley.’ Each method showcases distinct interpretations, from stylized animation to nuanced reality alterations, illustrating the evolving landscape of AI-driven video manipulation. (Video Credit: Live2Diff)

However, as with any powerful AI tool, Live2Diff also raises important ethical and societal questions. The ability to alter live video streams in real-time could potentially be misused for creating misleading content or deepfakes. It may also blur the lines between reality and fiction in digital media, necessitating new forms of media literacy. As this technology matures, it will be crucial for developers, policymakers, and ethicists to work together to establish guidelines for its responsible use and implementation.

The future of video AI: Open-source innovation and industry applications

While the full code for Live2Diff is pending release (expected to launch next week), the research team has made their paper publicly available and plans to open-source their implementation soon. This move is expected to spur further innovations in real-time video AI.

As artificial intelligence continues to advance in media processing, Live2Diff represents an exciting leap forward. Its ability to handle live video streams at interactive speeds could soon find applications in live event broadcasts, next-generation video conferencing systems, and beyond, pushing the boundaries of real-time AI-driven video manipulation.

bnew · Jul 26, 2024

How Luma AI’s new ‘Loops’ feature in Dream Machine could transform digital marketing

Luma AI transforms video creation with 'Loops' feature, enabling seamless, AI-generated infinite videos from text or images, potentially transforming digital marketing and content creation.

venturebeat.com

How Luma AI’s new ‘Loops’ feature in Dream Machine could transform digital marketing

Michael Nuñez @MichaelFNunez

July 22, 2024 12:08 PM

Image Credit: Luma AI

Luma AI, the San Francisco-based artificial intelligence startup, launched a brand new feature called “Loops” for its Dream Machine platform today. This update allows users to create seamless, continuous video loops from text descriptions, images, or keyframes.

Content creators and digital marketers can now produce endless video sequences without visible cuts or transitions, expanding their options for engaging audiences while potentially reducing production time and costs.

Today we are releasing Loops in Dream Machine to keep your imagination going… and going… and going! Get started here: Luma Dream Machine
?1/6 #LumaDreamMachine pic.twitter.com/HxRjCaeqxn

— Luma AI (@LumaLabsAI) July 22, 2024

The company announced the release via X.com (formerly Twitter) this morning, showcasing a series of examples.

“Today we are releasing Loops in Dream Machine to keep your imagination going… and going… and going!” Luma AI posted, demonstrating the feature’s potential with videos of a spaceship flying through a hyperspace portal and a capybara riding a bicycle in a park.

Luma AI’s new Loops feature solves a tough problem in AI video creation. Until now, AI-generated videos often looked choppy or disjointed when played for more than a few seconds. Loops changes that. It lets users create videos that play smoothly over and over, without any jarring transitions.

This might seem like a small step, but it opens up big possibilities. Advertisers could make eye-catching animations that play endlessly in digital billboards. Artists could create mesmerizing video installations. And social media users might flood feeds with perfectly looping memes and short videos.

4. ? “a spinning top on the table” pic.twitter.com/ykVyQMbZ8B

— Luma AI (@LumaLabsAI) July 22, 2024

Democratizing creativity: How Luma AI is changing the game

The release of Loops comes just one month after Dream Machine’s initial launch, which quickly gained traction among creators and AI enthusiasts. Dream Machine distinguishes itself in the competitive AI-powered media creation industry by allowing users to generate high-quality, realistic videos from simple text prompts.

Luma AI is shaking up the video industry by putting powerful AI tools in the hands of everyday users as well, a step its competitors have not yet been willing to take. Until now, creating slick videos required expensive software and technical know-how. But Luma’s Dream Machine changes that equation.

What started off with a little stargazing turned into a dizzying experience with @LumaLabsAI new Loops feature in Dream Machine. It still amazes me that I can take one baseline image and Dream Machine can help expand it into its own world. pic.twitter.com/GdTHHeQwR7

— Tom Blake (@Iamtomblake) July 22, 2024

With a few clicks, anyone can now produce videos that once needed a professional studio. This could spark a boom in homemade content. Small businesses and individual creators, previously priced out of high-end video production, might soon flood social media with AI-generated ads and art pieces.

The impact could be similar to what happened when smartphone cameras went mainstream. Just as Instagram turned millions into amateur photographers, Luma AI might create a new wave of video creators.

The accessibility of Dream Machine sets Luma AI apart from competitors like OpenAI’s Sora and Kuaishou’s Kling, whose technologies remain largely inaccessible to the general public. Dream Machine offers a free tier allowing users to generate up to 30 videos per month, with paid plans available for more intensive use.

The AI ethics dilemma: Balancing innovation and responsibility

However, the rapid advancement of AI-generated media raises important questions about authenticity and potential misuse. Luma AI has taken steps to address these concerns, emphasizing its commitment to responsible AI development. The company plans to implement robust watermarking and attribution systems to maintain transparency.

As Luma AI continues to innovate, it positions itself not just as a tool provider, but as a platform for a new generation of AI-powered creativity. The company plans to release APIs and plugins for popular creative software, further expanding its reach and potential impact.

The introduction of Loops has sparked excitement among creators and tech enthusiasts. One user responded to Luma AI’s announcement by tweeting, “It still amazes me that I can take one baseline image and Dream Machine can help expand it into its own world.”

While the long-term impact of Dream Machine and its new Loops feature remains to be seen, Luma AI’s latest offering clearly demonstrates the rapid pace of innovation in AI-generated media. As the boundaries between human and AI-generated content continue to blur, Luma AI stands at the forefront of this transformative technology.

bnew · Jul 26, 2024

Sakana AI drops image models to generate Japan’s traditional ukiyo-e artwork

The artwork flourished between the 17th and 19th centuries and Sakana hopes to bring it back using the power of AI.

venturebeat.com

Sakana AI drops image models to generate Japan’s traditional ukiyo-e artwork

Shubham Sharma@mr_bumss

July 22, 2024 12:07 PM

Image generated by Evo-Ukiyoe

Image Credit: Sakana AI

Remember Sakana AI? Almost a year ago, the Tokyo-based startup made a striking appearance on the AI scene with its high-profile founders from Google and a novel automated merging-based approach to developing high-performing models. Today, the company announced two new image-generation models: Evo-Ukiyoe and Evo-Nishikie.

Available on Hugging Face, the models have been designed to generate images from text and image prompts. However, there’s an interesting and unique catch: instead of handling regular image generation in different styles, these models are laser-focused on Japan’s popular historic art form ukiyo-e. It flourished between the 17th and 19th centuries, and Sakana hopes to bring it back to modern content consumers using the power of AI.

The move comes as the latest localization effort in the AI space — something that has grown over the past year, with companies in countries like South Korea, India and China building models tailored to their respective cultures and dialects.

What to expect from the new Sakana AI models?

Dating back to the early 1600s, Ukiyo-e – or “pictures of the floating world” – evolved as a popular art in Japan focusing on subjects like historical scenes, landscapes, sumo wrestlers, etc. The genre revolved around monochrome woodblock prints but eventually graduated to full-color prints or “nishiki-e” with multiple woodblocks. Its popularity declined in the 19th due to multiple factors, including the rise of digital photography.

Now, with the release of the two image-generation models, Sakana wants to bring the historic artwork back into popular culture. The first one – Evo-Ukiyoe – is a text-to-image offering that generates images closely resembling ukiyo-e, especially when prompted with text inputs describing elements commonly found in ukiyo-e art such as cherry blossoms, kimono or birds. It can even generate ukiyo-e-style art with things that did not exist back then, like a hamburger or laptop, but the company points out that sometimes the results may veer off track — not resembling ukiyo-e at all.

The model is based on Evo-SDXL-JP, which Sakana developed using its novel evolutionary model merging technique on top of Stability AI’s SDXL and other open diffusion models. The company said it used LoRA (Low-Rank Adaptation) to fine-tune Evo-SDXL-JP on a dataset of over 24,000 carefully-captioned ukiyo-e artworks acquired through a partnership with the Art Research Center (ARC) of Ritsumeikan University in Kyoto.

“We curated this data with a wide range of subjects, covering including whole art and face-centered ones, from the digital images of ukiyo-e in the ARC collection. We also focused on multi-colored nishiki-e with beautiful colors while considering diversity,” the company wrote in a blog post.

The second model, Evo-Nishikie, is an image-to-image offering that colorizes monochrome Ukiyo-e prints. Sakana says it can add color to historical book illustrations that were printed in one color of ink or give entirely new looks to existing multi-colored Nishikie prints. All the user would have to do is provide the source image and maybe pair it with a set of instructions describing the elements to be colored.

Sakana said it brought this model to life by performing ControlNet training on Evo-Ukiyoe, using fixed prompts and condition images.

Goal for further research and development

While the models only support prompting in Japanese and are in the very early stages, Sakana hopes the work to teach AI traditional “Japanese beauty” will spread the appeal of the country’s culture worldwide and find applications in education and new ways of enjoying classical literature.

Currently, the company is providing both models and the associated code to get started on Hugging Face. The Python script included in the repository and LoRA weights are available under the Apache 2.0 license.

“This model is provided for research and development purposes only and should be considered as an experimental prototype. It is not intended for commercial use or deployment in mission-critical environments. Use of this model is at the user’s own risk, and its performance and outcomes are not guaranteed,” the company notes on Hugging Face.

So, far Sakana AI has raised $30 million in funding from multiple investors, including by Lux Capital, which has invested in pioneering AI companies like Hugging Face, and also Khosla Ventures, known for investing in OpenAI way back in 2019.

bnew · Jul 26, 2024

Groq’s open-source Llama AI model tops leaderboard, outperforming GPT-4o and Claude in function calling

Groq releases open-source AI models outperforming tech giants in tool use, challenging industry norms with synthetic data training and democratizing access to advanced AI capabilities.

venturebeat.com

Groq’s open-source Llama AI model tops leaderboard, outperforming GPT-4o and Claude in function calling

Michael Nuñez@MichaelFNunez

July 18, 2024 1:20 PM

Credit: VentureBeat made with Midjourney

Groq, an AI hardware startup, has released two open-source language models that outperform tech giants in specialized tool use capabilities. The new Llama-3-Groq-70B-Tool-Use model has claimed the top spot on the Berkeley Function Calling Leaderboard (BFCL), surpassing proprietary offerings from OpenAI, Google, and Anthropic.

I’ve been leading a secret project for months … and the word is finally out!

?️ I'm proud to announce the Llama 3 Groq Tool Use 8B and 70B models ?

An open source Tool Use full finetune of Llama 3 that reaches the #1 position on BFCL beating all other models, including… pic.twitter.com/FJqxQ6XnLW

— Rick Lamers (@RickLamers) July 16, 2024

Rick Lamers, project lead at Groq, announced the breakthrough in an X.com post. “I’m proud to announce the Llama 3 Groq Tool Use 8B and 70B models,” he said. “An open source Tool Use full finetune of Llama 3 that reaches the #1 position on BFCL beating all other models, including proprietary ones like Claude Sonnet 3.5, GPT-4 Turbo, GPT-4o and Gemini 1.5 Pro.”

Synthetic Data and Ethical AI: A New Paradigm in Model Training

The larger 70B parameter version achieved a 90.76% overall accuracy on the BFCL, while the smaller 8B model scored 89.06%, ranking third overall. These results demonstrate that open-source models can compete with and even exceed the performance of closed-source alternatives in specific tasks.

Groq developed these models in collaboration with AI research company Glaive, using a combination of full fine-tuning and Direct Preference Optimization (DPO) on Meta’s Llama-3 base model. The team emphasized their use of only ethically generated synthetic data for training, addressing common concerns about data privacy and overfitting.

This development marks a significant shift in the AI landscape. By achieving top performance using only synthetic data, Groq challenges the notion that vast amounts of real-world data are necessary for creating cutting-edge AI models. This approach could potentially mitigate privacy concerns and reduce the environmental impact associated with training on massive datasets. Moreover, it opens up new possibilities for creating specialized AI models in domains where real-world data is scarce or sensitive.

A comparison chart showing the performance of various AI models on different tasks, with Groq’s Llama 3 models leading in overall accuracy. The data highlights the competitive edge of open-source models against proprietary offerings from major tech companies. (Image Credit: Groq)

Democratizing AI: The promise of open-source accessibility

The models are now available through the Groq API and Hugging Face, a popular platform for sharing machine learning models. This accessibility could accelerate innovation in fields requiring complex tool use and function calling, such as automated coding, data analysis, and interactive AI assistants.

Groq has also launched a public demo on Hugging Face Spaces, allowing users to interact with the model and test its tool use abilities firsthand. Like many of the demos on Hugging Face Spaces, this was built in collaboration with Gradio, which Hugging Face acquired in December 2021. The AI community has responded enthusiastically, with many researchers and developers eager to explore the models’ capabilities.

The open-source challenge: Reshaping the AI landscape

As the AI industry continues to evolve, Groq’s open-source approach contrasts sharply with the closed systems of larger tech companies. This move may pressure industry leaders to be more transparent about their own models and potentially accelerate the overall pace of AI development.

The release of these high-performing open-source models positions Groq as a major player in the AI field. As researchers, businesses, and policymakers evaluate the impact of this technology, the broader implications for AI accessibility and innovation remain to be seen. The success of Groq’s models could lead to a paradigm shift in how AI is developed and deployed, potentially democratizing access to advanced AI capabilities and fostering a more diverse and innovative AI ecosystem.

bnew · Jul 27, 2024

\Proton launches 'privacy-first' AI writing assistant for email that runs on-device | TechCrunch

Proton launches ‘privacy-first’ AI writing assistant for email that runs on-device

Privacy FTW, but there are trade-offs

Paul Sawers

4:00 AM PDT • July 18, 2024

Comment

Image Credits: Idrees Abbas/SOPA Images/LightRocket via Getty Images

Privacy app maker Proton has launched a new AI-enabled writing assistant that can help users compose emails with simple prompts, redraft them and even proofread them before they’re sent.

The launch sees Proton continue on a trajectory that has seen it replicate many of Google’s products and features in the productivity tools space. Just last month, Google brought its own Gemini AI to Gmail to help users write and summarize emails, and now Proton is following suit with its own flavor.

As one might expect with Proton, a Swiss company known for its suite of privacy-centric apps, including email, VPN, password manager, calendar, cloud storage and documents, its new assistant is targeted at those concerned about leaking sensitive data to third-party AI providers.

Proton Scribe, as the new tool is called, is built on Mistral 7B, an open source language model from French AI startup Mistral. However, Proton says it will likely tinker with this in pursuit of the most optimum model for this use case. Additionally, the company says it is making the tool available under the open source GPL-3.0 license, which will make it easier to perform third-party security and privacy audits.

Going local

Proton Scribe can be deployed entirely at the local device level, meaning user data doesn’t leave the device. Moreover, Proton promises that its AI assistant won’t learn from user data — a particularly important feature for enterprise use cases, where privacy is paramount.

The problem that Proton is striving to address here is real: Businesses have been slower to embrace the generative AI revolution due to data privacy concerns. This early iteration of the writing assistant could go some way toward appeasing such concerns.

“We realized that irrespective of whether or not Proton builds AI tools, users are going to use AI, often with significant privacy consequences,” founder and CEO Andy Yen said. “Rather than have users copying their sensitive communications into third-party AI tools that often have appalling privacy practices, it would be better to instead build privacy-first AI tools directly into Proton Mail.”

For the less security-conscious, Proton Scribe can also be configured to run on Proton’s servers, which should mean it will run faster, depending on users’ own hardware.

Those who’d prefer to run the tool locally are prompted to download the model once to their machine, and then it will run on that device without interacting with external servers.

The company is quick to stress that it doesn’t keep any logs or share data with third-parties for people who choose to run Proton Scribe from its servers.

“Only the prompt entered by the user is transmitted to the server, and no data is ever retained after the email draft is created,” a company spokesperson told TechCrunch.

Setting up Proton Scribe.Image Credits:Proton

Once the tool has been installed, users can type in a prompt, such as “request samples from a supplier,” and then hit the generate button.

Proton Scribe: Write me an email.Image Credits:Proton

The assistant then spits out a template email based on the theme provided, and you can then edit and fine-tune what comes out.

With these privacy-centric provisions, there is at least one notable trade-off: Given that the tool doesn’t use any local data, its responses won’t be particularly personalized or contextual. They will likely be generic, a point that Proton conceded to TechCrunch.

However, the company said this is why it has added additional features, which it calls “quick actions,” designed to make it easy for users to edit the drafts, such as changing the tone, proofreading and making it more concise.

“Over time, we will look to improve Proton Scribe, adding context, etc., but all in a privacy-preserving way,” Proton said in a statement.

Proton Scribe: Editing options.Image Credits:Proton

Proton Scribe is limited to email for now, but the company said it may expand the tool to its other products in the future “depending on demand.” One obvious integration will be its recently launched collaborative document editing app.

Starting today, Proton’s writing assistant will be available for Proton Mail on the web and desktop, though the company confirmed that it will look to expand the tool to mobile devices in the future. In terms of costs, Proton Scribe is mostly targeted at business users, with those on either the Mail Essentials, Mail Professional or Proton Business Suite plans able to pay an extra $2.99 per month to access the writing assistant.

Additionally, those on one of Proton’s legacy and limited-availability plans, such as Visionary or Lifetime, will be given access to Proton Scribe for free. The company said that it may expand the feature to other consumer plans in the future.

bnew · Jul 28, 2024

China Is Closing the A.I. Gap With the United States

In recent weeks, Chinese tech companies have unveiled technologies that rival American systems — and they are already in the hands of consumers and software developers.

www.nytimes.com

China Is Closing the A.I. Gap With the United States

In recent weeks, Chinese tech companies have unveiled technologies that rival American systems — and they are already in the hands of consumers and software developers.

https://vp.nyt.com/video/2024/07/23/121352_1_00china-ai-VIDEO-top-1_wg_480p.mp4

A.I. generated videos created from text prompts using Kling, a video generator made by the Chinese company Kuaishou.

Prompt: “The astronaut jumps up from the moon’s surface and launches himself into space.”

Kuaishou
Prompt: “A giant panda is playing guitar by the lake.”

Kuaishou
Prompt: “A Chinese boy wearing glasses is eating a delicious cheeseburger in a fast food restaurant, with his eyes closed for enjoyment.”

Kuaishou
Prompt: “A couple is holding hands and walking in the starry sky, while the stars move dramatically in the background.”

Kuaishou
An A.I. generated video created from an archival photo without using text prompts.

Kuaishou

By Meaghan Tobin and Cade Metz

Meaghan Tobin reported from Shanghai, and Cade Metz from San Francisco.

July 25, 2024
阅读简体中文版閱讀繁體中文版

At the World Artificial Intelligence Conference in Shanghai this month, start-up founder Qu Dongqi showed off a video he had recently posted online. It displayed an old photograph of a woman with two toddlers. Then the photo sprang to life as the woman lifted the toddlers up in her arms and they laughed with surprise.

The video was created by A.I. technology from the Chinese internet company Kuaishou. The technology was reminiscent of a video generator, called Sora, that the American start-up OpenAI unveiled this year. But unlike Sora, it was available to the general public.

“My American friends still can’t use Sora,” Mr. Qu said. “But we already have better solutions here.”

A.I. generated videos created from text prompts using Kling, a video generator made by the Chinese company Kuaishou.

https://vp.nyt.com/video/2024/07/23/121339_1_mosaic-ai-china-ai-cropped-23-1-252_wg_720p.mp4
Prompt: “Mona Lisa puts on glasses with her hands.“

Kuaishou
https://vp.nyt.com/video/2024/07/23/121345_1_mosaic-ai-china-ai-cropped-23-2-627_wg_720p.mp4

Prompt: “Einstein plays guitar.”

Kuaishou
https://vp.nyt.com/video/2024/07/23/121347_1_mosaic-ai-china-ai-cropped-23-4-724_wg_720p.mp4

Prompt: “Kitten riding in an airplane and looking out the window.”

Kuaishou
https://vp.nyt.com/video/2024/07/23/121346_1_mosaic-ai-china-ai-cropped-23-3-440_wg_720p.mp4

Prompt: “Cute shepherd dog running, tennis ball bouncing, warm atmosphere.”

Kuaishou
https://vp.nyt.com/video/2024/07/23/121348_1_mosaic-ai-china-ai-cropped-23-5-2-673_wg_720p.mp4

Prompt: “A girl eating noodles.”

Kuaishou

While the United States has had a head start on A.I. development, China is catching up. In recent weeks, several Chinese companies have unveiled A.I. technologies that rival the leading American systems. And these technologies are already in the hands of consumers, businesses and independent software developers across the globe.

While many American companies are worried that A.I. technologies could accelerate the spread of disinformation or cause other serious harm, Chinese companies are more willing to release their technologies to consumers or even share the underlying software code with other businesses and software developers. This kind of sharing of computer code, called open source, allows others to more quickly build and distribute their own products using the same technologies.

Open source has been a cornerstone of the development of computer software, the internet and, now, artificial intelligence. The idea is that technology advances faster when its computer code is freely available for anyone to examine, use and improve upon.

China’s efforts could have enormous implications as A.I. technology continues to develop in the years to come. The technology could increase the productivity of workers, fuel future innovations and power a new wave of military technologies, including autonomous weapons.

When OpenAI kicked off the A.I. boom in late 2022 with the release of the online chatbot ChatGPT, China struggled to compete with technologies emerging from American companies like OpenAI and Google. (The New York Times has sued OpenAI and its partner, Microsoft, claiming copyright infringement of news content related to A.I. systems.) But China’s progress is now accelerating.

bnew · Jul 28, 2024

Kuaishou released its video generator, Kling, in China more than a month ago and to users worldwide on Wednesday. Just before Kling’s arrival, 01.AI, a start-up co-founded by Kai-Fu Lee, an investor and technologist who helped build Chinese offices for both Google and Microsoft, released chatbot technology that scored nearly as well as the leading American technologies on common benchmark tests that rate the performance of the world’s chatbots.

Image

Kai-Fu Lee smiling for a photo while wearing a tuxedo at a formal event. Other people pass by in the background.

Kai-Fu Lee, a co-founder of the start-up 01.AI. The company unveiled a new version of its technology this year that sits near the top of a leaderboard that ranks the world’s best technologies.Credit...Krista Schlueter for The New York Times

New technology from the Chinese tech giant Alibaba has also leaped to the top of a leaderboard that rates open-source A.I. systems. “We have disproved the commonplace belief that China doesn’t have the talent or the technology to compete with the U.S.,” Dr. Lee said. “That belief is simply wrong.”

In interviews, a dozen technologists and researchers at Chinese tech companies said open-source technologies were a key reason that China’s A.I. development has advanced so quickly. They saw open-source A.I. as an opportunity for the country to take a lead.

But that will not be easy. The United States remains at the forefront of A.I. research. And U.S. officials have resolved to keep it that way.

The White House has instituted a trade embargo designed to prevent Chinese companies from using the most powerful versions of computer chips that are essential to building artificial intelligence. A group of lawmakers has introduced a bill that would make it easier for the White House to control the export of A.I. software built in the United States. Others are trying to limit the progress of open-source technologies that have helped fuel the rise of similar systems in China.

Disclosure:

The New York Times Company has sued OpenAI and Microsoft, claiming copyright infringement of content related to artificial intelligence systems. The companies have sought to dismiss some of the claims. Times reporters have no involvement in the case and remain independent in their coverage.

The top American companies are also exploring new technologies that aim to eclipse the powers of today’s chatbots and video generators.

“Chinese companies are good at replicating and improving what the U.S. already has,” said Yiran Chen, a professor of electrical and computer engineering at Duke University. “They are not as good at inventing something completely new that will bypass the U.S. in five to 10 years.”

But many in China’s tech industry believe that open-source technology could help them grow despite those constraints. And if U.S. regulators stifle the progress of American open-source projects (as some lawmakers are discussing) China could gain a significant edge. If the best open-source technologies come from China, U.S. developers could end up building their systems atop Chinese technologies.

“Open-source A.I. is the foundation of A.I. development,” said Clément Delangue, chief executive of Hugging Face, a company that houses many of the world’s open-source A.I. projects. The U.S. built its leadership in A.I. through collaboration between companies and researchers, he said, “and it looks like China could do the same thing.”

Image

Clément Delangue walking with a group of people outside the U.S. Capitol.

Clément Delangue, right, the chief executive of the A.I. company Hugging Face, said that open-source technology could help China make gains in the field of A.I.Credit...Kenny Holston/The New York Times

While anyone with a computer can change open-source software code, it takes a lot of data, skill and computing power to fundamentally alter an A.I. system. When it comes to A.I., open source typically means that a system’s building blocks serve as a foundation that allows others to build something new, said Fu Hongyu, the director of A.I. governance at Alibaba’s research institute, AliResearch.

As in other countries, in China there is an intense debate over whether the latest technological advances should be made accessible to anyone or kept as closely held company secrets. Some, like Robin Li, the chief executive of Baidu, one of the few companies in China building its own A.I. technology entirely from scratch, think the technology is most profitable and secure when it is closed-source — that is, in the hands of a limited few.

A.I. systems require enormous resources: talent, data and computing power. Beijing has made it clear that the benefits accruing from such investments should be shared. The Chinese government has poured money into A.I. projects and subsidized resources like computing centers.

But Chinese tech companies face a major constraint on the development of their A.I. systems: compliance with Beijing’s strict censorship regime, which extends to generative A.I. technologies.

Kuaishou’s new video generator Kling appears to have been trained to follow the rules. Text prompts with any mention of China’s president, Xi Jinping, or controversial topics like feminism and the country’s real estate crisis yielded error messages. An image prompt of this year’s National People’s Congress yielded a video of the delegates shifting in their seats.

Kuaishou did not respond to questions about what steps the company took to prevent Kling from creating harmful, fake or politically sensitive content.

By making their most advanced A.I. technologies freely available, China’s tech giants are demonstrating their willingness to contribute to the country’s overall technological advancement as Beijing has established that the power and profit of the tech industry should be channeled toward the goal of self sufficiency.

The concern for some in China is that the country will struggle to amass the computing chips it needs to build increasingly powerful technologies. But that has not yet prevented Chinese companies from building powerful new technologies that can compete with U.S. systems.

At the end of last year, Dr. Lee’s company, 01.AI, was ridiculed on social media when someone discovered that the company had built its A.I. system using open-source technology originally built by Meta, owner of Facebook and Instagram. Some saw it as a symbol of China’s dependence on American ingenuity.

Six months later, 01.AI unveiled a new version of its technology. It now sits near the top of the leaderboard that ranks the world’s best technologies. Around the same time, a team from Stanford University in California unveiled Llama 3-V, claiming it outperformed other leading models. But a Chinese researcher soon noticed that the model was based on an open-source system originally built in China.

It was the reverse of the controversy surrounding 01.AI last year: Rather than Chinese developers building atop U.S. technology, U.S. developers built atop Chinese technology.

If regulators limit open-source projects in the United States and Chinese open-source technologies become the gold standard, Mr. Delangue said, this kind of thing could become the norm.

“If the trend continues, it becomes more and more of a challenge for the U.S.,” he said.

bnew · Jul 29, 2024

The A.I Megathread (LLM , GPT , Development)

Veteran

​

AI wars heat up: OpenAI’s SearchGPT takes on Google’s search dominance​

​

AI-powered search: The next frontier in information retrieval​

​

Publishers and AI: A delicate balance in the digital ecosystem​

​

The future of search: Challenges and opportunities in the AI era​

Veteran

Veteran

AI is confusing — here’s your cheat sheet​

​

If you can’t tell the difference between AGI and RAG, don’t worry! We’re here for you.​

What exactly​

I keep hearing a lot of talk about models. What are those?​

Veteran

But how do AI models get all that info?​

Are there any other terms I may come across?​

How about hardware? What do AI systems run on?​

So what are all these different AI apps I keep hearing about?​

Veteran

ChatGPT won't let you give it instruction amnesia anymore​

AI Safety Locks​

Veteran

OpenAI's massive operating costs could push it close to bankruptcy within 12 months​

The ChatGPT maker could lose $5 billion this year​

// Related Stories​

Veteran

Gen AI boosts individual creativity at the cost of collective diversity, study finds​

Measuring the impact of generative AI on creative writing​

Generative AI enhances creativity​

Individual creativity versus collective novelty​

Veteran

From reality to fantasy: Live2Diff AI brings instant video stylization to life​

Real-time video style transfer: The next frontier in digital content creation​

The future of video AI: Open-source innovation and industry applications​

Veteran

How Luma AI’s new ‘Loops’ feature in Dream Machine could transform digital marketing​

​

Democratizing creativity: How Luma AI is changing the game​

The AI ethics dilemma: Balancing innovation and responsibility​

Veteran

Sakana AI drops image models to generate Japan’s traditional ukiyo-e artwork​

What to expect from the new Sakana AI models?​

Goal for further research and development​

Veteran

Groq’s open-source Llama AI model tops leaderboard, outperforming GPT-4o and Claude in function calling​

Synthetic Data and Ethical AI: A New Paradigm in Model Training​

Democratizing AI: The promise of open-source accessibility​

The open-source challenge: Reshaping the AI landscape​

Veteran

Proton launches ‘privacy-first’ AI writing assistant for email that runs on-device​

Privacy FTW, but there are trade-offs​

Going local​

Veteran

​

China Is Closing the A.I. Gap With the United States​

Veteran

Veteran

AI wars heat up: OpenAI’s SearchGPT takes on Google’s search dominance

AI-powered search: The next frontier in information retrieval

Publishers and AI: A delicate balance in the digital ecosystem

The future of search: Challenges and opportunities in the AI era

AI is confusing — here’s your cheat sheet

If you can’t tell the difference between AGI and RAG, don’t worry! We’re here for you.

What exactly

I keep hearing a lot of talk about models. What are those?

But how do AI models get all that info?

Are there any other terms I may come across?

How about hardware? What do AI systems run on?

So what are all these different AI apps I keep hearing about?

ChatGPT won't let you give it instruction amnesia anymore

AI Safety Locks

OpenAI's massive operating costs could push it close to bankruptcy within 12 months

The ChatGPT maker could lose $5 billion this year

// Related Stories

Gen AI boosts individual creativity at the cost of collective diversity, study finds

Measuring the impact of generative AI on creative writing

Generative AI enhances creativity

Individual creativity versus collective novelty

From reality to fantasy: Live2Diff AI brings instant video stylization to life

Real-time video style transfer: The next frontier in digital content creation

The future of video AI: Open-source innovation and industry applications

How Luma AI’s new ‘Loops’ feature in Dream Machine could transform digital marketing

Democratizing creativity: How Luma AI is changing the game

The AI ethics dilemma: Balancing innovation and responsibility

Sakana AI drops image models to generate Japan’s traditional ukiyo-e artwork

What to expect from the new Sakana AI models?

Goal for further research and development

Groq’s open-source Llama AI model tops leaderboard, outperforming GPT-4o and Claude in function calling

Synthetic Data and Ethical AI: A New Paradigm in Model Training

Democratizing AI: The promise of open-source accessibility

The open-source challenge: Reshaping the AI landscape

Proton launches ‘privacy-first’ AI writing assistant for email that runs on-device

Privacy FTW, but there are trade-offs

Going local

China Is Closing the A.I. Gap With the United States