Large Language Models News & Discussions

bnew · Mar 12, 2024

1/8
Today we're excited to introduce Devin, the first AI software engineer.

Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork.

Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser.

When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted.

Check out what Devin can do in the thread below.

2/8
Today we're excited to introduce Devin, the first AI software engineer.

Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork.

Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser.

When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted.

Check out what Devin can do in the thread below.Today we're excited to introduce Devin, the first AI software engineer.

Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork.

Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser.

When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted.

Check out what Devin can do in the thread below.

3/8
1/4 Devin can learn how to use unfamiliar technologies.

4/8
2/4 Devin can contribute to mature production repositories.

5/8
3/4 Devin can train and fine tune its own AI models.

6/8
4/4 We even tried giving Devin real jobs on Upwork and it could do those too!

7/8
For more details on Devin, check out our blog post here: For more details on Devin, check out our blog post here:

Blog

www.cognition-labs.com

See Devin in action
If you have any project ideas, drop them below and we'll forward them to Devin.

See Devin in action
If you have any project ideas, drop them below and we'll forward them to Devin.

8/8
We'd like to thank all our supporters who have helped us get to where we are today, including @patrickc, @collision, @eladgil, @saranormous, Chris Re, @eglyman, @karimatiyeh, @bernhardsson, @t_xu, @FEhrsam, @foundersfund, and many more.

If you’re excited to solve some of the…

bnew · Mar 13, 2024

Midjourney bans all Stability AI employees over alleged data scraping

Midjourney claims the alleged activity caused a 24-hour service outage.

www.theverge.com

Midjourney bans all Stability AI employees over alleged data scraping

Midjourney blamed a near 24-hour service outage on ‘botnet-like activity’ from two accounts linked to the Stable Diffusion creator.

By Jess Weatherbed, a news writer focused on creative industries, computing, and internet culture. Jess started her career at TechRadar, covering news and hardware reviews.

Mar 11, 2024, 3:22 PM EDT

13 Comments

Stability.AI CEO Emad Mostaque is currently investigating the situation.

Illustration: Beatrice Sala

Midjourney says it has banned Stability AI staffers from using its service, accusing employees at the rival generative AI company of causing a systems outage earlier this month during an attempt to scrape Midjourney’s data.

Midjourney posted an update to its Discord server on March 2nd that acknowledged an extended server outage was preventing generated images from appearing in user galleries. In a summary of a business update call on March 6th, Midjourney claimed that “botnet-like activity from paid accounts” — which the company specifically links to Stability AI employees — was behind the outage.

These meeting notes were posted to Midjourney’s official Discord channel following an “office hours” call on March 6th.
Image: Midjourney / Discord

According to Midjourney user Nick St. Pierre on X, who listened to the call, Midjourney said that the service was brought down because “someone at Stability AI was trying to grab all the prompt and image pairs in the middle of a night on Saturday.” St. Pierre said that Midjourney had linked multiple paid accounts to an individual on the Stability AI data team.

In its summary of the business update call on March 6th (which Midjourney refers to as “office hours”), the company says it’s banning all Stability AI employees from using its service “indefinitely” in response to the outage. Midjourney is also introducing a new policy that will similarly ban employees of any company that exercises “aggressive automation” or causes outages to the service.

St. Pierre flagged the accusations to Stability AI CEO Emad Mostaque, who replied on X, saying he was investigating the situation and that Stability hadn’t ordered the actions in question. “Very confusing how 2 accounts would do this team also hasn’t been scraping as we have been using synthetic & other data given SD3 outperforms all other models,” said Mostaque, referring to the Stable Diffusion 3 AI model currently in preview. He claimed that if the outage was caused by a Stability employee, then it was unintentional and “obviously not a DDoS attack.”

Midjourney founder David Holz responded to Mostaque in the same thread, claiming to have sent him “some information” to help with his internal investigation.

The situation is otherwise still developing, and no additional updates have been provided since that conversation on March 6th. At the time of writing, neither Midjourney nor Stability AI have responded to the Verge’s request for comment.

It does seem odd that scraping activity from just two accounts allegedly managed to cause such an extended server outage. The irony of this situation also hasn’t been lost on online creatives, who have extensively criticized both companies (and generative AI systems in general) for training their models on masses of online data scraped from their works without consent. Stable Diffusion and Midjourney have both been targeted with several copyright lawsuits, with the latter being accused of creating an artist database for training purposes in December.

bnew · Mar 13, 2024

Midjourney is testing a highly requested “consistent characters” feature.

The generative AI service’s new algorithm can now use the same character across multiple images and styles without deviating too far from their original design. Instructions on how to use “consistent characters” can be found on Midjourney’s Discord channel. The feature isn’t designed to...

www.theverge.com

Midjourney is testing a highly requested “consistent characters” feature.The generative AI service’s new algorithm can now use the same character across multiple images and styles without deviating too far from their original design.
Instructions on how to use “consistent characters” can be found on Midjourney’s Discord channel. The feature isn’t designed to replicate real people from photographs, and works best on characters generated via Midjourney.

1/1
We're testing a new algorithm today to help you have "consistent characters" across your images. Check out our announcement channel for more instructions. It works for both MJ6 and Niji6 models. We hope this helps you play with telling stories and building new worlds <3

1/7
It's similar to the style reference feature, except instead of matching style, it makes your characters match your Character Reference (--cref) image

I used the image on the left as my character reference

Prompts in ALT

2/7
It's similar to the style reference feature, except instead of matching style, it makes your characters match your Character Reference (--cref) image

I used the image on the left as my character reference

Prompts in ALT

3/7
It also works across image styles, which is pretty sick and very fun to play with

4/7
You can use the Character Weight parameter (--cw N) to control the level of character detail you carry over.

At lower values like --cw 0 it will focus mostly on the face, but at higher values like --cw 100 it'll pull more of the outfit in too

Top left is ref image

5/7
On the left is the character reference

On the right is the character reference used in a totally different prompt that included style references

It's definitely not perfect, but it's wayyy better than any other solution we've had previously

6/7
You can use more than one reference too, and start to blend things together like I did here

I used both examples in a single prompt here (i'll go into this in more detail in a future post

It also works through inpainting (I'll do a post on that too)

6/7
NOTES:
> precision is currently limited
> --cref works in niji 6 & v6 models
> --cw 100 is default (face, hair, & clothes)
> works best with MJ generated characters
> wont copy exact dimples/freckles/or logos

Messing w/ this all night tn
I'll let you know what else I figure out

bnew · Mar 13, 2024

The EU has officially adopted its sweeping AI law.

After two years of debate and revisions, European Parliament members gave the Artificial Intelligence Act their final approval on Wednesday. While the law officially comes into force 20 days after it’s published in the Official Journal (likely happening in May), some rules — like those impacting...

www.theverge.com

POSTED MAR 13, 2024
AT 7:34 AM EDT

0 Comments
JESS WEATHERBED

The EU has officially adopted its sweeping AI law.

After two years of debate and revisions, European Parliament members gave the Artificial Intelligence Act their final approval on Wednesday.

While the law officially comes into force 20 days after it’s published in the Official Journal (likely happening in May), some rules — like those impacting general-purpose AI systems like chatbots — will take effect 12 months later to give AI providers time to comply.

Artificial Intelligence Act: MEPs adopt landmark law | News | European Parliament
[WWW.EUROPARL.EUROPA.EU]

Artificial Intelligence Act: MEPs adopt landmark law

Press Releases
PLENARY SESSION
IMCO
LIBE

30 minutes ago

Safeguards on general purpose artificial intelligence
Limits on the use of biometric identification systems by law enforcement
Bans on social scoring and AI used to manipulate or exploit user vulnerabilities
Right of consumers to launch complaints and receive meaningful explanations

Personal identification technologies in street surveillance cameras

The untargeted scraping of facial images from CCTV footage to create facial recognition databases will be banned © Alexander / Adobe Stock

On Wednesday, Parliament approved the Artificial Intelligence Act that ensures safety and compliance with fundamental rights, while boosting innovation.

The regulation, agreed in negotiations with member states in December 2023, was endorsed by MEPs with 523 votes in favour, 46 against and 49 abstentions.

It aims to protect fundamental rights, democracy, the rule of law and environmental sustainability from high-risk AI, while boosting innovation and establishing Europe as a leader in the field. The regulation establishes obligations for AI based on its potential risks and level of impact.

Banned applications

The new rules ban certain AI applications that threaten citizens’ rights, including biometric categorisation systems based on sensitive characteristics and untargeted scraping of facial images from the internet or CCTV footage to create facial recognition databases. Emotion recognition in the workplace and schools, social scoring, predictive policing (when it is based solely on profiling a person or assessing their characteristics), and AI that manipulates human behaviour or exploits people’s vulnerabilities will also be forbidden.

Law enforcement exemptions

The use of biometric identification systems (RBI) by law enforcement is prohibited in principle, except in exhaustively listed and narrowly defined situations. “Real-time” RBI can only be deployed if strict safeguards are met, e.g. its use is limited in time and geographic scope and subject to specific prior judicial or administrative authorisation. Such uses may include, for example, a targeted search of a missing person or preventing a terrorist attack. Using such systems post-facto (“post-remote RBI”) is considered a high-risk use case, requiring judicial authorisation being linked to a criminal offence.

Obligations for high-risk systems

Clear obligations are also foreseen for other high-risk AI systems (due to their significant potential harm to health, safety, fundamental rights, environment, democracy and the rule of law). Examples of high-risk AI uses include critical infrastructure, education and vocational training, employment, essential private and public services (e.g. healthcare, banking), certain systems in law enforcement, migration and border management, justice and democratic processes (e.g. influencing elections). Such systems must assess and reduce risks, maintain use logs, be transparent and accurate, and ensure human oversight. Citizens will have a right to submit complaints about AI systems and receive explanations about decisions based on high-risk AI systems that affect their rights.

Transparency requirements

General-purpose AI (GPAI) systems, and the GPAI models they are based on, must meet certain transparency requirements, including compliance with EU copyright law and publishing detailed summaries of the content used for training. The more powerful GPAI models that could pose systemic risks will face additional requirements, including performing model evaluations, assessing and mitigating systemic risks, and reporting on incidents.

Additionally, artificial or manipulated images, audio or video content (“deepfakes”) need to be clearly labelled as such.

Measures to support innovation and SMEs

Regulatory sandboxes and real-world testing will have to be established at the national level, and made accessible to SMEs and start-ups, to develop and train innovative AI before its placement on the market.

Quotes

During the plenary debate on Tuesday, the Internal Market Committee co-rapporteur Brando Benifei (S&D, Italy) said: “We finally have the world’s first binding law on artificial intelligence, to reduce risks, create opportunities, combat discrimination, and bring transparency. Thanks to Parliament, unacceptable AI practices will be banned in Europe and the rights of workers and citizens will be protected. The AI Office will now be set up to support companies to start complying with the rules before they enter into force. We ensured that human beings and European values are at the very centre of AI’s development”.

Civil Liberties Committee co-rapporteur Dragos Tudorache (Renew, Romania) said: “The EU has delivered. We have linked the concept of artificial intelligence to the fundamental values that form the basis of our societies. However, much work lies ahead that goes beyond the AI Act itself. AI will push us to rethink the social contract at the heart of our democracies, our education models, labour markets, and the way we conduct warfare. The AI Act is a starting point for a new model of governance built around technology. We must now focus on putting this law into practice”.

Next steps

The regulation is still subject to a final lawyer-linguist check and is expected to be finally adopted before the end of the legislature (through the so-called corrigendum procedure). The law also needs to be formally endorsed by the Council.

It will enter into force twenty days after its publication in the official Journal, and be fully applicable 24 months after its entry into force, except for: bans on prohibited practises, which will apply six months after the entry into force date; codes of practise (nine months after entry into force); general-purpose AI rules including governance (12 months after entry into force); and obligations for high-risk systems (36 months).

Background

The Artificial Intelligence Act responds directly to citizens’ proposals from the Conference on the Future of Europe (COFE), most concretely to proposal 12(10) on enhancing EU’s competitiveness in strategic sectors, proposal 33(5) on a safe and trustworthy society, including countering disinformation and ensuring humans are ultimately in control, proposal 35 on promoting digital innovation, (3) while ensuring human oversight and (8) trustworthy and responsible use of AI, setting safeguards and ensuring transparency, and proposal 37 (3) on using AI and digital tools to improve citizens’ access to information, including persons with disabilities.

bnew · Mar 13, 2024

1/6
South Korea's local governments are deploying around 7,000 AI-robot dolls to seniors and dementia patients.

The $1,800 robot doll by Hyodal can hold full conversations to tackle loneliness and remind users to take medication.

Dystopian, yes, but the data is fascinating:

1. Studies (with over 9,000 users) found that depression levels reduced from 5.73 to 3.14, and medicine intake improved from 2.69 to 2.87.

2. The doll comes with a companion app and web monitoring platform for caretakers to monitor remotely.

3. Safety features are installed to alert when no movement has been detected for a certain period, essentially always watching the user.

4. The doll also offers touch interaction, 24-hour voice reminders, check-ins, voice messages, a health coach, quizzes, exercise, music, and more.

5. Caregivers have access to the app, allowing them to send/receive voice messages, make group announcements, and monitor motion detection.

I'd definitely have some privacy and data collection concerns here before handing this off to my family, but the product actually seems really cool.

Will be interesting to watch the data to see if this idea has legs.

Keep in mind, SK has a rapidly aging population and one of the world's lowest birth rates, so it makes sense for the local governments to be early adopters here.

2/6
South Korea's local governments are deploying around 7,000 AI-robot dolls to seniors and dementia patients.

The $1,800 robot doll by Hyodal can hold full conversations to tackle loneliness and remind users to take medication.

Dystopian, yes, but the data is fascinating:

1. Studies (with over 9,000 users) found that depression levels reduced from 5.73 to 3.14, and medicine intake improved from 2.69 to 2.87.

2. The doll comes with a companion app and web monitoring platform for caretakers to monitor remotely.

3. Safety features are installed to alert when no movement has been detected for a certain period, essentially always watching the user.

4. The doll also offers touch interaction, 24-hour voice reminders, check-ins, voice messages, a health coach, quizzes, exercise, music, and more.

5. Caregivers have access to the app, allowing them to send/receive voice messages, make group announcements, and monitor motion detection.

I'd definitely have some privacy and data collection concerns here before handing this off to my family, but the product actually seems really cool.

Will be interesting to watch the data to see if this idea has legs.

Keep in mind, SK has a rapidly aging population and one of the world's lowest birth rates, so it makes sense for the local governments to be early adopters here.

3/6
100%

4/6
TV doesn’t monitor you and collect your data, but very valid point

5/6
Yeah, also got to remember that this is the worst AI this will ever be. It’ll only get better.

Watching this AI loneliness space closely to see what the long term data looks like (if users stay interested, etc.)

6/6
Hyodol. Here's the website (it's in Korean)

bnew · Mar 13, 2024

Google’s new AI will play video games with you — but not to win

SIMA aims to be a better multiplayer gamer.

www.theverge.com

Google’s new AI will play video games with you — but not to win

Google DeepMind trained its video game playing AI agent on games like Valheim, No Man’s Sky, and Goat Simulator.

By Emilia David, a reporter who covers AI. Prior to joining The Verge, she covered the intersection between technology, finance, and the economy.

Mar 13, 2024, 10:00 AM EDT

2 Comments

Illustration: The Verge

Google DeepMind unveiled SIMA, an AI agent training to learn gaming skills so it plays more like a human instead of an overpowered AI that does its own thing. SIMA, which stands for Scalable, Instructable, Multiworld Agent, is currently only in research.

SIMA will eventually learn how to play any video game, even games with no linear path to end the game and open-world games. Though it’s not intended to replace existing game AI, think of it more as another player that meshes well with your party. It mixes natural language instruction with understanding 3D worlds and image recognition.

“SIMA isn’t trained to win a game; it’s trained to run it and do what it’s told,” said Google DeepMind researcher and SIMA co-lead Tim Harley during a briefing with reporters.

Google worked with eight game developers, including Hello Games, Embracer, Tuxedo Labs, Coffee Stain, and others, to train and test SIMA. Researchers plugged SIMA into games like No Man’s Sky, Teardown, Valheim, and Goat Simulator 3 to teach the AI agent the basics of playing the games. In a blog post, Google said that SIMA doesn’t need a custom API to play the games or access source codes.

Harley said the team chose games that were more focused on open play than narrative to help SIMA learn general gaming skills. If you’ve played or watched a playthrough of Goat Simulator, you know that doing random, spontaneous things is the point of the game, and Harley said it was this kind of spontaneity they hoped SIMA would learn.

To do this, the team first built a new environment in the Unity engine where the agents needed to create sculptures to test their understanding of object manipulation. Then, Google recorded pairs of human players — one controlling the game and the other giving instructions on what to do next — to capture language instructions. Afterward, players played independently to show what led to their actions in the game. All of this was fed to the SIMA agents to learn to predict what would happen next on the screen.

SIMA currently has about 600 basic skills, such as turning left, climbing a ladder, and opening the menu to use a map. Eventually, Harley said, SIMA could be instructed to do more complex functions within a game. Tasks like “find resources and build a camp” are still difficult because AI agents can’t perform actions for humans.

SIMA isn’t meant to be an AI-powered NPC like the ones from Nvidia and Convai, but another player in a game that impacts the result. SIMA project co-lead Frederic Besse said it’s too early to tell what kind of uses AI agents like it could bring to gaming outside of the research sphere.

Like AI NPCs, however, SIMA may eventually learn to talk, but it’s far from that. SIMA is still learning how to play games and adapt to ones it hasn’t played before. Google said that with more advanced AI models, SIMA may eventually be able to do more complex tasks and be the perfect AI party member to lead you to victory.

bnew · Mar 13, 2024

OpenAI’s Sora text-to-video generator will be publicly available later this year

The hyperrealistic video generator will soon be available to more people.

www.theverge.com

OpenAI’s Sora text-to-video generator will be publicly available later this year

OpenAI CTO Mira Murati tells The Wall Street Journal that Sora will eventually incorporate sound as well.

By Emma Roth, a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO.

Mar 13, 2024, 9:37 AM EDT

1 Comment

This is an example of the kind of content Sora can produce.
Image: OpenAI

You’ll soon get to try out OpenAI’s buzzy text-to-video generator for yourself. In an interview with The Wall Street Journal, OpenAI chief technology officer Mira Murati says Sora will be available “this year” and that it “could be a few months.”

OpenAI first showed off Sora, which is capable of generating hyperrealistic scenes based on a text prompt, in February. The company only made the tool available for visual artists, designers, and filmmakers to start, but that didn’t stop some Sora-generated videos from making their way onto platforms like X.

In addition to making the tool available to the public, Murati says OpenAI has plans to “eventually” incorporate audio, which has the potential to make the scenes even more realistic. The company also wants to allow users to edit the content in the videos Sora produces, as AI tools don’t always create accurate images. “We’re trying to figure out how to use this technology as a tool that people can edit and create with,” Murati tells the Journal.

When pressed on what data OpenAI used to train Sora, Murati didn’t get too specific and seemed to dodge the question. “I’m not going to go into the details of the data that was used, but it was publicly available or licensed data,” she says. Murati also says she isn’t sure whether it used videos from YouTube, Facebook, and Instagram. She only confirmed to the Journal that Sora uses content from Shutterstock, with which OpenAI has a partnership.

Murati also told the Journal that Sora is “much more expensive” to power. OpenAI is trying to make the tool “available at similar costs” to DALL-E, the company’s AI text-to-image model, when it’s released to the public. You can see even more examples of what kinds of videos this tool can produce in the Journal’s report, including an animated bull in a China shop and a mermaid smartphone reviewer.

As we approach the 2024 presidential election, concerns about generative AI tools and their potential to create misinformation have only increased. When released, Murati says Sora likely won’t be able to produce images of public figures, similar to DALL-E’s policies. Videos will also have a watermark to distinguish them from the real thing, but as my colleague Emilia David points out, watermarks aren’t a perfect solution.

bnew · Mar 13, 2024

The EU AI Act passed — here’s what comes next

The EU Act will come into force in 2025.

www.theverge.com

The EU AI Act passed — here’s what comes next

The EU’s sweeping AI regulations have (almost) passed their final hurdle.

By Emilia David and Jess Weatherbed

Updated Mar 13, 2024, 8:30 AM EDT

3 Comments

Now EU MEPs just need to figure out how to implement and enforce it.
Cath Virginia / The Verge

European Union lawmakers have officially approved the bloc’s landmark AI regulation, paving the way for the EU to prohibit certain uses of the technology and demand transparency from providers. In a majority vote on Wednesday, 523 European Parliament members elected to formally adopt the Artificial Intelligence Act (AI Act), and will now work towards its enforcement and implementation.

The AI Act has been hotly debated since it was first proposed in 2021, with some of its strictest regulations — such as a proposed total ban on biometric systems for mass public surveillance — being softened by last-minute compromises. While Wednesday’s announcement means the law has almost passed its final hurdle, it will still take years for some rules to be enforced.

The legal language of the text is still awaiting final approval, either via a separate announcement or a plenary session vote on April 10th/11th, with the AI Act then officially coming into force 20 days after it’s published in the Official Journal — which is anticipated to happen in May or June this year. Provisions will then take effect in stages: countries will have six months to ban prohibited AI systems, 12 months to enforce rules against “general-purpose AI systems” like chatbots, and up to 36 months for AI systems the law has designated as “high risk.”

Prohibited systems include things like social scoring, emotion recognition at work or schools, or systems that are designed to influence behavior or exploit user vulnerabilities. Examples of “high-risk” AI systems include those applied to critical infrastructure, education, and vocational training, certain law enforcement systems, and those that can be used to influence democratic processes like elections.

“In the very short run, the compromise on the EU AI Act won’t have much direct effect on established AI designers based in the US, because, by its terms, it probably won’t take effect until 2025,” said Paul Barrett back, deputy director of the NYU Stern Center for Business and Human Rights, back in December 2023 when the EU provisionally agreed on the landmark AI regulation. So for now, Barrett says major AI players like OpenAI, Microsoft, Google, and Meta will likely continue to fight for dominance, particularly as they navigate regulatory uncertainty in the US.

The AI Act got its start before the explosion in general-purpose AI (GPAI) tools like OpenAI’s GPT-4 large language model, and regulating them became a remarkably complicated sticking point in last-minute discussions. The act divides its rules on the level of risk an AI system has on society, or as the EU said in a statement, “the higher the risk, the stricter the rules.”

But some member states grew concerned that this strictness could make the EU an unattractive market for AI. France, Germany, and Italy all lobbied to water down restrictions on GPAI during negotiations. They won compromises, including limiting what can be considered “high-risk” systems, which would then be subject to some of the strictest rules. Instead of classifying all GPAI as high-risk, there will be a two-tier system and law enforcement exceptions for outright prohibited uses of AI like remote biometric identification.

That still hasn’t satisfied all critics. French President Emmanuel Macron attacked the rules, saying the AI Act creates a tough regulatory environment that hampers innovation. Barrett said some new European AI companies could find it challenging to raise capital with the current rules, which gives an advantage to American companies. Companies outside of Europe may even choose to avoid setting up shop in the region or block access to platforms so they don’t get fined for breaking the rules — a potential risk Europe has faced in the non-AI tech industry as well, following regulations like the Digital Markets Act and Digital Services Act.

But the rules also sidestep some of the most controversial issues around generative AI

AI models trained on publicly available — but sensitive and potentially copyrighted — data have become a big point of contention for organizations, for instance. The approved rules, however, do not create new laws around data collection. While the EU pioneered data protection laws through GDPR, its AI rules do not prohibit companies from gathering information, beyond requiring that it follow GDPR guidelines.

“Under the rules, companies may have to provide a transparency summary or data nutrition labels,” Susan Ariel Aaronson, director of the Digital Trade and Data Governance Hub and a research professor of international affairs at George Washington University said when the EU provisionally approved the rules. “But it’s not really going to change the behavior of companies around data.”

Aaronson points out that the AI Act still hasn’t clarified how companies should treat copyrighted material that’s part of model training data, beyond stating that developers should follow existing copyright laws (which leave lots of gray areas around AI). So it offers no incentive for AI model developers to avoid using copyrighted data.

The AI Act also won’t apply its potentially stiff fines to open-source developers, researchers, and smaller companies working further down the value chain — a decision that’s been lauded by open-source developers in the field. GitHub chief legal officer Shelley McKinley said it is “a positive development for open innovation and developers working to help solve some of society’s most pressing problems.” (GitHub, a popular open-source development hub, is a subsidiary of Microsoft.)

Observers think the most concrete impact could be pressuring other political figures, particularly American policymakers, to move faster. It’s not the first major regulatory framework for AI — in July, China passed guidelines for businesses that want to sell AI services to the public. But the EU’s relatively transparent and heavily debated development process has given the AI industry a sense of what to expect. Aaronson said the provisional text (which has since been approved) at least shows that the EU has listened and responded to public concerns around the technology.

Lothar Determann, data privacy and information technology partner at law firm Baker McKenzie, says the fact that it builds on existing data rules could also encourage governments to take stock of what regulations they have in place. And Blake Brannon, chief strategy officer at data privacy platform OneTrust, said more mature AI companies set up privacy protection guidelines in compliance with laws like GDPR and in anticipation of stricter policies. He said that depending on the company, the AI Act is “an additional sprinkle” to strategies already in place.

The US, by contrast, has largely failed to get AI regulation off the ground — despite being home to major players like Meta, Amazon, Adobe, Google, Nvidia, and OpenAI. Its biggest move so far has been a Biden administration executive order directing government agencies to develop safety standards and build on voluntary, non-binding agreements signed by large AI players. The few bills introduced in the Senate have mostly revolved around deepfakes and watermarking, and the closed-door AI forums held by Sen. Chuck Schumer (D-NY) have offered little clarity on the government’s direction in governing the technology.

Now, policymakers may look at the EU’s approach and take lessons from it

This doesn’t mean the US will take the same risk-based approach, but it may look to expand data transparency rules or allow GPAI models a little more leniency.

Navrina Singh, founder of Credo AI and a national AI advisory committee member, believes that while the AI Act is a huge moment for AI governance, things will not change rapidly, and there’s still a ton of work ahead.

“The focus for regulators on both sides of the Atlantic should be on assisting organizations of all sizes in the safe design, development, and deployment of AI that are both transparent and accountable,” Singh told The Verge in December. She adds there’s still a lack of standards and benchmarking processes, particularly around transparency.

The act does not retroactively regulate existing models or apps, but future versions of OpenAI’s GPT, Meta’s Llama, or Google’s Gemini will need to take into account the transparency requirements set by the EU. It may not produce dramatic changes overnight — but it demonstrates where the EU stands on AI.

Update March 12th, 8:30ET AM: Updated the original article following the EU Act being officially adopted.

bnew · Mar 13, 2024

1/5
Introducing SIMA: the first generalist AI agent to follow natural-language instructions in a broad range of 3D virtual environments and video games.

It can complete tasks similar to a human, and outperforms an agent trained in just one setting. Introducing SIMA, a Scalable Instructable Multiworld Agent

2/5
Introducing SIMA: the first generalist AI agent to follow natural-language instructions in a broad range of 3D virtual environments and video games.

It can complete tasks similar to a human, and outperforms an agent trained in just one setting. https://

3/5
We partnered with gaming studios to train SIMA (Scalable Instructable Multiworld Agent) on @NoMansSky, @Teardowngame, @ValheimGame and others.

These offer a wide range of distinct skills for it to learn, from flying a spaceship to crafting a helmet.

4/5
SIMA needs only the images provided by the 3D environment and natural-language instructions given by the user.

With mouse and keyboard outputs, it is evaluated across 600 skills, spanning areas like navigation and object interaction - such as "turn left" or "chop down tree."…

5/5
We found SIMA agents trained on all of our domains significantly outperformed those trained on just one world.

When it faced an unseen environment, it performed nearly as well as the specialized agent - highlighting its ability to generalize to new spaces. ↓

bnew · Mar 15, 2024

1/2
LMs Can Teach Themselves to Think Before Speaking

This paper presents a generalization of STaR, called Quiet-STaR, to enable language models (LMs) to learn to reason in more general and scalable ways.

Quiet-STaR enables LMs to generate rationales at each token to explain future text. It proposes a token-wise parallel sampling algorithm that helps improve LM predictions by efficiently generating internal thoughts. The rationale generation is improved using REINFORCE.

It's also interesting to see the use of meta-tokens to indicate when the model is generating rationale and when it's predicting based on the rationale.

Chain-of-thought, considered to be a "thinking out loud" approach, could potentially be improved further by allowing Quiet-STaR to "think quietly" and possibly generate more structured and coherent chains of thought.

Interesting findings from the paper: "Encouragingly, generated rationales disproportionately help model difficult-to-predict tokens and improve the LM’s ability to directly answer difficult questions. In particular, after continued pretraining of an LM on a corpus of internet text with Quiet-STaR, we find zero-shot improvements on GSM8K (5.9%→10.9%) and CommonsenseQA (36.3%→47.2%) and observe a perplexity improvement of difficult tokens in natural text."

2/2
Paper:

[2403.09629] Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Computer Science > Computation and Language

[Submitted on 14 Mar 2024]

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman

When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is implicit in almost all written text. For example, this applies to the steps not stated between the lines of a proof or to the theory of mind underlying a conversation. In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting -- ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions. We address key challenges, including 1) the computational cost of generating continuations, 2) the fact that the LM does not initially know how to generate or use internal thoughts, and 3) the need to predict beyond individual next tokens. To resolve these, we propose a tokenwise parallel sampling algorithm, using learnable tokens indicating a thought's start and end, and an extended teacher-forcing technique. Encouragingly, generated rationales disproportionately help model difficult-to-predict tokens and improve the LM's ability to directly answer difficult questions. In particular, after continued pretraining of an LM on a corpus of internet text with Quiet-STaR, we find zero-shot improvements on GSM8K (5.9%→10.9%) and CommonsenseQA (36.3%→47.2%) and observe a perplexity improvement of difficult tokens in natural text. Crucially, these improvements require no fine-tuning on these tasks. Quiet-STaR marks a step towards LMs that can learn to reason in a more general and scalable way.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2403.09629 [cs.CL]
	(or arXiv:2403.09629v1 [cs.CL] for this version)
	[2403.09629] Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Focus to learn more

Submission history

From: Eric Zelikman [view email]
[v1] Thu, 14 Mar 2024 17:58:16 UTC (510 KB)

https://arxiv.org/pdf/2403.09629

bnew · Mar 15, 2024

1/7
Introducing Maisa KPU: The next leap in AI reasoning capabilities.

The Knowledge Processing Unit is a Reasoning System for LLMs that leverages all their reasoning power and overcomes their intrinsic limitations.

2/7
For more details on the KPU, visit the technical report: KPU - Maisa

3/7
With a novel architecture, the system positions the LLM as the central reasoning engine, pushing the boundaries of AI capabilities. This design enables the KPU to adeptly tackle complex, end-to-end tasks, while eliminating hallucinations and context constraints.

4/7
We are pleased to show that the KPU has improved the performance for GSM8k, MATH, BBH and DROP benchmarks when evaluated against the most capable language models.

5/7
Join the waitlist for early access here: https://https://acvdq80a98m.typeform.com/to/t4orMXJK?typeform-source=maisa.ai

6/7
Some cool examples of the KPU capabilities:

7/7
DEMO TIME! Customer service:
Help a customer with a question about an order that did not arrive. This time the customer accidentally did not write the order ID correctly

x.com/maisaAI_/status/1768757167807459697

bnew · Mar 16, 2024

1/5
𝐂𝐨𝐝𝐞𝐔𝐥𝐭𝐫𝐚𝐅𝐞𝐞𝐝𝐛𝐚𝐜𝐤: 𝐀𝐧 𝐋𝐋𝐌-𝐚𝐬-𝐚-𝐉𝐮𝐝𝐠𝐞 𝐃𝐚𝐭𝐚𝐬𝐞𝐭 𝐟𝐨𝐫 𝐀𝐥𝐢𝐠𝐧𝐢𝐧𝐠 𝐋𝐚𝐫𝐠𝐞 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 𝐭𝐨 𝐂𝐨𝐝𝐢𝐧𝐠 𝐏𝐫𝐞𝐟𝐞𝐫𝐞𝐧𝐜𝐞𝐬

@AtonKamanda @sahraouh

Paper: [2403.09032] CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences

Code: GitHub - martin-wey/CodeUltraFeedback: CodeUltraFeedback for aligning large language models to coding preferences
2/5
Datasets:

CodeUltraFeedback: coseal/CodeUltraFeedback · Datasets at Hugging Face

CodeUltraFeedback Binarized: coseal/CodeUltraFeedback_binarized · Datasets at Hugging Face[

CODAL-Bench:

3/5
People who may be interested in this work @LoubnaBenAllal1 @lvwerra @_lewtun @_akhaliq @BigCodeProject

4/5
I think this is very interesting and could help getting a better initial policy and mitigate distribution shift issues between the preference dataset and the initial LLM policy. But the latter issue remains in the current off-policy scenario.
Thanks for sharing!

5/5
Hey we missed that one, thanks for the pointer!
Also well done on your last survey paper :smile:

bnew · Mar 16, 2024

1/7
Excited to introduce our new work LiveCodeBench!

Live evaluations to ensure fairness and reliability
Holistic evaluations using 4 code-related scenarios
Insights from comparing 20+ code models

We use problem release dates to detect and prevent contamination

2/7
Excited to introduce our new work LiveCodeBench!

Live evaluations to ensure fairness and reliability
Holistic evaluations using 4 code-related scenarios
Insights from comparing 20+ code models

We use problem release dates to detect and prevent contamination

3/7
Joint work from a super fun collaboration with @kingh0730 @minimario1729 @xu3kev @fanjia_yan @tianjun_zhang @sidawxyz Armando Solar-Lezama @koushik77 and Ion Stoica across UC Berkeley, MIT, and Cornell!

Paper URL - [2403.07974] LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

Keep reading for the key takeaways!!

4/7
Overfitting to HumanEval
Models cluster into two groups when comparing performance on LiveCodeBench and HumanEval

Closed (API) models (GPT4 Claude3 Mistral Gemini) - perform similarly on both benchmarks

Fine-tuned open models - perform better on HumanEval

5/7
Holistic Model Comparisons
Relative performances change over scenarios!

GPT4T is better at generating code Claude3-O is better at predicting test outputs

Closed models are better at NL reasoning.
Performance gap increases for execution and test prediction scenarios

6/7
OSS Coding Models for LCB

DeepSeek (33B), StarCoder2 (15B), and CodeLLaMa (34B) emerge as the top base models

Finetuning:
Boosts both LCB & HumanEval performance
May overfit to HumanEval-style problems
Need to diversify open fine-tuning data for robust gains

7/7
Check out our paper, datasets, and leaderboard for more details!

Paper - [2403.07974] LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

Huggingface - livecodebench (Live Code Bench)

Leaderboard - LiveCodeBench Leaderboard

bnew · Mar 19, 2024

bnew · Mar 20, 2024

OpenAI is expected to release a 'materially better' GPT-5 for its chatbot mid-year, sources say

The model that underlies the popular ChatGPT generative AI tool is said to be soon getting an upgrade.

www.businessinsider.com

OpenAI is expected to release a 'materially better' GPT-5 for its chatbot mid-year, sources say

Kali Hays and Darius Rafieyan

Mar 19, 2024, 6:35 PM EDT

Sam AltmanJustin Sullivan/Getty Images

OpenAI's main product is the popular generative AI tool ChatGPT.
Since its last major model upgrade to GPT-4, the tool has run into performance issues.
The next version of the model, GPT-5, is set to come out soon. It's said to be "materially better."

OpenAI is poised to release in the coming months the next version of its model for ChatGPT, the generative AI tool that kicked off the current wave of AI projects and investments.

The generative AI company helmed by Sam Altman is on track to put out GPT-5 sometime mid-year, likely during summer, according to two people familiar with the company. Some enterprise customers have recently received demos of the latest model and its related enhancements to the ChatGPT tool, another person familiar with the process said. These people, whose identities Business Insider has confirmed, asked to remain anonymous so they could speak freely.

"It's really good, like materially better," said one CEO who recently saw a version of GPT-5. OpenAI demonstrated the new model with use cases and data unique to his company, the CEO said. He said the company also alluded to other as-yet-unreleased capabilities of the model, including the ability to call AI agents being developed by OpenAI to perform tasks autonomously.

The company does not yet have a set release date for the new model, meaning current internal expectations for its release could change.

OpenAI is still training GPT-5, one of the people familiar said. After training is complete, it will be safety tested internally and further "red teamed," a process where employees and typically a selection of outsiders challenge the tool in various ways to find issues before it's made available to the public. There is no specific timeframe when safety testing needs to be completed, one of the people familiar noted, so that process could delay any release date.

Spokespeople for the company did not respond to an email requesting comment.

Sales to enterprise customers, which pay OpenAI for an enhanced version of ChatGPT for their work, are the company's main revenue stream as it builds out its business and Altman builds his growing AI empire.

OpenAI released a year ago its last major update to ChatGPT. GPT-4 was billed as being much faster and more accurate in its responses than its previous model GPT-3. OpenAI later in 2023 released GPT-4 Turbo, part of an effort to cure an issue sometimes referred to as " laziness" because the model would sometimes refuse to answer prompts.

Large language models like those of OpenAI are trained on massive sets of data scraped from across the web to respond to user prompts in an authoritative tone that evokes human speech patterns. That tone, along with the quality of the information it provides, can degrade depending on what training data is used for updates or other changes OpenAI may make in its development and maintenance work.

Several forums on Reddit have been dedicated to complaints of GPT-4 degradation and worse outputs from ChatGPT. People inside OpenAI hope GPT-5 will be more reliable and will impress the public and enterprise customers alike, one of the people familiar said.

Much of the most crucial training data for AI models is technically owned by copyright holders. OpenAI, along with many other tech companies, have argued against updated federal rules for how LLMs access and use such material.

Large Language Models News & Discussions

Veteran

Veteran

Midjourney bans all Stability AI employees over alleged data scraping​

Midjourney blamed a near 24-hour service outage on ‘botnet-like activity’ from two accounts linked to the Stable Diffusion creator.​

Veteran

Veteran

Artificial Intelligence Act: MEPs adopt landmark law​

Veteran

Veteran

Google’s new AI will play video games with you — but not to win​

Google DeepMind trained its video game playing AI agent on games like Valheim, No Man’s Sky, and Goat Simulator.​

Veteran

OpenAI’s Sora text-to-video generator will be publicly available later this year​

OpenAI CTO Mira Murati tells The Wall Street Journal that Sora will eventually incorporate sound as well.​

Veteran

The EU AI Act passed — here’s what comes next​

The EU’s sweeping AI regulations have (almost) passed their final hurdle.​

Veteran

Veteran

Computer Science > Computation and Language​

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking​

Submission history​

Veteran

Veteran

Veteran

Veteran

Veteran

OpenAI is expected to release a 'materially better' GPT-5 for its chatbot mid-year, sources say​

Midjourney bans all Stability AI employees over alleged data scraping

Midjourney blamed a near 24-hour service outage on ‘botnet-like activity’ from two accounts linked to the Stable Diffusion creator.

Artificial Intelligence Act: MEPs adopt landmark law

Google’s new AI will play video games with you — but not to win

Google DeepMind trained its video game playing AI agent on games like Valheim, No Man’s Sky, and Goat Simulator.

OpenAI’s Sora text-to-video generator will be publicly available later this year

OpenAI CTO Mira Murati tells The Wall Street Journal that Sora will eventually incorporate sound as well.

The EU AI Act passed — here’s what comes next

The EU’s sweeping AI regulations have (almost) passed their final hurdle.

Computer Science > Computation and Language

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Submission history

OpenAI is expected to release a 'materially better' GPT-5 for its chatbot mid-year, sources say