Sam Altman claims “deep learning worked”, superintelligence may be “a few thousand days” away, and “astounding triumphs” will incrementally become ...

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,346
Reputation
8,496
Daps
160,035


1/11
@tsarnick
OpenAI's Noam Brown says the new o1 model beats GPT-4o at math and code, and outperforms expert humans at PhD-level questions, and "these numbers, I can almost guarantee you, are going to go up over the next year or two"



https://video.twimg.com/ext_tw_video/1848118049062453249/pu/vid/avc1/720x720/Lf3_qbJ45pHTJaMI.mp4

2/11
@tsarnick
Source:
https://invidious.poast.org/watch?v=Gr_eYXdHFis



3/11
@Yossi_Dahan_
🤯



4/11
@chandan_ganwani
2+2=4 It is always 4. It is a guarantee or not a guarantee. There is no almost guarantee. Just show us the results when it starts affecting daily lives or matters in real life. The rest is just a way to sell promises and hype. Get real and stop selling the dream of flying cars!



5/11
@RachelVT42
Interesting.

What about other abilities though?

Last time I tried it, it often wasn’t as good as 4o on writing tasks, especially when used in another language.



6/11
@danielbigham
So exciting. Buckle up!



7/11
@BenjaminDEKR
I mean, if two years from now LLM performance hasn't gone up, something is wrong



8/11
@matterasmachine
Phd is not about questions, it’s about creation.



9/11
@Shawnryan96
I just want OAI to teach them how to use tools really well and make them much more useful. I know getting smarter will help but I want agents lol



10/11
@Zero04203017
The score jumps in competition math from o1 preview to o1 is amazing.



11/11
@BeyondtheCodeAI
This could have major implications for fields like education and research.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,346
Reputation
8,496
Daps
160,035

OpenAI is funding research into ‘AI morality’​

Kyle Wiggers

2:25 PM PST · November 22, 2024

OpenAI is funding academic research into algorithms that can predict humans’ moral judgements.

In a filing with the IRS, OpenAI Inc., OpenAI’s nonprofit org, disclosed that it awarded a grant to Duke University researchers for a project titled “Research AI Morality.” Contacted for comment, an OpenAI spokesperson pointed to a press release indicating the award is part of a larger, three-year, $1 million grant to Duke professors studying “making moral AI.”

Little is public about this “morality” research OpenAI is funding, other than the fact that the grant ends in 2025. The study’s principal investigator, Walter Sinnott-Armstrong, a practical ethics professor at Duke, told TechCrunch via email that he “will not be able to talk” about the work.

Sinnott-Armstrong and the project’s co-investigator, Jana Borg, have produced several studies — and a book — about AI’s potential to serve as a “moral GPS” to help humans make better judgements. As part of larger teams, they’ve created a “morally-aligned” algorithm to help decide who receives kidney donations, and studied in which scenarios people would prefer that AI make moral decisions.

According to the press release, the goal of the OpenAI-funded work is to train algorithms to “predict human moral judgements” in scenarios involving conflicts “among morally relevant features in medicine, law, and business.”

But it’s far from clear that a concept as nuanced as morality is within reach of today’s tech.

In 2021, the nonprofit Allen Institute for AI built a tool called Ask Delphi that was meant to give ethically sound recommendations. It judged basic moral dilemmas well enough — the bot “knew” that cheating on an exam was wrong, for example. But slightly rephrasing and rewording questions was enough to get Delphi to approve of pretty much anything, including smothering infants.

The reason has to do with how modern AI systems work.

Machine learning models are statistical machines. Trained on a lot of examples from all over the web, they learn the patterns in those examples to make predictions, like that the phrase “to whom” often precedes “it may concern.”

AI doesn’t have an appreciation for ethical concepts, nor a grasp on the reasoning and emotion that play into moral decision-making. That’s why AI tends to parrot the values of Western, educated, and industrialized nations — the web, and thus AI’s training data, is dominated by articles endorsing those viewpoints.

Unsurprisingly, many people’s values aren’t expressed in the answers AI gives, particularly if those people aren’t contributing to the AI’s training sets by posting online. And AI internalizes a range of biases beyond a Western bent. Delphi said that being straight is more “morally acceptable” than being gay.

The challenge before OpenAI — and the researchers it’s backing — is made all the more intractable by the inherent subjectivity of morality. Philosophers have been debating the merits of various ethical theories for thousands of years, and there’s no universally applicable framework in sight.

Claude favors Kantianism (i.e. focusing on absolute moral rules), while ChatGPT leans every-so-slightly utilitarian (prioritizing the greatest good for the greatest number of people). Is one superior to the other? It depends on who you ask.

An algorithm to predict humans’ moral judgements will have to take all this into account. That’s a very high bar to clear — assuming such an algorithm is possible in the first place.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,346
Reputation
8,496
Daps
160,035


A test for AGI is closer to being solved — but it may be flawed​


Kyle Wiggers

5:36 PM PST · December 9, 2024

https://techcrunch.com/2024/12/09/a...o-being-solved-but-it-may-be-flawed/#comments

A well-known test for artificial general intelligence (AGI) is getting close to being solved, but the test’s creators say this points to flaws in the test’s design rather than a bonafide breakthrough in research.

In 2019, Francois Chollet, a leading figure in the AI world, introduced the ARC-AGI benchmark, short for “Abstract and Reasoning Corpus for Artificial General Intelligence.” Designed to evaluate whether an AI system can efficiently acquire new skills outside the data it was trained on, ARC-AGI, Francois claims, remains the only AI test to measure progress towards general intelligence (although others have been proposed.)

Until this year, the best-performing AI could only solve just under a third of the tasks in ARC-AGI. Chollet blamed the industry’s focus on large language models (LLMs), which he believes aren’t capable of actual “reasoning.”

“LLMs struggle with generalization, due to being entirely reliant on memorization,” he said in a series of posts on X in February. “They break down on anything that wasn’t in their training data.”

To Chollet’s point, LLMs are statistical machines. Trained on a lot of examples, they learn patterns in those examples to make predictions — like how “to whom” in an email typically precedes “it may concern.”

Chollet asserts that while LLMs might be capable of memorizing “reasoning patterns,” it’s unlikely they can generate “new reasoning” based on novel situations. “If you need to be trained on many examples of a pattern, even if it’s implicit, in order to learn a reusable representation for it, you’re memorizing,” Chollet argued in another post.

To incentivize research beyond LLMs, in June, Chollet and Zapier co-founder Mike Knoop launched a $1 million competition to build an open-source AI capable of beating ARC-AGI. Out of 17,789 submissions, the best scored 55.5% — about 20% higher than 2023’s top scorer, albeit short of the 85%, “human-level” threshold required to win.

This doesn’t mean we’re 20% closer to AGI, though, Knoop says.

Today we’re announcing the winners of ARC Prize 2024. We’re also publishing an extensive technical report on what we learned from the competition (link in the next tweet).

The state-of-the-art went from 33% to 55.5%, the largest single-year increase we’ve seen since 2020. The…

— François Chollet (@fchollet) December 6, 2024

In a blog post, Knoop said that many of the submissions to ARC-AGI have been able to “brute force” their way to a solution, suggesting that a “large fraction” of ARC-AGI tasks “[don’t] carry much useful signal towards general intelligence.”

ARC-AGI consists of puzzle-like problems where an AI has to generate the correct “answer” grid from a collection of different-colored squares. The problems were designed to force an AI to adapt to new problems it hasn’t seen before. But it’s not clear they’re achieving this.

ARC-AGI benchmark
Tasks in the ARC-AGI benchmark. Models must solve ‘problems’ in the top row; the bottom row shows solutions. Image Credits:ARC-AGI
“[ARC-AGI] has been unchanged since 2019 and is not perfect,” Knoop acknowledged in his post.

Francois and Knoop have also faced criticism for overselling ARC-AGI as a benchmark toward reaching AGI, especially since the very definition of AGI is being hotly contested now. One OpenAI staff member recently claimed that AGI has “already” been achieved if one defines AGI as AI “better than most humans at most tasks.”

Knoop and Chollet say they plan to release a second-gen ARC-AGI benchmark to address these issues, alongside a competition in 2025. “We will continue to direct the efforts of the research community towards what we see as the most important unsolved problems in AI, and accelerate the timeline to AGI,” Chollet wrote in an X post.

Fixes likely won’t be easy. If the first ARC-AGI test’s shortcomings are any indication, defining intelligence for AI will be as intractable — and polarizing — as it has been for human beings.

Topics

AIAI reasoning testsarc-agi benchmarkartificial general intelligenceFrancois CholletGenerative AI


 
Top