Y'all heard about ChatGPT yet? AI instantly generates question answers, entire essays etc.

bnew · Dec 17, 2024

ChatGPT’s AI search engine is rolling out to everyone

Now you can search the web directly within ChatGPT.

www.theverge.com

ChatGPT’s AI search engine is rolling out to everyone

OpenAI has also made some improvements to ChatGPT search on mobile.

By Emma Roth, a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO.

Dec 16, 2024, 2:52 PM EST

Image: The Verge

ChatGPT’s AI search engine is rolling out to all users starting today. OpenAI announced the news as part of its newest 12 days of ship-mas livestream, while also revealing an “optimized” version of the feature on mobile and the ability to search with advanced voice mode.
ChatGPT’s search engine first rolled out to paid subscribers in October. It will now be available at the free tier, though you have to have an account and be logged in.

One of the improvements for search on mobile makes ChatGPT look more like a traditional search engine. When looking for a particular location, like restaurants or local attractions, ChatGPT will display a list of results with accompanying images, ratings, and hours. Clicking on a location will pull up more information about the spot, and you can also view a map with directions from directly within the app.

Another feature aims to make ChatGPT search faster when you’re looking for certain kinds of sites, such as “hotel booking websites.” Instead of generating a response right away, ChatGPT will surface links to websites before taking the time to provide more information about each option. Additionally, ChatGPT can also automatically provide up-to-date information from the web when using Advanced Voice Mode, though that’s only available to paid users.

In earlier livestreams, OpenAI also announced the launch of its text-to-image model Sora and rolled out a $200 per month ChatGPT Pro subscription.

bnew · Dec 20, 2024

OpenAI announces o3 and o3-mini, its next simulated reasoning models

o3 matches human levels on ARC-AGI benchmark, and o3-mini exceeds o1 at some tasks.

arstechnica.com

OpenAI announces o3 and o3-mini, its next simulated reasoning models

o3 matches human levels on ARC-AGI benchmark, and o3-mini exceeds o1 at some tasks.

Benj Edwards – Dec 20, 2024 2:31 PM |
60

Credit: Benj Edwards / Andriy Onufriyenko via Getty Images

On Friday, during Day 12 of its "12 days of OpenAI," OpenAI CEO Sam Altman announced its latest AI "reasoning" models, o3 and o3-mini, which build upon the o1 models launched earlier this year. The company is not releasing them yet but will make these models available for public safety testing and research access today.

The models use what OpenAI calls "private chain of thought," where the model pauses to examine its internal dialog and plan ahead before responding, which you might call "simulated reasoning" (SR)—a form of AI that goes beyond basic large language models (LLMs).

The company named the model family "o3" instead of "o2" to avoid potential trademark conflicts with British telecom provider O2, according to The Information. During Friday's livestream, Altman acknowledged his company's naming foibles, saying, "In the grand tradition of OpenAI being really, truly bad at names, it'll be called o3."

According to OpenAI, the o3 model earned a record-breaking score on the ARC-AGI benchmark, a visual reasoning benchmark that has gone unbeaten since its creation in 2019. In low-compute scenarios, o3 scored 75.7 percent, while in high-compute testing, it reached 87.5 percent—comparable to human performance at an 85 percent threshold.

OpenAI also reported that o3 scored 96.7 percent on the 2024 American Invitational Mathematics Exam, missing just one question. The model also reached 87.7 percent on GPQA Diamond, which contains graduate-level biology, physics, and chemistry questions. On the Frontier Math benchmark by EpochAI, o3 solved 25.2 percent of problems, while no other model has exceeded 2 percent.

During the livestream, the president of the ARC Prize Foundation said, "When I see these results, I need to switch my worldview about what AI can do and what it is capable of."

The o3-mini variant, also announced Friday, includes an adaptive thinking time feature, offering low, medium, and high processing speeds. The company states that higher compute settings produce better results. OpenAI reports that o3-mini outperforms its predecessor, o1, on the Codeforces benchmark.

Simulated reasoning on the rise

OpenAI's announcement comes as other companies develop their own SR models, including Google, which announced Gemini 2.0 Flash Thinking Experimental on Thursday. In November, DeepSeek launched DeepSeek-R1, while Alibaba's Qwen team released QwQ, what they called the first "open" alternative to o1.

These new AI models are based on traditional LLMs, but with a twist: They are fine-tuned to produce a type of iterative chain of thought process that can consider its own results, simulating reasoning in an almost brute-force way that can be scaled at inference (running) time, instead of focusing on improvements during AI model training, which has seen diminishing returns recently.

OpenAI will make the new SR models available first to safety researchers for testing. Altman said the company plans to launch o3-mini in late January, with o3 following shortly after.[]

bnew · Dec 20, 2024

Call ChatGPT from any phone with OpenAI’s new 1-800 voice service

1-800-CHATGPT telephone number lets any US caller talk to OpenAI’s assistant—no smartphone required.

arstechnica.com

Call ChatGPT from any phone with OpenAI’s new 1-800 voice service

1-800-CHATGPT telephone number lets any US caller talk to OpenAI's assistant—no smartphone required.

Benj Edwards – Dec 18, 2024 1:42 PM |
75

Photo of two tin toy robots with a vintage telephone handset receiver.

Credit: Charles Taylor via Getty Images

On Wednesday, OpenAI launched a 1-800-CHATGPT (1-800-242-8478) telephone number that anyone in the US can call to talk to ChatGPT via voice chat for up to 15 minutes for free. The company also says that people outside the US can send text messages to the same number for free using WhatsApp.

Upon calling, users hear a voice say, "Hello again, it's ChatGPT, an AI assistant. Our conversation may be reviewed for safety. How can I help you?" Callers can ask ChatGPT anything they would normally ask the AI assistant and have a live, interactive conversation.

During a livestream demo of "Calling with ChatGPT" during Day 10 of "12 Days of OpenAI," OpenAI employees demonstrated several examples of the telephone-based voice chat in action, asking ChatGPT to identify a distinctive house in California and for help in translating a message into Spanish for a friend. For fun, they showed calls from an iPhone, a flip phone, and a vintage rotary phone.

OpenAI developers demonstrate calling 1-800-CHATGPT during a livestream on December 18, 2024. Credit: OpenAI

OpenAI says the new features came out of an internal OpenAI "hack week" project that a team built just a few weeks ago. The company says its goal is to make ChatGPT more accessible if someone does not have a smartphone or a computer handy.

During the livestream, an OpenAI employee explained that voice calls are limited to 15 minutes, after which users are prompted to return to their regular ChatGPT interface (website, mobile, or desktop app). The voice calling feature is built on OpenAI's Realtime API, while the WhatsApp text interface uses GPT-4o mini.[]

bnew · Dec 23, 2024

1/11
@mikeknoop
o3 is really special and everyone will need to update their intuition about what AI can/cannot do.

while these are still early days, this system shows a genuine increase in intelligence, canaried by ARC-AGI

semiprivate v1 scores:

* GPT-2 (2019): 0%
* GPT-3 (2020): 0%
* GPT-4 (2023): 2%
* GPT-4o (2024): 5%
* o1-preview (2024): 21%
* o1 high (2024): 32%
* o1 Pro (2024): ~50%
* o3 tuned low (2024): 76%
* o3 tuned high (2024): 87%

given i put in the original $1M @arcprize, i'd like to re-affirm my previous commitment. we will keep running the grand prize competition until an efficient 85% solution is open sourced.

but our ambitions are greater! ARC Prize found its mission this year -- to be an enduring north star towards AGI.

the ARC benchmark design principle is to be easy for humans, hard for AI and so long as there remain things in that category, there is more work to do for AGI.

there are >100 tasks from the v1 family unsolved by o3 even on the high compute config which is very curious.

successors to o3 will need to reckon with efficiency. i expect this to become a major focus for the field. for context, o3 high used 172x more compute than o3 low which itself used 100-1000x more compute than the grand prize competition target.

we also started work on v2 in earnest this summer (v2 is in the same grid domain as v1) and will launch it alongside ARC Prize 2025. early testing is promising even against o3 high compute. but the goal for v2 is not to make an adversarial benchmark, rather be interesting and high signal towards AGI.

we also want AGI benchmarks that can endure many years. i do not expect v2 will. and so we've also starting turning attention to v3 which will be very different. im excited to work with OpenAI and other labs on designing v3.

given it's almost the end of the year, im in the mood for reflection.

as anyone who has spent time with the ARC dataset can tell you, there is something special about it. and even moreso about a system than can fully beat it. we are seeing glimpses of that system with the o-series.

i mean it when i say these are early days. i believe o3 is the alexnet moment for program synthesis. we now have concrete evidence that deep-learning guided program search works.

we are staring up another mountain that, from my vantage point, looks equally tall and important as deep learning for AGI.

many things have surprised me this year, including o3. but the biggest surprise has been the increasing response to ARC Prize.

i've been surveying AI researchers about ARC for years. before ARC Prize launched in June, only one in ten had heard of it.

now it's objectively the spear tip benchmark, being used by spear tip labs, to demonstrate progress on the spear tip of AGI -- the most important technology in human history.

@fchollet deserves recognition for designing such an incredible benchmark.

i'm continually grateful for the opportunity to steward attention towards AGI with ARC Prize and we'll be back in 2025!

[Quoted tweet]
New verified ARC-AGI-Pub SoTA!

@OpenAI o3 has scored a breakthrough 75.7% on the ARC-AGI Semi-Private Evaluation.

And a high-compute o3 configuration (not eligible for ARC-AGI-Pub) scored 87.5% on the Semi-Private Eval.

1/4

2/11
@mckaywrigley
Perhaps the 50% number has already been floated and I just missed it, but this was a nice confirmation that o1 pro is indeed quite a bit better than even o1 high.

3/11
@mikeknoop
I use approximate score for o1 Pro because we didn't get API access in time and it was on a small sample size run, i'd give error bounds +-10%. In all cases, yes o1 Pro was better than o1 high.

4/11
@abuchanlife
sounds like o3 is pushing some boundaries! what’s the big deal about it?

5/11
@RyanEndacott
Congrats Mike! Super exciting to see how important the ARC-AGI benchmark has become!

6/11
@creativedrewy
Can anyone give an example of one of the ARC benchmark tasks that would be easy for a human but hard for the AI?

7/11
@StonkyOli
What does "tuned" mean?

8/11
@JoelKreager
Reasoning isn't what is going on. In the computational space, it is possible to know absolutely everything. The best method in this case, is to store a weighted image of every possible outcome.

9/11
@paras_savnani
intersting

10/11
@alienmilian
Incredible numbers.

11/11
@sriramk
Great work.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/11
@8teAPi
OpenAI’s o3 model for laypeople

What it is and why it’s important

What
> o3 is an AI language model, that under the right set of circumstances, can solve PhD level problems

Its Smart
> it’s a big deal because it’s effectively solved
a) ARC-AGI which is a picture puzzle IQ test similar to Raven’s matrices which is what Mensa uses
b) solved 25% of FrontierMath which are difficult grad student level math questions

There is no wall
> it’s also a really big deal because OpenAI only introduced its last o1 model 3 months ago. This means they reduced the cycle time to 3 months from 18 months
> Intel used to have a tick (chip die shrink) tock (architecture change) cycle during the height of Moore’s law.
OpenAI now effectively has a tick (new Nvidia chip training data center) 4 tocks (new chains of thought) cycle.
> This means potentially 5 (!) step ups in capability next year.

The machine that builds the machines
> OpenAI is also using its current generation of models to build its next generation
> The OpenAI staff themselves are somewhat bewildered by how well things are working

Fast, cheap models every tock
> OpenAI also introduced an o3-mini model which is small and fast and capable.
> Notably it was as capable as the much slower o1 full model.
> This means that every 3 months you can look forward to a cheap fast model as good as the smartest state of the art super genius model 3 months before that.

Reliability
> one big barrier to AI deployment has been hallucination and reliability.
> The o1 model had early indications of much higher reliability (in one test refusing to be tricked into giving up passwords 100% of the time to users).
> We don’t have a sense of how well the o3 models perform yet… but if this has been solved you will start seeing these models in service work next year…

By end 2025 (speculation)
> superhuman mathematician and programmer available at moderate prices
> reliable assistant for hotel booking, calendar management, passwords, general computer use

What will a superhuman mathematician/programmer do?
> Everywhere you use an algo, it will get better
> jump from 5G to 10G in cell phones
> credit default costs across economy will drop, leading to credit becoming much much cheaper. 0% interest rates for some, no credit for others
> search costs across economy drop: hotels, airlines, dating…
> quantitative trading will better allocate capital, more good ideas financed, fewer bad ideas funded

And then you get to 2026…

2/11
@8teAPi
Please follow me!

3/11
@8teAPi
This post was a response to

[Quoted tweet]
I have seen some of your posts about o3. Would love for you to do a little summary for the layman who doesn’t understand the technical nuances without context.

4/11
@8teAPi

[Quoted tweet]
The points wrt how models will impact markets is a great callout. Services with “hidden” knowledge (e.g., broker intermediaries) will go through normalization because models will be an information buffer, accessible to anyone. The work needed to arbitrage will drop significantly.

5/11
@AAbuhashem
even if things continue on a similar trajectory, it won’t reach near anywhere as your predictions for the superhuman mathematician
even if AGI happens next year, it won’t lead to what you’re saying. you’re talking about ASI that is not bound by energy or real world constraints

6/11
@8teAPi
An ASI is not God. To an AI of the year 3000, an ASI of 2030 is an imbecile. There is no ceiling to intelligence (it’s just compute, and there’s always more compute in the universe). But capped by energy and physical constraints.

7/11
@paul_cal
Agree mostly but comparison to Mensa for ARC-AGI isn't quite right. ARC is designed so median humans score highly

o3's performance on ARC is still v significant bc ARC stood as a benchmark since 2019. o3 has beaten all narrow model attempts w a general system (tho more $$/task)

8/11
@8teAPi
Had to contexualize somehow without too much jargon

9/11
@Yuriixyz
ai getting smarter but still cant flip jpegs like a true degen in the trenches

10/11
@sziq1713474
@readwise save it

11/11
@redneckbwana
I kinda wonder? Do the weights ultimately converge on something? Like some set of fractal coefficients? A grad unified model/theory of reality?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Amerikan Melanin · Dec 24, 2024

The world has changed forever with this.

bnew · Dec 25, 2024

https://archive.is/8Hvlr

bnew · Dec 29, 2024

https://archive.is/KUOvY

bnew · Jan 6, 2025

https://archive.is/tYXsP

bnew · Jan 8, 2025

https://archive.is/kqfuV

Kornball The Conqueror · Jan 8, 2025

brave browser has built in AI I'm using it to pass test

bnew · Jan 14, 2025

ChatGPT now lets you schedule reminders and recurring tasks | TechCrunch

Paying users of OpenAI's ChatGPT can now ask the AI assistant to schedule reminders or recurring requests. The new beta feature, called tasks, will start

techcrunch.com

ChatGPT now lets you schedule reminders and recurring tasks

Maxwell Zeff

10:00 AM PST · January 14, 2025

Paying users of OpenAI’s ChatGPT can now ask the AI assistant to schedule reminders or recurring requests. The new beta feature, called tasks, will start rolling out to ChatGPT Plus, Team, and Pro users around the globe this week.

With tasks, users can set simple reminders with ChatGPT such as, “Remind me when my passport expires in six months,” and the AI assistant will follow up with a push notification on whatever platform you have tasks enabled. Users can also now set recurring requests to ChatGPT, such as, “Every Friday, give me a weekend plan based on my location and the weather forecast,” or “Give me a news briefing every day at 7 a.m.”

The new tasks manager in ChatGPT’s Web APP.Image Credits:OpenAI

The new task feature appears to be OpenAI’s first step into AI models that can act somewhat independently, also known as AI agents. OpenAI CEO Sam Altman says that 2025 will be big for AI agents, even claiming they will “join the workforce” this year. Tasks is a fairly limited version of an agentic system, but it allows users to set reminders with ChatGPT, a practical feature most people have come to expect from assistants like Siri and Alexa. The scheduled information requests are more unique, showing new capabilities that previous digital assistants were not capable of.

Users can access tasks by selecting “4o with scheduled tasks” from a dropdown menu in ChatGPT. From there, they can send ChatGPT a message telling the AI assistant what reminder or action they want to create. At times, OpenAI says ChatGPT may suggest certain tasks based on chats. Users can set and manage tasks by chatting with the AI assistant on any platform, or through a dedicated tasks manager tab that’s only available on the web app.

Through the tasks feature, ChatGPT can now browse the web on a set schedule, but it will not run continuous searches in the background or make purchases. For example, you could instruct ChatGPT to check once a month for concert tickets to see your favorite artist in your area, but you can neither tell the AI assistant to alert you the moment the tickets go live, nor can ChatGPT buy tickets for you. That said, it’s a step toward those systems.

OpenAI says it’s using this beta period to learn more about how people use tasks before it makes the feature broadly available on its mobile app and free tier of ChatGPT. For this beta launch, the company says you can’t set tasks through Advanced Voice Mode.

While AI assistants based on large language models have pushed the limits of what computers can do, they also struggle with some simple tasks that smartphones are capable of. OpenAI, Google, and other AI model developers have had to come up with clever workarounds to get their assistants to set timers and create reminders. While these tasks are relatively low stakes, OpenAI wants ChatGPT to do much more complicated tasks moving forward.

OpenAI is gearing up to release more advanced agentic systems, including an agent reportedly called Operator that can write code and book travel. That system could be coming in the next few weeks, according to Bloomberg.

With more advanced agentic systems comes more potential problems. Tasks shows a fairly controlled selection of agentic abilities, but OpenAI’s safeguards may be tested in the coming months as it rolls out more independent AI systems.

bnew · Jan 14, 2025

OpenAI's AI reasoning model 'thinks' in Chinese sometimes and no one really knows why | TechCrunch

OpenAI's o1 'reasoning' model sometimes switches to Chinese and other languages as it reasons through problems, and AI experts don't know exactly why.

techcrunch.com

OpenAI’s AI reasoning model ‘thinks’ in Chinese sometimes and no one really knows why

Kyle Wiggers

7:05 AM PST · January 14, 2025

Shortly after OpenAI released o1, its first “reasoning” AI model, people began noting a curious phenomenon. The model would sometimes begin “thinking” in Chinese, Persian, or some other language — even when asked a question in English.

Given a problem to sort out — e.g. “How many R’s are in the word ‘strawberry?’” — o1 would begin its “thought” process, arriving at an answer by performing a series of reasoning steps. If the question was written in English, o1’s final response would be in English. But the model would perform some steps in another language before drawing its conclusion.

“[o1] randomly started thinking in Chinese halfway through,” one user on Reddit said.

“Why did [o1] randomly start thinking in Chinese?” a different user asked in a post on X. “No part of the conversation (5+ messages) was in Chinese.”

Why did o1 pro randomly start thinking in Chinese? No part of the conversation (5+ messages) was in Chinese… very interesting… training data influence pic.twitter.com/yZWCzoaiit

— Rishab Jain (@RishabJainK) January 9, 2025

OpenAI hasn’t provided an explanation for o1’s strange behavior — or even acknowledged it. So what might be going on?

Well, AI experts aren’t sure. But they have a few theories.

Several on X, including Hugging Face CEO Clément Delangue, alluded to the fact that reasoning models like o1 are trained on datasets containing a lot of Chinese characters. Ted Xiao, a researcher at Google DeepMind, claimed that companies including OpenAI use third-party Chinese data labeling services, and that o1 switching to Chinese is an example of “Chinese linguistic influence on reasoning.”

“[Labs like] OpenAI and Anthropic utilize [third-party] data labeling services for PhD-level reasoning data for science, math, and coding,” Xiao wrote in a post on X. “[F]or expert labor availability and cost reasons, many of these data providers are based in China.”

Labels, also known as tags or annotations, help models understand and interpret data during the training process. For example, labels to train an image recognition model might take the form of markings around objects or captions referring to each person, place, or object depicted in an image.

Studies have shown that biased labels can produce biased models. For example, the average annotator is more likely to label phrases in African-American Vernacular English (AAVE), the informal grammar used by some Black Americans, as toxic, leading AI toxicity detectors trained on the labels to see AAVE as disproportionately toxic.

Other experts don’t buy the o1 Chinese data labeling hypothesis, however. They point out that o1 is just as likely to switch to Hindi, Thai, or a language other than Chinese while teasing out a solution.

Rather, these experts say, o1 and other reasoning models might simply be using languages they find most efficient to achieve an objective (or hallucinating).

“The model doesn’t know what language is, or that languages are different,” Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, told TechCrunch. “It’s all just text to it.”

Indeed, models don’t directly process words. They use tokens instead. Tokens can be words, such as “fantastic.” Or they can be syllables, like “fan,” “tas,” and “tic.” Or they can even be individual characters in words — e.g. “f,” “a,” “n,” “t,” “a,” “s,” “t,” “i,” “c.”

Like labeling, tokens can introduce biases. For example, many word-to-token translators assume a space in a sentence denotes a new word, despite the fact that not all languages use spaces to separate words.

Tiezhen Wang, a software engineer at AI startup Hugging Face, agrees with Guzdial that reasoning models’ language inconsistencies may be explained by associations the models made during training.

“By embracing every linguistic nuance, we expand the model’s worldview and allow it to learn from the full spectrum of human knowledge,” Wang wrote in a post on X. “For example, I prefer doing math in Chinese because each digit is just one syllable, which makes calculations crisp and efficient. But when it comes to topics like unconscious bias, I automatically switch to English, mainly because that’s where I first learned and absorbed those ideas.”

Wang’s theory is plausible. Models are probabilistic machines, after all. Trained on many examples, they learn patterns to make predictions, such as how “to whom” in an email typically precedes “it may concern.”

But Luca Soldaini, a research scientist at the nonprofit Allen Institute for AI, cautioned that we can’t know for certain. “This type of observation on a deployed AI system is impossible to back up due to how opaque these models are,” they told TechCrunch. “It’s one of the many cases for why transparency in how AI systems are built is fundamental.”

Short of an answer from OpenAI, we’re left to muse about why o1 thinks of songs in French but synthetic biology in Mandarin.

WIA20XX · Jan 25, 2025

How you feel about Operator?

Might be a game changer for me personally.

bnew · Jan 25, 2025

1/13
@MatthewBerman
OpenAI just dropped Operator, their first Agents, who can use web browsers to complete tasks for you.

For the first time, OpenAI's agents can directly impact the real world.

The AI industry had strong reactions!

Here’s a roundup of reactions and incredible use cases.

2/13
@MatthewBerman
Andrej Karpathy, cofounder of OpenAI, compares Operator to humanoid robots in the physical world.

Why? Because both are designed to interact with systems built for humans (browsers, factories, streets).

[Quoted tweet]
Projects like OpenAI’s Operator are to the digital world as Humanoid robots are to the physical world. One general setting (monitor keyboard and mouse, or human body) that can in principle gradually perform arbitrarily general tasks, via an I/O interface originally designed for humans. In both cases, it leads to a gradually mixed autonomy world, where humans become high-level supervisors of low-level automation. A bit like a driver monitoring the Autopilot. This will happen faster in digital world than in physical world because flipping bits is somewhere around 1000X less expensive than moving atoms. Though the market size and opportunity feels a lot bigger in physical world.

We actually worked on this idea in very early OpenAI (see Universe and World of Bits projects), but it was incorrectly sequenced - LLMs had to happen first. Even now I am not 100% sure if it is ready. Multimodal (images, video, audio) just barely got integrated with LLMs last 1-2 years, often bolted on as adapters. Worse, we haven’t really been to the territory of very very long task horizons. E.g. videos are a huge amount of information and I’m not sure that we can expect to just stuff it all into context windows (current paradigm) and then expect it to also work. I could imagine a breakthrough or two needed here, as an example.

People on my TL are saying 2025 is the year of agents. Personally I think 2025-2035 is the decade of agents. I feel a huge amount of work across the board to make it actually work. But it *should* work. Today, Operator can find you lunch on DoorDash or check a hotel etc, sometimes and maybe. Tomorrow, you’ll spin up organizations of Operators for long-running tasks of your choice (eg running a whole company). You could be a kind of CEO monitoring 10 of them at once, maybe dropping in to the trenches sometimes to unblock something. And things will get pretty interesting.

3/13
@MatthewBerman

Greg Brockman, CTO of OpenAI, hints that Operator is just the start. Expect agents that can control your desktop, phone, and more.

[Quoted tweet]
Operator — research preview of an agent that can use its own browser to perform tasks for you.

2025 is the year of agents.

4/13
@MatthewBerman

Aaron Levie, CEO of Box, believes giving agents full browser access unlocks 100x more use cases.

Most web tasks lack APIs—agents solve this gap.

[Quoted tweet]
AI Agents having full browser access is going to open up 100x more use cases for AI. The web doesn’t have APIs for the long tail of tasks that we do every day on computers, and browser use is a major missing link. Another building block for AI is here.

https://video.twimg.com/ext_tw_video/1882496963976871936/pu/vid/avc1/1162x720/EWV5Y5IqcAMMRg2B.mp4

5/13
@MatthewBerman
Open source takes on Operator!

• @_akhaliq shares BrowserGPT
• @hwchase17 recommends BrowserUse
• @pk_iv shares BrowserBase

[Quoted tweet]
You don't need to pay $200 for AI.

We're launching Open Operator - an open source reference project that shows how easy it is to add web browsing capabilities to your existing AI tool.

It's early, slow, and might not work everywhere. But it's free and open source!

https://video.twimg.com/ext_tw_video/1882837132450082817/pu/vid/avc1/1694x1080/ps39hpEL-nrdARdv.mp4

6/13
@MatthewBerman
Data advantage: Greg Kamradt, President of @arcprize points out that Operator collects procedural data as it learns to navigate websites, improving over time.

This “memory” gives OpenAI a major edge in the agent race.

[Quoted tweet]
Imagine the procedural memory OpenAI is building up about how to navigate every website operator touches

Once they jump out of the browser to the desktop, no app is safe.

7/13
@MatthewBerman
But it's not perfect...yet.

[Quoted tweet]
My favorite thing about the AI agents is that they can help me get something done in half an hour, what used to take me less than a minute.

8/13
@MatthewBerman

Use cases highlight Operator’s potential:

@garrytan Planned an impromptu Vegas trip, navigating complex booking

[Quoted tweet]
OpenAI Operator is very impressive - planning an impromptu trip to Vegas — it's able to navigate JSX's website and handle unusual cases and basically figure out sold out scenarios, change dates and times, and now it's figuring out where to eat for Friday night for 2.

Bravo.

9/13
@MatthewBerman
.@omooretweets: Paid a bill from just a photo.

[Quoted tweet]
I just gave Operator a picture of a paper bill I got in the mail.

From only the bill picture, it navigated to the website, pulled up my account, entered my info, and asked for my credit card number to complete payment.

We are so back

10/13
@MatthewBerman
.@daniel_mac8: Built a website using Gemini AI + Operator.

[Quoted tweet]

well played @OpenAI

tried to access Operator through Operator

check out the message that was waiting for me:

11/13
@MatthewBerman

The coolest demo? @kieranklaassen used Operator to QA test a local dev environment, tunneling it through for 24/7 bug checks.

Imagine having an Agent QA engineer ready at all times to work alongside you.

[Quoted tweet]
This is extremely promising and the best use case of Operator so far!
@OpenAI ChatGPT Operator is great for testing my local dev environment to see if my feature is working!

Tunnel Operator to your local dev env and let it test your feature. Waiting for an API and @cursor_ai to integrate it.

https://video.twimg.com/ext_tw_video/1882585578962817024/pu/vid/avc1/1112x720/lRjJyXbBTM7Bc7AT.mp4

12/13
@MatthewBerman

Interesting insight from @emollick:

Operator’s brand preferences (e.g., choosing Bing or 1-800-Flowers) may inadvertently create new SEO industries.

Agents may define how brands compete in the future.

[Quoted tweet]
Next big thing for brands: knowing what brands agents prefer.

If you ask for stock prices, Claude with Computer Use goes to Yahoo Finance while Operator does a Bing search

Operator loves buying from the top search result on Bing. Claude has direct preferences like 1-800-Flowers

13/13
@MatthewBerman
If you enjoy this kind of stuff, check out my newsletter: Forward Future Daily

And check out my full video breakdown of the industry's reactions here:

https://invidious.poast.org/watch?v=i9s4fqhSvz8

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/5
@Techmeme
OpenAI partners with DoorDash, Instacart, Priceline, StubHub, and Uber to ensure that Operator respects these businesses' terms of service agreements (@zeffmax / TechCrunch)

OpenAI launches Operator, an AI agent that performs tasks autonomously | TechCrunch

OpenAI partners with DoorDash, eBay, Instacart, Priceline, StubHub, Uber, and other companies to ensure that Operator respects their terms of service agreements

2/5
@FindKar
also making it easier to build AI agents — but more for one-off workflows vs. one-off tasks

[Quoted tweet]
Watch @BytespaceAI web-agents control the web

With a few prompts, I built a web-agent that:

- Find prospects on LinkedIn
- Scrapes structured data about their profiles
- And uses Claude to send a personalized message

Whole salesteam on auto-pilot. When?

https://video.twimg.com/amplify_video/1878274284881399808/vid/avc1/1920x1080/_jbpiUi-UnRVx010.mp4

3/5
@evans4fintech
Regulatory compliance should be a top priority for Operator AI, US policy makers not moving to regulate applications like Operator baffles me.

4/5
@JOSourcing
Sam Altman's own words:

Do NOT trust me.

https://invidious.poast.org/watch?v=dY1VK8oHj5s

5/5
@JOSourcing

[Quoted tweet]
'Suchir Killed By OpenAI': #SuchirBalaji's mother's explosive claim

'My son had documents against OpenAI. They have attacked him and killed him...: Poornima Rao, Suchir Balaji's mother.

@PriyaBahal22 shares more details.

https://video.twimg.com/amplify_video/1880319397006352384/vid/avc1/1280x720/qs_8Tc_qvVTtclRZ.mp4

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jan 25, 2025

1/21
@yoheinakajima
haha, i used @openai operator build, deploy, and open source a tool on github using @replit agent.

took about 30 min, here's an ~8 min supercut video

thoughts:
- while working with replit agent, it actually deployed the app, tested it, and described the error back to replit agent for me
- operator asked me a few more Qs than i wanted, but it was mostly for safety (eg filling forms) so i guess okay with it
- it had trouble with a few things around UI like knowing it needs to scroll a page to see the rest of it, and it needed pointers to find the git feature in replit
- once it found the git feature it didn't need my assistance to create a repo and open source after having the agent write a readme

while a bit slower, this was even more automated than replit agent (especially testing features and working through errors) - which is impressive

would be nice to have: push notifications for when it needs my attention, and voice mode capabilities

https://video.twimg.com/ext_tw_video/1882706467029143552/pu/vid/avc1/1280x720/r_AlKu1wfDkUHNsj.mp4

2/21
@yoheinakajima
GitHub - yoheinakajima/pippin-tasks

this was a test so i went with the classic to-do app, but with a twist: it's for agents

- API for agent to create, read, update, delete tasks
- user web UI for manually managing tasks
- test UI for testing endpoints
- API performance metrics

3/21
@yoheinakajima
oh man i just realized i should have had it build an agent

[Quoted tweet]
agents using agents to build tools for agents

4/21
@shannonNullCode
Now we're talking! did it work out of the box or did you have to coerce it? I noticed with the anthropic computer use demo they intentionally crippled it, making automating things like cursor/windsurf a non starter.

5/21
@yoheinakajima
no coercing, you’re seeing the full video (except for when I logged into Replit at the beginning)

6/21
@iruletheworldmo
it’s the fact you can run ten in the background and they’ll quietly work a way. human out the loop.

7/21
@yoheinakajima
Going to try a day where I try running 4 operators and see if it’s more efficient than doing the actual work myself (I have my doubts)

8/21
@EricFriedman
Pretty good test idea. Even to “takeover” an existing project (would that even work?) to get help

9/21
@yoheinakajima
probably works but I feel like for feeding it existing code, doing it via text *feels* more efficient than having it read from a screen

10/21
@HighRiskT
bullish

11/21
@DennisLund
@readwise save

12/21
@vishalsachdev
The best part :smile:

13/21
@NaturallyDragon
Quick and clean. As it should be.

14/21
@GerrardL_
Nice, was hoping someone would do this

15/21
@0xfanfaron
could Pippin spin off its own sentient creatures and learn from their activities?

16/21
@danwick
@readwise save thread

17/21
@jvivas_official
I keep thinking that a core step is to request the agent to create documentation as new information is added to the system. Love the idea of using replit. I am thinking of AI First interfaces that makes it super efficient for the operator to make the changes

18/21
@ter_pieter
@OpenAI operator told @Replit agent what to do. I am looking into the future

19/21
@ShepOfKnowledge
@threadreaderapp unroll

20/21
@threadreaderapp
@ShepOfKnowledge Hi! here is your unroll: Thread by @yoheinakajima on Thread Reader App See you soon.

21/21
@brooksy4503
thanks for sharing the video :smile:

1/11
@levie
Operator from OpenAI can basically straight up use cloud software to do anything. Here it’s using Box to build out an entire file and folder structure then adding research into each doc. Imagine waking up to work being done for you by AI Agents.

https://video.twimg.com/ext_tw_video/1882867625329676288/pu/vid/avc1/960x720/QNkjImvQJmIAwZ35.mp4

2/11
@GodofTunder4
/search?q=#Astro AI is about to go live this month. The team have have been putting everything together to perfect the platform. A platform where All tokenizations will be scalable, allowing holders to benefit from a growing market cap.

/search?q=#Astro @Astro_sol_Ai

CA: C1odMKziGNXd8g9w6qxVfRvwuSfFaUW66XfKb6rTpump

3/11
@bryanking__
this feels like the self-parking cars of the early 2010s. sort of pointless in the grand scheme of things. Every auto manufacturer tried to push the tech, but nobody ever really seemed to use it.

then full self driving comes along, actually changing the game for good.

4/11
@_ricardovm91_
this might be a first glimpse of an interesting future. Just maybe.

However, no company should be paying someone to do the kind of work operator does.

So few people are talking about meaningful work done by AIs, that sometimes all this feels very Metaversy.

5/11
@AravSrinivas
I hope it’s not the same old hype Adept tried to do 2 years earlier.

6/11
@fillegar
This is similar to an automated test case in theory

7/11
@eerac
Makes me wonder if Box could make this interaction more efficient by exposing some frontend JavaScript functions Operator could call instead of having to click on everything sequentially.

I realize this sounds similar to exposing an API, but this would still live in a browser.

8/11
@forward_future_
This is smooth. Very smoooth. So many incredible use cases to explore.

We’ve got a special edition newsletter all about Operator coming out tomorrow—don’t miss it!

9/11
@fourguyses
Use case: Namecheap Hosting department helping with DNS config or other configuration on the user end, only input is a ppfo prompt and boom! Just talk to the customer while AI agent does the work

10/11
@danielkempe
Feels so inefficient for a computer though…

11/11
@The_Colonel__
Does it work out of the box?

1/22
@ChanningAllen
I'm in a state of disbelief that no one's talking about Operator right now.

Either very few people have access to OpenAI's Pro tier, or the people who do aren't experimenting very much. Because this thing is insane.

Here's just one example:

Operator can follow "monkey see, monkey do" instructions! Holy shyt!

But you probably don't get why this matters. So let me explain real quick.

It's easy to instruct AI agents to do basic out-of-the-box tasks like booking flights or ordering meals.

But what about tasks that are highly specific?

For example, we draft our newsletters in Notion, then convert them from Markdown format to HTML in Retool, then paste that HTML in Kit to send the broadcasts to our subscribers.

Writing a prompt to get an AI agent to do all those steps would be a pain in the ass. You'd have to get extremely granular, almost like you were writing code.

But with Operator, you don't have to write anything. Instead, you can simply do a screen recording (e.g. via Loom) of yourself doing the steps in question while explaining what you're doing.

Then you can link the recording to Operator and give it the following simple prompt: "Watch this video and follow the instructions that the person gives you."

2/22
@ChanningAllen
FYI here's a video walkthrough of Operator. It's a little rough around the edges because I recorded it as soon as I got my hands on the feature:

[Quoted tweet]
Want to see OpenAI's new AI agent in action?

Check out this live demo.

I didn't have a plan and didn't know what to expect. The moment Operator rolled out to me I pressed "Start recording" and shot my "unboxing" video:
loom.com/share/679b0ca35cb14…

3/22
@zmbnski
This is really amazing.

I wonder how it will work in practice with very complicated tasks though.

When using AI for app dev, it tends to do errors with a complicated code base.

4/22
@ChanningAllen
it works fairly well for complex tasks after i record a walkthrough for it. the problem is that it's super slow

5/22
@DomWellsOnfolio
Presumably it can execute on the tasks so much faster too? Can you create a regular task with it? For example “hey our latest newsletter is ready to go” and it knows to follow the video and do the steps

6/22
@ChanningAllen
yeah once i give it instructions for the first time in a given chat, it's better at doing new iterations of the same task

7/22
@p_millerd
i dont have it yet!

8/22
@ChanningAllen
couple replies are telling me there are cheaper alternatives?

9/22
@0xkarmatic
That was my first thought too!

Insanely powerful stuff. Doing stuff like manual entry into CRMs or numbers into Excel can be all automated away without any user having the need to be technical.

[Quoted tweet]
That's going to happen soon. They are releasing the desktop control API in Jan. Eventually you will be able to record workflows and ask it to automate it.

10/22
@ChanningAllen
100%

11/22
@deifosv
I don't know where you are hanging out man because on my side of the pond operator is already all school.

Open Operator

And...
...

[Quoted tweet]
Open AI releases operator to some users on the $200 plan.

while @BrowserUse is free and Bytedance just released.

UI-TARS-desktop

The race is heating up.
Repo on the post below.

https://video.twimg.com/ext_tw_video/1882595837399715840/pu/vid/avc1/1284x720/K68ys1DnlM_b-J7-.mp4

12/22
@ChanningAllen
thanks I'll check these out

13/22
@ienjoykit
Virtual assistant jobs gone Right?

14/22
@ChanningAllen
not yet, my assistant does lots more that ai still can't do

15/22
@AccountantMurph
Waaaaaaaaaat.

This is amazing. How does it deal with logging in between these platforms?

16/22
@ChanningAllen
it's great. the operator browser saves your login info for each app after you log in the first time. and the login process is quick and easy. i covered it here in a video:

[Quoted tweet]
FYI here's a video walkthrough of Operator. It's a little rough around the edges because I recorded it as soon as I got my hands on the feature:

17/22
@RobHackneyEsq
Can it do video editing with a desktop app like Davinci resolve?

18/22
@ChanningAllen
web apps only

19/22
@ColleenMBrady
Another thought: Maybe some of the convos are taking place privately.

Saw this from @shl.

[Quoted tweet]
Starting a group chat for o1 pro users - DM proof of usage for an invite!

20/22
@ChanningAllen
ha, yep I'm in the group

21/22
@andre1sk
There are non AI QA tools that can do this with no need to record video or explain anything.

22/22
@ChanningAllen
care to name an example or two?

Y'all heard about ChatGPT yet? AI instantly generates question answers, entire essays etc.

Veteran

​

ChatGPT’s AI search engine is rolling out to everyone​

​

OpenAI has also made some improvements to ChatGPT search on mobile.​

Veteran

OpenAI announces o3 and o3-mini, its next simulated reasoning models​

Simulated reasoning on the rise​

Veteran

Call ChatGPT from any phone with OpenAI’s new 1-800 voice service​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Superstar

Veteran

ChatGPT now lets you schedule reminders and recurring tasks​

Veteran

OpenAI’s AI reasoning model ‘thinks’ in Chinese sometimes and no one really knows why​

Superstar

Veteran

Veteran

Similar threads

ChatGPT’s AI search engine is rolling out to everyone

OpenAI has also made some improvements to ChatGPT search on mobile.

OpenAI announces o3 and o3-mini, its next simulated reasoning models

Simulated reasoning on the rise

Call ChatGPT from any phone with OpenAI’s new 1-800 voice service

ChatGPT now lets you schedule reminders and recurring tasks

OpenAI’s AI reasoning model ‘thinks’ in Chinese sometimes and no one really knows why