bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265

1/1
GPT-4o for transcribing historical handwritten documents:

GPT-4o is truly remarkable on 18th handwriting. I gave it the following letter and asked it for a transcription. A couple of very minor errors…amazing!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNf4MIPWIAEJev6.jpg

GNgGTiKawAALmQs.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265


































1/34
To everyone disappointed by
@openai today... don't be. The livestream was for a general consumer audience.

The cool stuff is "hidden" on their site.

I am really excited. (Text to 3D??)


2/34
1/ Lightyears ahead of anyone at having text in AI generated images. Gorgeous

3/34
2/ so confident in their text image abilities they can create fonts with #GPT4-o

4/34
3/ casual 3d rendering....

5/34
4/ sound effect synthesis, not just speech

6/34
5/ effectively one shot stable diffusion finetuning, in context!?

7/34
@willdepue why didn’t yall talk about this in the livestream

8/34
Follow for AI news

9/34
What you think about the apple rumor

10/34
Not sure this is just from their blog

11/34
Follow for AI projects and sneak peeks on stuff!

12/34
So true

13/34
Finetuning difference? Or something else

14/34
Livestream was more productive focused product focused

15/34
Not availbale to anyone. The features shown in threat

16/34
I dont think all features available by api yet

17/34
Follow me for AI newss

18/34
thanks i try!

19/34
Not fully available yet

20/34
Thanks deepak

21/34
well i was also a bit disappointed bc they didn't talk too much about what exactly gpt4o could do, and mainly focused on the product changes

22/34
Yep

23/34
New tokenizer

24/34
Some features out now on chatgpt

25/34
Thanks!

26/34
What did you think

27/34
Thanks!!

28/34
!

29/34
Awesome

30/34
Nope

31/34
Craycray

32/34
Thats mainly python libraries like plotly tho

33/34
The gif is from their blogpost

34/34
Well for now does look like this is first supermultimodal model that does voixe2voice not voice2text2text2voice


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNemiO0aMAIS3qJ.jpg

GNemoh8aMAIr92S.jpg

GNems-7aMAMjql6.jpg

GNemypNaMAEiW4j.jpg

GNenC5RaMAYpFnD.jpg

GNenIshaMAAuIcY.jpg

GNfjC6NXYAEOd_z.jpg

GNezs3saMAEsu5k.jpg

GNeomUSWIAE8yjI.jpg

GNeo1tOWQAAl3rG.jpg

GNgBTuva0AA012m.jpg

GNeNBUgbIAAsV8F.jpg

GNe09V9XQAAsjSL.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265



1/3
For the last two years, my team and I have been publicly working on laying the foundations of early-fusion, multi-modal (MM) token-in token-out approaches, from the original CM3 paper to MM-scaling laws to CM3Leon to half a dozen or so more papers all around space, to a couple more coming out soon.

2/3
While it's true we're behind, we're much closer to OpenAI than when GPT-4 launched. We've built recipes that scale, architectures aligned with multi-modality, science on how to train these models, and, most importantly, the strongest team outside of OpenAI in this research space.

3/3
I firmly believe in ~2 months, there will be enough knowledge in the open-source for folks to start pre-training their own gpt4o-like models. We're working hard to make this happen.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNgyBPAWgAAAtDl.jpg

GNgyBSNWMAAB8x0.jpg

GNfC-J3bMAERqt3.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265

1/1
Introducing Gemini 1.5 Flash

It’s a lighter-weight model, optimized for tasks where low latency and cost matter most. Starting today, developers can use it with up to 1 million tokens in Google AI Studio and Vertex AI. #GoogleIO


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNjlSFxXMAA9wTt.jpg


1/1
If 1 million tokens is a lot, how about 2 million?

Today we’re expanding the context window for Gemini 1.5 Pro to 2 million tokens and making it available for developers in private preview. It’s the next step towards the ultimate goal of infinite context. #GoogleIO


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNjixMzWkAAre5C.jpg



1/1
The Gemini era is here, bringing the magic of AI to the tools you use every day. Learn more about all the announcements from #GoogleIO → Google I/O 2024: An I/O for a new generation


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/2
The new Learning coach Gem uses LearnLM to provide step-by-step study guidance, helping you build understanding instead of just giving you an answer. It will launch in Gemini in the coming months. #GoogleIO

2/2
This is Search in the Gemini era. #GoogleIO


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNjfpj6acAMLYtJ.jpg









1/7
Introducing Veo: our most capable generative video model.

It can create high-quality, 1080p clips that can go beyond 60 seconds.

From photorealism to surrealism and animation, it can tackle a range of cinematic styles. #GoogleIO

2/7
Prompt: “Many spotted jellyfish pulsating under water. Their bodies are transparent and glowing in deep ocean.”

3/7
Prompt: “Timelapse of a water lily opening, dark background.”

4/7
Prompt: “A lone cowboy rides his horse across an open plain at beautiful sunset, soft light, warm colors.”

5/7
Prompt: “A spaceship hurdles through the vastness of space, stars streaking past as it, high speed, sci-fi.”

6/7
Prompt: “A woman sitting alone in a dimly lit cafe, a half-finished novel open in front of her. Film noir aesthetic, mysterious atmosphere. Black and white.”

7/7
Prompt: “Extreme close-up of chicken and green pepper kebabs grilling on a barbeque with flames. Shallow focus and light smoke. vivid colours.”


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265





1/5
As someone who spent a lot of time making a browser and researching it, I can tell you that this integration of ChatGPT on to the computer belies a greater purpose—one where AI will eat the browser steadily. They will no longer have to be restricted by the Google's platform limits.

2/5
Once you can access the desktop, you can control the desktop...

3/5
100%!

4/5
Tighter integration to mic, cam, files, auth info, screen, gpu, cpu, disk, network, etc

5/5
No - my point is that the browser is what gets eaten slowly but steadily; not quite completely though.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNf7yR4asAAMTCn.png

GNkufXSW4AEX9qj.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265


1/2
Ilya and OpenAI are going to part ways. This is very sad to me; Ilya is easily one of the greatest minds of our generation, a guiding light of our field, and a dear friend. His brilliance and vision are well known; his warmth and compassion are less well known but no less important.

OpenAI would not be what it is without him. Although he has something personally meaningful he is going to go work on, I am forever grateful for what he did here and committed to finishing the mission we started together. I am happy that for so long I got to be close to such genuinely remarkable genius, and someone so focused on getting to the best future for humanity.

Jakub is going to be our new Chief Scientist. Jakub is also easily one of the greatest minds of our generation; I am thrilled he is taking the baton here. He has run many of our most important projects, and I am very confident he will lead us to make rapid and safe progress towards our mission of ensuring that AGI benefits everyone.

2/2
congratulations to
@oklo on going public, especially
@jakedewitte
and
@caorilne
, who i have worked with for a decade.

energy is one of the most important things to work on and i’m excited to help support that mission. onward!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNkzVQlXsAAgrc0.jpg




1/1
After almost a decade, I have made the decision to leave OpenAI. The company’s trajectory has been nothing short of miraculous, and I’m confident that OpenAI will build AGI that is both safe and beneficial under the leadership of
@sama ,
@gdb
,
@miramurati
and now, under the excellent research leadership of
@merettm
. It was an honor and a privilege to have worked together, and I will miss everyone dearly. So long, and thanks for everything. I am excited for what comes next — a project that is very personally meaningful to me about which I will share details in due time.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNkyJBsbUAARmc5.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265



1/3
Sir Demis Hassabis just showed a super low latency demo of Google’s multimodal AI assistant on your phone AND augmented reality glasses. Clearly they’ve been cooking this for a while. The race is on!

2/3
Video questions coming to Google Search Multmodal agentic AI queries

3/3
I want it now. My meta glasses and it’s llama assistant feels… dated now


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/1
We’re sharing Project Astra: our new project focused on building a future AI assistant that can be truly helpful in everyday life.

Watch it in action, with two parts - each was captured in a single take, in real time. ↓ #GoogleIO


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNjpYiSWEAAb51-.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265

Google is redesigning its search engine — and it’s AI all the way down​


From ‘AI Overviews’ to automatic categorization, Google is bringing AI to practically every part of the search process.​

By David Pierce, editor-at-large and Vergecast co-host with over a decade of experience covering consumer tech. Previously, at Protocol, The Wall Street Journal, and Wired.

May 14, 2024, 1:56 PM EDT

65 Comments

A screenshot of Google search results for yoga studios.

Ask multipart questions and get a single answer –that’s AI search at work. Image: Google

A year ago, Google said that it believed AI was the future of search. That future is apparently here: Google is starting to roll out “AI Overviews,” previously known as the Search Generative Experience, or SGE, to users in the US and soon around the world. Pretty soon, billions of Google users will see an AI-generated summary at the top of many of their search results. And that’s only the beginning of how AI is changing search.

“What we see with generative AI is that Google can do more of the searching for you,” says Liz Reid, Google’s newly installed head of Search, who has been working on all parts of AI search for the last few years. “It can take a bunch of the hard work out of searching, so you can focus on the parts you want to do to get things done, or on the parts of exploring that you find exciting.”

Reid ticks off a list of features aimed at making that happen, all of which Google announced publicly on Tuesday at its I/O developer conference. There are the AI Overviews, of course, which are meant to give you a general sense of the answer to your query along with links to resources for more information. There’s also a new feature in Lens that lets you search by capturing a video. There’s a new planning tool designed to automatically generate a trip itinerary or a meal plan based on a single query. There’s a new AI-powered way to organize the results page itself so that when you want to see restaurants in a new city, it might offer you a bunch for date night and a bunch for a business meeting without you even having to ask.

This is nothing short of a full-stack AI-ification of search. Google is using its Gemini AI to figure out what you’re asking about, whether you’re typing, speaking, taking a picture, or shooting a video.



It’s using a new specialized Gemini model to summarize the web and show you an answer. It’s even using Gemini to design and populate the results page.

A screenshot of Google search results showing anniversary-worthy restaurants.

Google is using AI to both populate and organize your search results page. Image: Google

Not every search needs this much AI, though, Reid says, and not every search will get it. “If you just want to navigate to a URL, you search for Walmart and you want to get to walmart.com. It’s not really beneficial to add AI.” Where she figures Gemini can be most helpful is in more complex situations, the sort of things you’d either need to do a bunch of searches for or never even go to Google for in the first place.

One example Reid likes is local search. (You hear this one a lot in AI because it can be tricky to wade through tons of same-y listings and reviews to find something actually good.) With Gemini, she says, “we can do things like ‘Find the best yoga or pilates studio in Boston rated over four stars within a half-hour walk of Beacon Hill.’” Maybe, she continues, you also want details on which has the best offers for first-timers. “And so you can get information that’s combined, across the Knowledge Graph and across the web, and pull it together.”

That combination of the Knowledge Graph and AI — Google’s old search tool and its new one — is key for Reid and her team. Some things in search are a solved problem, like sports scores: “If you just actually want the score, the product works pretty well,” Reid says. Gemini’s job, in that case, is to make sure you get the score no matter how strangely you ask for it. “You can think about expanding the types of questions that would successfully trigger the scores,” she says, “but you still want that canonical sports data.”

A screenshot of a Google AI overview explaining the difference between thunder and lightning.

Not every search will get an AI overview, but a lot of them will. Image: Google

Getting good data is the whole ball game for Google and any other search engine. Part of the impetus for creating the new search-specific Gemini model, Reid tells me, was to focus it on getting things right. “There’s a balance between creativity and factuality” with any language model, she says. “We’re really going to skew it toward the factuality side.” AI Overviews may not be fun or charming, but as a result, they might get things right more often. (Though no model is perfect, and Google is surely going to face plenty of problems from hallucinated and just straight-up false overviews.)

As AI has come for search, products like Perplexity and Arc have come under scrutiny for combing and summarizing the web without directing users to the actual sources of information. Reid says it’s a tricky but important balance to strike and that one way Google is trying to do the right thing is by simply not triggering overviews on certain things. But she’s also convinced and says early data shows that this new way of searching will actually lead to more clicks to the open web. Sure, it may undercut low-value content, she says, but “if you think about [links] as digging deeper, websites that do a great job of providing perspective or color or experience or expertise — people still want that.” She notes that young users in particular are always looking for a human perspective on their query and says it’s still Google’s job to give that to them.

Over most of the last decade, Google has been trying to change the way you search. It started as a box where you type keywords; now, it wants to be an all-knowing being that you can query any way you want and get answers back in whatever way is most helpful to you. “You increase the richness, and let people ask the question they naturally would,” Reid says. For Google, that’s the trick to getting even more people to ask even more questions, which makes Google even more money. For users, it could mean a completely new way to interact with the internet: less typing, fewer tabs, and a whole lot more chatting with a search engine.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265

Project Astra is the future of AI at Google​


Siri and Alexa never managed to be useful assistants. But Google and others are convinced the next generation of bots is really going to work.​

By David Pierce, editor-at-large and Vergecast co-host with over a decade of experience covering consumer tech. Previously, at Protocol, The Wall Street Journal, and Wired.

May 14, 2024, 1:56 PM EDT

31 Comments

A still from a video showing a phone identifying a bowl of markers.

Astra is meant to be a real-time, multimodal AI assistant. Image: Google

“I’ve had this vision in my mind for quite a while,” says Demis Hassabis, the head of Google DeepMind and the leader of Google’s AI efforts. Hassabis has been thinking about and working on AI for decades, but four or five years ago, something really crystallized. One day soon, he realized, “We would have this universal assistant. It’s multimodal, it’s with you all the time.” Call it the Star Trek Communicator; call it the voice from Her; call it whatever you want. “It’s that helper,” Hassabis continues, “that’s just useful. You get used to it being there whenever you need it.”

At Google I/O, the company’s annual developer conference, Hassabis showed off a very early version of what he hopes will become that universal assistant. Google calls it Project Astra, and it’s a real-time, multimodal AI assistant that can see the world, knows what things are and where you left them, and can answer questions or help you do almost anything. In an incredibly impressive demo video that Hassabis swears is not faked or doctored in any way, an Astra user in Google’s London office asks the system to identify a part of a speaker, find their missing glasses, review code, and more. It all works practically in real time and in a very conversational way.



Astra is just one of many Gemini announcements at this year’s I/O. There’s a new model, called Gemini 1.5 Flash, designed to be faster for common tasks like summarization and captioning. Another new model, called Veo, can generate video from a text prompt. Gemini Nano, the model designed to be used locally on devices like your phone, is supposedly faster than ever as well. The context window for Gemini Pro, which refers to how much information the model can consider in a given query, is doubling to 2 million tokens, and Google says the model is better at following instructions than ever. Google’s making fast progress both on the models themselves and on getting them in front of users.

A still from a video showing a phone identifying a speaker tweeter with AI.

Astra is multimodal by design — you can talk, type, draw, photograph, and video to chat with it. Image: Google

Going forward, Hassabis says, the story of AI will be less about the models themselves and all about what they can do for you. And that story is all about agents: bots that don’t just talk with you but actually accomplish stuff on your behalf. “Our history in agents is longer than our generalized model work,” he says, pointing to the game-playing AlphaGo system from nearly a decade ago. Some of those agents, he imagines, will be ultra-simple tools for getting things done, while others will be more like collaborators and companions. “I think it may even be down to personal preference at some point,” he says, “and understanding your context.”

Astra, Hassabis says, is much closer than previous products to the way a true real-time AI assistant ought to work. When Gemini 1.5 Pro, the latest version of Google’s mainstream large language model, was ready, Hassabis says he knew the underlying tech was good enough for something like Astra to begin to work well. But the model is only part of the product. “We had components of this six months ago,” he says, “but one of the issues was just speed and latency. Without that, the usability isn’t quite there.” So, for six months, speeding up the system has been one of the team’s most important jobs. That meant improving the model but also optimizing the rest of the infrastructure to work well and at scale. Luckily, Hassabis says with a laugh, “That’s something Google does very well!”



A lot of Google’s AI announcements at I/O are about giving you more and easier ways to use Gemini. A new product called Gemini Live is a voice-only assistant that lets you have easy back-and-forth conversations with the model, interrupting it when it gets long-winded or calling back to earlier parts of the conversation. A new feature in Google Lens allows you to search the web by shooting and narrating a video. A lot of this is enabled by Gemini’s large context window, which means it can access a huge amount of information at a time, and Hassabis says it’s crucial to making it feel normal and natural to interact with your assistant.

An image showing the benefits of Google’s new Gemini 1.5 Flash model.

Gemini 1.5 Flash exists to make AI assistants faster above all else. Image: Google

Know who agrees with that assessment, by the way? OpenAI, which has been talking about AI agents for a while now. In fact, the company demoed a product strikingly similar to Gemini Live barely an hour after Hassabis and I chatted. The two companies are increasingly fighting for the same territory and seem to share a vision for how AI might change your life and how you might use it over time.

How exactly will those assistants work, and how will you use them? Nobody knows for sure, not even Hassabis. One thing Google is focused on right now is trip planning — it built a new tool for using Gemini to build an itinerary for your vacation that you can then edit in tandem with the assistant. There will eventually be many more features like that. Hassabis says he’s bullish on phones and glasses as key devices for these agents but also says “there is probably room for some exciting form factors.” Astra is still in an early prototype phase and only represents one way you might want to interact with a system like Gemini. The DeepMind team is still researching how best to bring multimodal models together and how to balance ultra-huge general models with smaller and more focused ones.

We’re still very much in the “speeds and feeds” era of AI, in which every incremental model matters and we obsess over parameter sizes. But pretty quickly, at least according to Hassabis, we’re going to start asking different questions about AI. Better questions. Questions about what these assistants can do, how they do it, and how they can make our lives better. Because the tech is a long way from perfect, but it’s getting better really fast.
 

boogers

cats rule, dogs drool
Supporter
Joined
Mar 11, 2022
Messages
7,683
Reputation
2,999
Daps
22,621
Reppin
#catset
even i've come around on GPT. the openAI api is pretty fun. a friend and i collaborated on a discord/chatgpt3.5 bot in python and it was really fun to work on. been writing a ton of Python lately. gotta be my favorite language
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265





1/5
One more day until #GoogleIO! We’re feeling . See you tomorrow for the latest news about AI, Search and more.

2/5
Ready for you to experience everything

3/5
We're just as excited as you are

4/5
Stay tuned for what's to come

5/5
It's a privilege to share new ideas with such a passionate community.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNeYhJ1XgAEdAt3.jpg







1/6
Google is preparing the release of "Memory", a feature allowing you to save facts about yourself, or stuff you just want Gemini to remember.

This feature *may* be released in the next few days.

2/6
Google has also been testing a bunch of features for early testers, including the ability to enter up to 1M tokens, directly in Gemini itself (while the model supported it, Gemini itself was limiting the number of tokens).

3/6
They've also been testing "Gems", which Google describes as the ability to "Customize Gemini for your needs". Sounds like Gems is what bots & Motoko were.

4/6
Finally, they've also been testing the ability to upload up to 10 documents (PDFs, Word, Google Docs) at the same time (from Google Drive).

5/6
There's a lot more to come at the I/O. Stay tuned!

6/6
I spotted "Memory" a while back already. Gemini was not even Gemini yet at the time.

Read more:


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNY-sQNX0AAAmlI.jpg





1/1
Google Messages wants to make sure you don't see texts from blocked contacts anywhere

Read - Google Messages wants to make sure you don't see texts from blocked contacts anywhere

In short - You will not see messages from blocked people in RCS group chats now.

#Google #Android


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNMjMJPWIAA5uIu.jpg

GNNvuebWsAABp37.jpg

GNNvuhwWMAEJdMC.jpg
 
Top