Bard gets its biggest upgrade yet with Gemini {Google A.I / LLM}

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,057
Reputation
8,239
Daps
157,733



1/3
Sir Demis Hassabis just showed a super low latency demo of Google’s multimodal AI assistant on your phone AND augmented reality glasses. Clearly they’ve been cooking this for a while. The race is on!

2/3
Video questions coming to Google Search Multmodal agentic AI queries

3/3
I want it now. My meta glasses and it’s llama assistant feels… dated now


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/1
We’re sharing Project Astra: our new project focused on building a future AI assistant that can be truly helpful in everyday life.

Watch it in action, with two parts - each was captured in a single take, in real time. ↓ #GoogleIO


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNjpYiSWEAAb51-.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,057
Reputation
8,239
Daps
157,733

Google is redesigning its search engine — and it’s AI all the way down​


From ‘AI Overviews’ to automatic categorization, Google is bringing AI to practically every part of the search process.​

By David Pierce, editor-at-large and Vergecast co-host with over a decade of experience covering consumer tech. Previously, at Protocol, The Wall Street Journal, and Wired.

May 14, 2024, 1:56 PM EDT

65 Comments

A screenshot of Google search results for yoga studios.

Ask multipart questions and get a single answer –that’s AI search at work. Image: Google

A year ago, Google said that it believed AI was the future of search. That future is apparently here: Google is starting to roll out “AI Overviews,” previously known as the Search Generative Experience, or SGE, to users in the US and soon around the world. Pretty soon, billions of Google users will see an AI-generated summary at the top of many of their search results. And that’s only the beginning of how AI is changing search.

“What we see with generative AI is that Google can do more of the searching for you,” says Liz Reid, Google’s newly installed head of Search, who has been working on all parts of AI search for the last few years. “It can take a bunch of the hard work out of searching, so you can focus on the parts you want to do to get things done, or on the parts of exploring that you find exciting.”

Reid ticks off a list of features aimed at making that happen, all of which Google announced publicly on Tuesday at its I/O developer conference. There are the AI Overviews, of course, which are meant to give you a general sense of the answer to your query along with links to resources for more information. There’s also a new feature in Lens that lets you search by capturing a video. There’s a new planning tool designed to automatically generate a trip itinerary or a meal plan based on a single query. There’s a new AI-powered way to organize the results page itself so that when you want to see restaurants in a new city, it might offer you a bunch for date night and a bunch for a business meeting without you even having to ask.

This is nothing short of a full-stack AI-ification of search. Google is using its Gemini AI to figure out what you’re asking about, whether you’re typing, speaking, taking a picture, or shooting a video.



It’s using a new specialized Gemini model to summarize the web and show you an answer. It’s even using Gemini to design and populate the results page.

A screenshot of Google search results showing anniversary-worthy restaurants.

Google is using AI to both populate and organize your search results page. Image: Google

Not every search needs this much AI, though, Reid says, and not every search will get it. “If you just want to navigate to a URL, you search for Walmart and you want to get to walmart.com. It’s not really beneficial to add AI.” Where she figures Gemini can be most helpful is in more complex situations, the sort of things you’d either need to do a bunch of searches for or never even go to Google for in the first place.

One example Reid likes is local search. (You hear this one a lot in AI because it can be tricky to wade through tons of same-y listings and reviews to find something actually good.) With Gemini, she says, “we can do things like ‘Find the best yoga or pilates studio in Boston rated over four stars within a half-hour walk of Beacon Hill.’” Maybe, she continues, you also want details on which has the best offers for first-timers. “And so you can get information that’s combined, across the Knowledge Graph and across the web, and pull it together.”

That combination of the Knowledge Graph and AI — Google’s old search tool and its new one — is key for Reid and her team. Some things in search are a solved problem, like sports scores: “If you just actually want the score, the product works pretty well,” Reid says. Gemini’s job, in that case, is to make sure you get the score no matter how strangely you ask for it. “You can think about expanding the types of questions that would successfully trigger the scores,” she says, “but you still want that canonical sports data.”

A screenshot of a Google AI overview explaining the difference between thunder and lightning.

Not every search will get an AI overview, but a lot of them will. Image: Google

Getting good data is the whole ball game for Google and any other search engine. Part of the impetus for creating the new search-specific Gemini model, Reid tells me, was to focus it on getting things right. “There’s a balance between creativity and factuality” with any language model, she says. “We’re really going to skew it toward the factuality side.” AI Overviews may not be fun or charming, but as a result, they might get things right more often. (Though no model is perfect, and Google is surely going to face plenty of problems from hallucinated and just straight-up false overviews.)

As AI has come for search, products like Perplexity and Arc have come under scrutiny for combing and summarizing the web without directing users to the actual sources of information. Reid says it’s a tricky but important balance to strike and that one way Google is trying to do the right thing is by simply not triggering overviews on certain things. But she’s also convinced and says early data shows that this new way of searching will actually lead to more clicks to the open web. Sure, it may undercut low-value content, she says, but “if you think about [links] as digging deeper, websites that do a great job of providing perspective or color or experience or expertise — people still want that.” She notes that young users in particular are always looking for a human perspective on their query and says it’s still Google’s job to give that to them.

Over most of the last decade, Google has been trying to change the way you search. It started as a box where you type keywords; now, it wants to be an all-knowing being that you can query any way you want and get answers back in whatever way is most helpful to you. “You increase the richness, and let people ask the question they naturally would,” Reid says. For Google, that’s the trick to getting even more people to ask even more questions, which makes Google even more money. For users, it could mean a completely new way to interact with the internet: less typing, fewer tabs, and a whole lot more chatting with a search engine.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,057
Reputation
8,239
Daps
157,733

Project Astra is the future of AI at Google​


Siri and Alexa never managed to be useful assistants. But Google and others are convinced the next generation of bots is really going to work.​

By David Pierce, editor-at-large and Vergecast co-host with over a decade of experience covering consumer tech. Previously, at Protocol, The Wall Street Journal, and Wired.

May 14, 2024, 1:56 PM EDT

31 Comments

A still from a video showing a phone identifying a bowl of markers.

Astra is meant to be a real-time, multimodal AI assistant. Image: Google

“I’ve had this vision in my mind for quite a while,” says Demis Hassabis, the head of Google DeepMind and the leader of Google’s AI efforts. Hassabis has been thinking about and working on AI for decades, but four or five years ago, something really crystallized. One day soon, he realized, “We would have this universal assistant. It’s multimodal, it’s with you all the time.” Call it the Star Trek Communicator; call it the voice from Her; call it whatever you want. “It’s that helper,” Hassabis continues, “that’s just useful. You get used to it being there whenever you need it.”

At Google I/O, the company’s annual developer conference, Hassabis showed off a very early version of what he hopes will become that universal assistant. Google calls it Project Astra, and it’s a real-time, multimodal AI assistant that can see the world, knows what things are and where you left them, and can answer questions or help you do almost anything. In an incredibly impressive demo video that Hassabis swears is not faked or doctored in any way, an Astra user in Google’s London office asks the system to identify a part of a speaker, find their missing glasses, review code, and more. It all works practically in real time and in a very conversational way.



Astra is just one of many Gemini announcements at this year’s I/O. There’s a new model, called Gemini 1.5 Flash, designed to be faster for common tasks like summarization and captioning. Another new model, called Veo, can generate video from a text prompt. Gemini Nano, the model designed to be used locally on devices like your phone, is supposedly faster than ever as well. The context window for Gemini Pro, which refers to how much information the model can consider in a given query, is doubling to 2 million tokens, and Google says the model is better at following instructions than ever. Google’s making fast progress both on the models themselves and on getting them in front of users.

A still from a video showing a phone identifying a speaker tweeter with AI.

Astra is multimodal by design — you can talk, type, draw, photograph, and video to chat with it. Image: Google

Going forward, Hassabis says, the story of AI will be less about the models themselves and all about what they can do for you. And that story is all about agents: bots that don’t just talk with you but actually accomplish stuff on your behalf. “Our history in agents is longer than our generalized model work,” he says, pointing to the game-playing AlphaGo system from nearly a decade ago. Some of those agents, he imagines, will be ultra-simple tools for getting things done, while others will be more like collaborators and companions. “I think it may even be down to personal preference at some point,” he says, “and understanding your context.”

Astra, Hassabis says, is much closer than previous products to the way a true real-time AI assistant ought to work. When Gemini 1.5 Pro, the latest version of Google’s mainstream large language model, was ready, Hassabis says he knew the underlying tech was good enough for something like Astra to begin to work well. But the model is only part of the product. “We had components of this six months ago,” he says, “but one of the issues was just speed and latency. Without that, the usability isn’t quite there.” So, for six months, speeding up the system has been one of the team’s most important jobs. That meant improving the model but also optimizing the rest of the infrastructure to work well and at scale. Luckily, Hassabis says with a laugh, “That’s something Google does very well!”



A lot of Google’s AI announcements at I/O are about giving you more and easier ways to use Gemini. A new product called Gemini Live is a voice-only assistant that lets you have easy back-and-forth conversations with the model, interrupting it when it gets long-winded or calling back to earlier parts of the conversation. A new feature in Google Lens allows you to search the web by shooting and narrating a video. A lot of this is enabled by Gemini’s large context window, which means it can access a huge amount of information at a time, and Hassabis says it’s crucial to making it feel normal and natural to interact with your assistant.

An image showing the benefits of Google’s new Gemini 1.5 Flash model.

Gemini 1.5 Flash exists to make AI assistants faster above all else. Image: Google

Know who agrees with that assessment, by the way? OpenAI, which has been talking about AI agents for a while now. In fact, the company demoed a product strikingly similar to Gemini Live barely an hour after Hassabis and I chatted. The two companies are increasingly fighting for the same territory and seem to share a vision for how AI might change your life and how you might use it over time.

How exactly will those assistants work, and how will you use them? Nobody knows for sure, not even Hassabis. One thing Google is focused on right now is trip planning — it built a new tool for using Gemini to build an itinerary for your vacation that you can then edit in tandem with the assistant. There will eventually be many more features like that. Hassabis says he’s bullish on phones and glasses as key devices for these agents but also says “there is probably room for some exciting form factors.” Astra is still in an early prototype phase and only represents one way you might want to interact with a system like Gemini. The DeepMind team is still researching how best to bring multimodal models together and how to balance ultra-huge general models with smaller and more focused ones.

We’re still very much in the “speeds and feeds” era of AI, in which every incremental model matters and we obsess over parameter sizes. But pretty quickly, at least according to Hassabis, we’re going to start asking different questions about AI. Better questions. Questions about what these assistants can do, how they do it, and how they can make our lives better. Because the tech is a long way from perfect, but it’s getting better really fast.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,057
Reputation
8,239
Daps
157,733





1/5
One more day until #GoogleIO! We’re feeling . See you tomorrow for the latest news about AI, Search and more.

2/5
Ready for you to experience everything

3/5
We're just as excited as you are

4/5
Stay tuned for what's to come

5/5
It's a privilege to share new ideas with such a passionate community.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNeYhJ1XgAEdAt3.jpg







1/6
Google is preparing the release of "Memory", a feature allowing you to save facts about yourself, or stuff you just want Gemini to remember.

This feature *may* be released in the next few days.

2/6
Google has also been testing a bunch of features for early testers, including the ability to enter up to 1M tokens, directly in Gemini itself (while the model supported it, Gemini itself was limiting the number of tokens).

3/6
They've also been testing "Gems", which Google describes as the ability to "Customize Gemini for your needs". Sounds like Gems is what bots & Motoko were.

4/6
Finally, they've also been testing the ability to upload up to 10 documents (PDFs, Word, Google Docs) at the same time (from Google Drive).

5/6
There's a lot more to come at the I/O. Stay tuned!

6/6
I spotted "Memory" a while back already. Gemini was not even Gemini yet at the time.

Read more:


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNY-sQNX0AAAmlI.jpg





1/1
Google Messages wants to make sure you don't see texts from blocked contacts anywhere

Read - Google Messages wants to make sure you don't see texts from blocked contacts anywhere

In short - You will not see messages from blocked people in RCS group chats now.

#Google #Android


To post tweets in this format, more info here: Tips And Tricks For Posting The Coli Megathread.
GNMjMJPWIAA5uIu.jpg

GNNvuebWsAABp37.jpg

GNNvuhwWMAEJdMC.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,057
Reputation
8,239
Daps
157,733

Google will use AI to help you detect a scam call as it's happening​

News

By Dave LeClair

published 14 hours ago

A handy alert will let you know that something seems off

Scam warnings in phone calls using AI

(Image credit: Google)

Google announced a lot of new stuff during its I/O keynote. Seriously, just read through our live blog and prepare to be overwhelmed by the sheer volume of new things. One of the most exciting announcements seems to have flown under the radar, as Google announced a new feature that would use its powerful Gemini AI to detect automatically when a phone call sounds like a scam.

Basically, the AI will constantly listen to calls from possible spam numbers and alert you if anything said during the conversation sounds like it could be a scam. The company showed off the feature by having a fake bank representative call the presenter. As soon as the scammer asked the presenter to transfer money to a different account to keep it safe, the AI alerted them that it sounded like a scam, helping protect their money.

The AI would also have alerted the presenter if the alleged representative had asked for a PIN or password, as these numbers aren't typically asked for over the phone.



While most of the features touted for Gemini involve its generative capabilities, this one shows another way the power of a trained AI model can be used. Sure, more savvy users might be able to spot a scam call quickly without AI intervention, but for more vulnerable people, a feature like this could save them from a massive headache or a life-ruining scam.

Unfortunately, Google said the feature won't launch with Android 15. The company said it will share more details later this year. Importantly, it also noted that users would need to opt in and that it would all be handled on their device, so it should be secure. Whether users want AI to listen to their calls remains to be seen, but this does seem like an excellent way for AI to be integrated into our daily lives outside of creating funny images that look a little off.

Google also announced a massive update to Gemini 1.5 Pro and even more Android 15 AI features that sound like game-changers.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,057
Reputation
8,239
Daps
157,733

Google’s new LearnLM AI model focuses on education​


LearnLM is already integrated into Google products like Android and YouTube.​

By Emilia David, a reporter who covers AI. Prior to joining The Verge, she covered the intersection between technology, finance, and the economy.

May 14, 2024, 5:02 PM EDT

2 Comments

Illustration of Google’s wordmark, written in red and pink on a dark blue background.

Illustration: The Verge

Google says its new AI model, LearnLM, will help students with their homework.

LearnLM, a family of AI models based on Google’s other large language mode, Gemini, was built to be an expert in subjects, find and present examples in different ways like a photo or video, coach students while studying, and, in Google’s words, “inspire engagement.”

Google has already integrated LearnLM into its products, bundling it with other services like Google Search, Android, YouTube, and the Gemini chatbot. For example, customers can use Circle to Search on Android to highlight a math or physics word problem, and LearnLM will help solve the question. On YouTube, while watching a lecture video, viewers can ask questions about the video, and the model will respond with an explanation.

Some AI models, such as Microsoft’s Orca-Math AI model, can answer math questions pretty reliably — the ability to answer math questions is one of the benchmarks that measure LLM performance — and Google boasted that Gemini aced the math benchmark better than GPT-4.

Google says LearnLM was specifically “fine-tuned” to only respond and find answers based on educational research. In other words, LearnLM will not help someone plan a trip or find a restaurant.

Google says it’s working with educators in a new pilot program on Google Classroom so they can use LearnLM to simplify lesson planning. The company is also experimenting with Illuminate, a platform that will break down research papers into short audio clips with AI-generated voices. Ideally, this will help students understand complex information better.

Google also partnered with Columbia’s Teachers College, Arizona State University, NYU Tisch, and Khan Academy to provide feedback to LearnLM.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,057
Reputation
8,239
Daps
157,733








1/3
Google is changing the way we use Google Search through Gemini AI Agents.

They announced multi-step reasoning so Google can do the 'searching for you' through multiple steps.

It essentially breaks down your questions into parts and figures out which problems to solve first (and in what order).

2/3
Users can also search and ask questions from videos now

3/3
Full 'Search in the Gemini era' demo video


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


 

BBG

2014-2024
Joined
Aug 4, 2014
Messages
7,066
Reputation
2,372
Daps
28,356
WHAT DOES ALL OF THIS MEAN ?!:damn::damn:


I fukkin hate getting old, I remember being a kid laughing at my people nem cuz they didn't get computers and the Internet, now here I am not able to fukkin keep up. And unlike my people before me I actually have an interest in this shyt, I just don't understand how to utilize the shyt for personal gain but knowing I could if I knew how bothers the shyt outta me
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,057
Reputation
8,239
Daps
157,733
WHAT DOES ALL OF THIS MEAN ?!:damn::damn:


I fukkin hate getting old, I remember being a kid laughing at my people nem cuz they didn't get computers and the Internet, now here I am not able to fukkin keep up. And unlike my people before me I actually have an interest in this shyt, I just don't understand how to utilize the shyt for personal gain but knowing I could if I knew how bothers the shyt outta me

go to https://gemini.google.com/app , type "WHAT DOES ALL OF THIS MEAN ?!" and paste whatever text you don't understand. the AI will help you make sense of it.
 
  • Dap
Reactions: BBG

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,057
Reputation
8,239
Daps
157,733












1/11
I was so impressed with the Astra demo at Google I/O yesterday that I decided to build my own version using Gemini 1.5 Pro Flash.

It's so fast and really good.

It was even able to detect the gate! Content is streamed directly from my camera.

Voice via
@elevenlabsio

2/11
Also, note that this script is not optimized at all. I wrote it in a rush at the gate. In the demo, you can see the image only gets saved after the voice finishes speaking, things like that.

I am optimizing it more to make it even more magical and fast before I release it.

3/11
It's amazing. Incredible work, team!

4/11
Thank you, and great work!

5/11
Thanks, man! I love showing what these things can do. We live in such exciting times.

6/11
Releasing the code soon!

7/11
Gemini Flash is like 30 cents per million tokens, so basically nothing. ElevenLabs is pretty cheap too.

8/11
Way better for the vision part. Look how fast the text is generated, it's basically instant, and at that quality! Quality is what I care.

9/11
That Jarvis voice tho

10/11
Thank you!

11/11
You could definitely do it. The response is really, really fast. Check out how fast the text is being generated.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,057
Reputation
8,239
Daps
157,733


1/2
I hate to acknowledge this, but Gemini 1.5 Flash is better than llama-3-70b on long context tasks. It's way faster than my locally hosted 70b model (on 4*A6000) and hallucinates less. The free of charge plan is good enough for me to do prompt engineering for prototyping

2/2
The api response doesn't seem to contain token count and time to first token info. For a prompt with 20k input tokens and 300 output tokens, it took 6 seconds to finish.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNzxS9LXsAAzVtu.png

GNzyLFBXcAAtgZT.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,057
Reputation
8,239
Daps
157,733



1/3
Nobody is talking about this right now but Google dropped a CRAZY model interpretation graph tool, to enable you to better understand your models better.

Check it out, link

2/3
LINK: Model Explorer: Graph visualization for large model development

Github: 4. API Guide

3/3
Want more cool ML tips and tricks?

Follow me and hit that notification bell


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,057
Reputation
8,239
Daps
157,733

1/1
Updated Gemini 1.5 Pro report: MATH benchmark for specialized version now at 91.1%, SOTA 3 years ago was 6.9%, overall a lot of progress from February to May in all benchmarks


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196









1/7
A mathematics-specialized version of Gemini 1.5 Pro achieves some extremely impressive scores in the updated technical report.

2/7
From the report; 'Currently the math-specialized model is only being explored for Google internal research use cases; we hope to bring these stronger math capabilities into our deployed models soon.'

3/7
New benchmarks, including Flash.

4/7
Google is doing something very interesting by building specialized versions of its frontier models for math, healthcare, and education (so far). The benchmarks on all of these are pretty impressive, and it seems to be beyond what can be done with traditional fine tuning alone. twitter.com/jeffdean/statu…

5/7
1.5 Pro is now stronger than 1.0 Ultra.

6/7
4o only got to enjoy the crown for 4 days.


7/7
They put Av_Human at the top of the chart there visually to make people feel better. The average human is now in third place.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GNzIjYKWUAADUtx.jpg

GNzL61TWkAAZvvS.jpg

GNzWUnMW0AIPiLs.jpg

GNy5GCNXcAAhph2.jpg

GNy5bmHWkAAsM7j.jpg

GNy5w6oXgAAwF13.jpg

GN0WhkIWcAAcwLI.jpg

GNz3_cBbIAAFgzx.jpg

 
Last edited:
Top