Bard gets its biggest upgrade yet with Gemini {Google A.I / LLM}

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,031
Reputation
8,229
Daps
157,708

BARD

Bard gets its biggest upgrade yet with Gemini


Dec 06, 2023
2 min read

You can now try Gemini Pro in Bard for new ways to collaborate with AI. Gemini Ultra will come to Bard early next year in a new experience called Bard Advanced.

Sissie Hsiao

Vice President and General Manager, Google Assistant and Bard


Different icons — including the Bard logo, a lightning bolt and a thumbs-up sign — float against a black background.

Today we announced Gemini, our most capable model with sophisticated multimodal reasoning capabilities. Designed for flexibility, Gemini is optimized for three different sizes — Ultra, Pro and Nano — so it can run on everything from data centers to mobile devices.


Now, Gemini is coming to Bard in Bard’s biggest upgrade yet. Gemini is rolling out to Bard in two phases: Starting today, Bard will use a specifically tuned version of Gemini Pro in English for more advanced reasoning, planning, understanding and more. And early next year, we’ll introduce Bard Advanced, which gives you first access to our most advanced models and capabilities — starting with Gemini Ultra.


Try Gemini Pro in Bard

Before bringing it to the public, we ran Gemini Pro through a number of industry-standard benchmarks. In six out of eight benchmarks, Gemini Pro outperformed GPT-3.5, including in MMLU (Massive Multitask Language Understanding), one of the key leading standards for measuring large AI models, and GSM8K, which measures grade school math reasoning.

On top of that, we’ve specifically tuned Gemini Pro in Bard to be far more capable at things like understanding, summarizing, reasoning, coding and planning. And we’re seeing great results: In blind evaluations with our third-party raters, Bard is now the most preferred free chatbot compared to leading alternatives.

We also teamed up with YouTuber and educator Mark Rober to put Bard with Gemini Pro to the ultimate test: crafting the most accurate paper airplane. Watch how Bard helped take the creative process to new heights.

Mark Rober takes Bard with Gemini Pro for a test flight.

6:19

You can try out Bard with Gemini Pro today for text-based prompts, with support for other modalities coming soon. It will be available in English in more than 170 countries and territories to start, and come to more languages and places, like Europe, in the near future.


Look out for Gemini Ultra in an advanced version of Bard early next year

Gemini Ultra is our largest and most capable model, designed for highly complex tasks and built to quickly understand and act on different types of information — including text, images, audio, video and code.

One of the first ways you’ll be able to try Gemini Ultra is through Bard Advanced, a new, cutting-edge AI experience in Bard that gives you access to our best models and capabilities. We’re currently completing extensive safety checks and will launch a trusted tester program soon before opening Bard Advanced up to more people early next year.

This aligns with the bold and responsible approach we’ve taken since Bard launched. We’ve built safety into Bard based on our AI Principles, including adding contextual help, like Bard’s “Google it” button to more easily double-check its answers. And as we continue to fine-tune Bard, your feedback will help us improve.

With Gemini, we’re one step closer to our vision of making Bard the best AI collaborator in the world. We’re excited to keep bringing the latest advancements into Bard, and to see how you use it to create, learn and explore. Try Bard with Gemini Pro today.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,031
Reputation
8,229
Daps
157,708

State-of-the-art performance

We've been rigorously testing our Gemini models and evaluating their performance on a wide variety of tasks. From natural image, audio and video understanding to mathematical reasoning, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.

With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.

Our new benchmark approach to MMLU enables Gemini to use its reasoning capabilities to think more carefully before answering difficult questions, leading to significant improvements over just using its first impression.

A chart showing Gemini Ultra’s performance on common text benchmarks, compared to GPT-4 (API numbers calculated where reported numbers were missing).


Gemini surpasses state-of-the-art performance on a range of benchmarks including text and coding.​

Gemini Ultra also achieves a state-of-the-art score of 59.4% on the new MMMU benchmark, which consists of multimodal tasks spanning different domains requiring deliberate reasoning.

With the image benchmarks we tested, Gemini Ultra outperformed previous state-of-the-art models, without assistance from object character recognition (OCR) systems that extract text from images for further processing. These benchmarks highlight Gemini’s native multimodality and indicate early signs of Gemini's more complex reasoning abilities.

See more details in our Gemini technical report.

A chart showing Gemini Ultra’s performance on multimodal benchmarks compared to GPT-4V, with previous SOTA models listed in places where capabilities are not supported in GPT-4V.


Gemini surpasses state-of-the-art performance on a range of multimodal benchmarks.​


Next-generation capabilities

Until now, the standard approach to creating multimodal models involved training separate components for different modalities and then stitching them together to roughly mimic some of this functionality. These models can sometimes be good at performing certain tasks, like describing images, but struggle with more conceptual and complex reasoning.

We designed Gemini to be natively multimodal, pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness. This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models — and its capabilities are stateof the art in nearly every domain.

Learn more about Gemini’s capabilities and see how it works.



Sophisticated reasoning

Gemini 1.0’s sophisticated multimodal reasoning capabilities can help make sense of complex written and visual information. This makes it uniquely skilled at uncovering knowledge that can be difficult to discern amid vast amounts of data.

Its remarkable ability to extract insights from hundreds of thousands of documents through reading, filtering and understanding information will help deliver new breakthroughs at digital speeds in many fields from science to finance.


2:43

Gemini unlocks new scientific insights



Understanding text, images, audio and more

Gemini 1.0 was trained to recognize and understand text, images, audio and more at the same time, so it better understands nuanced information and can answer questions relating to complicated topics. This makes it especially good at explaining reasoning in complex subjects like math and physics.


1:59

Gemini explains reasoning in math and physics



Advanced coding

Our first version of Gemini can understand, explain and generate high-quality code in the world’s most popular programming languages, like Python, Java, C++, and Go. Its ability to work across languages and reason about complex information makes it one of the leading foundation models for coding in the world.

Gemini Ultra excels in several coding benchmarks, including HumanEval, an important industry-standard for evaluating performance on coding tasks, and Natural2Code, our internal held-out dataset, which uses author-generated sources instead of web-based information.

Gemini can also be used as the engine for more advanced coding systems. Two years ago we presented AlphaCode, the first AI code generation system to reach a competitive level of performance in programming competitions.

Using a specialized version of Gemini, we created a more advanced code generation system, AlphaCode 2, which excels at solving competitive programming problems that go beyond coding to involve complex math and theoretical computer science.


5:01

Gemini excels at coding and competitive programming

When evaluated on the same platform as the original AlphaCode, AlphaCode 2 shows massive improvements, solving nearly twice as many problems, and we estimate that it performs better than 85% of competition participants — up from nearly 50% for AlphaCode. When programmers collaborate with AlphaCode 2 by defining certain properties for the code samples to follow, it performs even better.

We’re excited for programmers to increasingly use highly capable AI models as collaborative tools that can help them reason about the problems, propose code designs and assist with implementation — so they can release apps and design better services, faster.

See more details in our AlphaCode 2 technical report.
 

IIVI

Superstar
Joined
Mar 11, 2022
Messages
11,395
Reputation
2,687
Daps
38,481
Reppin
Los Angeles
Insane:


Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities, the company said in a blog post Wednesday. It can supposedly understand nuance and reasoning in complex subjects.

 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,031
Reputation
8,229
Daps
157,708


Google’s Demo For ChatGPT Rival Criticized By Some Employees

Google’s newly unveiled AI technology doesn’t work quite as well as some people assumed, but the company says more updates are coming in the new year. [/SIZE]
1200x799.jpg

Sundar Pichai during the Google I/O Developers Conference in Mountain View, California, in May. Photographer: David Paul Morris/Bloomberg

By Shirin Ghaffary and Davey Alba
December 7, 2023 at 5:04 PM EST

Google stunned viewers this week with a video demo for its new ChatGPT rival. In one case, however, the technology doesn’t work quite as well as people assumed. But first…

Three things to know:

• EU representatives remain divided on AI rules after nearly 24 hours of debate

• A popular provider of AI-powered drive-thru systems relies heavily on humans to review orders

• Air Space Intelligence, an AI startup for aerospace, was valued at about $300 million in new financing

Google’s duck problem

When Google unveiled Gemini, its long-awaited answer to ChatGPT, perhaps the most jaw-dropping use case involved a duck. In a pre-recorded video demo shared on Wednesday, a disembodied hand is shown drawing the animal. The AI system appears to analyze it in real-time as it’s drawn and responds with a human-sounding voice in conversation with the user.


Google CEO Sundar Pichai promoted the video on X, writing, “Best way to understand Gemini’s underlying amazing capabilities is to see them in action, take a look ⬇️.” Others on X said the demo was “mind-blowing” and “unreal.”

But the technology doesn’t quite work as well as people assumed, as many were quick to point out. Right now, Gemini doesn't say its responses out loud, and you probably can’t expect its responses to be as polished as it appears to be in the video. Some Google employees are also calling out those discrepancies internally.

One Googler told Bloomberg that, in their view, the video paints an unrealistic picture of how easy it is to coax impressive results out of Gemini. Another staffer said they weren’t too surprised by the demo because they’re used to some level of marketing hype in how the company publicly positions its products. (Of course, all companies do.) "I think most employees who've played with any LLM technology know to take all of this with a grain of salt,” said the employee, referring to the acronym for large language models, which power AI chatbots. These people asked not to be identified for fear of professional repercussions.

“Our Hands on with Gemini demo video shows real outputs from Gemini. We created the demo by capturing footage in order to test Gemini’s capabilities on a wide range of challenges,” Google said in a statement. “Then we prompted Gemini using still image frames from the footage, and prompting via text.”

To its credit, Google did disclose that what’s shown in the video is not exactly how Gemini works in practice. “For the purposes of this demo, latency has been reduced and Gemini’s outputs have been shortened for brevity,” a description of the demo uploaded to YouTube reads. In other words, the video shows a shorter version of Gemini’s original responses and the AI system took longer to come up with them. Google told Bloomberg that individual words in Gemini’s responses were not changed and the voiceover captured excerpts from actual text prompting of Gemini.

Eli Collins, vice-president of product at Google DeepMind, told Bloomberg the duck-drawing demo was still a research-level capability and not in Google’s actual products, at least for now.

Gemini, released Wednesday, is the result of Google working throughout this year to catch up to OpenAI’s ChatGPT and regain its position as the undisputed leader in the AI industry. But the duck demo highlights the gap between the promise of Google’s AI technology and what users can experience right now.

Google said Gemini is its largest, most capable and flexible AI model to date, replacing PaLM 2, released in May. The company said Gemini exceeds leading AI models in 30 out of 32 benchmarks testing for reasoning, math, language and other metrics. It specifically beats GPT-4, one of OpenAI’s most recent AI models, in seven out of eight of those benchmarks, according to Google, although a few of those margins are slim. Gemini is also multimodal, which means it can understand video, images and code, setting it apart from GPT-4 which can only input images and text.

“It’s a new era for us,” Collins said in an interview after the event. “We’re breaking ground from a research perspective. This is V1. It’s just the beginning.”

Google is releasing Gemini in a tiered rollout. Gemini Ultra, the most capable version and the one that the company says outperforms GPT-4 in most tests, won’t be released until early next year. Other features, like those demoed in the duck video, remain in development.

Internally, some Googlers have been discussing whether showing the video without a prominent disclosure could be misleading to the public. In a corporate company-wide forum, one Googler shared a meme implying the duck video was deceptively edited. Another meme showed a cartoon of Homer Simpson proudly standing upright in his underwear, with the caption: “Gemini demo prompts.” It was contrasted with a less flattering picture of Homer in the same position from behind, with his loose skin bunched up. The caption: “the real prompts.”

Another Googler said in a comment, “I guess the video creators valued the ‘storytelling’ aspect more.”



ChatGPT vs. Gemini: Hands on

For now, users can play around with the medium tier version of Gemini in Google’s free chatbot, Bard. The company said this iteration outperformed the comparable version of OpenAI’s GPT model (GPT 3.5) in six out of eight industry benchmark tests.[/SIZE]

In our own limited testing with the new-and-revamped Bard, we found it mostly to be on par or better than ChatGPT 3.5, and in some ways better than the old Bard. However, it’s still unreliable on some tasks.

Out of seven SAT math and reasoning questions we prompted Bard with, it correctly answered four, incorrectly answered two and said it didn’t know the answer for one. It also answered one out of three reading comprehension questions correctly. When we tested GPT 3.5, it yielded similar results, but it was able to answer one question that stumped Bard.

Bard, like all large-language models, still hallucinates or provides incorrect information at times. When we asked Bard, for example, to name what AI model it runs on, it incorrectly told me PaLM2, the previous version it used.

On some planning-oriented tasks, Bard’s capabilities did seem like a clear improvement over the previous iteration of the product and compared to ChatGPT. When asked to plan a girls’ trip to Miami, for example, Bard gave me a useful day-by-day breakdown separated into morning, afternoon and evening itineraries. For the first day, it started with a “delicious Cuban breakfast” at a local restaurant, a boat tour of Biscayne Bay and a night out in South Beach. When I gave the same prompt to ChatGPT 3.5, the answers were longer and less specific.

To test out Bard’s creativity, we asked it to write a poem about the recent boardroom chaos at OpenAI. It came up with some brooding lines, including: “OpenAI, in turmoil's grip/Saw visions shattered, alliances split.” GPT 3.5’s poem didn’t quite capture the mood as well because it only has access to online information through early 2022“ Those paying for ChatGPT 4, however, can get real-time information, and its poetry was more on topic: “Sam Altman, a name, in headlines cast/A leader in question, a future vast.”

In our interview, DeepMind’s Collins said Bard is “one of the best free chatbots” in the world now with the Gemini upgrade. Based on our limited testing, he may be right.

Got a question about AI? Email me, Shirin Ghaffary, and I'll try to answer yours in a future edition of this newsletter.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,031
Reputation
8,229
Daps
157,708

Digging deeper into the MMLU Gemini Beat - Gemini doesn't really Beat GPT-4 On This Key Benchmark.

The Gemini MMLU beat is specifically at CoT@32. GPT-4 still beats Gemini for the standard 5-shot - 86.4% vs. 83.7%

5-shot is the standard way to evaluate this benchmark. You prepend 5 examples in the prompt.

Google has invented a different methodology around CoT@32 to claim it's better than GPT-4. The CoT@32 only beats when you add in for "uncertainty routing."

Need to dig into this more, but it seems like a method that optimizes a consensus cutoff to determine when to use majority vs. fallback to max likelihood greedy.

People don't use LLMs this way in the real world, so I suspect GPT-4 is still better than Gemini.

FWIW, MMLU is a very important benchmark in LLM performance.

This is exactly why it's important for the scientific community to want an API end-point or model weights vs. a blog post where benchmarks can be engineered to showcase your favorite LLM

TLDR: We can't really say too much about Gemini-Ultra until they actually release it
 

42 Monks

Veteran
Supporter
Joined
Feb 20, 2014
Messages
55,814
Reputation
9,288
Daps
207,364
Reppin
Carolina

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,031
Reputation
8,229
Daps
157,708


Really happy to see the interest around our “Hands-on with Gemini” video. In our developer blog yesterday, we broke down how Gemini was used to create it. https://developers.googleblog.com/2023/12/how-its-made-gemini-multimodal-prompting.html

We gave Gemini sequences of different modalities — image and text in this case — and had it respond by predicting what might come next. Devs can try similar things when access to Pro opens on 12/13 🚀. The knitting demo used Ultra⚡

All the user prompts and outputs in the video are real, shortened for brevity. The video illustrates what the multimodal user experiences built with Gemini could look like. We made it to inspire developers.

When you’re building an app, you can get similar results (there’s always some variability with LLMs) by prompting Gemini with an instruction that allows the user to "configure" the behavior of the model, like inputting “you are an expert in science …” before a user can engage in the same kind of back and forth dialogue. Here’s a clip of what this looks like in AI Studio with Gemini Pro. We’ve come a long way since Flamingo 🦩 & PALI, looking forward to seeing what people build with it!
 
Top