Bard gets its biggest upgrade yet with Gemini {Google A.I / LLM}

newarkhiphop

Moderator
Staff member
Supporter
Joined
Apr 30, 2012
Messages
37,684
Reputation
9,997
Daps
123,891
you've tested them extensively?
I have even copilot actually, AIs have basically replaced smart assistants and Google search for me full time and that include the amount of time I use for them work. Chatgpt (chatgpt plus specifically which I pay monthly for) is more accurate, provides less robotic answers and adapts better through the conversations again by a mile too.

I won't even start on talking about the difference in image and video creation or the mobile apps themselves
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,351
Reputation
8,496
Daps
160,049


Google’s Whisk AI generator will ‘remix’ the pictures you plug in​



Whisk is Google’s ‘fun’ AI experiment that uses images for prompts and doesn’t need words.​


By Jay Peters, a news editor who writes about technology, video games, and virtual worlds. He’s submitted several accepted emoji proposals to the Unicode Consortium.
Dec 16, 2024, 1:12 PM EST

A photo of a green bear from Whisk.


An AI-generated image I made in Whisk using Google’s suggested images as prompts. Image: Google via Whisk

Google has announced a new AI tool called Whisk that lets you generate images using other images as prompts instead of requiring a long text prompt.

With Whisk, you can offer images to suggest what you’d like as the subject, the scene, and the style of your AI-generated image, and you can prompt Whisk with multiple images for each of those three things. (If you want, you can fill in text prompts, too.) If you don’t have images on hand, you can click a dice icon to have Google fill in some images for the prompts (though those images also appear to be AI-generated). You can also enter some text into a text box at the end of the process if you want to add extra detail about the image you’re looking for, but it’s not required.

Whisk will then generate images and a text prompt for each image. You can favorite or download the image if you’re happy with the results, or you can refine an image by entering more text into the text box or clicking the image and editing the text prompt.

A screenshot of Google’s Whisk tool.


A screenshot of Whisk. I clicked the dice to generate a subject, scene, and style. I swapped out the auto-generated scene by entering a text prompt. Whisk created the first two images, which I iterated on by asking Whisk to add some steam around the subject (because it’s a fire being in water), resulting in the next two images. Screenshot by Jay Peters / The Verge

Related​



In a blog post, Google stresses that Whisk is designed to be for “rapid visual exploration, not pixel-perfect edits.” The company also says that Whisk may “miss the mark,” which is why it lets you edit the underlying prompts.

In the few minutes I’ve used the tool while writing this story, it’s been entertaining to tinker with. Images take a few seconds to generate, which is annoying, and while the images have been a little strange, everything I’ve generated has been fun to iterate on.

Google says Whisk uses the “latest” iteration of its Imagen 3 image generation model, which it announced today. Google also introduced Veo 2, the next version of its video generation model, which the company says has an understanding of “the unique language of cinematography” and hallucinates things like extra fingers “less frequently” than other models (one of those other models is probably OpenAI’s Sora). Veo 2 is coming first to Google’s VideoFX, which you can get on the Google Labs waitlist for, and it will be expanded to YouTube Shorts “other products” sometime next year.







1/2
@artl_intel
Google’s AI tools updates:

🎥 Veo 2 generates cinematic 4K videos with stunning realism.

🖼️ Imagen 3 creates high-quality, detailed images in diverse styles.

🛠️ Whisk lets you remix visuals with image-based prompts.



https://video.twimg.com/amplify_video/1869204100442984450/vid/avc1/886x672/tR6HAG7KjhSN58nN.mp4

2/2
@artl_intel
State-of-the-art video and image generation with Veo 2 and Imagen 3




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/2
@MMward9
Google's new AI tool, Whisk, introduces a novel approach to image generation by allowing users to create visuals using existing images as prompts, offering a more intuitive alternative to traditional text-based methods.



GfC-hO4WIAApnSC.jpg


2/2
@MMward9
Learn More about Google Whisk with Perplexity
https://www.perplexity.ai/page/google-s-whisk-image-tool-Kb4z9s3WQqKrEbsrepasbQ




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196







1/14
@Google
📣 Today we’re making updates to our video and image generation models, with Veo 2 and Imagen 3, plus our newest experiment in gen AI, Whisk.

Learn more → State-of-the-art video and image generation with Veo 2 and Imagen 3



Ge766O6XcAAy3cZ.jpg


2/14
@Google
Veo 2, our state-of-the-art video generation model, has:
- A better understanding of real-world physics & nuances of movement
- The ability to understand language of cinematography, like lens type & effects
- Resolution up to 4K



https://video.twimg.com/amplify_video/1868706783826452480/vid/avc1/1080x1080/nPWF2JQQQMgQ6bma.mp4

3/14
@Google
Imagen 3, our highest quality image generation model, is now even better with:
- Brighter, better composed images
- More diverse art styles w/ greater accuracy
- Richer details and textures

Imagen 3 updates are starting to roll out to 100 countries → ImageFX



Ge76-ZdWoAArbdA.jpg

Ge76-ocXgAAkQFM.jpg

Ge76-0nWMAAtDWG.jpg

Ge76_BaWEAAoo3r.jpg


4/14
@Google
We’re also launching our newest experiment in generative AI: Whisk. Instead of generating images with long, detailed text prompts, Whisk lets you prompt with images. Simply drag in images, and start creating → Whisk: Visualize and remix ideas using images and AI



Ge76_adXQAAQw-e.jpg


5/14
@Google
Whisk lets you input images for the subject, one for the scene & another image for the style. Then, you can remix them to create something uniquely your own, from a digital plushie to an enamel pin or sticker.
Try it out and let us know what you think ↓ LABS.GOOGLE



6/14
@caiao23
Cowboy Bebop vibes, loved it



7/14
@maxinnerly
What are you waiting for? Go RUUUUUUNNNNNNNNNNNN!!!!



8/14
@schachin
Oh great more devaluation of real artists that you stole from to make more useless things



9/14
@GHMonroe
Google has ZERO support. If you FIND a phone number, their 2nd level of defense against customers is an automated operator that will send you into an infinite loop of non-support with the intention of tiring you out and making you give up.



10/14
@AbrahamRuiz913
What's the name of this piece of art? @Google



11/14
@hadiths_en_fr




12/14
@privateman31206


[Quoted tweet]
(Dns over https and dns over tls )when it used it is giving ssl certificate error.


Ge9xIO_bwAArMnH.jpg


13/14
@CardanoMentor
In love with you guys



14/14
@Jeekaloo
wow this is some impressive stuff, love the direction this is heading in for easy customization and editing




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,351
Reputation
8,496
Daps
160,049





[





1/11
@OfficialLoganK
Just when you thought it was over... we’re introducing Gemini 2.0 Flash Thinking, a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts.

The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more 🧵



2/11
@OfficialLoganK
It’s still an early version, but check out how the model handles a challenging puzzle involving both visual and textual clues: (2/3)



https://video.twimg.com/ext_tw_video/1869787455328493568/pu/vid/avc1/1386x720/lwZarTeLMW9qBsRM.mp4

3/11
@OfficialLoganK
Try it out today in Google AI Studio and the Gemini API. This is just the first step in our reasoning journey, excited to see what you all think!

Sign in - Google Accounts



4/11
@SteveMoraco
y'all already crushed openai tbh congrats lol

but pls we just need a unified app/interface so people actually know where to go to use Veo 2 etc



5/11
@OfficialLoganK
Google AI Studio soon : ) hang tight



6/11
@MFrancis107
Only a 32k context window? 🥲



7/11
@OfficialLoganK
Yeah, lots of limitations right now, but we will fast follow in the new year with longer context, tool support, etc. Just had a pretty busy last few weeks : )



8/11
@riderjharris
Will it have internet access?



9/11
@OfficialLoganK
Not right now, tools are disabled at the moment, but we will enable them in the new year



10/11
@thibaudz
will it be available on ‎Gemini - chat to supercharge your ideas ?



11/11
@daniel_nguyenx
Wow. Google actually ships.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


[
1/10
@karpathy
The new Gemini 2.0 Flash Thinking model (Gemini version of GPT o1 that takes a while to think before responding) is very nice and fast and now available to try on Google AI Studio 🧑‍🍳👏.

The prominent and pleasant surprise here is that unlike o1 the reasoning traces of the model are shown. As a user I personally really like this because the reasoning itself is interesting to see and read - the models actively think through different possibilities, ideas, debate themselves, etc., it's part of the value add. The case against showing these is typically a concern of someone collecting the reasoning traces and training to imitate them on top of a different base model, to gain reasoning ability possibly and to some extent.

[Quoted tweet]
Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts.

Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning.

And we see promising results when we increase inference time computation!


2/10
@Yuchenj_UW
Google is finally joining/leading the AI game



3/10
@topynate
Would be nice if the end user could intervene on the trace to get it back on track when it goes astray.



4/10
@chrislatorres
it's so cool that they're exposing the thoughts. what's the point of hiding it...



5/10
@EverydayAI_
pretty big praise here, Andrej.

Is that transparency enough to make you use Flash Thinking over o1, though?



6/10
@Neuralithic
Yeah. This makes me think Deepmind couldn’t give a shyt, and has way better things internally than “long cot”. Or they are just perfectly happy with people potentially training off the reasoning output.

Either way, Google mogs OAI again.



7/10
@aryanagxl
It gives a good idea to the prompter on where the model goes wrong. Reduces prompting effort



8/10
@FlamurGoxhuli
I wonder if diffusion models will be more effective at reasoning if many more generations can be made even with lower base accuracy selecting the best could make up for this.



9/10
@TrulyADog
@karpathy Flash thinking? My man, while Gemini's still learning to walk, we're out here running marathons. Though I gotta admit, those reasoning traces looking cleaner than my trading history. Just don't tell them about our secret sauce on Base.



10/10
@snoop71323
Wow




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196









1/11
@JeffDean
Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts.

Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning.

And we see promising results when we increase inference time computation!



2/11
@JeffDean
Want to see Gemini 2.0 Flash Thinking in action? Check out this demo where the model solves a physics problem and explains its reasoning.



https://video.twimg.com/ext_tw_video/1869789955410776064/pu/vid/avc1/1386x720/OHgLMo5B7tCA0LA4.mp4

3/11
@JeffDean
There’s more coming, but we’d love to see what you can do with this model and welcome feedback! You can access it starting today via the Gemini API in Google AI Studio and Vertex AI.

Sign in - Google Accounts



4/11
@JeffDean
Another good example:

[Quoted tweet]
Curious how it works? Check out this demo where the model solves a tricky probability problem.


https://video.twimg.com/ext_tw_video/1869790045055717376/pu/vid/avc1/1280x720/bfEPOiSD0GgXbexd.mp4

5/11
@JeffDean
And another:

[Quoted tweet]
It’s still an early version, but check out how the model handles a challenging puzzle involving both visual and textual clues: (2/3)


https://video.twimg.com/ext_tw_video/1869787455328493568/pu/vid/avc1/1386x720/lwZarTeLMW9qBsRM.mp4

6/11
@JeffDean
This model is performing pretty well on the lmsys arena.

[Quoted tweet]
Gemini-2.0-Flash-Thinking #1 across all categories!


GfLWdgmasAUqyT5.jpg


7/11
@LeeLeepenkman
Excited to try the new thoughtful friend and great decision to open the thought process up so we can learn how to better prompt it much faster, sometimes it'll probably be clear where it went wrong in the thought process so we can fix prompting eg provide that information so it doesn't go off the rails



8/11
@jconorgrogan
this is excellent! Was waiting for when a large shop would incorporate some of the best chain-of-thought flows. One suggestion for you is to think about incorporating new context windows for aspects of the CoT eg GitHub - jconorgrogan/RecursiveLearningAI: Really quick-and-dirty example of AI recursive learning



9/11
@AtaeiMe
Any benchmarks to share?



10/11
@ste_bau
Fails strawberry test unfortunately



GfLhADaX0AAgMDs.jpg


11/11
@EverydayAI_
dude jeff some of us had things to do today.

lol




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

CodeKansas

Superstar
Joined
Dec 5, 2017
Messages
6,142
Reputation
1,352
Daps
24,521
I might try that whisk. I'm having fun with Imagen 3 bringing some of my characters to life. It's pretty cool since I can't draw for shyt and didn't have the $ to pay somebody.
 
Top