bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647


Midjourney has competition — I got access to Google Imagen 3 and it is impressive​

Features

By Ryan Morrison
last updated August 2, 2024

Available in ImageFX

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

ImageFX

(Image credit: ImageFX)

Jump to:


Imagen 3 is a text-to-image artificial intelligence model built by Google's advanced AI lab DeepMind. It was announced at Google I/O and is finally rolling out to users.

The model is currently only available through the Google AI Test Kitchen experiment ImageFX and only to a small group of “trusted users” but that pool is being expanded regularly.

With Imagen3 Google promises better detail, richer lighting and fewer artifacts than the previous generations. It also has better prompt understanding and text rendering.

ImageFX is available for any Google user in the U.S., Kenya, New Zealand and Australia. I’ve been given access to Imagen 3 and created a series of prompts to put it to the test.


Creating Imagen 3 prompts​


Google DeepMind promises higher-quality images across a range of styles including photorealism, oil paintings and graphic art. It can also understand natural language prompts and complex camera angles.

I fed all this into Claude and had it come up with a bullet list of promised features. I then refined each bullet into a prompt to cover as many areas as possible. The one I’m most excited for is its ability to accurately render text on an image — something few do very well.

1. The wildcard (I’m feeling lucky)​


ImageFX Imagen 3

(Image credit: ImageFX Imagen 3/AI)

The first prompt is one that ImageFX generated on its own. It automatically suggests an idea and you hit ‘tab’ to see the prompt in full and either adapt or use it to make an image.

This is also a good way to test one of the most powerful features of ImageFX — its chips. These turn keywords or phrases into menu items where you can quickly adapt elements of an image.

It offered me: “A macro photograph of a colorful tiny gnome riding a snail through a thick green forest, magical, fantasy.” Once generated you can edit any single element of an image with inpainting, this will generate four new versions but only change the area you selected.

I love the way it rendered the background and captured the concept of a macro photograph. It was also incredibly easy to adapt the color of the hat.

2. Dewdrop Web Macro​


ImageFX Imagen 3

(Image credit: ImageFX Imagen 3/AI)

This prompt aims to test Imagen 3's ability to render microscopic details and complex light interactions in a natural setting. This is similar to the first test but with an additional degree of complexity in the foreground with the dew dropr.

Prompt: "A macro photograph of a dewdrop on a spider's web, capturing the intricate details of the web and the refraction of light through the water droplet. The background should be a soft focus of a lush green forest."

As an arachnophobe, I was worried it would generate a spider but it followed the prompt well enough to just show a portion of the web.

3. Hummingbird Style Contrast​


ImageFX Imagen 3

(Image credit: ImageFX Imagen 3/AI)

Here the aim is to test the model's versatility in generating contrasting artistic styles within a single image. I initially used the prompt: "Create a split-screen image: on the left, a photorealistic close-up of a hummingbird feeding from a flower; on the right, the same scene reimagined as a vibrant, stylized oil painting in the style of Van Gogh."

This would have worked with Midjourney or similar but not Google. I had to revise this prompt as ImageFX won’t create work in the style of a named artist, even one whose work is long out of copyright.

So I used: “Create a split-screen image: on the left, a photorealistic close-up of a hummingbird feeding from a flower; on the right, the same scene reimagined as a vibrant, stylized painting with bold, swirling brushstrokes, intense colors, and a sense of movement and emotion in every element. The sky should have a turbulent, dream-like quality with exaggerated stars or swirls.”

This is a style I plan to experiment with more as it looks stunning. I'd have adapted the background on one side to better match the 'Van Gogh' style but otherwise it was what I asked for and did a good job at integrating the conflicting styles.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647

4. Steampunk Market Scene​


ImageFX Imagen 3

(Image credit: ImageFX Imagen 3/AI)

With this prompt, the aim is to challenge Imagen 3's ability to compose a complex, detailed scene with multiple elements and specific lighting conditions. I gave it some complex elements and descriptions to see how many it would produce.

Prompt: "A bustling steampunk-themed marketplace at dusk. In the foreground, a merchant is demonstrating a brass clockwork automaton to amazed onlookers. The background should feature airships docking at floating platforms, with warm lantern light illuminating the scene."

The first of the four images it generated exactly matched the prompt and the lighting is what you'd expect, suggesting Imagen has a goo understanding of the real world.

5. Textured Reading Nook​


ImageFX Imagen 3

(Image credit: ImageFX Imagen 3/AI)

Generating accurate or at least compelling textures can be a challenge for models, sometimes resulting in a plastic-style effect. Here we test the model's proficiency in accurately rendering a variety of textures and materials.

Prompt: "A cozy reading nook with a plush velvet armchair, a chunky knit blanket draped over it, and a weathered leather-bound book on the seat. Next to it, a rough-hewn wooden side table holds a delicate porcelain teacup with intricate floral patterns."

Not much to say about this beyond the fact it looks great. What I loved about this one were the options I was offered in 'chips'. It allowed me to easily swap cozy for spacious, airy and bright. I could even change the reading nook to a study, library and living room.

Obviously, you can just re-write the whole thing but these are ideas that work as subtle changes to fit the style.

6. Underwater Eclipse Diorama​


ImageFX Imagen 3

(Image credit: ImageFX Imagen 3/AI)

The idea behind this prompt was to test Imagen 3's ability to interpret and execute a long, detailed prompt with multiple complex elements and lighting effects.

Prompt: "An underwater scene of a vibrant coral reef during a solar eclipse. The foreground shows diverse marine life reacting to the dimming light, including bioluminescent creatures beginning to glow. In the background, the eclipsed sun is visible through the water's surface, creating eerie light rays that illuminate particles floating in the water."

This is probably the worst of the outputs. It looks fine but the solar eclipse feels out of place and the texture feels 'aquarium' rather than ocean.

7. Lunar Resort Poster​


ImageFX Imagen 3

(Image credit: ImageFX Imagen 3/AI)

The last few tests all target the model's improved text rendering capabilities. This one asks for a poster and requires Imagen 3 to generate an image within a stylized graphic design context.

Prompt: "Design a vintage-style travel poster for a fictional lunar resort. The poster should feature retro-futuristic art deco styling with the text 'Visit Luna Luxe: Your Gateway to the Stars' prominently displayed. Include imagery of a gleaming moon base with Earth visible in the starry sky above."

Text rendering was as good as I've seen, especially across multiple elements rather than just the headline. The style was OK but not perfect. It did fit the requirement but fell more art-deco than futuristic.

8. Eco-Tech Product Launch​


ImageFX Imagen 3

(Image credit: ImageFX Imagen 3/AI)

Midjourney is very good at creating product images in the real world. DALL-E is also doing that to a degree with its recent update. Here I'm asking Imagen 3 to create a modern, sleek advertisement with integrated product information.

Prompt: "Design a cutting-edge digital billboard for the launch of 'EcoCharge', a new eco-friendly wireless charging pad. The design should feature a minimalist, high-tech aesthetic with a forest green and silver color scheme. Include a 3D render of the slim, leaf-shaped device alongside the text 'EcoCharge: Power from Nature' and 'Charge your device, Save the planet - 50% more efficient'. Incorporate subtle leaf patterns and circuit board designs in the background."

It did exactly what we asked on the prompt in terms of the style, render, and design. It gt the title and subhead perfectly, and even rendered most of the rest of the text but that wasn't nearly as clear.

9. Retro Gaming Festival​


ImageFX Imagen 3

(Image credit: ImageFX Imagen 3/AI)

The final test is something I've actively used AI for — making a poster or flyer. Here we're testing its ability to handle a range of styles with multiple text elements.

Prompt: "Create a vibrant poster for 'Pixel Blast: Retro Gaming Festival'. The design should feature a collage of iconic 8-bit and 16-bit era video game characters and elements. The title 'PIXEL BLAST' should be in large, colorful pixel art font at the top. Include the text 'Retro Gaming Festival' in a chrome 80s style font below. At the bottom, add 'June 15-17 • City Arena • Tickets at pixelblast.com' in a simple, readable font. Incorporate scan lines and a CRT screen effect over the entire image."

I'd give it an 8 out of 10. It was mostly accurate with the occasional additional word but every single word was rendered correctly, there were sometimes just too many of them.


Final thoughts​


Google Imagen 3's text rendering and realism have matched Midjourney levels. It does refuse to generate more often than Midjourney but that is understandable from a Google product.

Imagen 3 is a huge step up from Imagen 2, which was already a good model. The biggest upgrade seems to be in overall image quality. Generated pictures are better looking, and have fewer artifacts and a lot more detail than I’ve seen from Imagen 2 or other company models.

It will be interesting to see how this works once it is rolled out to other platforms such as Gemini or built into third-party software as a developer API.

However it is finally deployed, DeepMind has done it again with an impressive real application of advanced generative AI and created it in a way that is user-friendly, adaptable and powerful enough for even the pickiest of users.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647


7 best OpenAI Sora alternatives for generating AI videos​

Round-up

By Ryan Morrison
last updated August 2, 2024

Here's what you can use now

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Pika Labs lip sync video

(Image credit: Pika Labs)

Jump to:


OpenAI’s Sora is one of the most impressive AI tools I’ve seen in years of covering the technology but only a handful of professional creatives have been given access so far and it doesn't look like it will be widely available anytime soon.

We’ve seen dozens of impressive videos from a documentary about an astronaut to a music video about watching the rain. We’ve even seen a short film about a man with a balloon head and a commercial for Toys 'R' Us.

Mira Muratti, OpenAI's Chief Technology Officer, originally hinted we'd get access to Sora this year, but that seems to have slipped. The most recent updated suggested OpenAI's developers were having issues making it easy to use.

The company says the focus is currently on both safety and usability, which likely includes ensuring guardrails don't allow it to replicate real people or allow it to be used in misinformation campaigns.


Alternatives to Sora already available​


While you’re waiting for Sora, there are several amazing AI video tools already available that can create a range of clips, styles and content to try. These Sora alternatives include Pika Labs and Runway and Luma Labs Dream Machine.

Sora's biggest selling points were more natural movement and longer initial clip generations, but with the arrival of Dream Machine and Runway Gen-3 some of those unique abilities have already been replicated.

There are now two categories of AI video models and I split them into first and second generation. The first are models like Runway Gen-2, Pika 1.0, Haiper, and any Stable Video Diffusion-based model including the one in Leonardo and NightCafe.

The main limitation of this early generation of AI video tools is in duration. Most can’t do more than 3-6 seconds of consistent motion; some struggle to go beyond 3 seconds. This, coupled with a smaller context window makes consistency tough.

However, the second-gen models, including Runway's Gen-3 and Luma Labs Dream Machine (as well as unavailable models like Kling and Sora) have much longer initial generations, improved motion understanding and better realism.


Runway​


Runway AI video

(Image credit: Runway)

Runway is one of the biggest players in this space. Before OpenAI unveiled Sora, Runway had some of the most realistic and impressive generative video content, and remains very impressive, with Gen-3 at a near Sora level of motion quality.

Runway was the first to launch a commercial synthetic video model and has been adding new features and improvements over the past year including very accurate lip-synching, motion brush to control the animation and voice over.

With the launch of Gen-3 you can now create videos starting at ten seconds long. It is currently only in Alpha so some of the more advanced features aren't available such as video-to-video and clip extension — but its coming soon. Image-to-video has already launched and it is very impressive.

In an increasingly crowded market Runway is still one of the best AI video platforms, and on top of generative content it has good collaboration tools and other image-based AI features such as upscaling and text-to-image.

Runway has a free plan with 125 credits. The standard plan is $15 per month.


Luma Labs Dream Machine​


Luma Labs Dream Machine

(Image credit: Luma Labs/Future AI)

One of the newest AI video platforms, Luma Labs released Dream Machine seemingly out of nowhere. It offers impressive levels of realism, prompt following and natural motion and has an initial 5-second video generation.

Unlike other platforms Dream Machine charges one credit per generation, making it easier to keep track of what you're spending or when you're near the limit.

It automatically improves on your prompt, ensuring better output and one of the most innovative features is its keyframes. You can give it two images — a start and finish point — and tell it how to fill the gap between the two. This is perfect if you want to do a fun transition or have a character walk across the screen.

Being able to extend clips is also particularly powerful in Dream Machine as this allows for character following and fresh scenes. It continues from the final frame of your last video and you can change the motion description for each extension.

Dream Machine has a free plan with 30 generations per month. The standard tier is $30 per month and includes 150 generations.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647


Kling​


Kling AI

(Image credit: Kling AI Video)

Kling AI is a Chinese generative video product from video platform company Kuaishou, its features include longer video generations, improved movement, better prompt following and multi-shot sequences.

Its interface is simple and easy to learn, offering image-to-video and text-to-video with automatic upscaling options and clips starting at 5 or 10 seconds.

I've used it to make multiple videos and in every case it captures human and animal motion in a way no other platform I've tried has achieved. It also seems to be able to depict emotion with both camera movement and within the characters it generates.

In the text-to-video mode you can start with a 1 second clip, use a very long description and set the orientation to widescreen, phone screen or square. It also allows you to define the camera movement and set a negative prompt.

Kling is free to use with 66 credits per day and between 5 and 10 credits per generation. There isn't a subscription plan as such but it offers memberships starting at $10 per month for 660 monthly credits.


Pika Labs​


Pika Labs lip sync video

(Image credit: Pika Labs)

Pika Labs is one of the two major players in the generative AI video space alongside Runway. Its Pika 1.0 model can create video from images, text or other video, as well as extend a video to up to 12 seconds — although the more you extend it, the worse the motion becomes.

Pika launched last year to a lot of fanfare, sharing a cartoon version of Elon Musk and an impressive inpainting ability that allows you to replace or animate a specific region of a clip.

Pika Labs offers negative prompting and fine control over the motion in the video. It also features sound effects that are either included from a text prompt or aligned to the video and lip sync.

The lip syncing from Pika Labs can be added to video content. So you can have it generate a video from, say, a Midjourney photo, then animate its lips and give it voice. Or, as I did in an experiment, you can animate action figures.

I'm told Pika 2.0 is in development and they recently introduced significant upgrades to the image-to-video model, creating better overall motion and control.

Pika Labs has a free plan with 300 credits. The standard plan is $10 per month.

Leonardo and Night Cafe​


Leonardo AI

(Image credit: Leonardo AI)

Stable Video Diffusion is an open model which means it can be commercially licensed and adapted by other companies. Two of the best examples of this are from Leonardo and Night Cafe, two AI image platforms that offer a range of models including Stable Diffusion itself.

Branded as Motion by Leonardo and Animate by NightCafe, the image platforms essentially do the same thing — take an image you’ve already made with the platform and make it move. You can set the degree of motion but there are minimal options for other controls.

NightCafe's base plan is $6 per month for 100 credits.

Leonardo has a free plan with 150 creations per day. The basic plan is $10 per month.


FinalFrame​


FinalFrame

(Image credit: FinalFrame AI generated)

This is a bit of a dark horse in the AI video space with some interesting features. A relatively small bootstrapped company, FinalFrame comfortably competes in terms of quality and features with the likes of Pika Labs and Runway, building out to a “total platform.”

The name stems from the fact FinalFrame builds the next clip based on the final frame of the previous video, improving consistency across longer video generations. You can generate or import a clip, then drop it on to the timeline to create a follow on, or to build a full production.

The startup recently added lip syncing and sound effects for certain users, including an audio track in the timeline view to add those sounds to your videos.

FinalFrame requires the purchase of credit packs which last a month. The basic plan is 20 credits for $2.99.


Haiper​


Haiper

(Image credit: Haiper AI video)

A relative newcomer with its own model, Haiper takes a slightly different approach from other AI video tools, building out an underlying model and training dataset that is better at following the prompt rather than offering fine-tuned control over the motion.

The default mode doesn't even allow you to change the motion level. It assumes the AI will understand the level of motion from the prompt, and for the most part, it works well. In a few tests, I found leaving the motion set to default worked better than any control I could set.

Haiper has now launched version 1.5 with improved realism and longer initial clips, starting at 8 seconds and extendable.

Haiper has a free plan with 10 creations per day and watermarks with no commercial use. If you want commercial use and to remove watermarks you need the $ 30-a-month Pro plan which includes unlimited video creations.

LTX Studio​


AI video from LTX Studio

(Image credit: LTX Studio/AI Video)

Unlike the others, this is a full generative content platform, able to create a multishot, multiscene video from a text prompt. LTX Studio has images, video, voice-over, music and sound effects; it can generate all of the above at the same time.

The layout is more like a storyboard than the usual prompt box and video player of the other platforms. When you generate video, LTX Studio lets you go in and adapt any single element, including changing the camera angle or pulling in an image to animate from an external application.

I don’t find LTX Studio handles motion as well as Runway or Stable Video, often generating unsightly blurring or warping, but those are issues the others have started to resolve and something LTX Studio owner Lightricks will tackle over time. It also doesn’t have lip sync, but that is likely to come at some point in the future.

LTX Studio has a free plan with 1 hour of generation per month and personal use. For $5 a month you get three hours but if you want commercial use it costs $175 per month and comes with 25 computing hours.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647


Runway just dropped image-to-video in Gen3 — I tried it and it changes everything​

Features

By Ryan Morrison
published July 30, 2024

Character consistency is now possible

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Runway Gen-3

(Image credit: Runway/Future AI)

Runway’s Gen-3 is one of the best artificial intelligence video models currently available and it just got a lot better with the launch of the highly anticipated image-to-video feature.

While Gen-3 has a surprisingly good image generation model, making its text-to-video one of the best available, it struggles with character consistency or hyperrealism. Both problems are solved by giving it a starting image instead of just using text.

Image-to-video using Gen-3 also allows for motion or text prompts to steer how the AI model should generate the 10-second initial video, starting with the image. This can be AI-generated or a photo taken with a camera — the AI can then make it move.

Gen-3 also works with Runway’s lip-sync feature, meaning you can give it an image of a character, animate that image and then add accurate speech to the animated clip.

Why is image-to-video significant?​



Until AI video tools get the same character consistency features found in tools like Leonardo, Midjourney, and Ideogram their use for longer storytelling is limited. This doesn’t just apply to people but also to environments and objects.

While you can in theory use text-to-video to create a short film, using descriptive language to get as close to consistency across frames as possible, there will also be discrepancies.

Starting with an image ensures, at least for the most part, that the generated video follows your aesthetic and keeps the same scenes and characters across multiple videos. It also means you can make use of different AI video tools and keep the same visual style.

In my own experiments, I’ve also found that when you start with an image the overall quality of both the image and the motion is better than if you just use text. The next step is for Runway to upgrade its video-to-video model to allow for motion transfer with style changes.


Putting Runway Gen-3 mage-to-video to the test​

Get started with Gen-3 Alpha Image to Video.Learn how with today’s Runway Academy. pic.twitter.com/Mbw0eqOjtoJuly 30, 2024


To put Runway’s Gen-3 image-to-video to the test I used Midjourney to create a character. In this case a middle-aged geek.

I then created a series of images of our geek doing different activities using the Midjourney consistent character feature. I then animated each image using Runway.

Some of the animations were made without a text prompt, others did use a prompt to steer the motion but it didn’t always make a massive difference. In the one video, I needed to work to properly steer the motion — my character playing basketball — adding a text prompt made it worse.

Runway Gen-3

(Image credit: Runway Gen-3/Future AI)

Overall, Gen-3 image-to-video worked incredibly well. Its understanding of motion was as close to realistic as I've seen and one video, where the character is giving a talk at a conference made me do a double take it was so close to real.

Gen-3 is still in Alpha and there will be continual improvements before its general release. We haven't even seen video-to-video yet and it is already generating near-real video.

I love how natural the camera motion feels, and the fact it seems to have solved some of the human movement issues, especially when you start with an image.

Other models put the characters in slow motion when they move, including previous versions of Runway. This solves some of that problem.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647

1/11
100% AI web dev pipeline just became possible today:

Generate a landing page with FLUX imaging model

->

Send it to GPT to get the HTML and CSS code to build it

h/t @dannypostmaa

2/11
been building an entire RoR app from scratch using this method

This is how far I’ve gotten so far

OnlyIdeas - Share Your Ideas

3/11
But how accurate is it? When I do it, it's close but never exact.

4/11
Better WAY:

FIGMA ➡️ Export to JPEG ➡️ Send to GPT

5/11
You didn’t show the end result?

6/11
Could also be generated with v0 by Vercel

7/11
If you actually pasted the html code into an editor you’ll likely get some random gobbledygook

8/11
Non-Technical Solopreneurs after seeing this

9/11
thanks, posted 4 days ago in the quoted OP

10/11
Imagine sending this into Claude to get the HTML and CSS? It would be OP.

11/11
Sick I flux with this


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GUT4aMIWYAAw2n2.jpg

GUlVCveXIAEG7MG.jpg

GUmuujhWUAAn6hw.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647

1/1
Researchers at MIT's CSAIL and The AI Institute have created a new algorithm called "Estimate, Extrapolate, and Situate" (EES). This algorithm helps robots adapt to different environments by enhancing their ability to learn autonomously.

The EES algorithm improves robot efficiency in settings like factories, homes, hospitals, and coffee shops by using a vision system to monitor surroundings and assist in task performance.

The EES algorithm assesses how well a robot is performing a task and decides if more practice is needed. It was tested on Boston Dynamics's Spot robot at The AI Institute, where it successfully completed tasks after a few hours of practice.

For example, the robot learned to place a ball and ring on a slanted table in about three hours and improved its toy-sweeping skills within two hours.

/search?q=#MIT /search?q=#algorithm /search?q=#ees /search?q=#robot /search?q=#TechNews /search?q=#AI


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GUlGXaIXMAAszHe.jpg








1/8
The phrase "practice makes perfect" is great advice for humans — and also a helpful maxim for robots 🧵

Instead of requiring a human expert to guide such improvement, MIT & The AI Institute’s "Estimate, Extrapolate, and Situate" (EES) algorithm enables these machines to practice on their own, potentially helping them improve at useful tasks in factories, households, and hospitals: Helping robots practice skills independently to adapt to unfamiliar environments

2/8
EES first works w/a vision system that locates & tracks the machine’s surroundings. Then, the algorithm estimates how reliably the robot executes an action & if it should practice more.

3/8
EES forecasts how well the robot could perform the overall task if it refines that skill & finally it practices. The vision system then checks if that skill was done correctly after each attempt.

4/8
This algorithm could come in handy in places like a hospital, factory, house, or coffee shop. For example, if you wanted a robot to clean up your living room, it would need help practicing skills like sweeping.

EES could help that robot improve w/o human intervention, using only a few practice trials.

5/8
EES's knack for efficient learning was evident when implemented on Boston Dynamics’ Spot quadruped during research trials at The AI Institute.

In one demo, the robot learned how to securely place a ball and ring on a slanted table in ~3 hours.

6/8
In another, the algorithm guided the machine to improve at sweeping toys into a bin w/i about 2 hours.

Both results appear to be an upgrade from previous methods, which would have likely taken >10 hours per task.

7/8
Featured authors in article: Nishanth Kumar (@nishanthkumar23), Tom Silver (@tomssilver), Tomás Lozano-Pérez, and Leslie Pack Kaelbling
Paper: Practice Makes Perfect: Planning to Learn Skill Parameter Policies
MIT research group: @MITLIS_Lab

8/8
Is there transfer learning? Each robot’s optimisation for the same task performance may be different,how to they share experiences and optimise learning?


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GUjNWxOXQAAdisD.jpg




Helping robots practice skills independently to adapt to unfamiliar environments​


A new algorithm helps robots practice skills like sweeping and placing objects, potentially helping them improve at important tasks in houses, hospitals, and factories.

Alex Shipps | MIT CSAIL
Publication Date:
August 8, 2024
Press Inquiries

The phrase “practice makes perfect” is usually reserved for humans, but it’s also a great maxim for robots newly deployed in unfamiliar environments.

Picture a robot arriving in a warehouse. It comes packaged with the skills it was trained on, like placing an object, and now it needs to pick items from a shelf it’s not familiar with. At first, the machine struggles with this, since it needs to get acquainted with its new surroundings. To improve, the robot will need to understand which skills within an overall task it needs improvement on, then specialize (or parameterize) that action.

A human onsite could program the robot to optimize its performance, but researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and The AI Institute have developed a more effective alternative. Presented at the Robotics: Science and Systems Conference last month, their “Estimate, Extrapolate, and Situate” (EES) algorithm enables these machines to practice on their own, potentially helping them improve at useful tasks in factories, households, and hospitals.

Sizing up the situation

To help robots get better at activities like sweeping floors, EES works with a vision system that locates and tracks the machine’s surroundings. Then, the algorithm estimates how reliably the robot executes an action (like sweeping) and whether it would be worthwhile to practice more. EES forecasts how well the robot could perform the overall task if it refines that particular skill, and finally, it practices. The vision system subsequently checks whether that skill was done correctly after each attempt.

EES could come in handy in places like a hospital, factory, house, or coffee shop. For example, if you wanted a robot to clean up your living room, it would need help practicing skills like sweeping. According to Nishanth Kumar SM ’24 and his colleagues, though, EES could help that robot improve without human intervention, using only a few practice trials.

“Going into this project, we wondered if this specialization would be possible in a reasonable amount of samples on a real robot,” says Kumar, co-lead author of a paper describing the work, PhD student in electrical engineering and computer science, and a CSAIL affiliate. “Now, we have an algorithm that enables robots to get meaningfully better at specific skills in a reasonable amount of time with tens or hundreds of data points, an upgrade from the thousands or millions of samples that a standard reinforcement learning algorithm requires.”

See Spot sweep

EES’s knack for efficient learning was evident when implemented on Boston Dynamics’ Spot quadruped during research trials at The AI Institute. The robot, which has an arm attached to its back, completed manipulation tasks after practicing for a few hours. In one demonstration, the robot learned how to securely place a ball and ring on a slanted table in roughly three hours. In another, the algorithm guided the machine to improve at sweeping toys into a bin within about two hours. Both results appear to be an upgrade from previous frameworks, which would have likely taken more than 10 hours per task.

“We aimed to have the robot collect its own experience so it can better choose which strategies will work well in its deployment,” says co-lead author Tom Silver SM ’20, PhD ’24, an electrical engineering and computer science (EECS) alumnus and CSAIL affiliate who is now an assistant professor at Princeton University. “By focusing on what the robot knows, we sought to answer a key question: In the library of skills that the robot has, which is the one that would be most useful to practice right now?”

EES could eventually help streamline autonomous practice for robots in new deployment environments, but for now, it comes with a few limitations. For starters, they used tables that were low to the ground, which made it easier for the robot to see its objects. Kumar and Silver also 3D printed an attachable handle that made the brush easier for Spot to grab. The robot didn’t detect some items and identified objects in the wrong places, so the researchers counted those errors as failures.

Giving robots homework

The researchers note that the practice speeds from the physical experiments could be accelerated further with the help of a simulator. Instead of physically working at each skill autonomously, the robot could eventually combine real and virtual practice. They hope to make their system faster with less latency, engineering EES to overcome the imaging delays the researchers experienced. In the future, they may investigate an algorithm that reasons over sequences of practice attempts instead of planning which skills to refine.

“Enabling robots to learn on their own is both incredibly useful and extremely challenging,” says Danfei Xu, an assistant professor in the School of Interactive Computing at Georgia Tech and a research scientist at NVIDIA AI, who was not involved with this work. “In the future, home robots will be sold to all sorts of households and expected to perform a wide range of tasks. We can't possibly program everything they need to know beforehand, so it’s essential that they can learn on the job. However, letting robots loose to explore and learn without guidance can be very slow and might lead to unintended consequences. The research by Silver and his colleagues introduces an algorithm that allows robots to practice their skills autonomously in a structured way. This is a big step towards creating home robots that can continuously evolve and improve on their own.”

Silver and Kumar’s co-authors are The AI Institute researchers Stephen Proulx and Jennifer Barry, plus four CSAIL members: Northeastern University PhD student and visiting researcher Linfeng Zhao, MIT EECS PhD student Willie McClinton, and MIT EECS professors Leslie Pack Kaelbling and Tomás Lozano-Pérez. Their work was supported, in part, by The AI Institute, the U.S. National Science Foundation, the U.S. Air Force Office of Scientific Research, the U.S. Office of Naval Research, the U.S. Army Research Office, and MIT Quest for Intelligence, with high-performance computing resources from the MIT SuperCloud and Lincoln Laboratory Supercomputing Center.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647



1/11
These are all AI, I just don't know anymore.

2/11
🫠

3/11
we had such fun

4/11
Flux realism?

5/11
Yep

6/11
One day into Flux I'm starting to recognize it though

The faces are all averagely beautiful

I tried to generate ugly below average and it's impossible

Maybe will be possible with realism LoRa

7/11
Flawless skin all around. Not one razor bump, zit, blotchiness, uneven pores. I imagine they look more fake at a higher resolution. Skin is still an easy way to tell.

8/11
Ringless ring

9/11
this is the death of dating apps and I'm all for it

10/11
It's over

11/11
I'd love a movie where someone discovers the ai is producing pics of them when they were younger all to realize we are all in a simulation.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GUplnrUXoAAsScS.jpg

GUplqizX0AAQrvw.jpg

GUplrxwW4AAVoko.jpg

GUplv8aW0AABHH7.jpg

GUppfJrWcAABjnv.jpg

GUppfJsWoAAjAUM.jpg

GUppfJnWIAA3ZGO.jpg

GUppfJsWgAEQl4w.jpg

GUpsnrKWQAAyUU7.jpg

GUqr9IFbwAAzTkD.jpg




 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647


Video Deep Fake - AI NEWS - Is this TOO much?​



Olivio Sarikas

230K subscribers

20,535 views Aug 10, 2024
Video AI has dramatically improved and can do deep fakes in real time now. In this Video we talk about Deep Live Cam, CG Avatar, ReSyncer and the new Runway Gen-3 Last Frame Method
#### Links from my Video
#### Deep Liva Cam AI Deep Fake: https://github.com/hacksider/Deep-Liv...
CG Avatar AI: https://twitter.com/unikoukokun/status/1821...
Flux.1 Animation: https://twitter.com/HalimAlrasihi/status/18...
ReSyncer Lip animation: https://guanjz20.github.io/projects/R...
Runway Animation: https://twitter.com/AllarHaltsonen/status/1...
Runway Explosion: https://twitter.com/notiansans/status/18204...


00...
Intro 00...
Best AI Images Discord Challenge 03...
Deep Live Cam 04...
CG Avatar 05...
Flux 1 Video AI 06...
ReSyncer 7...
Gen-3 Last Frame
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647

1/1
After almost indistinguishable photorealistic FLUX images, checkout these lip-sync demos from ReSyncer.

We've truly arrived in the post-truth era.

ReSyncer


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196





1/1
After almost indistinguishable photorealistic FLUX images, checkout these lip-sync demos from ReSyncer.

We've truly arrived in the post-truth era.

ReSyncer


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647

1/1
🎨✨Just read about VideoDoodles, a new system that lets artists easily create video doodles—hand-drawn animations integrated with video! It uses 3D scene-aware canvases and custom tracking to make animations look natural. Game-changer for creatives! /search?q=#SIGGRAPH2023 /search?q=#AI /search?q=#GROK 2


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/7
VideoDoodles: a great combination of modern CV techniques and clever HCI by @emxtyu & al @adobe (+ is now open source)

project: VideoDoodles: Hand-Drawn Animations on Videos with Scene-Aware Canvases
paper: https://www-sop.inria.fr/reves/Basilic/2023/YBNWKB23/VideoDoodles.pdf
code: GitHub - adobe-research/VideoDoodles

2/7
interesting & fun paper/repo by adobe💪

adding it to http://microlaunch.net august spotlights

3/7
boop @kelin_online

4/7
Oh this one is from last year, fantastic that is now oss!

5/7
yes we have VIDEO DOODLES!

https://invidious.poast.org/watch?v=hsykpStD2yw

6/7
Haha très cool !

7/7



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GUpG626XcAASWrH.jpg


VideoDoodles: Hand-Drawn Animations on Videos with Scene-Aware Canvases​

Emilie Yu Kevin Blackburn-Matzen Cuong Nguyen Oliver Wang Rubaiat Habib Kazi Adrien Bousseau
ACM Transactions on Graphics (SIGGRAPH) - 2023
VideoDoodles: Hand-Drawn Animations on Videos with Scene-Aware Canvases/

Video doodles combine hand-drawn animations with video footage. Our interactive system eases the creation of this mixed media art by letting users place planar canvases in the scene which are then tracked in 3D. In this example, the inserted rainbow bridge exhibits correct perspective and occlusions, and the character’s face and arms follow the tram as it runs towards the camera.
Paper Code Supplemental webpage Video

Abstract​

We present an interactive system to ease the creation of so-called video doodles – videos on which artists insert hand-drawn animations for entertainment or educational purposes. Video doodles are challenging to create because to be convincing, the inserted drawings must appear as if they were part of the captured scene. In particular, the drawings should undergo tracking, perspective deformations and occlusions as they move with respect to the camera and to other objects in the scene – visual effects that are difficult to reproduce with existing 2D video editing software. Our system supports these effects by relying on planar canvases that users position in a 3D scene reconstructed from the video. Furthermore, we present a custom tracking algorithm that allows users to anchor canvases to static or dynamic objects in the scene, such that the canvases move and rotate to follow the position and direction of these objects. When testing our system, novices could create a variety of short animated clips in a dozen of minutes, while professionals praised its speed and ease of use compared to existing tools.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647



1/11
Deep-Live-Cam is trending #1 on github. It enables anyone to convert a single image into a LIVE stream deepfake, instant and immediately.

2/11
The github link states it was created to "help artists with tasks such as animating a custom character or using the character as a model for clothing etc"

GitHub - hacksider/Deep-Live-Cam: real time face swap and one-click video deepfake with only a single image (uncensored)

3/11
It’s so over

4/11
When it's a simple cellphone app, that's when we're really fukked.

5/11
whats the purpose of this shyt

6/11
The Laughing Man already knows the answer to that

7/11
This needs to be banned

8/11
@kyledunnigan

9/11
That's not Billy Butcher that's Chilli Chopper

10/11
So everything you’ll see online, everything you’ll read online is going to be fake. That’s a good way to know how to avoid things from now on.

Focus on your life instead of online

11/11
This is going to cause chaos.

Members of The Real World know what this means.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196





1/11
#1 trending github repo right now looks INSANE

Single image to live stream deep fake.

Look at that quality!!

It's called Deep-Live-Cam (link in replies)

2/11
GitHub - hacksider/Deep-Live-Cam: real time face swap and one-click video deepfake with only a single image (uncensored)

3/11
Some experiments - it works almost flawlessly and it's totally real-time. Took me 5 minutes to install.

4/11
Incredible

5/11
Gonna do a tutorial?

6/11
Oh yeah

7/11
I can finally not attend meetings

8/11
Chat is this real?

9/11
I hope photorealistic ai-generated video becomes common and easily accessible. If every man can claim video of them to be fake an unprecedented age of privacy would follow. Totalitarian surveillance states will become impossible. Real human witnesses will become essential again.

10/11
Everyone will live in their own little 3D Dopaverse.

Bye bye shared reality

11/11
Election season about to get interesting. lol


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647

1/10
We just shipped 🧨diffusers Dreambooth LoRA training scripts for FLUX.1 [dev] 🚀
✍️ includes support for text encoder training of CLIP
💾 memory reqs are quite high, make sure to check the README

Can't wait to see some awesome flux finetunes 🤌🏻🔥

2/10
read more & train ▶️ diffusers/examples/dreambooth/README_flux.md at main · huggingface/diffusers

let's find best practices for flux dreambooth loras, share your thoughts & insights😄

3/10
@lucataco93

4/10
Paging @araminta_k

5/10
Is there a way to train using cloud compute?

6/10
how high is quite high? 🫣

7/10
What are your thoughts on DoRA?

8/10
love it , thanks y'all

9/10
Yaaay, fun! Link to training tips guide doesn't seem to be working. Also, I'm praying for some larger VRAM consumer GPUs soon.

10/10
link to "bghira's guide" for memory constraints is broken


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GUhVKr1aIAAuQNM.jpg






1/9
Now that you can train FLUX.1-Dev Dreambooth LoRAs
Try running them with this Flux LoRA explorer:
lucataco/flux-dev-lora – Run with an API on Replicate

2/9
is replicate planning to roll out training flux directly on there (similar to what you guys have done for sdxl and 1.5)?

3/9
Ill see what I can do!

4/9
Tried running the trainer on A100 but out of memory 😭

5/9
I’ll take a look at the trainer tmr!

6/9
Can it train styles ?

7/9
Yes, check out this alvdansen/frosting_lane_flux style:
alvdansen/frosting_lane_flux · Hugging Face

8/9
Amazing 🤩 Does it work with your simpletuner-flux? Tried to put the http://output.zip URL into it but errors

9/9
Not yet! This LoRA Explorer works out of the box with Diffusers trained LoRAs (for now). Im still working on the SimpleTuner model


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GUhVKr1aIAAuQNM.jpg

GUtsOdkWUAA_-8F.jpg

GUtsOdeXAAA9mBI.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,796
Reputation
7,926
Daps
148,647







1/11
Alright Ya'll

I know it's a Saturday, but I decided to release my first Flux Dev Lora.

A retrain of my "Frosting Lane" model and I am sure the styles will just keep improving.

Have fun! Link Below - Thanks again to @ostrisai for the trainer and @bfl_ml for the awesome model

2/11
alvdansen/frosting_lane_flux · Hugging Face

3/11


4/11
Also you can literally prompt right on the model page, yay!

5/11


6/11
Hello, have we figured out Lora trainings now without any shortcomings or compromises in quality? it's done?

7/11
I need to test the models more to decide if they’re perfect, but seems pretty close to done.

8/11
Ade the model and LORA both compatible with Fooocus?

9/11
No idea - I don’t use it

10/11
Wonderful quintessence, thank you!

11/11
Thanks so much!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GUqBiJsXEAA2YqK.jpg

GUqBiJzW4AArRKW.jpg

GUqBiJzWgAA6xjv.jpg

GUqBiJuXIAAJ6qu.jpg

GUqGncbXcAAsaDV.jpg

GUqIHOtXUAAO6Qu.jpg

GUqPhYQasAA89d-.jpg

GUqPhY0a8AA-vzl.jpg

GUqPhbYasAA1dZA.jpg

GUqPhcjbYAAossO.jpg



1/1
You can try @araminta_k’s latest flux lora on Replicate with @lucataco93’s lora explorer:

lucataco/flux-dev-lora – Run with an API on Replicate

Use "alvdansen/frosting_lane_flux" as the hf_lora


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GUslJ_dWQAAutKg.jpg

GUslafsXYAAjfQs.jpg

GUqBiJsXEAA2YqK.jpg

GUqBiJzW4AArRKW.jpg

GUqBiJzWgAA6xjv.jpg

GUqBiJuXIAAJ6qu.jpg
 
Top