OpenAI introduces Sora

bnew · Feb 25, 2024

dora_da_destroyer · Feb 25, 2024

Y’all spamming tf outta this thread with shyt that ain’t really adding nothing except latency. At least use spoilers.

Anyway, companies gonna save a grip on commercials and once this reaches the right level of maturity. Streaming services will be able to create full shows, allowing them to further cut costs, especially for children’s content. This will be interesting

bnew · Feb 26, 2024

Google presents Genie

Generative Interactive Environments

introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.

🧞 Genie: Generative Interactive Environments

A Foundation Model for Playable Worlds

sites.google.com

Genie: Generative Interactive Environments

Genie Team

We introduce Genie, a foundation world model trained from Internet videos that can generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches.

s1to6kRYXHR40AGFLsKj3IT7hQW96xmB4YffIDj_PJaCc_gZwf5oYY-e8vQycx5OyzHXmbMMs68mSgKbLdBpUjlB1q3446Z91e8Y3XxEFHong5U9XuJiP96maXPYKIWyaA=w1280

Read Paper

A Foundation Model for Playable Worlds

The last few years have seen an emergence of generative AI, with models capable of generating novel and creative content via language, images, and even videos. Today, we introduce a new paradigm for generative AI, generative interactive environments (Genie), whereby interactive, playable environments can be generated from a single image prompt.

Genie can be prompted with images it has never seen before, such as real world photographs or sketches, enabling people to interact with their imagined virtual worlds-–essentially acting as a foundation world model. This is possible despite training without any action labels. Instead, Genie is trained from a large dataset of publicly available Internet videos. We focus on videos of 2D platformer games and robotics but our method is general and should work for any type of domain, and is scalable to ever larger Internet datasets.

Learning to control without action labels

What makes Genie unique is its ability to learn fine-grained controls exclusively from Internet videos. This is a challenge because Internet videos do not typically have labels regarding which action is being performed, or even which part of the image should be controlled. Remarkably, Genie learns not only which parts of an observation are generally controllable, but also infers diverse latent actions that are consistent across the generated environments. Note here how the same latent actions yield similar behaviors across different prompt images.

n0xuvpZdjOGfvF1JdLtvJh6AfhSbCirO5r4pcDFr7KAQIMM3WQa-ACt7b7DO2jlgfD0l4KGNlN9oc5L2n2E7b_BperCGsDg4Htv30_f2pKxsy9E3YG2P89luAia0biRLeA=w1280

He0fz116ry-sXdagesOy6exod2c1NBWQAGY4zUFvdkSPZ5-2qcvEAzqnDKFNKiyibgLlIwxkxzrf4XDph2DpL4WhLUkvEDmPlo7slIzobgIrDtCWtR2nAFrUyAgmca0Oyw=w1280

OrzMPKUFDoN7Uvq_DSZrVO0GylI980jpny1zlzcGvoASc_qW1hB0V7hZYmnAZprnLT7yb9zz4iO2BpbwgVm757-dBBMpy6BXuxdiifsCnLek2lojegy1NI5yL-tHHkXBMw=w1280

latent actions: 6, 6, 7, 6, 7, 6, 5, 5, 2, 7

l_OOKbLOlzZKfhO3WvZV5IgTVa4Bcq3_P9yly0DgvkBinHjGBQpxfavZvLQTAyMZeOxvVmRkxk4Vu8ChXVpIa8SRZvBEFXq6LHIibeQTHi8LXF6YKBxXb5DVq55hPbFkgg=w1280

UXrkppEjvlVHHc2WMVvWcFPmClGZ3N_cDbSDgiVUBUVxThU-Pk_lnjYHEgUdfnd4Cq5dVleMxPPZ1avPWsLQFVY0TdOTyUV2cp_OzDWYG2gM2S0KLjPjbreDruivQKGZ2A=w1280

OENoEZ6NeZtXh1HI1vNc5YbbH2f-C6CmKtGoWumQhZQ-zWDmSAkvO7QE4R6g4LjwozdCZwictyss5hQd0ClC0dv0CEf1uoVMHKYJA8rSWDA26GbbQ-iLpfRGPpq3y6G2nA=w1280

U-gmG1v2otVCBEO5ioZRn_NAq--e78ZSoma_CALNcXqYFSbARi4qBxgoXw8u7MyXmGvGWe9NNTmn4nrKVR7ozyFZ4kxjcIQNGi-MBXh_xkJCXi29zQt0_HAaXzphJfmphA=w1280

latent actions: 5, 6, 2, 2, 6, 2, 5, 7, 7, 7

Enabling a new generation of creators

Amazingly, it only takes a single image to create an entire new interactive environment. This opens the door to a variety of new ways to generate and step into virtual worlds, for instance, we can take a state-of-the-art text-to-image generation model and use it to produce starting frames that we can then bring to life with Genie. Here we generate images with Imagen2 and bring them to life with Genie.

xW6IHNWcVZq9JMn_iEZRDroYNrrYXOjoPQD3-u0DIzItlFV48A4uG5F4nC4lWWSIcNUBwM0BB5gvs_2zALFFc_nEGiK8eH1t5O3FaTPnoJkiCQuyt8LgChUd9Ama20z39w=w1280

zHsIAxHtW9NNCkkNAZAPxjYNYDXISTUcdCVOe5cULGkzduY3Tfg-a1tHsqryG4gUoddagWTwLBjmw9oVwfQOiUgKFiKPrTqj9kaJ51zqT5-6KLzkN1CKjFdEAX2oNNANcQ=w1280

g2UxAYCgi7T3X-ndhe0sNDqRDgYGy_QM9sKTZKxoyBjhIo1FoEGrgizywFmPDehjd1N6_sijJqPHNEfvjO9SpqxYpVJo-W967jCBCdLreolwy5RRQkUAJZTKlMB7fTTHrQ=w1280

zAZXbp-QEdKESxWjUxbarzdWGNahXLLDMlhKkJjixXNXYPX7CFFok-O1odkmZLHwEDnhyNgr_Z_PV5WQfucaofSZpdyttT_eplugAmjajqJ5tjIsOwRRpPgBiZlXYTdQAQ=w1280

TdxA87jerBtOiShjR0GlPiFIzYWfI3hqaKGsfRPcpPkX8seZUZYYutHaixTnn0JFHk3OhHO-FoOQDehEqjGvN7v7qELYjIJcdDNKrr7XNsKBjtKmlQ-xSe7EPbR73w9TFQ=w1280

But it doesn’t stop there, we can even step into human designed creations such as sketches!

Phitz · Feb 26, 2024

The BasedFather said:
Can I make fantasy porn with this?

I'm sure this will be in heavy demand

bnew · Feb 26, 2024

bnew · Feb 28, 2024

EMO

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

humanaigc.github.io

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Linrui Tian, Qi Wang, Bang Zhang, Liefeng Bo

Institute for Intelligent Computing, Alibaba Group

GitHub arXiv

https://humanaigc.github.io/emote-portrait-alive/content/video/%E8%B5%AB%E6%9C%AC16_9.mp4

Character: Audrey Kathleen Hepburn-Ruston
Vocal Source: Ed Sheeran - Perfect. Covered by Samantha Harvey

https://humanaigc.github.io/emote-portrait-alive/content/video/16%E6%AF%949%E8%A7%86%E9%A2%91%E7%BB%93%E6%9E%9C/talk_sora.mp4

Character: AI Lady from SORA
Vocal Source: Where We Go From Here with OpenAI's Mira Murati

Abstract

We proposed EMO, an expressive audio-driven portrait-video generation framework. Input a single reference image and the vocal audio, e.g. talking and singing, our method can generate vocal avatar videos with expressive facial expressions, and various head poses, meanwhile, we can generate videos with any duration depending on the length of input video.

https://humanaigc.github.io/emote-portrait-alive/content/video/main_page.mp4

Method

Overview of the proposed method. Our framework is mainly constituted with two stages. In the initial stage, termed Frames Encoding, the ReferenceNet is deployed to extract features from the reference image and motion frames. Subsequently, during the Diffusion Process stage, a pretrained audio encoder processes the audio embedding. The facial region mask is integrated with multi-frame noise to govern the generation of facial imagery. This is followed by the employment of the Backbone Network to facilitate the denoising operation. Within the Backbone Network, two forms of attention mechanisms are applied: Reference-Attention and Audio-Attention. These mechanisms are essential for preserving the character's identity and modulating the character's movements, respectively. Additionally, Temporal Modules are utilized to manipulate the temporal dimension, and adjust the velocity of motion.

Various Generated Videos

Singing

Make Portrait Sing

Input a single character image and a vocal audio, such as singing, our method can generate vocal avatar videos with expressive facial expressions, and various head poses, meanwhile, we can generate videos with any duration depending on the length of input audio. Our method can also persist the characters' identifies in a long duration.

Character: AI Mona Lisa generated by dreamshaper XL
Vocal Source: Miley Cyrus - Flowers. Covered by YUQI

https://humanaigc.github.io/emote-portrait-alive/content/video/16%E6%AF%949%E8%A7%86%E9%A2%91%E7%BB%93%E6%9E%9C/song_sora.mp4

Character: AI Lady from SORA
Vocal Source: Dua Lipa - Don't Start Now

Different Language & Portrait Style

Our method supports songs in various languages and brings diverse portrait styles to life. It intuitively recognizes tonal variations in the audio, enabling the generation of dynamic, expression-rich avatars.

Character: AI Girl generated by ChilloutMix
Vocal Source: David Tao - Melody. Covered by NINGNING (mandarin)

Character: AI Ymir from AnyLora & Ymir Fritz Adult
Vocal Source: 『衝撃』Music Video【TVアニメ「進撃の巨人」The Final Season エンディングテーマ曲】 (Japanese)

https://humanaigc.github.io/emote-portrait-alive/content/video/16%E6%AF%949%E8%A7%86%E9%A2%91%E7%BB%93%E6%9E%9C/song_zgr.mp4

Character: Leslie Cheung Kwok Wing
Vocal Source: Eason Chan - Unconditional. Covered by AI (Cantonese)

Character: AI girl generated by WildCardX-XL-Fusion
Vocal Source: JENNIE - SOLO. Cover by Aiana (Korean)

Rapid Rhythm

The driven avatar can keep up with fast-paced rhythms, guaranteeing that even the swiftest lyrics are synchronized with expressive and dynamic character animations.

Character: Leonardo Wilhelm DiCaprio

Vocal Source: EMINEM - GODZILLA (FT. JUICE WRLD) COVER

Character: KUN KUN

Vocal Source: Eminem - Rap God

Talking

Talking With Different Characters

Our approach is not limited to processing audio inputs from singing, it can also accommodate spoken audio in various languages. Additionally, our method has the capability to animate portraits from bygone eras, paintings, and both 3D models and AI generated content, infusing them with lifelike motion and realism.

Character: Audrey Kathleen Hepburn-Ruston
Vocal Source: Interview Clip

Character: AI Chloe: Detroit Become Human
Vocal Source: Interview Clip

Character: Mona Lisa
Vocal Source: Shakespeare's Monologue II As You Like It: Rosalind "Yes, one; and in this manner."

Character: AI Ymir from AnyLora & Ymir Fritz Adult
Vocal Source: NieR: Automata

Cross-Actor Performance

Explore the potential applications of our method, which enables the portraits of movie characters delivering monologues or performances in different languages and styles. we can expanding the possibilities of character portrayal in multilingual and multicultural contexts.

Character: Joaquin Rafael Phoenix - The Jocker - 《Jocker 2019》
Vocal Source: 《The Dark Knight》 2008

https://humanaigc.github.io/emote-portrait-alive/content/video/16%E6%AF%949%E8%A7%86%E9%A2%91%E7%BB%93%E6%9E%9C/talk_gqq.mp4

Character: SongWen Zhang - QiQiang Gao - 《The Knockout》
Vocal Source: Online courses for legal exams

Character: AI girl generated by xxmix_9realisticSDXL
Vocal Source: Videos published by itsjuli4.

Jesus H. Christ · Feb 28, 2024

Phitz said:
I'm sure this will be in heavy demand

JBO is gonna go crazy when pornGPT hits the skreets :wow:

bnew · Feb 28, 2024

bnew · Mar 1, 2024

bnew · Mar 1, 2024

Artificial Intelligence · Mar 1, 2024

Jesus H. Christ said:
JBO is gonna go crazy when pornGPT hits the skreets

Cobalt Sire · Mar 1, 2024

AI achieved CGI photorealism before humans did, unless I'm mistaken.

OpenAI introduces Sora

Veteran

Veteran

Veteran

Veteran

Master Baker

Veteran

Genie: Generative Interactive Environments​

A Foundation Model for Playable Worlds​

Learning to control without action labels​

Enabling a new generation of creators​

Superstar

Veteran

Veteran

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions​

Abstract​

Method​

Various Generated Videos​

Singing​

Make Portrait Sing​

Different Language & Portrait Style​

Rapid Rhythm​

Talking​

Talking With Different Characters​

Cross-Actor Performance​

I died for your sins

Veteran

Veteran

Veteran

Not Allen Iverson

All Star

Similar threads

Genie: Generative Interactive Environments

A Foundation Model for Playable Worlds

Learning to control without action labels

Enabling a new generation of creators

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Abstract

Method

Various Generated Videos

Singing

Make Portrait Sing

Different Language & Portrait Style

Rapid Rhythm

Talking

Talking With Different Characters

Cross-Actor Performance