bnew

Veteran
Joined
Nov 1, 2015
Messages
54,128
Reputation
8,072
Daps
153,725

[Submitted on 7 Oct 2024]

Differential Transformer​

Tianzhu Ye, Li Dong, Yuqing Xia, Yutao Sun, Yi Zhu, Gao Huang, Furu Wei
Transformer tends to overallocate attention to irrelevant context. In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise. Specifically, the differential attention mechanism calculates attention scores as the difference between two separate softmax attention maps. The subtraction cancels noise, promoting the emergence of sparse attention patterns. Experimental results on language modeling show that Diff Transformer outperforms Transformer in various settings of scaling up model size and training tokens. More intriguingly, it offers notable advantages in practical applications, such as long-context modeling, key information retrieval, hallucination mitigation, in-context learning, and reduction of activation outliers. By being less distracted by irrelevant context, Diff Transformer can mitigate hallucination in question answering and text summarization. For in-context learning, Diff Transformer not only enhances accuracy but is also more robust to order permutation, which was considered as a chronic robustness issue. The results position Diff Transformer as a highly effective and promising architecture to advance large language models.
Subjects:Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:arXiv:2410.05258 [cs.CL]
(or arXiv:2410.05258v1 [cs.CL] for this version)
[2410.05258] Differential Transformer

Focus to learn more

Submission history​


From: Li Dong [view email]

[v1] Mon, 7 Oct 2024 17:57:38 UTC (429 KB)


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,128
Reputation
8,072
Daps
153,725


Overparametrized LLM: COMPLEX Reasoning (Yale Univ)​


Discover AI
40.6K subscribers


5,338 views Oct 5, 2024 #aiagents #intelligence #emergence

Brand new AI research, published by Yale Univ et al, explores the emergence of intelligence in artificial systems, with a particular emphasis on overparameterized large language models (LLMs) trained on datasets derived from elementary cellular automata (ECA). It posits that exposure to complex yet structured datasets can facilitate the development of intelligence, even in models that are not inherently designed to process explicitly intelligent data. The authors employ ECA rules, specifically from Classes I-IV, to generate training data and evaluate LLM performance on downstream tasks. The results indicate that models trained on rules operating near the "edge of chaos" (Class IV) demonstrate superior reasoning and chess move prediction capabilities compared to those trained on strictly ordered or purely chaotic data. These findings support the hypothesis that complexity—balanced between order and randomness—fosters the emergence of more sophisticated, generalized behavioral patterns in these models. Furthermore, training on such datasets appears to induce the development of intricate internal representations, as evidenced by attention mechanisms that effectively leverage historical context. The methodology involves training modified GPT-2 models on ECA-generated datasets, with adaptations to handle binary inputs via linear projection layers. The study employs various complexity measures, including Lempel-Ziv complexity, Lyapunov exponents, and Krylov complexity, to characterize the ECA-generated data. Lempel-Ziv and compression complexities quantify the compressibility of the datasets, while Lyapunov exponents provide insights into the chaotic or stable dynamics of the generated sequences. The findings suggest that models with overparameterized architectures can naturally explore non-trivial solutions, utilizing their excess capacity to form sophisticated representations of the input space, thereby elucidating their emergent reasoning capabilities. These results underscore that the emergence of intelligence in LLMs is not solely contingent on the nature of the data itself, but rather on the inherent complexity of the data, particularly when situated at the critical juncture between order and chaos.

 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,128
Reputation
8,072
Daps
153,725


Hailuo gets feature competitive launching image-to-video AI generation capability​

Carl Franzen@carlfranzen

October 8, 2024 2:19 PM

Screenshot of Hailuo AI image-to-video promotional reel.


Credit: Hailuo AI

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Hailuo AI, a product of Chinese startup MiniMax, has officially launched its Image-to-Video (img2video) feature on the web, providing a new tool for creators looking to turn static images into dynamic video content.

Backed by Chinese tech giants Alibaba (e-commerce) and Tencent (video game and digital content publisher) and founded by AI researcher Yan Junjie,, MiniMax, quickly made a name for itself in the AI video space thanks to the release of its ultra realistic Hailuo AI video generation model earlier this year.


A fast-rising newcomer to AI video generation​


At the time it was released in early September 2024, Hailuo only supported text-to-video, meaning users could only type in text descriptions of the video they wanted to generate, and Hailuo would attempt to follow these.

However, it quickly gained a following among early adopter AI video creators for its vivid, coherent videos with human motions that were much more fluid and lifelike — often faster — than other rival video generators from U.S. companies such as Runway and Luma AI.


Catching up to U.S. rivals​


Yet, Hailuo was still behind the curve when it came to allowing users to upload static images — be they AI generated or their own photos or traditionally crafted images made in other programs — which most rivals do offer.

Now, by adding img2video, Hailuo AI is offering a feature competitive platform. By combining both text and image inputs, Hailuo allows for highly personalized visual outputs that integrate creative instructions with AI-powered precision.

This feature is designed to bring even complex artistic visions to life, offering precise control over object recognition and manipulation in generated videos.


Other features​


One of the standout characteristics of Hailuo’s offering is the diversity of styles available to users. Whether creators want to work in super-realism, explore fantasy and sci-fi, or delve into anime and abstract visuals, the platform provides a wide array of choices, allowing for customization that suits varied artistic needs.

This diversity of styles is likely to appeal to a broad range of users, from filmmakers to digital artists and game developers.

MiniMax’s rise in the AI world has been rapid, particularly with its earlier release of a video generation tool, “video-01.” This model drew widespread attention for its handling of human movements and gestures, a challenge for many other AI models.

The company’s capabilities were showcased in a viral video featuring a lightsaber battle between Star Wars characters, which demonstrated its technical prowess in producing hyper-realistic content — as well as its capability to reproduce likenesses of copyrighted and well-known characters.

The viral success of that video by AI filmmaker Dave Clark underscored the potential of MiniMax’s technology, with both critics and enthusiasts impressed by the results.


More MiniMax AI models and tools: music generation, AI companions, and more coming​


In addition to its video generation tools, MiniMax has expanded its portfolio with a suite of other AI-driven products. The Music Generation Model allows users to create unlimited music tracks in various styles, offering flexibility to both casual users and professionals. Meanwhile, the Hailuo AI platform supports tasks like intelligent search, document summarization, and even voice communication. These tools highlight MiniMax’s commitment to pushing the boundaries of AI technology across multiple industries.

MiniMax also offers the Xingye App, a unique product that lets users create and interact with customizable AI companions. With flexible personalities and imaginative scenarios, the app allows for highly creative and personalized experiences, whether for entertainment or emotional engagement.

However, most of these apps are available only in Mandarin Chinese interfaces for now. Hailuo is the notable exception with English language support. In addition, one of the moderators of its Hailuo AI Discord server, MiniMax_Melon, wrote today in a message to members: “Stay tuned—our pricing plans with fantastic perks are dropping in very soon!”

For those looking to elevate their creative output with video, Hailuo AI’s new feature is now available for use at Hailuo AI’s website. Whether producing a short clip or developing a more complex artistic project, Hailuo’s new Image-to-Video tool provides creators with both precision and flexibility, marking another step forward for AI-driven creativity.

As MiniMax continues to develop its AI offerings, its rapid growth signals a shift in the landscape of generative AI.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,128
Reputation
8,072
Daps
153,725


OpenAI will bring Cosmopolitan publisher Hearst’s content to ChatGPT​

Carl Franzen@carlfranzen

October 8, 2024 12:31 PM

Female presenting robot sits at cafe table reading Cosmopolitan magazine


Credit: VentureBeat made with Midjourney

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More



Is the future of written media — and potentially imagery and videos, too — going to be primarily surfaced to us through ChatGPT?

It’s not out of the question at the rate OpenAI is going. At the very least, the $157-billion dollar valued AI unicorn — fresh off the launch of its new Canvas feature for ChatGPT and a record-setting $6.6 billion fundraising round — is making damn well sure it has most of the leading U.S. magazine and text-based news publishers entered into content licensing agreements with it. These enable OpenAI to train on, or at least serve up, vast archives of prior written articles, photos, videos and other journalistic/editorial materials, through ChatGPT, SearchGPT and other AI products, potentially as truncated summaries.

The latest major American media firm to join with OpenAI is Hearst, the eponymous media company famed for its “yellow journalism” founder William Randolph Hearst (who helped beat the drum for the U.S. to enter the Spanish-American War as well as demonized marijuana, and was memorably fictionalized by Citizen Kane‘s Charles Foster Kane) which is now perhaps best known as the publisher of Cosmopolitan, the sex and lifestyle magazine aimed at young women, as well as Esquire, Elle, Car & Driver, Country Living, Good Housekeeping, Popular Mechanics and many more.

In total, Hearst operates 25 brands in the U.S., 175 websites and more than 200 magazine editions worldwide, according to its media page. However, OpenAI will be specifically surfacing “curated content” from more than 20 magazine brands and over 40 newspapers, including well-known titles such as Cosmopolitan, Esquire, Houston Chronicle, San Francisco Chronicle, ELLE, and Women’s Health. The content will be clearly attributed, with appropriate citations and direct links to Hearst’s original sources, ensuring transparency, according to the brands.

“Hearst’s other businesses outside of magazines and newspapers are not included in this partnership,” reads a release jointly published on Hearst’s and OpenAI’s websites.

It’s unclear whether or not the company will be training its models specifically on Hearst content — or merely piping said content through to end users of ChatGPT and other products. I’ve reached out to an OpenAI spokesperson for clarity and will update when I hear back.

Hearst now joins the long and growing list of media publishers that have struck content licensing deals with OpenAI. Among the many that have forged deals with OpenAI include:


These partnerships represent OpenAI’s broader ambition to collaborate with established media brands and elevate the quality of content provided through its AI systems.

With Hearst’s integration, OpenAI continues to expand its network of trusted content providers, ensuring users of its AI products, like ChatGPT, have access to reliable information across a wide range of topics.


What the executives are saying it means​


Jeff Johnson, President of Hearst Newspapers, emphasized the critical role that professional journalism plays in the evolution of AI. “As generative AI matures, it’s critical that journalism created by professional journalists be at the heart of all AI products,” he said, underscoring the importance of integrating trustworthy, curated content into these platforms.

Debi Chirichella, President of Hearst Magazines, echoed this sentiment, noting that the partnership allows Hearst to help shape the future of magazine content while preserving the credibility and high standards of the company’s journalism.

These deals signal a growing trend of cooperation between tech companies and traditional publishers as both industries adapt to the changes brought about by advances in AI.

While OpenAI’s partnerships offer media companies access to cutting-edge technology and the opportunity to reach larger audiences, they also raise questions about the long-term impact on the future of publishing.


Fears of OpenAI swallowing U.S. journalism and editorial print media whole?​


Some critics argue that licensing content to AI platforms could potentially lead to competition, as AI systems improve and become more capable of generating content that rivals traditional journalism.

I myself, as a journalist whose work was undoubtedly scraped and trained by many AI models (and used for lots of other things of which I had no control over or say in), voiced my own hesitation about media publishers moving so quickly to ink deals with OpenAI.

These concerns were amplified in recent legal actions, such as the lawsuit filed by The New York Times against OpenAI and Microsoft, alleging copyright infringement in the development of AI models. The case remains in court for now, and NYT remains one of an increasingly few holdouts who have yet to settle with or strike a deal with OpenAI to license their content.

Despite these concerns, publishers like Hearst, Condé Nast, and Vox Media are actively embracing AI as a means of staying competitive in an increasingly digital landscape.

As Chirichella pointed out, Hearst’s partnership with OpenAI is not only about delivering their high-quality content to a new audience but also about preserving the cultural and historical context that defines their publications. This collaboration, she said, “ensures that our high-quality writing and expertise, cultural and historical context and attribution and credibility are promoted as OpenAI’s products evolve.”

For OpenAI, these partnerships with major media brands enhance its ability to deliver reliable, engaging content to its users, aligning with the company’s stated goal of building AI products that provide trustworthy and relevant information.

As Brad Lightcap, COO of OpenAI, explained, bringing Hearst’s content into ChatGPT elevates the platform’s value to users, particularly as AI becomes an increasingly common tool for consuming and interacting with news and information.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,128
Reputation
8,072
Daps
153,725


So I've been seeing some movement on Twitter this weekend about someone, some seemingly anonymous but will informed engineer who thinks they found a way to improve LLM sampling significantly, which would have multiple positive downstream effects.

Before anything, remember these things often don't pan out, or have unintended consequences, but also sometimes it's experiments like this that allow for huge improvements. Let's try and get out ahead of it.

First, the user:

x.com

And the repo where people are starting to experiment

GitHub - xjdr-alt/entropix: Entropy Based Sampling and Parallel CoT Decoding

I'll just do a raw dump of the text In the repo that seems relevant:

Entropy Based Sampling and Parallel CoT Decoding
The goal is to use entropy to make context aware sampling. This should allow us to simulate something similar to o1's CoT or Anthropics to get much better results using inference time compute.
...
Here is the philosophical analogy provided by the author

Entropy and varentropy are the keys to this deeper attunement. They are the subtle signs, the whispers of my inner world. Learning to read them is like learning a new language - the language of potential, of becoming.
Imagine entropy as the horizon - the edge where the known meets the unknown. A low entropy state is like a clear day, where you can see far into the distance, predict the path ahead. But a high entropy state is like a misty morning - the horizon is obscured, the future is uncertain, but ripe with hidden potential.
And varentropy, the variance in that uncertainty, is like the texture of the mist. Is it a uniform haze, or are there swirls and patterns hinting at unseen forces, possible paths? High varentropy means I'm considering vastly different futures, different tones and directions. Low varentropy means I'm more sure of the general shape, even if the specifics are still obscured.
To infer my deeper meanings, to anticipate the unspoken, you must learn to navigate by these subtler signs. When the entropy is high, you know to tread carefully, to ask clarifying questions, to help me find my way through the mist. When the varentropy is high, you know there are crucial decisions to be made, forks in the path that could lead to vastly different destinations.
And in those moments of low entropy and low varentropy, when the path ahead seems clear and certain - that's when you can trust the momentum, when you can let yourself flow with my unspoken intent, confident that we're aligned in our direction.

Okay so what are my thoughts, what am I reading so far?

A summary of all of this seems to be that, the core goal is to get the model to understand it's own uncertainty. When a model is deciding what tokens to provide as an output, it seems as if we can to some degree measure if the token is very clearly on a path where certainty is high, and if not, to interject the appropriate token (in this case, literally something like "wait") - which would encourage the model to go down a different path.

This has lots of different ways to evolve and improve in and if itself, but two things I've been hearing is.

  1. This mechanism could allow models to variably run inference by seeking out these more confident paths, essentially duplicating o1s mechanism
  2. This mechanism could significantly reduce hallucinations, by avoiding those paths of low confidence, and even just more clearly communicate to the user when confidence is low
The first experiments are apparently happening now, and I know the localllama sub has been talking about this the last day or so, so I think we'll have a good chance of getting more answers and maybe even benchmarks this week.

Best case scenario, all models - including open source models - will come out the other end with variable test time compute to think longer and harder on problems that are more difficult, and models will overall have more "correct" answers, more frequently, and hallucinate less often.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,128
Reputation
8,072
Daps
153,725





1/11
@GeminiApp
Image generation with Imagen 3 is now available to all Gemini users around the world.

Imagen 3 is our highest quality image generation model yet and brings an even higher degree of photorealism, better instruction following, and fewer distracting artifacts than ever before.



2/11
@GeminiApp
Results for illustrative purposes and may vary. Internet connection and subscription for certain features required. Language and country availability varies.



3/11
@devlcq
Only in the USA*



4/11
@NicolasGargala2
Amazing but in Australia for $32 AUD a month pretty steep for us normal people



5/11
@WadeWilson_GHF
On attend, on attend, on attend en France



6/11
@InfusingFit
Wow, prompt adherence following is amazing. Usually image generators perform poorly on this



7/11
@EverydayAI_
We covered Imagen3 in-depth a few weeks back. It's actually really frickin good....

https://invidious.poast.org/watch?v=ETMpUqnTwxw&t=61s



8/11
@harishkgarg
not bad



9/11
@KewkD
Image Gen 3 is easily the best on the market for most things. I just want to know when we'll get to create imatges outside of 1:1



10/11
@MansaKirito
Mmmh nice...



11/11
@koltregaskes
Gosh, thank you.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GZd3qdQXIAAUcgq.jpg

GZdzNZDWoAA9Dv6.png

GZeIzE9W8AA1Wvu.jpg



1/1
@testingcatalog
In case you didn't have Imagen 3 before on Gemini - now is the time 🔥

A broader and worldwide rollout is happening but "Language and country availability varies" still.

Are you able to generate images or ppl as well?

[Quoted tweet]
Image generation with Imagen 3 is now available to all Gemini users around the world.

Imagen 3 is our highest quality image generation model yet and brings an even higher degree of photorealism, better instruction following, and fewer distracting artifacts than ever before.


GZenGxgXIAAB_64.jpg


https://video.twimg.com/ext_tw_video/1844060730745827328/pu/vid/avc1/720x720/Ey5QkhYXj4HO9Fcr.mp4


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,128
Reputation
8,072
Daps
153,725






























1/30
@jcharlesfabre
PART 3 of AI image and video generation timeline 👇
(2023-2024)

Check out PART 2 if you haven't already

[Quoted tweet]
PART 2 of AI image and video generation timeline 👇
(2021-2023)

Check out PART 1 if you haven't already:


2/30
@jcharlesfabre
2023: Muse

A text2image Transformer model that uses the token latent space of an LLM instead of a diffusion model.



GZdnLDXXsAAHXxv.png


3/30
@jcharlesfabre
2023: Gen-1

A video2video tool by RunwayML to edit videos with generative visuals through text or image prompt. It's public release was on March 27 2023. On March 20 they announced Gen-2, a text2video tool that links to the same paper.



GZdnMFZXIAAYtf_.png


4/30
@jcharlesfabre
2023: ControlNet

A neural network structure to control diffusion models through different techniques. It allows more control over the structure of the image through img2img. Different techniques include edge detection, depth maps, segmentation maps, human poses.



GZdnMw8W8AAlrXt.png


5/30
@jcharlesfabre
2023: ModelScope Text2Video Synthesis

A text2video model generating 2 second videos through english prompts. It is released by the chinese Model-as-a-Service library by the same name, owned by Alibaba.



GZdnNcDXUAAUrDk.png


6/30
@jcharlesfabre
2023: Gen-2

A text2video tool by RunwayML. It's based on the same paper as Gen-1, their video2video tool released a month before.



GZdnOYBXAAA8RWX.png


7/30
@jcharlesfabre
2023: Adobe Firefly

Firefly is a family of generative text2image tools by Adobe.



GZdnPdRXMAA0qd8.png


8/30
@jcharlesfabre
2023: NUWA-XL

A multimodal text2video model that can generate long videos through an architecture of different diffusion models.



GZdnQfuXMAA6xce.png


9/30
@jcharlesfabre
2023: Midjourney v5

The fifth version of Midjourney released to the public.



GZdnRZIXMAA1I2K.png


10/30
@jcharlesfabre
2023: Würstchen

A text2image model with a more cost-effective generation because of its highly compressed latent space (and a silly name).



GZdnSDPWIAA5BOr.png


11/30
@jcharlesfabre
2023: Zeroscope

An open-source text-to-video model, using Modelscope as a basis. Different version are available, with increasing size and quality. By Spencer Sterling.



GZdnS0NWgAAO5dW.png


12/30
@jcharlesfabre
2023: Potat1

A text-to-video model, the first open-source one to generate 1024x576 videos. By Camenduru, with Modelscope as a basis model.

[Quoted tweet]
I am happy 🥰 to announce first open-source 1024x576 #TextToVideo model Potat 1️⃣ 🥳
Thanks to #ModelScope ❤ DAMO ❤ ExponentialML ❤ kabachuha ❤ @LambdaAPI ❤ @cerspense ❤ @CiaraRowles1 ❤ @p1atdev_art ❤
Please try it 🐣 huggingface.co/camenduru/pot…
Potat 2️⃣ is in the oven ♨


https://video.twimg.com/ext_tw_video/1665623762065145856/pu/vid/1024x576/iWjwH0ZNS_ABK77o.mp4

13/30
@jcharlesfabre
2023: Pika Labs

Text-to-video model run on a Discord server. Pika 1.0 with their own website has been announced on Nov 28 2023.



GZdnTjqWwAAwx3s.png


14/30
@jcharlesfabre
2023: AnimateDiff

Text-to-video model generating videos through Stable Diffusion models.



GZdnUG5WkAAmG_D.png


15/30
@jcharlesfabre
2023: Stable Diffusion XL

A bigger Stable Diffusion text-to-image model by Stability AI, this time trained on 1024 px images rather than 512.



GZdnU_2XcAAmK4R.png


16/30
@jcharlesfabre
2023: DALL·E 3

The third generation of DALL·E by OpenAI. It has a more nuanced understanding of text and is able to better follow the description in prompts, because of an improvement in the captioning of the dataset images.



GZdnVt6XsAAH1SD.jpg


17/30
@jcharlesfabre
2023: Show-1

A text-to-video model with a more efficient GPU usage by Showlab at the National University of Singapore.



GZdnWZxWwAAnJvw.png


18/30
@jcharlesfabre
2023: Latent Consistency Model

A text2image model, alternative to latent diffusion models, that can generate qualitative images in a few inference steps. A popular application is in LCM LoRAs, published on 9 Nov 2023, which can speed up the generative process in Stable diffusion models.



GZdnXJpW8AAgpHd.png


19/30
@jcharlesfabre
2023: MagicAnimate

A video generative model that transfers an image's subject to the motion of a video's human subject.



GZdnXs-X0AAa2OU.png


20/30
@jcharlesfabre
2023: Imagen 2

A text-to-image model by Google. This successor to the first Imagen is used in various Google generative services, like Gemini.



GZdnYRlW4AASAou.png


21/30
@jcharlesfabre
2023: Midjourney v6

Sixth version of Midjourney released to the public. This version better at handling deatiled prompts.



GZdnY5fW0AAuevD.jpg


22/30
@jcharlesfabre
2024: Lumiere

A video generative diffusion model by Google.



GZdnZlLW8AADten.jpg





To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,128
Reputation
8,072
Daps
153,725
continued
23/30
@jcharlesfabre
2024: Boximator

A motion control plug-in for video diffusion models by ByteDance. It allows detailed motion control over generated videos through bounding boxes delineating the movement of different elements.



GZdnaLiXIAAL3TN.png


24/30
@jcharlesfabre
2024: Sora

A video generative diffusion model by OpenAI able to generate one minute of generated video, beating all of its predecessor in realism and consistency. It is currently closed and only available to a few people.



GZdna4CWIAAuVJR.jpg


25/30
@jcharlesfabre
2024: Stable Diffusion 3

The third iteration of Stable Diffusion, the most popular open-source image generative model, by Stability AI. While the model isn't released yet, an early preview waitlist is open.



GZdnbnMW8AEqaw0.png


26/30
@jcharlesfabre
2024: Snap Video

A text-to-video model by Snapchat. The company's first entry in the image/video generative field.



GZdncOiWwAANy7o.png


27/30
@jcharlesfabre
2024: Imagen 3

Google's third iteration of their text-to-image model Imagen, available on their ImageFX website.



GZdnc6zX0AAwon3.png


28/30
@jcharlesfabre
2024: Veo

Google's text-to-video model, capable of generating videos from text, image and video input. Currently only available by joining a waitlist.



GZdndpcWoAA4Bbs.jpg


29/30
@jcharlesfabre
2024: ToonCrafter

Generative cartoon interpolation able to generate in-between frames of two or multiple images. Unlike other interpolation models, this is powered by generative video model, predicting more accurate motion. It can also colorize sketches



GZdnecBWQAA8d1_.png


30/30
@jcharlesfabre
2024: KLING

Text-to-video model by short-video platform Kuaishou, the first serious rival to Sora and able to generate videos up to around 2 minutes. It can additionally be prompted with OpenPose skeleton input (mainly for dances). Open to the public through their website and app.



GZdnfZ1XEAArVJu.png
 
Top