Yall dont understand: Sora is going to be a GAME CHANGER for fan fiction.

bnew

Veteran
Joined
Nov 1, 2015
Messages
62,605
Reputation
9,508
Daps
171,470
Extensive list of generative tools curated by Eyal Gruss

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
62,605
Reputation
9,508
Daps
171,470
Introducing Runway Gen-4 | Runway



Channel Info Runway Subscribers: 98.2K subscribers

Description
Introducing Runway Gen-4: Our next-generation series of state of the art AI models for media generation and world consistency.

A new generation of consistent and controllable media is here.

With Gen-4, you are now able to precisely generate consistent characters, locations and objects across scenes. Simply set your look and feel and the model will maintain coherent world environments while preserving the distinctive style, mood and cinematographic elements of each frame. Allowing you to regenerate those elements from multiple perspectives and positions within your scenes.

Get started at runwayml.com
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
62,605
Reputation
9,508
Daps
171,470
gpt-4o mini

script to calculate cost of generating a video on kling based on 4/19/2025 info.


Python:
import readline

def calculate_movie_cost(movie_length_seconds, credits_per_5_seconds, credits_per_month, additional_credit_cost):
    # Calculate total credits needed for the movie
    total_credits = (movie_length_seconds / 5) * credits_per_5_seconds
    
    # Calculate additional credits needed if total credits exceed monthly credits
    additional_credits_needed = max(0, total_credits - credits_per_month)
    
    # Calculate total cost in dollars
    total_cost = (additional_credits_needed / 100) * additional_credit_cost
    
    return total_credits, total_cost

def convert_to_seconds(length, unit):
    if unit == 'hours':
        return length * 3600
    elif unit == 'minutes':
        return length * 60
    elif unit == 'seconds':
        return length
    else:
        raise ValueError("Invalid time unit. Please use 'hours', 'minutes', or 'seconds'.")

def main():
    # Constants for each plan with explanations
    plans = {
        "Basic": {
            "credits_per_month": 166,
            "additional_credit_cost": 0,  # Free plan does not allow additional credits
            "explanation": "Best for casual users who want to explore Kling AI without committing to a paid plan."
        },
        "Standard": {
            "credits_per_month": 660,
            "additional_credit_cost": 1.06,
            "explanation": "Ideal for content creators and digital artists who need better AI-generated visuals with faster processing."
        },
        "Pro": {
            "credits_per_month": 3000,
            "additional_credit_cost": 0.87,
            "explanation": "Perfect for professional designers and video creators who need faster processing and more credits."
        },
        "Premier": {
            "credits_per_month": 8000,
            "additional_credit_cost": 0.81,
            "explanation": "Designed for high-level professionals and studios needing large amounts of high-quality AI content with exclusive perks."
        }
    }

    credits_per_5_seconds = 100  # Credits for every 5 seconds

    # User input for movie length
    print("Enter the length of the movie:")
    length = float(input("Length (e.g., 2.5 for 2 hours 30 minutes, 120 for 120 minutes): "))

    # Define possible units and set up readline for autocomplete
    units = ['hours', 'minutes', 'seconds']
    readline.set_startup_hook(lambda: readline.insert_text('hours'))  # Default suggestion
    readline.parse_and_bind("tab: complete")
    readline.set_completer(lambda text, state: [unit for unit in units if unit.startswith(text)][state])

    unit = input("Unit (hours/minutes/seconds): ").strip().lower()
    readline.set_startup_hook()  # Clear the startup hook

    # Convert the input length to seconds
    try:
        movie_length_seconds = convert_to_seconds(length, unit)
    except ValueError as e:
        print(e)
        return

    # Calculate and display results for each plan
    for plan_name, plan_details in plans.items():
        print(f"\n--- {plan_name} Plan ---")
        print(f"Explanation: {plan_details['explanation']}")
        total_credits, total_cost = calculate_movie_cost(
            movie_length_seconds,
            credits_per_5_seconds,
            plan_details["credits_per_month"],
            plan_details["additional_credit_cost"]
        )
        
        print(f"Movie Length: {length} {unit}")
        print(f"Total Credits Required: {total_credits:.2f} credits")
        print(f"Total Cost: ${total_cost:.2f}")

if __name__ == "__main__":
    main()
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
62,605
Reputation
9,508
Daps
171,470
New layer addition to Transformers radically improves long-term video generation



Posted on Tue Apr 8 15:30:23 2025 UTC


Fascinating work coming from a team from Berkeley, Nvidia and Stanford.

They added a new Test-Time Training (TTT) layer to pre-trained transformers. This TTT layer can itself be a neural network.

The result? Much more coherent long-term video generation! Results aren't conclusive as they limited themselves to a one minute limit. But the approach can potentially be easily extended.

Maybe the beginning of AI shows?

Link to repo:
One-Minute Video Generation with Test-Time Training

One-Minute Video Generation with Test-Time Training


Abstract​


Transformers today still struggle to generate one-minute videos because self-attention layers are inefficient for long context. Alternatives such as Mamba layers struggle with complex multi-scene stories because their hidden states are less expressive. We experiment with Test-Time Training (TTT) layers, whose hidden states themselves can be neural networks, therefore more expressive. Adding TTT layers into a pre-trained Transformer enables it to generate one-minute videos from text storyboards. For proof of concept, we curate a dataset based on Tom and Jerry cartoons. Compared to baselines such as Mamba 2, Gated DeltaNet, and sliding-window attention layers, TTT layers generate much more coherent videos that tell complex stories, leading by 34 Elo points in a human evaluation of 100 videos per method. Although promising, results still contain artifacts, likely due to the limited capability of the pre-trained 5B model. The efficiency of our implementation can also be improved. We have only experimented with one-minute videos due to resource constraints, but the approach can be extended to longer videos and more complex stories.

Paper

Code

Adding TTT Layers to a Pre-Trained Transformer​


Adding TTT layers into a pre-trained Transformer enables it to generate one-minute videos with strong temporal consistency and motion smoothness.









1/12
@hyperbolic_labs
We’re proud to have supported the team behind One-Minute Video Generation with Test-Time Training with compute infrastructure.

Incredible to see our platform enabling breakthroughs in long-form video generation. Congrats to the authors!

@danielkoceja @GashonHussein @Jerry_XU_Jiarui @__yuezhao__ @jankautz @guestrin @tatsu_hashimoto @sanmikoyejo @YejinChoinka @xiaolonw @karansdalal

[Quoted tweet]
Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training.

We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency.

Every video below is produced directly by the model in a single shot, without editing, stitching, or post-processing. Every story is newly created.

Demos: test-time-training.github.io…
Paper: test-time-training.github.io…


GolAoksW8AA1YCm.png

GolApZAWIAA8wJC.jpg


https://video.twimg.com/ext_tw_video/1909310443530944513/pu/vid/avc1/720x480/S8MsN5qN0o9f_Lnx.mp4

2/12
@hyperbolic_labs
Read the full paper: https://test-time-training.github.io/video-dit/assets/ttt_cvpr_2025.pdf



3/12
@Quangduycbq
so cool i will make meaningful video🥰🥰🥰



4/12
@hyperbolic_labs
love it



5/12
@ChetaOfAllTrade
Incredible, Hyperbolic built to see developers actually reach their potential and not getting stucked by computing resource.

Congrats to the team



6/12
@hyperbolic_labs
🥂



7/12
@ericspo29
So now I can make my own cartoons, this is awesome!



8/12
@hyperbolic_labs
Pretty wild tech



9/12
@Just_marhk
Great 👍👏



10/12
@hyperbolic_labs
💯💯



11/12
@Bruhbears985
That's so great 🤘🏻



12/12
@hyperbolic_labs
amazing what AI can do now




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
















1/22
@karansdalal
Today, we're releasing a new paper – One-Minute Video Generation with Test-Time Training.

We add TTT layers to a pre-trained Transformer and fine-tune it to generate one-minute Tom and Jerry cartoons with strong temporal consistency.

Every video below is produced directly by the model in a single shot, without editing, stitching, or post-processing. Every story is newly created.

Demos: One-Minute Video Generation with Test-Time Training
Paper: http://test-time-training.github.io/video-dit/assets/ttt_cvpr_2025.pdf



https://video.twimg.com/ext_tw_video/1909310443530944513/pu/vid/avc1/720x480/S8MsN5qN0o9f_Lnx.mp4

2/22
@karansdalal
Test-time training (TTT) layers are RNN layers where the hidden state is a machine learning model and the update rule is a step of gradient descent. See this thread for previous work.

[Quoted tweet]
I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models.

We’ve designed a new architecture, which replaces the hidden state of an RNN with a machine learning model. This model compresses context through actual gradient descent on input tokens. We call our method “Test-Time-Training layers.”

TTT layers directly replace attention, and unlock linear complexity architectures with expressive memory, allowing us to train LLMs with millions (someday billions) of tokens in context.

Our instantiations, TTT-Linear and TTT-MLP, both match or beat the strongest Transformers and Mamba. Arxiv: arxiv.org/abs/2407.04620


GR-cpVpawAABD38.png


3/22
@karansdalal
Our approach simply adds TTT layers to a pre-trained Diffusion Transformer and fine-tunes it on long videos with text annotations. To keep costs manageable, we limit self-attention to local segments and let TTT (linear complexity) operate globally.



Gn88CAKbwAMyW2Q.jpg


4/22
@karansdalal
We create an “On-Chip Tensor Parallel” algorithm to implement an efficient TTT-MLP kernel. Specifically, we shard the weights of the “hidden state model” across Streaming Multiprocessors, and use the DSMEM feature Hopper GPUs implement AllReduce among SMs.

This avoids costly transfers between global memory (HBM) and shared memory (SMEM), while still fitting the large hidden state into the small amount of fast SMEM.

More details in the paper. Kernel code: GitHub - test-time-training/ttt-tk



Gn88LCwbwAMkVY_.jpg


5/22
@karansdalal
Grateful for wonderful collaborators. This work will be presented at CVPR 2025.

@danielkoceja @GashonHussein @Jerry_XU_Jiarui @__yuezhao__ @jankautz @guestrin @tatsu_hashimoto @sanmikoyejo @YejinChoinka @xiaolonw



Gn89FL3bwAM8v5j.jpg


6/22
@karansdalal
+ our wonderful collaborators without Twitter – Shihao Han, Ka Chun Cheung, Youjin Song, and Yu Sun.



7/22
@menhguin
what the fukk (complimentary)

ok for like a solid 30 seconds I thought this was the Test-Time Training used for the ARC AGI MIT submission and I was rly confused



8/22
@karansdalal
Same thing, different application! Best characterization would be "End to End" vs "Non E2E" test-time training.

Test-Time Training Project Website



9/22
@ruslanjabari
damn and this is only ~50 hours of training runs



10/22
@karansdalal
With a 5B model 🫣



11/22
@reborn_agi
This is incredible work — generating coherent, one-minute-long animated stories with zero post-processing is a huge leap in video generation. The TTT approach looks super promising for maintaining temporal consistency. Huge respect to you and the team.



12/22
@karansdalal
Thank you



13/22
@willdepue
very cool work karan! do you have any baselines of what it looks like without test time training?



14/22
@karansdalal
Thank you Will, sorry to miss this! Here's the World Trade Center video with the local attention baseline* We have some examples of comparing TTT to other RNNs on the project page.

* Disclaimer – this model does have less parameters than the one with added TTT layers.



https://video.twimg.com/ext_tw_video/1909798570049650689/pu/vid/avc1/720x480/0agZ6XihQUKUJ9iC.mp4

15/22
@TheGrizztronic
Pretty cool. TTT should get more love. Hope this helps!



16/22
@karansdalal
🙏



17/22
@jc_stack
Really interested in your pre-training approaches. Have you seen much impact on compute/memory overhead with the TTT layers? Thinking about startup resource constraints here.



18/22
@karansdalal
TTT layers have linear complexity, so long context inference is far better than self-attention. But we still have some way to go on kernel optimization when compared to other modern RNN layers.

Figure 6 from our paper:



Gn9WhFzbwAE84O3.jpg


19/22
@john7rho
Amazing work Karan



20/22
@karansdalal
Thank you John!



21/22
@jam3scampbell
🤔

[Quoted tweet]
in b4 ttt is the new q*


22/22
@nearcyan
hmmmm




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top