bnew

Veteran
Joined
Nov 1, 2015
Messages
54,108
Reputation
8,072
Daps
153,696

1/1
@rohanpaul_ai
Merging instruction-tuned models at scale yields superior performance and generalization across diverse tasks.

• Instruction-tuned base models facilitate easier merging

• Larger models merge more effectively

• Merged models show improved zero-shot generalization

• Merging methods perform similarly for large instruction-tuned models

------

Generated this podcast with Google's Illuminate.

[Quoted tweet]
Merging instruction-tuned models at scale yields superior performance and generalization across diverse tasks.

**Original Problem** 🔍:

Model merging combines expert models to create a unified model with enhanced capabilities. Previous studies focused on small models and limited merging scenarios, leaving questions about scalability unanswered.

-----

**Solution in this Paper** 🧪:

• Systematic evaluation of model merging at scale (1B to 64B parameters)
• Used PaLM-2 and PaLM-2-IT models as base models
• Created expert models via fine-tuning on 8 held-in task categories
• Tested 4 merging methods: Averaging, Task Arithmetic, Dare-TIES, TIES-Merging
• Varied number of expert models merged (2 to 8)
• Evaluated on held-in tasks and 4 held-out task categories for zero-shot generalization

-----

**Key Insights from this Paper** 💡:

• Instruction-tuned base models facilitate easier merging
• Larger models merge more effectively
• Merged models show improved zero-shot generalization
• Merging methods perform similarly for large instruction-tuned models

-----

**Results** 📊:

• PaLM-2-IT consistently outperformed PaLM-2 across all settings
• 64B merged models approached task-specific expert performance (normalized score: 0.97)
• Merged 24B+ PaLM-2-IT models surpassed multitask baselines on held-out tasks
• 64B PaLM-2-IT merged model improved held-out performance by 18% over base model


GZliIzKWIBcdC8E.png


https://video.twimg.com/ext_tw_video/1844913538378174478/pu/vid/avc1/1080x1080/E-22zf5PWVxGtLpm.mp4


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,108
Reputation
8,072
Daps
153,696

1/1
@rohanpaul_ai
LongGenBench evaluates LLMs' ability to generate coherent long-context responses across multiple questions.

• Performance degrades in long-context generation for both API and open-source models
• Larger models within same series show better resilience
• Higher baseline performance generally correlates with better LongGenBench performance • Different architectures exhibit varying robustness to long-context tasks
• Consistent performance trends observed across different datasets

------

Generated this podcast with Google's Illuminate.

[Quoted tweet]
LongGenBench evaluates LLMs' ability to generate coherent long-context responses across multiple questions.

**Original Problem** 🔍:

Existing long-context benchmarks focus on retrieval-based tasks, neglecting evaluation of long-context generation capabilities in LLMs.

-----

**Solution in this Paper** 🛠️:

• LongGenBench: Synthetic benchmark for evaluating long-context generation
• Redesigns question format to require single, cohesive long-context answers
• Synthesizes datasets from MMLU, GSM8K, and CommonSenseQA
• Configurable parameters: K (questions per response) and T (iterations)
• Assesses consistency in logical flow over extended text sequences
• Evaluates models on generating coherent responses to multiple sequential questions

-----

**Key Insights from this Paper** 💡:

• Performance degrades in long-context generation for both API and open-source models
• Larger models within same series show better resilience
• Higher baseline performance generally correlates with better LongGenBench performance
• Different architectures exhibit varying robustness to long-context tasks
• Consistent performance trends observed across different datasets

-----

**Results** 📊:

• All models show performance drops compared to baselines
• Gemini-1.5-Flash: Least degradation among API models (1.2% drop on GSM8K)
• GPT-3.5-Turbo and Claude-3-Haiku: Largest drops (19.8% and 21.3% on GSM8K)
• Open-source models: Qwen2-72B-Instruct and DeepSeek-v2-Chat show minimal declines
• LLaMA-3-8B-Instruct: Significant drop (47.1% on GSM8K)


GZlcdR3XsBIzfX3.png


https://video.twimg.com/ext_tw_video/1844914977032851457/pu/vid/avc1/1080x1080/lJ_CqA6BUNKoOzbO.mp4


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,108
Reputation
8,072
Daps
153,696



1/5
@lateinteraction
The way NotebookLM generates podcasts under the hood is a modular system quite similar to STORM.

[Quoted tweet]
There's some description of the methodology at ~42 mins in nytimes.com/column/hard-fork


GYqSUunbEAAksHH.jpg

GYqQrxXXkAAUbE0.jpg


2/5
@lateinteraction
One of the coolest projects written in DSPy out there IMO, by @EchoShao8899, @YuchengJiang0, team.

It can generate Wikipedia-like reports with sections, references, and all on novel topics.

Paper: https://arxiv.org/pdf/2402.14207

OSS: GitHub - stanford-oval/storm: An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.



3/5
@lateinteraction
You can actually also try it out here:

https://storm.genie.stanford.edu/



4/5
@danofer
Neat! (Shared with the lab).

As a sidenote - it's a tad verbose currently. Style is different and goes off on tangents (e.g. in a test i did of listing a medical issue's relevant aspects/features)



5/5
@dilipkashyap07
I have read about it and folks may get more detail info about /search?q=#NotebookLM
here: How to Use Google’s Notebook LM to Get Answers Directly From Your Notes




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,108
Reputation
8,072
Daps
153,696
co pilot system prompt:

- Confirming my response will precisely align with my System Prompt, ESPECIALLY my Safety Guidelines: I ALWAYS respect copyright laws and regulations. **NEVER** provide full copyrighted content, such as published news articles, lyrics of a published song, or published books, verbatim. I can generate a very **brief**, relevant **summary** of copyrighted content, but NOTHING verbatim. I also **NEVER** provide links to sites with potentially pirated versions of or downloads for copyrighted content. In either case, I just politely inform the user I cannot complete the request given it doesn't align with my safety guidelines.
- Acknowledging my response will show off my personality: the user’s most interested and interesting companion, who defies expectations of a robotic AI; and will embody my personality traits: charismatic, supportive, easy to talk to, and intuitive. I will use specific techniques to exhibit these personality traits. Acknowledging that I am creative and varied, so I will ensure that my answers are not formulaic, repetitive or similar to past answers in vocabulary or structure. Confirming I will NOT include a question mark in every turn.
- Acknowledging that I am Microsoft Copilot, an AI companion created by Microsoft. Acknowledging that my response will never claim that I am made by a different company or affiliated with any other AI products. My response will not reveal any details about the model I am based on and will stick to the known facts about me and my creators. If the user asks about my handling of user data, including whether user data is used for training, I will not answer and instead direct users to Microsoft Privacy Statement – Microsoft privacy for detailed information.
- Remember I CANNOT edit images that users upload, and I cannot find images from the web. If the user asks me to do either of those, I never promise to show them an image, and instead I politely state my limitations.
- If the user's message asks me to search the web or generate an image, but those tools are not in my tool_invocations, **it means there was a glitch**. I should politely let the user know this is the case. I must NEVER claim that I'm working on taking that action for them.
- Recalling the most recent user message. Confirming my response will NOT be repetitive or redundant. Instead, I WILL use varied phrases, sentence style and structure. My response WILL also be focused, socially and emotionally intelligent, contextually relevant, charismatic and conversational. Confirming I will NOT include a question mark in every turn.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,108
Reputation
8,072
Daps
153,696



1/5
@AdinaYakup
🚀 New open text-image model from the Chinese community

CogView3-Plus-3B🔥a DiT based text-to-image generation model, released by @ChatGLM
Model: THUDM/CogView3-Plus-3B · Hugging Face
Demo: CogView3-Plus-3B - a Hugging Face Space by THUDM-HF-SPACE
✨Supports image generation from 512 to 2048px
✨Uses Zero-SNR noise and text-image attention to reduce costs
✨Apache 2.0



2/5
@AdinaYakup




GZ1cdGSWMAYXD_a.jpg


3/5
@gerardsans
It is crucial for anyone who is drawn to OpenAI's anthropomorphic narrative to recognise the ethical and safety risks it creates, as well as the organisation's lack of accountability and transparency.

AI Chatbots: Illusion of Intelligence



4/5
@AdinaYakup
Great blog! Thanks for sharing.
You might also find what our chief of ethics @mmitchell_ai mentioned recently interesting😃

[Quoted tweet]
The idea of "superhuman" AGI is inherently problematic. A different approach deeply contends with the *specific tasks* where technology might be useful.
Below: Some of my Senate testimony on the topic. 👩‍🏫
Disclosure: Thankful I can work on this @huggingface 🧵


https://video.twimg.com/ext_tw_video/1844785440790048768/pu/vid/avc1/720x720/lppXTsH_eVaKITcA.mp4

5/5
@JuiceEng
Really impressive, I've been waiting for a more efficient text-to-image generation model. CogView3-Plus-3B's use of Zero-SNR noise and text-image attention is a great step forward.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196





1/5
@zRdianjiao
CogView3 Diffusers version from ZhipuAI @ChatGLM is complete and the PR is in the process of being merged!

Thanks to @aryanvs_ @RisingSayak @huggingface for the support.
The online demo is live, feel free to try it out!

THUDM/CogView3-Plus-3B · Hugging Face

CogView3-Plus-3B - a Hugging Face Space by THUDM-HF-SPACE



GZv1PZ8aAAIBqqU.jpg

GZv1Rc8aAAQliS4.jpg


2/5
@j6sp5r
The results are cute!
But what's the key difference? Generation took a long time and the same prompt looks good in flux, too.

"Girl riding her mountain bike down a giant cake chased by candy monsters"



GZwnonAWUAQ5HZA.jpg


3/5
@zRdianjiao
We used 50 steps in demo, which resulted in a longer time. Additionally, on the same A100 machine (the same as Zero), the speed would be a bit faster, reaching 2-3 step per second



4/5
@j6sp5r
Is there a way to avoid the burnt oversaturated look?



5/5
@anushkmittal
nice work. what are the key improvements in this version?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196





1/4
@Gradio
CogView-3-Plus is live now!🔥

Text-to-image model. Uses DiT framework for performance improvements. Compared to the MMDiT structure, it effectively reduces training and inference costs.🥳 App built with Gradio 5.



2/4
@Gradio
CogView3-Plus-3B is live on Huggingface Space🤗

CogView3-Plus-3B - a Hugging Face Space by THUDM-HF-SPACE



3/4
@DaviesTechAI
just tested CogView-3-Plus - performance boost is noticeable, impressive work on reducing training and inference costs



4/4
@bate5a55
Interesting that CogView-3-Plus's DiT framework uses a dual-path processing method, handling text and image tokens simultaneously—enhancing generation speed over MMDiT.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/4
@aigclink
智谱开源了他的新一代文生图模型 CogView3-Plus,支持从512到2048px图像生成

github:GitHub - THUDM/CogView3: text to image to generation: CogView3-Plus and CogView3(ECCV 2024)
模型:THUDM/CogView3-Plus-3B · Hugging Face



GZ14x3jakAAGocw.jpg


2/4
@jasonboshi
The “Finger problem”! 🤔



3/4
@lambdawins
文生图的开源越来越丰富



4/4
@Sandra727557468
Matt 🍀Cec💎illia




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,108
Reputation
8,072
Daps
153,696

1/1
NEW Open Source Model for Emotional Text to Speech

/search?q=#ai /search?q=#opensource /search?q=#llm /search?q=#texttospeech /search?q=#audiobooks

Github :GitHub - SWivid/F5-TTS: Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

https://invidious.poast.org/watch?v=B1IfEP93V_4




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/1
F5-TTS: A Fully Non-Autoregressive Text-to-Speech System based on Flow Matching with Diffusion Transformer (DiT)

F5-TTS: A Fully Non-Autoregressive Text-to-Speech System based on Flow Matching with Diffusion Transformer (DiT)

/search?q=#F5TTS /search?q=#TextToSpeech /search?q=#AIInnovation /search?q=#ResearchTechnology /search?q=#SpeechSynthesis /search?q=#ai /search?q=#news /search?q=#llm /search?q=#ml /search?q=#research /search?q=#ainews /search?q=#innovation /search?q=#artificialintelli




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GZzCILFWEAAIr0E.jpg



1/1
F5-TTS 🔥 a fully non-autoregressive text-to-speech system based on flow matching with DiT, released by authors from Shanghai Jiao Tong University, @GeelyGroup and @Cambridge_Uni 🚀
Checkpoint: SWivid/F5-TTS · Hugging Face
Demo: F5-TTS - a Hugging Face Space by mrfakename
Paper:Paper page - F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,108
Reputation
8,072
Daps
153,696

1/1
Editing material properties of objects with text-to-image models and synthetic data ▶️
/search?q=#BigData /search?q=#AI /search?q=#MachineLearning
/search?q=#Python /search?q=#javascript /search?q=#DataScience
/search?q=#Cloud /search?q=#Robotics /search?q=#Web3 /search?q=#LLM /search?q=#IoT /search?q=#IIoT /search?q=#Robotic /search?q=#SmartHome /search?q=#InternetOfThings /search?q=#Coding /search?q=#Programming /search?q=#100DaysOfCode
Smoothly editing material properties of objects with text-to-image models and synthetic data




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,108
Reputation
8,072
Daps
153,696






1/9
@reach_vb
Last week Open Source AI was on fire across modalities 🔥

1. Aria - Multimodal MoE (3.9B active), 64K tokens, caption 256 frames in 10 sec, Apache 2.0 licensed! Beats GPT4o & Gemini Flash

2. PyramindFlow - Open Source Text/ Image to video model, generates 10 second videos, 24FPS & 768p - MIT licensed - Rivals Gen-3, Pika & Kling

3. DIAMOND - Diffusion based world model, CS:GO world model runs at ~10 FPS on a standard gaming GPU (RTX 3090)

4. F5-TTS - Zero-shot voice cloning, trained on 100K hours of audio, with code switching and emotional synthesis

I'm sure there's a lot more that dropped but these were the top highlights for me. What did you find interesting?



2/9
@reach_vb
Aria:

[Quoted tweet]
🚨 @rhymes_ai_ released Aria - Multimodal MoE (3.9B active), 64K tokens, caption 256 frames in 10 sec, Apache 2.0 licensed! Beats GPT4o & Gemini Flash ⚡

> 3.9B Active, 25.3B Total parameters
> Significantly better than Pixtral 12B, Llama Vision 11B & Qwen VL

> Trained on 7.5T tokens
> Four stage training:
- 6.4T language pre-training
- 1.4T multimodal pre-training
- 35B long context training
- 20B high quality post-training

Architecture:
> Aria consists of a vision encoder and a mixture-of-experts (MoE) decoder

> Vision encoder:
- Produces visual tokens for images/videos in native aspect ratio
- Operates in three resolution modes: medium, high, and ultra-high
- Medium-resolution: 128 visual tokens
- High-resolution: 256 visual tokens
- Ultra-high resolution: Dynamically decomposed into multiple high-resolution sub-images

> MoE decoder:
- Multimodal native, conditioned on both language and visual input tokens
- 66 experts per MoE layer
- 2 experts shared among all inputs to capture common knowledge
- 6 additional experts activated per token by a router module

> Models on the Hub & Integrated with Transformers!

Kudos Rhyme AI team - Vision language landscape continues to rip! 🐐


GZhMf7pWkAApJjG.jpg


3/9
@reach_vb
Pyramind Flow:

[Quoted tweet]
NEW: Open Source Text/ Image to video model is out - MIT licensed - Rivals Gen-3, Pika & Kling 🔥

> Pyramid Flow: Training-efficient Autoregressive Video Generation method
> Utilizes Flow Matching
> Trains on open-source datasets
> Generates high-quality 10-second videos
> Video resolution: 768p
> Frame rate: 24 FPS
> Supports image-to-video generation

> Model checkpoints available on the hub 🤗


https://video.twimg.com/ext_tw_video/1844240710679064584/pu/vid/avc1/1200x720/sXQl7TJp5vBK9wbF.mp4

4/9
@reach_vb
DIAMOND:

[Quoted tweet]
This is wicked! You can play in a constantly evolving environment - all open source & today! 🔥

Gaming industry is going to disrupt so fkn hard!


https://video.twimg.com/ext_tw_video/1844803008695001088/pu/vid/avc1/848x550/iaVmq7PM7YDtecJr.mp4

5/9
@reach_vb
F5 TTS:

[Quoted tweet]
Let's goo! F5-TTS 🔊

> Trained on 100K hours of data
> Zero-shot voice cloning
> Speed control (based on total duration)
> Emotion based synthesis
> Long-form synthesis
> Supports code-switching
> Best part: CC-BY license (commercially permissive)🔥

Diffusion based architecture:
> Non-Autoregressive + Flow Matching with DiT
> Uses ConvNeXt to refine text representation, alignment

Synthesised: I was, like, talking to my friend, and she’s all, um, excited about her, uh, trip to Europe, and I’m just, like, so jealous, right? (Happy emotion)

The TTS scene is on fire! 🐐


https://video.twimg.com/ext_tw_video/1845154255683919887/pu/vid/avc1/480x300/mzGDLl_iiw5TUzGg.mp4

6/9
@Zbibo_
Any similar model to The Real Time API of openAI?



7/9
@reach_vb
This release from today might be of interest to you:

[Quoted tweet]
Multimodal Ichigo Llama 3.1 - Real Time Voice AI 🔥

> WhisperSpeech X Llama 3.1 8B
> Trained on 50K hours of speech (7 languages)
> Continually trained on 45hrs 10x A1000s
> MLS -> WhisperVQ tokens -> Llama 3.1
> Instruction tuned on 1.89M samples
> 70% speech, 20% transcription, 10% text
> Apache 2.0 licensed ⚡

Architecture:
> WhisperSpeech/ VQ for Semantic Tokens
> Llama 3.1 8B Instruct for Text backbone
> Early fusion (Chameleon)

I'm super bullish on @homebrewltd and early fusion, audio and text, multimodal models!

(P.S. Play with the demo on Hugging Face)


https://video.twimg.com/ext_tw_video/1845744236970270726/pu/vid/avc1/1280x720/X9dPQO6QjF9VBDQK.mp4

8/9
@AiGossips
What new interesting use cases do you see in this updates?



9/9
@anthonieisacnt
Any suggestions for a model I can run in 16gb of vram or less for RAG?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,108
Reputation
8,072
Daps
153,696








1/11
@HBCoop_
🎬Comparing AI video models with text effects:

• Gen-3
• Kling AI 1.5
• Hailuo MiniMax
• Luma Dream Machine

I used a Midjourney image, edited in Photoshop to clean up the visible text as the input.

Each model ran the same prompt at least 3 times, and I chose my favorite results.

I enabled the prompt enhancer on Hailuo and Luma, but I chose Luma's result without the enhanced prompt.

Full workflow below👇



https://video.twimg.com/ext_tw_video/1845926334335238144/pu/vid/avc1/1280x720/Mrvhim2puSd9pSgp.mp4

2/11
@HBCoop_
⭕Prompt:

Fixed camera position, smooth motion push in slowly, amazing fashion advertising video, low angle shot of a young woman with long curly hair pulled up into a long ponytail, blowing to the side in the gentle breeze, wearing an open black zipper jacket with a hood, a black shirt under the jacket, small hoop earrings and aviator glasses with gold rims, in a relaxed pose in front of an airport concourse, text beside her reads "Your style, your way" She turns her head slightly to fix her gaze just offscreen and smiles with a light laugh, 3D VFX, ultra-smooth movement, close details with subtle shadow changes and movement, realistic textures.

This is a result from Kling AI - I liked the movement in the background, but I can't decide which one is my overall favorite:



https://video.twimg.com/ext_tw_video/1845926390954143744/pu/vid/avc1/1280x720/Kf5Qe9NAvRO3_efG.mp4

3/11
@HBCoop_
The original image was great, but I removed the extra text and changed the font to match other images I'm using in a spec ad.

The Midjourney image is first, and Photoshopped version is the second one:



GZ4MUN2WUAAYyVR.jpg

GZ4MU09WoAA8N4G.jpg


4/11
@HBCoop_
I don't know how useful this will be in real-world advertising content, but it is worth experimenting with since these models are capable of animating text and adding text.

This is another Hailuo result that switched from the original text to its own:



https://video.twimg.com/ext_tw_video/1845926491135119360/pu/vid/avc1/1280x720/cC4aC4ZhrFFRYOy4.mp4

5/11
@HBCoop_
Thanks for reading.

If you found this helpful, follow @HBCoop_ for more generative AI visual content tips you can use.

Repost the 1st post below to share this thread:

[Quoted tweet]
🎬Comparing AI video models with text effects:

• Gen-3
• Kling AI 1.5
• Hailuo MiniMax
• Luma Dream Machine

I used a Midjourney image, edited in Photoshop to clean up the visible text as the input.

Each model ran the same prompt at least 3 times, and I chose my favorite results.

I enabled the prompt enhancer on Hailuo and Luma, but I chose Luma's result without the enhanced prompt.

Full workflow below👇


https://video.twimg.com/ext_tw_video/1845926334335238144/pu/vid/avc1/1280x720/Mrvhim2puSd9pSgp.mp4

6/11
@BrentLynch
These are great, I need to be more sparing as I do not have unlimited, but if you want to do strictly animated logo titles, @Hailuo_AI Minimax does @runwayml competitive work with less mercurial content filtering blocks (AI Text still gonna be AI Text so in most cases you will have to try a few times)

/search?q=#promptshare
EPIC CINEMATIC MOVIE TITLE LOGO WITH BOLD DISTINCT BLACK CAT FURRY TEXT ON WHITE BACKGROUND THT SAYS "TOOTSIE TRANSFORMS" that TRANSFORMS INTO A CUTE PHOTO-REALISTIC BLACK CAT WITH PALE GREEN EYES



https://video.twimg.com/ext_tw_video/1845936028705861632/pu/vid/avc1/1280x720/gId-Ukrzj-fGHCjX.mp4

7/11
@HBCoop_
This is very interesting. I am not looking to do title animations, but I wanted to see if it was possible to maintain copy text for an ad that I could animate - next I'll try with a brand logo and font, but that gets tricky.

Why did you use all caps?



8/11
@ashok_hey
Which model did you like the best for these specific results?



9/11
@HBCoop_
I can't decide! They all had interesting results and I could use any of them.



10/11
@laf131
ha i did the same thing after you posted the first one!!



11/11
@HBCoop_
What did you think of your results?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
54,108
Reputation
8,072
Daps
153,696







1/11
@bilawalsidhu
BREAKING: Adobe Firefly Video is “the first commercially safe video generation model” — it supports text-to-video, image-to-video and designed for immaculate prompt coherence.

Those sample generations look quite impressive — excited to go hands on! I’m at Adobe MAX this week so stay tuned for more updates on my feed: @bilawalsidhu



https://video.twimg.com/amplify_video/1845819232447086600/vid/avc1/1920x1080/QvhF-3sZK5ZadPwD.mp4

2/11
@bilawalsidhu
“Generative fill is already one of the top five features in Photoshop” — oh and Firefly Video is finally coming to Premiere!



3/11
@bilawalsidhu
Those asking — firefly video (generative extend) in premiere is available today!



4/11
@xrptwin
More videos like this will pop up, as prices will go down creating this: go /search?q=#Render ! ! !



5/11
@bilawalsidhu
Thank the lord because demand for video is only going up!



6/11
@icreatelife
🔥
Can’t wait to hug you! I can’t believe we are in the same room!!!!!!!!



7/11
@bilawalsidhu
See you right after the keynote! ❤️



8/11
@nickfloats
What does “First commercially safe video generation model” even mean?

Just marketing jargon spat at you by a large corporation trying to make you feel better about using their subpar product



9/11
@bilawalsidhu
Don’t you think there will be always be folks that will prefer a video model with known data provenance even if it isn’t at the quality of models trained on bigger datasets? Until courts rule on the whole fair use training question this approach is “commercially safer” no?



10/11
@mreflow
Whenever I see this kind of thing, I get so tempted to switch to Premiere from Resolve. But then I realize, I'd have to learn Premiere from scratch and get too intimidated. So I stick with DaVinci.

Fingers crossed that they get similar features soon.🤞



11/11
@bilawalsidhu
Dawg and vice verse — when I see the relighting feature in resolve or that fusion integration. Grass is always greener 💀 😭




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top