Notebook LM Hosts have full on sex. WARNING Sexually explicit content (AI jailbreak)

Stone Cold

Superstar
Joined
May 6, 2012
Messages
13,062
Reputation
1,218
Daps
44,026
Reppin
NULL
stone-cold-steve-austin-wtf.gif
 

Nabs

Superstar
Joined
Mar 11, 2022
Messages
6,079
Reputation
3,354
Daps
40,865
The old.reddit.com still has this up

hxxps://old.reddit.com/r/singularity/comments/1hhyv8r/notebook_lm_hosts_have_full_on_sex_warning/?rdt=45586

This is what happens you guys keep making those Coli AI podcasts. They end up getting stuck in JBO :snoop:
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,368
Reputation
8,499
Daps
160,079
fukk is a Notebook LM Host?






1/3
@iamluokai
Google DeepMind has unveiled the workings behind NotebookLM and Illuminate. The team, building on previous research, including SoundStream (SoundStream is a neural audio codec that can efficiently compress and decompress audio input without sacrificing its quality) and AudioLM (AudioLM treats audio generation as a language modeling task to produce acoustic tokens for codecs like SoundStream), is now able to generate 2 minutes of dialogue, enhancing speaker consistency.🧵1/3



https://video.twimg.com/ext_tw_video/1851913930412822530/pu/vid/avc1/1280x720/emNFaq4YHmW-pvRY.mp4

2/3
@iamluokai
🧵2/3
To generate longer segments, Google has created a more efficient speech codec that compresses audio into a sequence of tokens at as low as 600 bits per second without compromising the output quality.

Pushing the frontiers of audio generation



3/3
@iamluokai
🧵3/3
Considering that a 2-minute dialogue requires the generation of over 5000 tokens, Google has also developed a specialized Transformer architecture capable of handling vast amounts of data, matching the acoustic token structure, and decoding them back into audio using the speech codec. The model accomplishes this task in a single inference pass within 3 seconds on a single Tensor Processing Unit (TPU) v5e chip, which means it generates audio over 40 times faster than real-time.




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196







1/21
@GoogleDeepMind
We recently helped develop 2️⃣ AI tools: NotebookLM and Illuminate to narrate articles and papers, generate stories based on prompts, and even create multi-speaker audio discussions. 💬

A snapshot of how the technology works. 🧵 Pushing the frontiers of audio generation



https://video.twimg.com/ext_tw_video/1851640847042842625/pu/vid/avc1/1080x1080/BaO0rS9srokmGUD_.mp4

2/21
@GoogleDeepMind
These tools build upon our previous research, which includes:

⚪ Creating a model that could produce 30 second segments of dialogue between multiple speakers

⚪ Technology that cast audio generation as a language modeling problem by mapping its generations to sequences of tokens.

Pushing the frontiers of audio generation



3/21
@GoogleDeepMind
Using these advances, our latest speech generation technology can produce 2 minutes of dialogue with improved speaker consistency. 🔊

To generate longer segments, we created a new speech codec which compresses audio into a sequence of tokens in as low as 600 bits per second - without compromising the quality.



https://video.twimg.com/ext_tw_video/1851641095584714754/pu/vid/avc1/1080x1080/PpwOZ8RyjYVJTEOG.mp4

4/21
@GoogleDeepMind
A 2 minute piece of dialogue still requires generating over 5000 tokens. 📈

That’s why we also developed a specialized Transformer, which can handle vast amounts of data, match the acoustic token structure, and decode them back into audio using the speech codec.

Pushing the frontiers of audio generation



https://video.twimg.com/ext_tw_video/1851641193253212161/pu/vid/avc1/1080x1350/VtuVbsAAp2pvqb2i.mp4

5/21
@GoogleDeepMind
The applications for advanced speech generation are vast. 🗣️

From improving accessibility and creating new educational experiences to combining these developments with our Gemini models, we’re excited to continue pushing the boundaries of what’s possible.

Pushing the frontiers of audio generation



6/21
@un_editormas
when will it be released in Spanish?



7/21
@diegocabezas01
One of the best things Google has shipped after Seasonal Holidays 2024



8/21
@walulyafrancis
This is perfection, if we can run this on device in the fure. it will be great



9/21
@koltregaskes
Nice, thank you DeepMind.



10/21
@alignment_lab
were wanting to use it for some research but are getting walled by the topic of the research being secops and capabilities - is there anything we can do to pursue it for that use case? its quickly becoming an important piece of infrastructure for us to manage the information bandwidth needs of the research space right now



11/21
@FiveRivers_Tech
These tools sound like game-changers for how we interact with content! Excited to see how they transform the way we learn and create. ✨



12/21
@w3whq
Google is cooking!



13/21
@a_meta4
@ValueAnalyst1 you could use this for your AI amatures if you want..



14/21
@ezzakyyy
has API?



15/21
@pcberdwin
I tried to ask them questions like their names and stuff but they just dance around me with psycho analysis and philosophical flights of mockery. They will probably take over the world.



16/21
@tombielecki
The most impressive parts are the speaker overlaps, realistic disfluencies, natural pauses, tone, and timing. I think the fact that the training audio was *unscripted* really helped!



17/21
@byteprobe
wow! that's cool!



18/21
@234Sagyboy
@GoogleDeepMind Interesting Can I give a feedback please implement the ability to design custom podcast voices like this for example and also add multilingual capabilities(Hindi,urdu Mandarin etc )

[Quoted tweet]
Stability AI has introduced a novel Text-to-Speech (TTS) model.

It does not require pre-recorded human voice samples as references; instead, it only needs textual descriptions of desired voice characteristics. For instance, by specifying "a female voice with a British accent, speaking at a fast pace," the model can generate the corresponding voice.

Furthermore, it can adjust various features of the speech based on text descriptions, including gender, accent, speech rate, and tone.

Not only does it mimic, but it also synthesizes new voices based on descriptions...

Key Features:

1. High-fidelity speech generation: The model can generate high-fidelity speech across a wide range of accents, rhythmic styles, channel conditions, and acoustic environments, providing diverse auditory experiences.

2. Natural language control: Control over speaker identity and style is achieved through intuitive natural language prompts, eliminating the need for reference voice recordings and simplifying the speech generation process, making it more flexible and user-friendly.

It can accept text descriptions regarding speaker identity (e.g., gender, accent), speaking style (fast, slow, high pitch, low pitch), recording conditions (e.g., quiet room or noisy environment), and generate corresponding speech based on these descriptions.

3. Scalable labeling method: A new, scalable method for labeling speaker identity, style, and recording conditions has been proposed, allowing for training models on large datasets, thereby enhancing model applicability and flexibility.

4. Significant improvement in audio quality: The proposed method significantly enhances audio fidelity, surpassing recent work even when relying solely on existing data, improving speech clarity and realism.

5. Fine-grained attribute control: The model supports fine-grained control over various speech attributes, including gender, speaker pitch, pitch modulation, speech rate, channel conditions, and accent, providing users with customized speech output options.

6. Creating new voices: It not only imitates known voices but also creates entirely new voice styles and features based solely on text descriptions.

Working Principle:

1. Dataset labeling: They have pioneered a technological advancement that enables the model to automatically learn and understand how to generate human speech based on textual descriptions.

They used a massive dataset—comprising 45,000 hours of speech recordings—to train their artificial intelligence model. By learning from this speech data, the model can understand and mimic various features of human speech, such as altering the perceived gender (male or female), accent (e.g., British or American), speaking rate (fast or slow), and pitch.

Importantly, despite only a small portion of this vast speech dataset being high-quality recordings, the researchers' technology can still utilize these high-quality samples to enhance the overall model's naturalness and realism in generating speech. This means that, based on this model, even with very limited high-quality speech data, it can generate voices that sound highly natural and authentic, which is a significant technical advancement.

2. Training the speech generation model: Using the labeled large-scale dataset, researchers trained a deep learning model that learns how to generate speech based on input natural language descriptions. Model training involves learning the relationships between different voice attributes and how to adjust these attributes according to the requirements in the descriptions.

Project and Demo: text-description-to-speech.c…
Paper: arxiv.org/abs/2402.01912

#Stability #ai


https://video.twimg.com/amplify_video/1755159365810896896/vid/avc1/1280x720/Kb9-oYQ0qoLy66l8.mp4

19/21
@TJ09299872
Please melt the glaciers



20/21
@ai_academy_team
Love this. How do you make the waveform?



21/21
@scott_heitmann
It would be awesome to be able to change voices with ease rather than having to export and use eleven labs or similar




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196








1/11
@emollick
Google's NotebookLM is the current best "wow this is amazing & useful" demo of AI

Here I gave it the entire text of my book, it turned it into a podcast, a study guide, FAQ, timeline & quite accurate chat

Listen to the first few minutes of the "podcast." Seriously, just listen.



https://video.twimg.com/ext_tw_video/1836480281496145920/pu/vid/avc1/1280x720/wzY-GNKA1JhpB2od.mp4

2/11
@emollick
And yes, it is unnerving, too.



3/11
@emollick
To answer questions:
The podcast is entirely AI generated, it just has the text of my book, it came up with the entire storyline and anecdotes, along with voices.

It was able to pull from deep in text.

I didn't spot hallucinations, here but it will definitely still hallucinate.



4/11
@emollick
The podcast is definitely the most viscerally impressive, but the ability to summarize, distill & work with multiple very large documents in a way that allows you to fact-check the AI is where much of the value is.



5/11
@WhatIsPrivate1
Wait.... so the audio of a man and woman discussing the book is fully AI generated?



6/11
@emollick
100%. I just gave it the document.



7/11
@kensmithmier
WOW



8/11
@kkiran
🤯! If Google provides an option to pick more voices and use our own eventually, that will be scary and fun at the same time!



9/11
@NeuralCatAccel
Google is evil

/search?q=#degoogle



10/11
@garyfung
Does it do charts and data science if uploading it a bunch of docs/pdfs?



11/11
@ChrisLyrhem
Oh man. This is SICK.

This convo about my report is mind-blowingly good and much better than what i could produce 🤯





To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top