1/1
I am pleased to share that our work on SynthID text watermarking is published by @Nature today.
Read the Nature paper at: Scalable watermarking for identifying large language model outputs - Nature
Read more about the work at: SynthID: Tools for watermarking and detecting LLM-generated Text | Responsible Generative AI Toolkit | Google AI for Developers
[Quoted tweet]
Today, we’re open-sourcing our SynthID text watermarking tool through an updated Responsible Generative AI Toolkit.
Available freely to developers and businesses, it will help them identify their AI-generated content.
Find out more → goo.gle/40apGQh
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
I am pleased to share that our work on SynthID text watermarking is published by @Nature today.
Read the Nature paper at: Scalable watermarking for identifying large language model outputs - Nature
Read more about the work at: SynthID: Tools for watermarking and detecting LLM-generated Text | Responsible Generative AI Toolkit | Google AI for Developers
[Quoted tweet]
Today, we’re open-sourcing our SynthID text watermarking tool through an updated Responsible Generative AI Toolkit.
Available freely to developers and businesses, it will help them identify their AI-generated content.
Find out more → goo.gle/40apGQh
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
1/20
@GoogleDeepMind
Today, we’re open-sourcing our SynthID text watermarking tool through an updated Responsible Generative AI Toolkit.
Available freely to developers and businesses, it will help them identify their AI-generated content.
Find out more → SynthID
https://video-ft.twimg.com/ext_tw_v...376/pu/vid/avc1/1280x720/G5K0TaljbmDqO-lP.mp4
2/20
@GoogleDeepMind
Here’s how SynthID watermarks AI-generated content across modalities. ↓
https://video-ft.twimg.com/ext_tw_video/1792521399359180800/pu/vid/avc1/720x720/fT7NUZR4FiMQ2iwO.mp4
3/20
@GoogleDeepMind
By open-sourcing the code, more people will be able to use the tool to watermark and determine whether text outputs have come from their own LLMs - making it easier to build AI responsibly.
We explain more about this tech in @Nature. ↓ Scalable watermarking for identifying large language model outputs - Nature
4/20
@AidfulAI
Detecting AI-written text is tough without watermarks.
Open-sourcing SynthID-Text enables others to embed watermarks in their model outputs.
This means there will be two types of models:
Models which watermark their outputs and the ones that won't.
5/20
@mkieffer1107
awesome!!! was just looking into this yesterday hoping it was open source
6/20
@dom_beaini
1. Can we break down the image generation by down-sampling and up-sampling?
2. Invisible to the human eye, but if we plug them back into another gen-AI, would it remove the watermark? For example adding noise to the image, then feeding it back into another watermark-free diffusion model? Asking another LLM to make random modification to a given text?
3. Without regulatory enforcement of these watermarks, I suspect most models won't have them.
7/20
@DesFrontierTech
How does SynthID text’s generative watermarking handle variability across different content domains, and what measures are taken to ensure the watermark’s detectability remains consistent when faced with novel or out-of-distribution input contexts?
8/20
@cloudseedingtec
ok i have a random question tthat no one has answered.. did yall put that (i call it the poison pill) into youtube videos.. cuz like well not to self incriminate but it seems like yall did something<3
9/20
@entergnomer
Would a different sampler bypass this?
10/20
@BensenHsu
The study focuses on developing a method called SynthID-Text to watermark text generated by large language models (LLMs). Watermarking can help identify synthetic text and limit accidental or deliberate misuse of LLMs.
The researchers evaluate SynthID-Text across multiple LLMs and find that it provides improved detectability over comparable methods, while maintaining standard benchmarks and human side-by-side ratings that indicate no change in LLM capabilities. They also conduct a live experiment with the Gemini production system, which shows that the difference in response quality and utility, as judged by humans, is negligible between watermarked and unwatermarked responses.
full paper: Scalable watermarking for identifying large language model outputs
11/20
@shawnchauhan1
Awesome! Really appreciate it.
12/20
@HungamaHeadline
Google's open-sourcing of SynthID is a major step forward in ensuring accountability and trust in AI-generated content. By providing a reliable way to identify AI-generated media, SynthID empowers users to make informed decisions. This is a crucial development as AI continues to shape our world.
13/20
@thegenioo
Irrelevant somehow to the OP
But this simple animation also shows that how LLMs basically work using Probability to output words, like predicting the next word. Its not the entire process but a very simple illustration for someone who has no clue how AI works.
14/20
@MinhQua52508258
Alphastarter
15/20
@benrayfield
very suspicious to announce opensourcing something without saying what license or where to download it
16/20
@benrayfield
"Where is SynthID available? This technology is available to Vertex AI customers using our text-to-image models, Imagen 3 and Imagen 2, which create high-quality images in a wide variety of artistic styles". Prove its opensource. Wheres one of those guys one could fork from?
17/20
@benrayfield
Why dont you call it a steganography tool? Isnt watermarking a kind of steganography if you do it well enuf? You're hiding any arbitrary data by rewriting words to have a similar meaning, and paying for that in extra length to store the data.
18/20
@234Sagyboy
@GoogleDeepMind @Google Awesome now that we have verification in place meaning better identification of content generated by AI Is it possible that we can please have Google Soundstorm and AudioLm released Thanks
19/20
@explorewithmom
Google DeepMind's SynthID is a game-changer for identifying AI-generated content. I've been exploring AI watermarking for my own work and I'm excited to see SynthID open-sourced and freely available to developers and businesses.
20/20
@AdalaceV2
Oh ok so you're actively polluting the output of the software I am paying for. Sounds like I won't be paying for it anymore.
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
@GoogleDeepMind
Today, we’re open-sourcing our SynthID text watermarking tool through an updated Responsible Generative AI Toolkit.
Available freely to developers and businesses, it will help them identify their AI-generated content.
Find out more → SynthID
https://video-ft.twimg.com/ext_tw_v...376/pu/vid/avc1/1280x720/G5K0TaljbmDqO-lP.mp4
2/20
@GoogleDeepMind
Here’s how SynthID watermarks AI-generated content across modalities. ↓
https://video-ft.twimg.com/ext_tw_video/1792521399359180800/pu/vid/avc1/720x720/fT7NUZR4FiMQ2iwO.mp4
3/20
@GoogleDeepMind
By open-sourcing the code, more people will be able to use the tool to watermark and determine whether text outputs have come from their own LLMs - making it easier to build AI responsibly.
We explain more about this tech in @Nature. ↓ Scalable watermarking for identifying large language model outputs - Nature
4/20
@AidfulAI
Detecting AI-written text is tough without watermarks.
Open-sourcing SynthID-Text enables others to embed watermarks in their model outputs.
This means there will be two types of models:
Models which watermark their outputs and the ones that won't.
5/20
@mkieffer1107
awesome!!! was just looking into this yesterday hoping it was open source
6/20
@dom_beaini
1. Can we break down the image generation by down-sampling and up-sampling?
2. Invisible to the human eye, but if we plug them back into another gen-AI, would it remove the watermark? For example adding noise to the image, then feeding it back into another watermark-free diffusion model? Asking another LLM to make random modification to a given text?
3. Without regulatory enforcement of these watermarks, I suspect most models won't have them.
7/20
@DesFrontierTech
How does SynthID text’s generative watermarking handle variability across different content domains, and what measures are taken to ensure the watermark’s detectability remains consistent when faced with novel or out-of-distribution input contexts?
8/20
@cloudseedingtec
ok i have a random question tthat no one has answered.. did yall put that (i call it the poison pill) into youtube videos.. cuz like well not to self incriminate but it seems like yall did something<3
9/20
@entergnomer
Would a different sampler bypass this?
10/20
@BensenHsu
The study focuses on developing a method called SynthID-Text to watermark text generated by large language models (LLMs). Watermarking can help identify synthetic text and limit accidental or deliberate misuse of LLMs.
The researchers evaluate SynthID-Text across multiple LLMs and find that it provides improved detectability over comparable methods, while maintaining standard benchmarks and human side-by-side ratings that indicate no change in LLM capabilities. They also conduct a live experiment with the Gemini production system, which shows that the difference in response quality and utility, as judged by humans, is negligible between watermarked and unwatermarked responses.
full paper: Scalable watermarking for identifying large language model outputs
11/20
@shawnchauhan1
Awesome! Really appreciate it.
12/20
@HungamaHeadline
Google's open-sourcing of SynthID is a major step forward in ensuring accountability and trust in AI-generated content. By providing a reliable way to identify AI-generated media, SynthID empowers users to make informed decisions. This is a crucial development as AI continues to shape our world.
13/20
@thegenioo
Irrelevant somehow to the OP
But this simple animation also shows that how LLMs basically work using Probability to output words, like predicting the next word. Its not the entire process but a very simple illustration for someone who has no clue how AI works.
14/20
@MinhQua52508258
Alphastarter
15/20
@benrayfield
very suspicious to announce opensourcing something without saying what license or where to download it
16/20
@benrayfield
"Where is SynthID available? This technology is available to Vertex AI customers using our text-to-image models, Imagen 3 and Imagen 2, which create high-quality images in a wide variety of artistic styles". Prove its opensource. Wheres one of those guys one could fork from?
17/20
@benrayfield
Why dont you call it a steganography tool? Isnt watermarking a kind of steganography if you do it well enuf? You're hiding any arbitrary data by rewriting words to have a similar meaning, and paying for that in extra length to store the data.
18/20
@234Sagyboy
@GoogleDeepMind @Google Awesome now that we have verification in place meaning better identification of content generated by AI Is it possible that we can please have Google Soundstorm and AudioLm released Thanks
19/20
@explorewithmom
Google DeepMind's SynthID is a game-changer for identifying AI-generated content. I've been exploring AI watermarking for my own work and I'm excited to see SynthID open-sourced and freely available to developers and businesses.
20/20
@AdalaceV2
Oh ok so you're actively polluting the output of the software I am paying for. Sounds like I won't be paying for it anymore.
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
1/4
@MushtaqBilalPhD
Google has open-sourced a watermarking tool, SynthID, to identify AI-generated content.
Teachers can relax now because soon students won't be able to use AI to cheat on their assignments.
https://video-ft.twimg.com/ext_tw_v...305/pu/vid/avc1/1352x720/i6YazQbRYIH6iBnX.mp4
2/4
@MushtaqBilalPhD
Here's the full paper by Google DeepMind:
Scalable watermarking for identifying large language model outputs - Nature
3/4
@healthheronav
I've developed my own ways to detect AI-generated content, but I'm skeptical about tools like SynthID. What's to stop AI from evolving to evade watermarks?
4/4
@fcordobaot
It only works if the content was generated by Gemini after they created the watermark. So unless all the big ones use the standard watermark, it would be complicated to really achieve it!
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
@MushtaqBilalPhD
Google has open-sourced a watermarking tool, SynthID, to identify AI-generated content.
Teachers can relax now because soon students won't be able to use AI to cheat on their assignments.
https://video-ft.twimg.com/ext_tw_v...305/pu/vid/avc1/1352x720/i6YazQbRYIH6iBnX.mp4
2/4
@MushtaqBilalPhD
Here's the full paper by Google DeepMind:
Scalable watermarking for identifying large language model outputs - Nature
3/4
@healthheronav
I've developed my own ways to detect AI-generated content, but I'm skeptical about tools like SynthID. What's to stop AI from evolving to evade watermarks?
4/4
@fcordobaot
It only works if the content was generated by Gemini after they created the watermark. So unless all the big ones use the standard watermark, it would be complicated to really achieve it!
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
1/3
@kanpuriyanawab
Google Deepmind open-sourced SynthID today.
Here are 3 things you need to know:
What is SynthID??
SynthID has been developed for watermarking and identifying AI-generated content. This includes text, images, audio, and video.
Significance:
> This tool comes when distinguishing between AI and human-created content is becoming increasingly important due to misinformation, plagiarism, and copyright violations.
How it works?
> For text, SynthID modifies the probability scores of tokens during the generation process so that these modifications act as a watermark.
> This watermark can then be detected through a specific scoring system that assesses the likelihood that the text was generated by a watermarked large language model (LLM).
In my opinion,
The move to open-source SynthID allows anyone to implement this technology in their own AI models to watermark and later identify AI-generated text.
Moreover, this can be seen as a step towards fostering responsible AI development by allowing widespread implementation of watermarking technology.
2/3
@Yaaaaaashhh
SynthID is really cool!!!!
3/3
@kanpuriyanawab
and necessary
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
@kanpuriyanawab
Google Deepmind open-sourced SynthID today.
Here are 3 things you need to know:
What is SynthID??
SynthID has been developed for watermarking and identifying AI-generated content. This includes text, images, audio, and video.
Significance:
> This tool comes when distinguishing between AI and human-created content is becoming increasingly important due to misinformation, plagiarism, and copyright violations.
How it works?
> For text, SynthID modifies the probability scores of tokens during the generation process so that these modifications act as a watermark.
> This watermark can then be detected through a specific scoring system that assesses the likelihood that the text was generated by a watermarked large language model (LLM).
In my opinion,
The move to open-source SynthID allows anyone to implement this technology in their own AI models to watermark and later identify AI-generated text.
Moreover, this can be seen as a step towards fostering responsible AI development by allowing widespread implementation of watermarking technology.
2/3
@Yaaaaaashhh
SynthID is really cool!!!!
3/3
@kanpuriyanawab
and necessary
To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196