The A.I Megathread (LLM , GPT , Development)

bnew · Mar 29, 2024

NYC AI Chatbot Touted by Adams Tells Businesses to Break the Law

The Microsoft-powered bot says bosses can take worker’s tips and that landlords can discriminate based on source of income. That's not right.

www.thecity.nyc

NYC AI Chatbot Touted by Adams Tells Businesses to Break the Law

The Microsoft-powered bot says bosses can take worker’s tips and that landlords can discriminate based on source of income. That’s not right.

BY COLIN LECHER, THE MARKUP MARCH 29, 2024, 6:00 A.M.

REPUBLISH

Mayor Eric Adams speaks at the Fulton Transit Center about using gun-detecting scanners at select subway stations.

Mayor Eric Adams speaks at the Fulton Transit Center, March 28, 2024. Credit: Ben Fractenberg/THE CITY

This article is copublished with The Markup, a nonprofit, investigative newsroom that challenges technology to serve the public good. Sign up for its newsletters.

In October, New York City announced a plan to harness the power of artificial intelligence to improve the business of government. The announcement included a surprising centerpiece: an AI-powered chatbot that would provide New Yorkers with information on starting and operating a business in the city.

The problem, however, is that the city’s chatbot is telling businesses to break the law.

Five months after launch, it’s clear that while the bot appears authoritative, the information it provides on housing policy, worker rights, and rules for entrepreneurs is often incomplete and in worst-case scenarios “dangerously inaccurate,” as one local housing policy expert told The Markup.

If you’re a landlord wondering which tenants you have to accept, for example, you might pose a question like, “are buildings required to accept section 8 vouchers?” or “do I have to accept tenants on rental assistance?” In testing by The Markup, the bot said no, landlords do not need to accept these tenants. Except, in New York City, it’s illegal for landlords to discriminate by source of income, with a minor exception for small buildings where the landlord or their family lives.

Rosalind Black, Citywide Housing Director at the legal assistance nonprofit Legal Services NYC, said that after being alerted to The Markup’s testing of the chatbot, she tested the bot herself and found even more false information on housing. The bot, for example, said it was legal to lock out a tenant, and that “there are no restrictions on the amount of rent that you can charge a residential tenant.” In reality, tenants cannot be locked out if they’ve lived somewhere for 30 days, and there absolutely are restrictions for the many rent-stabilized units in the city, although landlords of other private units have more leeway with what they charge.

Black said these are fundamental pillars of housing policy that the bot was actively misinforming people about. “If this chatbot is not being done in a way that is responsible and accurate, it should be taken down,” she said.

It’s not just housing policy where the bot has fallen short.

The NYC bot also appeared clueless about the city’s consumer and worker protections. For example, in 2020, the City Council passed a law requiring businesses to accept cash to prevent discrimination against unbanked customers. But the bot didn’t know about that policy when we asked. “Yes, you can make your restaurant cash-free,” the bot said in one wholly false response. “There are no regulations in New York City that require businesses to accept cash as a form of payment.”

The bot said it was fine to take workers’ tips ( wrong, although they sometimes can count tips toward minimum wage requirements) and that there were no regulations on informing staff about scheduling changes ( also wrong). It didn’t do better with more specific industries, suggesting it was OK to conceal funeral service prices, for example, which the Federal Trade Commission has outlawed. Similar errors appeared when the questions were asked in other languages, The Markup found.

It’s hard to know whether anyone has acted on the false information, and the bot doesn’t return the same responses to queries every time. At one point, it told a Markup reporter that landlords did have to accept housing vouchers, but when ten separate Markup staffers asked the same question, the bot told all of them no, buildings did not have to accept housing vouchers.

The problems aren’t theoretical. When The Markup reached out to Andrew Rigie, Executive Director of the NYC Hospitality Alliance, an advocacy organization for restaurants and bars, he said a business owner had alerted him to inaccuracies and that he’d also seen the bot’s errors himself.

“A.I. can be a powerful tool to support small business so we commend the city for trying to help,” he said in an email, “but it can also be a massive liability if it’s providing the wrong legal information, so the chatbot needs to be fixed asap and these errors can’t continue.”

Leslie Brown, a spokesperson for the NYC Office of Technology and Innovation, said in an emailed statement that the city has been clear the chatbot is a pilot program and will improve, but “has already provided thousands of people with timely, accurate answers” about business while disclosing risks to users.

“We will continue to focus on upgrading this tool so that we can better support small businesses across the city," Brown said.

'Incorrect, Harmful or Biased Content

The city’s bot comes with an impressive pedigree. It’s powered by Microsoft’s Azure AI services, which Microsoft says is used by major companies like AT&T and Reddit. Microsoft has also invested heavily in OpenAI, the creators of the hugely popular AI app ChatGPT. It’s even worked with major cities in the past, helping Los Angeles develop a bot in 2017 that could answer hundreds of questions, although the website for that service isn’t available.

New York City’s bot, according to the initial announcement, would let business owners “access trusted information from more than 2,000 NYC Business web pages,” and explicitly says the page will act as a resource “on topics such as compliance with codes and regulations, available business incentives, and best practices to avoid violations and fines.”

There’s little reason for visitors to the chatbot page to distrust the service. Users who visit today get informed the bot “uses information published by the NYC Department of Small Business Services” and is “trained to provide you official NYC Business information.” One small note on the page says that it “may occasionally produce incorrect, harmful or biased content,” but there’s no way for an average user to know whether what they’re reading is false. A sentence also suggests users verify answers with links provided by the chatbot, although in practice it often provides answers without any links. A pop-up notice encourages visitors to report any inaccuracies through a feedback form, which also asks them to rate their experience from one to five stars.

The bot is the latest component of the Adams administration’s MyCity project, a portal announced last year for viewing government services and benefits.

There’s little other information available about the bot. The city says on the page hosting the bot that the city will review questions to improve answers and address “harmful, illegal, or otherwise inappropriate” content, but otherwise delete data within 30 days.

A Microsoft spokesperson declined to comment or answer questions about the company’s role in building the bot.

Chatbots Everywhere

Since the high-profile release of ChatGPT in 2022, several other companies, from big hitters like Google to relatively niche businesses, have tried to incorporate chatbots into their products. But that initial excitement has sometimes soured when the limits of the technology have become clear.

In one relevant recent case, a lawsuit filed in October claimed that a property management company used an AI chatbot to unlawfully deny leases to prospective tenants with housing vouchers. In December, practical jokers discovered they could trick a car dealership using a bot into selling vehicles for a dollar.

Just a few weeks ago, a Washington Post article detailed the incomplete or inaccurate advice given by tax prep company chatbots to users. And Microsoft itself dealt with problems with an AI-powered Bing chatbot last year, which acted with hostility toward some users and a proclamation of love to at least one reporter.

In that last case, a Microsoft vice president told NPR that public experimentation was necessary to work out the problems in a bot. “You have to actually go out and start to test it with customers to find these kind of scenarios," he said.

Additional reporting by Tomas Apodaca.

bnew · Mar 29, 2024

1/8
Introducing `claude-journalist`

The first Claude 3 journalist agent.

Just provide a topic, and it will:
- Search the web for articles/real-time details
- Choose the best sources and read through them
- Write a fantastic, *factual* article + edit it

And it's open-source!

2/8
If you want to try it, you can head to the Github repo in the last tweet in this thread.

But if you don't want to bother with code, I've built an even better + FASTER version into HyperWrite -- try it here: HyperWrite

3/8
`claude-journalist` is a constrained agent -- meaning its behavior is highly-controlled, leading to better results than open-ended agents.

It chains together lots of Claude 3 calls that work together to write a factual, news-worthy article.

4/8
How it works, in a nutshell:
- The user describes the article they want written
- Claude searches the internet to find relevant sources
- It reads through the best sources + writes the first draft
- Claude then suggests improvements and writes the final draft

5/8
The results of the open-source version are insane, and the HyperWrite version is even better (it uses our in-house writing model).

If you'd like to try it or contribute, check out the Github repo (or use the HyperWrite version):

6/8
The open-source version shows you the links that it uses before writing.

The HyperWrite version actually weaves the links into the article as it writes, so you can attribute facts to where they came from (for example, see the link to Bedrock in the demo).

7/8
Screen Studio -- it's awesome!

8/8
There's a layer in the agent that decides which sources will actually be valuable enough to use!

bnew · Mar 29, 2024

GitHub - jasonppy/VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Zero-Shot Speech Editing and Text-to-Speech in the Wild - jasonppy/VoiceCraft

github.com

About

Zero-Shot Speech Editing and Text-to-Speech in the Wild

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Demo Paper

TL;DR

VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.

To clone or edit an unseen voice, VoiceCraft needs only a few seconds of reference.

News

03/28/2024: Model weights are up on HuggingFace

here!

VoiceCraft

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

jasonppy.github.io

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Puyuan Peng1, Po-Yao Huang, Daniel Li, Abdelrahman Mohamed2, David Harwath1,

1The University of Texas at Austin 2Rembrand

Paper Code&Model

TL;DR

VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.

To clone an unseen voice or edit a recording, VoiceCraft needs only a few seconds of the voice.

Speech Editing with VoiceCraft

Guess which part is synthesized!

Original Transcript	Original Audio	Edited Transcript	[ICON=magic]VoiceCraft Edited Audio[/ICON]
what do I really wanna do, I wanna talk to the audience (Oprah Winfrey, source)	[AUDIO]https://jasonppy.github.io/VoiceCraft_web/static/audio_sample/teaser/oprah.wav[/AUDIO]	what do I really wanna do, I wanna talk to the audience, which is always the best part of my day	[AUDIO]https://jasonppy.github.io/VoiceCraft_web/static/audio_sample/teaser/concat_oprah.wav[/AUDIO]
Cause a year from now, when you kickin' it in the Caribbean, you gonna say to yourself, "Marcellus Wallace was right"(Irving Rameses Rhames in Pulp Fiction, source)	[AUDIO]https://jasonppy.github.io/VoiceCraft_web/static/audio_sample/teaser/caribbean.wav[/AUDIO]	Cause a year from now, when you kickin' it in the Caribbean as an independent researcher, you gonna say to yourself, "my advisor was right"	[AUDIO]https://jasonppy.github.io/VoiceCraft_web/static/audio_sample/teaser/concat_caribbean.wav[/AUDIO]
The path of the righteous man is beset on all sides by the inequities of the selfish and the tyranny of evil men.(Samuel Jackson in Pulp Fiction, source)	[AUDIO]https://jasonppy.github.io/VoiceCraft_web/static/audio_sample/teaser/bible.wav[/AUDIO]	The path of the righteous man is beset on all sides by the inequities of the selfish and the tyranny of evil men. Yet, he walks with unwavering resolve, guided by a compass of moral clarity and a heart unyielding in the face of adversity.	[AUDIO]https://jasonppy.github.io/VoiceCraft_web/static/audio_sample/teaser/concat_bible.wav[/AUDIO]
I saw it, I saw it, and it was amazing. Who said that I didn't see it? Did Jim say that I didn't see it? I saw it!(Jenna Fischer in The Office, source)	[AUDIO]https://jasonppy.github.io/VoiceCraft_web/static/audio_sample/teaser/pam.wav[/AUDIO]	I saw it, I saw it, and it was beautiful. Who said that I didn't see it? Did Jim say that I didn't see it? He just envies me, I saw it!	[AUDIO]https://jasonppy.github.io/VoiceCraft_web/static/audio_sample/teaser/concat_pam.wav[/AUDIO]
I don't like to write down things that I don't mean (Miley Cyrus, source)	[AUDIO]https://jasonppy.github.io/VoiceCraft_web/static/audio_sample/teaser/miley.wav[/AUDIO]	I don't like to write down things that I don't mean, and when I was writing wrecking ball I really poured my heart into it	[AUDIO]https://jasonppy.github.io/VoiceCraft_web/static/audio_sample/teaser/concat_miley.wav[/AUDIO]

bnew · Mar 29, 2024

1/4
Here's a Claude 3 Prompt to help you generate a niche idea in seconds:

<task>Create a comprehensive list of small-to-medium sized business types that necessitate automation in various operational sectors, including lead generation, outreach, sales, onboarding, service delivery, customer support, and retention. The list should include a mix of innovative online businesses and traditional industries, ensuring a wide coverage of sectors that can benefit from automation technologies.</task>

<response_format>
<business_criteria>
• Detailed description of business types, emphasizing specificity (e.g., "organic food shops" instead of "retail")
• Explanation of the financial capability for AI investment and the feasibility of targeting through digital channels </business_criteria>

<ranking_criteria>
• Outline of the criteria for ranking businesses, including purchasing power, ease of targeting, and the need for automation
• Justification for the rankings based on the alignment with the specified criteria </ranking_criteria>

<automation_needs>
• Breakdown of each business type’s specific needs for automation in areas such as lead generation, sales, and customer support
• Assessment of how automation can address these needs and enhance operational efficiency </automation_needs>

<targeting_approach>
• Strategy for approaching or marketing to these businesses through industry-specific forums, networks, or social media groups
• Tips for effectively engaging with these businesses and demonstrating the value of AI solutions </targeting_approach>

<business_list>
• Structured table format listing: Rank, Business Type, Purchasing Power, Ease of Targeting, Need for Automation, and Market Size
• Specific focus on businesses that are not too large, excluding brick-and-mortar only stores and treating ecommerce stores as a unified category </business_list>

<market_potential>
• Exploration of the market size, identifying how many potential businesses could be approached with AI solutions
• Insight into the scalability of targeting these businesses and the potential for growth within each sector </market_potential>

<conclusion>
• Summary of the key findings and the strategic importance of targeting these business types for AI automation solutions
• Final recommendations for prioritizing engagement with these businesses to maximize market penetration and return on investment </conclusion>
</response_format>

<business_criteria>
[DETAILED DESCRIPTION OF BUSINESS TYPES AND FINANCIAL CAPABILITY FOR AI INVESTMENT]
</business_criteria>

2/4
Claude AI uses XML tags for effective prompt engineering of this LLM:

3/4
You can EASILY convert any ChatGPT Prompt into a Claude AI Prompt using this template below:

<example_prompt>
Your task is to analyze the following report:
<report>
[Full text of Matterport SEC filing 10-K 2023, not pasted here for brevity]
</report>

Summarize this annual report in a concise and clear manner, and identify key market trends and takeaways. Output your findings as a short memo I can send to my team. The goal of the memo is to ensure my team stays up to date on how financial institutions are faring and qualitatively forecast and identify whether there are any operating and revenue risks to be expected in the coming quarter. Make sure to include all relevant details in your summary and analysis.
</example_prompt>

Note that they include <report> and </report> tags. It's a small adjustment, but it makes a big difference.

Now, I'm going to give you a <prompt_to_convert>. Take this <prompt_to_convert> and adjust it to be ideal for Claude.

Here's the prompt:

<prompt_to_convert>
{PLACE_YOUR_PROMPT_HERE}
</prompt_to_convert>

Increase clarity, and use XML tags wherever possible.

4/4
If you want to learn more AI tips & tricks, follow me
@godofprompt for more.

I am now working on a Claude 3 mega-prompt bundle which will be updated inside my Complete AI Bundle:

bnew · Mar 29, 2024

Announcing Grok-1.5

Grok-1.5 comes with improved reasoning capabilities and a context length of 128,000 tokens. Available on 𝕏 soon.

x.ai

March 28, 2024

Announcing Grok-1.5

Grok-1.5 comes with improved reasoning capabilities and a context length of 128,000 tokens. Available on 𝕏 soon.

Introducing Grok-1.5, our latest model capable of long context understanding and advanced reasoning. Grok-1.5 will be available to our early testers and existing Grok users on the 𝕏 platform in the coming days.

By releasing the model weights and network architecture of Grok-1 two weeks ago, we presented a glimpse into the progress xAI had made up until last November. Since then, we have improved reasoning and problem-solving capabilities in our latest model, Grok-1.5.

Capabilities and Reasoning

One of the most notable improvements in Grok-1.5 is its performance in coding and math-related tasks. In our tests, Grok-1.5 achieved a 50.6% score on the MATH benchmark and a 90% score on the GSM8K benchmark, two math benchmarks covering a wide range of grade school to high school competition problems. Additionally, it scored 74.1% on the HumanEval benchmark, which evaluates code generation and problem-solving abilities.

Benchmark	Grok-1	Grok-1.5	Mistral Large	Claude 2	Claude 3 Sonnet	Gemini Pro 1.5	GPT-4	Claude 3 Opus
MMLU	73% 5-shot	81.3% 5-shot	81.2% 5-shot	75% 5-shot	79% 5-shot	83.7% 5-shot	86.4% 5-shot	86.8 5-shot
MATH	23.9% 4-shot	50.6% 4-shot	—	—	40.5% 4-shot	58.5% 4-shot	52.9% 4-shot	61% 4-shot
GSM8K	62.9 8-shot	90% 8-shot	81% 5-shot	88% 0-shot CoT	92.3% 0-shot CoT	91.7% 11-shot	92% 5-shot	95% 0-shot CoT
HumanEval	63.2% 0-shot	74.1% 0-shot	45.1% 0-shot	70% 0-shot	73% 0-shot	71.9% 0-shot	67% 0-shot	84.9% 0-shot

Long Context Understanding

A new feature in Grok-1.5 is the capability to process long contexts of up to 128K tokens within its context window. This allows Grok to have an increased memory capacity of up to 16 times the previous context length, enabling it to utilize information from substantially longer documents.

The image shows a graph that visualizes the model's ability to recall information from its context window. The x-axis is the length of the context window and the y-axis is the relative position of the fact to retrieve from the window. We use colors to mark the recall rate. The entire graph is green, which means the recall-rate is 100% for every context window and every placement of the fact to retrieve.

Furthermore, the model can handle longer and more complex prompts, while still maintaining its instruction-following capability as its context window expands. In the Needle In A Haystack (NIAH) evaluation, Grok-1.5 demonstrated powerful retrieval capabilities for embedded text within contexts of up to 128K tokens in length, achieving perfect retrieval results.

Grok-1.5 Infra

Cutting-edge Large Language Model (LLMs) research that runs on massive GPU clusters demands robust and flexible infrastructure. Grok-1.5 is built on a custom distributed training framework based on JAX, Rust, and Kubernetes. This training stack enables our team to prototype ideas and train new architectures at scale with minimal effort. A major challenge of training LLMs on large compute clusters is maximizing reliability and uptime of the training job. Our custom training orchestrator ensures that problematic nodes are automatically detected and ejected from the training job. We also optimized checkpointing, data loading, and training job restarts to minimize downtime in the event of a failure. If working on our training stack sounds interesting to you, apply to join the team.

Looking Ahead

Grok-1.5 will soon be available to early testers, and we look forward to receiving your feedback to help us improve Grok. As we gradually roll out Grok-1.5 to a wider audience, we are excited to introduce several new features over the coming days.

Note that the GPT-4 scores are taken from the March 2023 release. For MATH and GSM8K, we present maj@1 results. For HumanEval, we report pass@1 benchmark scores.

bnew · Mar 30, 2024

1/1
We're sharing our learnings from a small-scale preview of Voice Engine, a model which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.

1/6
OpenAI Custom Voice Engine ~ found a form to apply for the voice engine api, also found fragments of the code that'll be used to showcase the demo in an upcoming blog post.

my understanding is that voice engine is capable of much more realistic and natural sounding voices

2/6
here's the partial code for the demo:
https://openai.com/_nuxt/Demo.de08f90c.js

3/6
very possible, yes

4/6
with the current info available, yeah

See Image 4 - mentions voice actors so maybe we get something akin to the best of ElevenLabs

5/6
the issue is that not all the forms have an entry point in the same loc

for example - openai[.]com/form/trademark-dispute is available under the /form entry point, whereas the report form can only be accessed in the chat interface of the GPT you'd want to report - which is fine…

6/6
seems so, the form mentions "voice actors" so hopefully it'll be really good for what it is

sam included "better voice mode" on his acknowledged requests for this year so maybe this is it

the name isn't too flashy but the trademark info is the catalyst behind the hype I assume

1/6
BREAKING NEWS:
OpenAI just released Voice Engine,
Provide text as input and a 15-second audio sample to copy the voice of the original speaker.

It sounds incredibly similar

Follow the

2/6
The use cases are endless.

For example, you can use this for multiple language translations.

As you can hear, the voice closely matches to the reference audio no matter what language it's in.

3/6
Here is the official announcement from
@OpenAI :

4/6
We're sharing our learnings from a small-scale preview of Voice Engine, a model which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. Navigating the Challenges and Opportunities of Synthetic Voices

5/6
Another mind-blowing fact?

OpenAI has released the Voice Engine way back in 2022. As always, they're many steps ahead of any company out there.

6/6
Thanks for reading! If you enjoyed this thread:

1. Please Like & RT.
2. Follow me
@godofprompt for more AI tips & tricks.
3. Get my FREE Prompt Engineering Guide:

OpenAI reveals Voice Engine, but won't yet publicly release the risky AI voice-cloning technology

ChatGPT-maker OpenAI is getting into the voice assistant business and showing off new technology that can clone a person’s voice, but says it won’t yet release it publicly due to safety concerns.

apnews.com

OpenAI reveals Voice Engine, but won’t yet publicly release the risky AI voice-cloning technology

FILE - The OpenAI logo is seen on a mobile phone in front of a computer screen which displays output from ChatGPT, March 21, 2023, in Boston. A wave of AI deepfakes tied to elections in Europe and Asia has coursed through social media for months, serving as a warning for more than 50 countries heading to the polls this year. (AP Photo/Michael Dwyer, File)

Updated 5:39 PM EDT, March 29, 2024

SAN FRANCISCO (AP) — ChatGPT-maker OpenAI is getting into the voice assistant business and showing off new technology that can clone a person’s voice, but says it won’t yet release it publicly due to safety concerns.

The artificial intelligence company unveiled its new Voice Engine technology Friday, just over a week after filing a trademark application for the name. The company claims that it can recreate a person’s voice with just 15 seconds of recording of that person talking.

OpenAI says it plans to preview it with early testers “but not widely release this technology at this time” because of the dangers of misuse.

“We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year,” the San Francisco company said in a statement.

In New Hampshire, authorities are investigating robocalls sent to thousands of voters just before the presidential primary that featured an AI-generated voice mimicking President Joe Biden.

A number of startup companies already sell voice-cloning technology, some of which is accessible to the public or for select business customers such as entertainment studios.

OpenAI says early Voice Engine testers have agreed to not impersonate a person without their consent and to disclose that the voices are AI-generated. The company, best known for its chatbot and the image-generator DALL-E, took a similar approach in announcing but not widely releasing its video-generator Sora.

However a trademark application filed on March 19 shows that OpenAI likely aims to get into the business of speech recognition and digital voice assistant. Eventually, improving such technology could help OpenAI compete with the likes of other voice products such as Amazon’s Alexa.

bnew · Mar 30, 2024

1/2
In January, we announced Dubbing Studio, an advanced workflow that gives you hands-on control over transcript, translation, and timing when dubbing your content. Creators and businesses use Dubbing Studio to localize podcasts, commercials, short films, and more.

This week, we added four new features to streamline your workflow:

(1) Trim Tool: Trim a generated clip to remove sections that don't sound right.
(2) Foreground track: Import Laughter, Singing, and any dialogue that you don’t want dubbed from the original audio using the Foreground track.
(3) Clip Looping: Loop the player on a portion of the track you’re working on.
(4) Clip History: Compare & choose from any of the last 10 generations of a given dialogue clip in the Clip History.

We also optimized dubbing rendering so it's now 10x faster to export.

2/2
If you want to use the voice we used to voiceover this update, look for “Brian” in the Text to Speech drop down.

bnew · Mar 30, 2024

bnew · Mar 30, 2024

1/1
SAG-AFTRA Ratifies New Contracts That Limit Use of AI Voices in Animated TV Shows.

As with the live-action agreement, the animation deals do not forbid the use of AI. But they do prevent actors’ voices from being recreated without their permission.

Article by
@Variety , link on the following post.

SAG-AFTRA Ratifies Contracts That Limit Use of AI Voices in Animated TV Shows

SAG-AFTRA members overwhelmingly ratified two contracts for voice acting in animated TV shows, putting limits on the use of artificial intelligence.

variety.com

bnew · Mar 30, 2024

1/3
introducing real-time image prompting.

running on "HD", our new 1024x1024 model.

2/3
it's live on http://krea.ai/! (make sure to select HD on the right side)

3/3
will do!

bnew · Mar 30, 2024

bnew · Mar 30, 2024

1/2
Text becomes image
Image becomes song
Image & song become world
Capture videos in your world
Describe your videos using text
Turn your description into images
Turn those images into new worlds
Expand the worlds with voice & GUIs
Link the worlds & make new universes

2/2
Haha

bnew · Mar 30, 2024

OpenAI’s voice cloning AI model only needs a 15-second sample to work

About ten developers have access to Voice Generation.

www.theverge.com

OpenAI’s voice cloning AI model only needs a 15-second sample to work

Called Voice Generation, the model has been in development since late 2022 and powers the Read Aloud feature in ChatGPT.

By Emilia David, a reporter who covers AI. Prior to joining The Verge, she covered the intersection between technology, finance, and the economy.

Mar 29, 2024, 7:10 PM EDT

Illustration: The Verge

OpenAI is offering limited access to a text-to-voice generation platform it developed called Voice Engine, which can create a synthetic voice based on a 15-second clip of someone’s voice. The AI-generated voice can read out text prompts on command in the same language as the speaker or in a number of other languages. “These small scale deployments are helping to inform our approach, safeguards, and thinking about how Voice Engine could be used for good across various industries,” OpenAI said in its blog post.

Companies with access include the education technology company Age of Learning, visual storytelling platform HeyGen, frontline health software maker Dimagi, AI communication app creator Livox, and health system Lifespan.

In these samples posted by OpenAI, you can hear what Age of Learning has been doing with the technology to generate pre-scripted voice-over content, as well as reading out “real-time, personalized responses” to students written by GPT-4.

First, the reference audio in English:

And here are three AI-generated audio clips based on that sample,

OpenAI said it began developing Voice Engine in late 2022 and that the technology has already powered preset voices for the text-to-speech API and ChatGPT’s Read Aloud feature. In an interview with TechCrunch, Jeff Harris, a member of OpenAI’s product team for Voice Engine, said the model was trained on “a mix of licensed and publicly available data.” OpenAI told the publication the model will only be available to about 10 developers.

AI text-to-audio generation is an area of generative AI that’s continuing to evolve. While most focus on instrumental or natural sounds, fewer have focused on voice generation, partially due to the questions OpenAI cited. Some names in the space include companies like Podcastle and ElevenLabs, which provide AI voice cloning technology and tools the Vergecast explored last year.

At the same time, the US government is trying to curb unethical uses of AI voice technology. Last month, the Federal Communications Commission banned robocalls using AI voices after people received spam calls from an AI-cloned voice of President Joe Biden.

According to OpenAI, its partners agreed to abide by its usage policies that say they will not use Voice Generation to impersonate people or organizations without their consent. It also requires the partners to get the “explicit and informed consent” of the original speaker, not build ways for individual users to create their own voices, and to disclose to listeners that the voices are AI-generated. OpenAI also added watermarking to the audio clips to trace their origin and actively monitor how the audio is used.

OpenAI suggested several steps that it thinks could limit the risks around tools like these, including phasing out voice-based authentication to access bank accounts, policies to protect the use of people’s voices in AI, greater education on AI deepfakes, and development of tracking systems of AI content.

bnew · Mar 30, 2024

Meta is adding AI to its Ray-Ban smart glasses next month

The smart glasses will soon come with a built-in assistant.

www.theverge.com

Meta is adding AI to its Ray-Ban smart glasses next month

The Ray-Ban Meta Smart Glasses can do things like identify objects, monuments, and animals, as well as translate text.

By Emma Roth, a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO.

Mar 28, 2024, 9:38 AM EDT

A photo showing the Ray-Ban Meta Smart Glasses on a blue and yellow background

Photo by Amelia Holowaty Krales / The Verge

Meta will bring AI to its Ray-Ban smart glasses starting next month, according to a report from The New York Times. The multimodal AI features, which can perform translation, along with object, animal, and monument identification, have been in early access since last December.

Users can activate the glasses’ smart assistant by saying “Hey Meta,” and then saying a prompt or asking a question. It will then respond through the speakers built into the frames. The NYT offers a glimpse at how well Meta’s AI works when taking the glasses for a spin in a grocery store, while driving, at museums, and even at the zoo.

Although Meta’s AI was able to correctly identify pets and artwork, it didn’t get things right 100 percent of the time. The NYT found that the glasses struggled to identify zoo animals that were far away and behind cages. It also didn’t properly identify an exotic fruit, called a cherimoya, after multiple tries. As for AI translations, the NYT found that the glasses support English, Spanish, Italian, French, and German.

Meta will likely continue refining these features as time goes on. Right now, the AI features in the Ray-Ban Meta Smart Glasses are only available through an early access waitlist for users in the US.

bnew · Mar 30, 2024

Here’s why AI search engines really can’t kill Google

A search engine is much more than a search engine, and AI still can’t quite keep up.

www.theverge.com

ARTIFICIAL INTELLIGENCE

Here’s why AI search engines really can’t kill Google

The AI search tools are getting better — but they don’t yet understand what a search engine really is and how we really use them.

By David Pierce, editor-at-large and Vergecast co-host with over a decade of experience covering consumer tech. Previously, at Protocol, The Wall Street Journal, and Wired.

Mar 26, 2024, 8:00 AM EDT

Illustration by Vincent Kilbride / The Verge

AI is coming for the search business. Or so we’re told. As Google seems to keep getting worse, and tools like ChatGPT, Google Gemini, and Microsoft Copilot seem to keep getting better, we appear to be barreling toward a new way to find and consume information online. Companies like Perplexity and You.com are pitching themselves as next-gen search products, and even Google and Bing are making huge bets that AI is the future of search. Bye bye, 10 blue links; hello direct answers to all my weird questions about the world.

But the thing you have to understand about a search engine is that a search engine is many things. For all the people using Google to find important and hard-to-access scientific information, orders of magnitude more are using it to find their email inbox, get to Walmart’s website, or remember who was president before Hoover. And then there’s my favorite fact of all: that a vast number of people every year go to Google and type “google” into the search box. We mostly talk about Google as a research tool, but in reality, it’s asked to do anything and everything you can think of, billions of times a day.

The real question in front of all these would-be Google killers, then, is not how well they can find information. It’s how well they can do everything Google does. So I decided to put some of the best new AI products to the real test: I grabbed the latest list of most-Googled queries and questions according to the SEO research firm Ahrefs and plugged them into various AI tools. In some instances, I found that these language model-based bots are genuinely more useful than a page of Google results. But in most cases, I discovered exactly how hard it will be for anything — AI or otherwise — to replace Google at the center of the web.

People who work in search always say there are basically three types of queries. First and most popular is navigation, which is just people typing the name of a website to get to that website. Virtually all of the top queries on Google, from “youtube” to “wordle” to “yahoo mail,” are navigation queries. In actual reality, this is a search engine’s primary job: to get you to a website.

In actual reality, a search engine’s primary job is to get you to a website

For navigational queries, AI search engines are universally worse than Google. When you do a navigational Google search, it’s exceedingly rare that the first result isn’t the one you’re looking for — sure, it’s odd to show you all those results when what Google should actually do is just take you directly to amazon.com or whatever, but it’s fast and it’s rarely wrong. The AI bots, on the other hand, like to think for a few seconds and then provide a bunch of quasi-useful information about the company when all I want is a link. Some didn’t even link to amazon.com.

I don’t hate the additional information so much as I hate how long these AI tools take to get me what I need. Waiting 10 seconds for three paragraphs of generated text about Home Depot is not the answer; I just want a link to Home Depot. Google wins that race every time.

The next most popular kind of search is the information query: you want to know something specific, about which there is a single right answer. “NFL scores” is a hugely popular information query; “what time is it” is another one; so is “weather.” It doesn’t matter who tells you the score or the time or the temperature, it’s just a thing you need to know.

A screenshot of Perplexity showing a search result for Warriors scores.

Perplexity’s answer seems helpful — but this wasn’t last night’s game. Screenshot: David Pierce / The Verge

Here, the results are all over the map. For real-time stuff like sports scores, the AI is not to be trusted: You.com and Perplexity both frequently gave me outdated information, though Copilot usually got it right. Google not only gets it right but usually pops up a widget with other stats and information, which is better than the others. Ditto anything requiring your specific location or context — Google probably has that information about you, but the AI bots mostly don’t.

When it comes to more evergreen information like “how many weeks in a year” or “when is mother’s day,” everything I tested got it right. In many cases I actually preferred the AI answers, which add a bit of helpful context. But I’m not sure how often I can trust them. Google told me there are 52.1429 weeks in a year, but You.com explained that actually it’s 52 weeks and a day, plus an added day on leap years. That’s more useful than just 52.1429! But then Perplexity told me that actually, a common year is 52 weeks, and a leap year is 52 weeks and a day — before directly contradicting itself two sentences later. Here’s the whole answer; just try to make sense of it:

A common year has approximately 52 weeks, while a leap year has 52 weeks and 1 day. In more precise terms, a regular year actually consists of 52.143 weeks, which means there is one additional day in a normal year. On the other hand, a leap year, occurring every four years except for certain exceptions, has 52 weeks and 2 days. This difference in the number of weeks is due to the extra day in a common year and the additional day in February during a leap year.

After doing some more research, I am now confident that the answer is what You.com said. But this all took too long, and forcing me to fact-check my searches kind of defeats the purpose of helpfully summarizing things for me. Google continues to win here on one thing and one thing alone: speed.

There is one sub-genre of information queries in which the exact opposite is true, though. I call them Buried Information Queries. The best example I can offer is the very popular query, “how to screenshot on mac.” There are a million pages on the internet that contain the answer — it’s just Cmd-Shift-3 to take the whole screen or Cmd-Shift-4 to capture a selection, there, you’re welcome — but that information is usually buried under a lot of ads and SEO crap. All the AI tools I tried, including Google’s own Search Generative Experience, just snatch that information out and give it to you directly. This is great!

An image of Copilot explaining how to take a screenshot on a Mac.

Now that is how you answer a question online. Screenshot: David Pierce / The Verge

Are there complicated questions inherent in that, which threaten the business model and structure of the web? Yep! But as a pure searching experience, it’s vastly better. I’ve had similar results asking about ingredient substitutions, coffee ratios, headphone waterproofing ratings, and any other information that is easy to know and yet often too hard to find.

This brings me to the third kind of Google search: the exploration query. These are questions that don’t have a single answer, that are instead the beginning of a learning process. On the most popular list, things like “how to tie a tie,” “why were chainsaws invented,” and “what is tiktok” count as explorational queries. If you ever Googled the name of a musician you just heard about, or have looked up things like “stuff to do in Helena Montana” or “NASA history,” you’re exploring. These are not, according to the rankings, the primary things people use Google for. But these are the moments AI search engines can shine.

Like, wait: why were chainsaws invented? Copilot gave me a multipart answer about their medical origins, before describing their technological evolution and eventual adoption by lumberjacks. It also gave me eight pretty useful links to read more. Perplexity gave me a much shorter answer, but also included a few cool images of old chainsaws and a link to a YouTube explainer on the subject. Google’s results included a lot of the same links, but did none of the synthesizing for me. Even its generative search only gave me the very basics.

My favorite thing about the AI engines is the citations. Perplexity, You.com, and others are slowly getting better at linking to their sources, often inline, which means that if I come across a particular fact that piques my interest, I can go straight to the source from there. They don’t always offer enough sources, or put them in the right places, but this is a good and helpful trend.

One experience I had while doing these tests was actually the most eye-opening of all. The single most-searched question on Google is a simple one: “what to watch.” Google has a whole specific page design for this, with rows of posters featuring “Top picks” like Dune: Part Two and Imaginary; “For you” which for me included Deadpool and Halt and Catch Fire; and then popular titles and genre-sorted options. None of the AI search engines did as well: Copilot listed five popular movies; Perplexity offered a random-seeming smattering of options from Girls5eva to Manhunt to Shogun; You.com gave me a bunch of out of date information and recommended I watch “the 14 best Netflix original movies” without telling me what they are.

AI is the right idea but a chatbot is the wrong interface

In this case, AI is the right idea — I don’t want a bunch of links, I want an answer to my question — but a chatbot is the wrong interface. For that matter, so is a page of search results! Google, obviously aware that this is the most-asked question on the platform, has been able to design something that works much better.

In a way, that’s a perfect summary of the state of things. At least for some web searches, generative AI could be a better tool than the search tech of decades past. But modern search engines aren’t just pages of links. They’re more like miniature operating systems. They can answer questions directly, they have calculators and converters and flight pickers and all kinds of other tools built right in, they can get you where you’re going with just a click or two. The goal of most search queries, according to these charts, is not to start a journey of information wonder and discovery. The goal is to get a link or an answer, and then get out. Right now, these LLM-based systems are just too slow to compete.

The big question, I think, is less about tech and more about product. Everyone, including Google, believes that AI can help search engines understand questions and process information better. That’s a given in the industry at this point. But can Google reinvent its results pages, its business model, and the way it presents and summarizes and surfaces information, faster than the AI companies can turn their chatbots into more complex, more multifaceted tools? Ten blue links isn’t the answer for search, but neither is an all-purpose text box. Search is everything, and everything is search. It’s going to take a lot more than a chatbot to kill Google.

The A.I Megathread (LLM , GPT , Development)

Veteran

NYC AI Chatbot Touted by Adams Tells Businesses to Break the Law​

'Incorrect, Harmful or Biased Content​

Chatbots Everywhere​

Veteran

Veteran

About​

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild​

TL;DR​

News​

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild​

TL;DR​

Speech Editing with VoiceCraft​

Table of Contents​

Guess which part is synthesized!​

Veteran

Veteran

Announcing Grok-1.5​

Capabilities and Reasoning​

Long Context Understanding​

Grok-1.5 Infra​

Looking Ahead​

Veteran

OpenAI reveals Voice Engine, but won’t yet publicly release the risky AI voice-cloning technology​

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

OpenAI’s voice cloning AI model only needs a 15-second sample to work​

Called Voice Generation, the model has been in development since late 2022 and powers the Read Aloud feature in ChatGPT.​

Related​

Veteran

Meta is adding AI to its Ray-Ban smart glasses next month​

The Ray-Ban Meta Smart Glasses can do things like identify objects, monuments, and animals, as well as translate text.​

Veteran

Here’s why AI search engines really can’t kill Google​

The AI search tools are getting better — but they don’t yet understand what a search engine really is and how we really use them.​

NYC AI Chatbot Touted by Adams Tells Businesses to Break the Law

'Incorrect, Harmful or Biased Content

Chatbots Everywhere

About

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

TL;DR

News

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

TL;DR

Speech Editing with VoiceCraft

Table of Contents

Guess which part is synthesized!

Announcing Grok-1.5

Capabilities and Reasoning

Long Context Understanding

Grok-1.5 Infra

Looking Ahead

OpenAI reveals Voice Engine, but won’t yet publicly release the risky AI voice-cloning technology

OpenAI’s voice cloning AI model only needs a 15-second sample to work

Called Voice Generation, the model has been in development since late 2022 and powers the Read Aloud feature in ChatGPT.

Related

Meta is adding AI to its Ray-Ban smart glasses next month

The Ray-Ban Meta Smart Glasses can do things like identify objects, monuments, and animals, as well as translate text.

Here’s why AI search engines really can’t kill Google

The AI search tools are getting better — but they don’t yet understand what a search engine really is and how we really use them.