The Finals uses AI text-to-speech because it can produce lines 'in just a matter of hours rather than months', baffles actual voice actors

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,341
Reputation
8,496
Daps
160,024

The Finals uses AI text-to-speech because it can produce lines 'in just a matter of hours rather than months', baffles actual voice actors​

By Harvey Randall
published about 18 hours ago

"Why the f**k do AI voice people act like hiring voice actors is some kind of arcane ritual."
A character from The Finals poses in front of a wall with THE FINALS written on it.
(Image credit: Embark Studios)


A podcast by Embark Studios—creators of the upcoming FPS The Finals—has hinted that the game will be using AI voice lines for the foreseeable future. The explanation, however, has left some voice actors feeling confused, at best.

Carl Strandberg and Andreas Almström, audio designers for The Finals, were asked: "who did the voiceovers? They sound really authentic" (spoilers: that's not an opinion everyone shares). They responded: "We use AI with a few exceptions, so all the contestant voices like the barks and voiceover commentators are AI text-to-speech." Miscellaneous voiceover stuff—grunting, pain noises, vaulting over objects—is otherwise done in-house.

"The reason that we went this route is that AI text-to-speech is finally extremely powerful. It gets us far enough in terms of quality, and allows us to be extremely reactive to new ideas … if a game designer comes up with a new idea for a game mode, we can have a voiceover representing that in just a matter of hours, instead of months."

That explanation, however, hasn't really jived with the experience of voice actors who actually work in games. One such actor is Gianni Matragrano—who you might recognise as Gabriel from Ultrakill, though he's provided work for a wide variety of games including Genshin Impact, Gloomwood, and Evil West.

Matragrano wrote on a Twitter thread: "We are constantly banging out rush order sessions for like, within a day or two … When you need more, you can book another session. We actually make it very easy." He goes on to reveal that he had his doubts when playing the beta, but was waiting for confirmation: "I had my suspicions but I didn't want to say anything in case I was wrong, or maybe it was at least just placeholder. But now at a big Open Beta with [150,000] concurrent players, this is definitely just their vision."



The above video is an example posted by Matragrano himself, and… yeah, it's not that great. I'm hearing too much uncanny valley to buy into that "extremely powerful" tech Strandberg and Almström were boasting. They did add the caveat: "If it sounds a bit off, it still blends kind of well with the fantasy of the virtual gameshow, aesthetically." It's up to you whether these voicelines immerse you.
Zane Schacht, another voice actor, wrote: "Why the fukk do AI voice people act like hiring voice actors is some kind of arcane ritual … I've knocked out entire games worth of audio in a two hour session. It ain't deep."

Meanwhile Pax Helgesen, who's both a senior sound designer and a voice actor himself, commented: "I’d like to again encourage devs to reconsider the use of voice in their games as simply an “asset” in the pipeline of agile development." He does go on to say that, yes, AI can serve an important role in the development of a game, but "An actor that could use the tools of their craft and experiences to collaborate and make something greater than what the devs imagined."

I'm inclined to agree here. In a sense, acting and sound design are two very different disciplines. It's similar to how 'AI Artists' get shot down in the public square when sharing the results of their prompts, since those with a better eye can see the lack of composition and intention a mile away.

You can ask an algorithm to produce something, true, but art involves dozens of purposeful choices that a machine can't, at the moment, replicate. Acting is similar. Part of me wonders if Strandberg and Almström just don't know enough about VA to understand how their ElevenLabs-generated lines are jarring to players who don't care about development turnaround times.

What makes this all the more bizarre is that there are interesting, thoughtful uses of this tech in games already. A little while back, it was revealed that Cyberpunk 2077's polish dub used AI to provide new lines for the game's expansion pack, Phantom Liberty, after the voice actor for a certain character died. CD Projekt did its due diligence. It hired a voice actor to provide the new lines (to be altered with Respeecher), it obtained the consent of the actor's surviving family members, and it did so to preserve the original, non-AI performance.

When it comes to The Finals, I'm struggling to see the creative intent. Sure, AI might be able to provide quicker turnarounds—even if they aren't as slow as the devs are making them out to be—but the result is devoid of personality. A multiplayer shoot-'em-up doesn't have to provide a deep narrative, sure, but you're listening to these barks for hours on end. I feel that stilted, awkward delivery will get annoying, fast.

I reached out to Embark Studios for comment and I've been told via an email the studio uses a mix of "recorded voice audio and audio generated via TTS [text to speech] tools in our games, depending on the context," citing conversations between characters as one where it's important to get real people talking to each other. "TTS allows us to have tailored [voice acting] where we otherwise wouldn't, e.g. due to speed of implementation."

"In the instances we use TTS in The Finals, it's always based on real voices." A point to make, here, is that most AI voice programs are based on real voices, in the same way that AI art is based on real art—that's how the tech works. "In the Open Beta, it is based on a mix of professional voice actors and temp voices from Embark employees. Making games without actors isn’t an end goal for Embark and TTS technology has introduced new ways for us to work together."

Embark studios did not comment on the 'months vs. weeks' question, though the implication does seem to hew close to what the aforementioned interview puts forward: TTS is part of The Finals' vision. The game will likely use a mixture of voicework and AI even once it's out of beta—unless public opinion sways Embark Studios otherwise.
 

winb83

52 Years Young
Supporter
Joined
May 28, 2012
Messages
45,810
Reputation
3,811
Daps
69,356
Reppin
Michigan
Sounds ok. Maybe they didn't have the budget for real VA work.
 
Top