We might need another thread for this update:
Audio purporting to be Pikesville High School principal Eric Eiswert was almost certainly faked using artificial intelligence.
www.thebaltimorebanner.com
Baltimore County principal’s racist comments faked by AI, experts say
Experts in detecting audio and video fakes say there is overwhelming evidence that the recording of a Baltimore County principal making racist and antisemitic comments is AI-generated.
The two experts — the director of a university media forensics lab and the CEO of an artificial intelligence detection firm that has worked with companies like Visa and Microsoft — say the audio has the hallmarks of a fake.
The audio circulated on social media in January, purporting to be of Pikesville High School principal Eric Eiswert
making derogatory comments about students and staff. In the clip, the speaker refers to “ungrateful Black kids who can’t test their way out of a paper bag.” Outrage swirled and prompted a Baltimore County Public Schools investigation that has dragged on for nearly two months with no news on its outcome.
Eiswert has always maintained the audio was fake. Known as deepfakes, audio and video created using artificial intelligence have been used to spread misinformation about public figures like President Biden, but the use of the technology to harm the reputation of a local figure, like a school principal, is unusual.
Audio has ‘hallmarks’ of AI
Siwei Lyu, director of a media forensics lab at the University at Buffalo, said the audio is not particularly sophisticated. Lyu has developed technologies at the State University of New York for spotting audio and images created using artificial intelligence.
This audio is “not a challenging case for the algorithms. I believe someone just made this using an AI voice generator,” Lyu said, adding that he doesn’t believe the person who made it put a lot of effort into the task. Online voice generator tools, like one from
Eleven Labs, are available to anyone and advertise their ability to instantly create audio that’s indistinguishable from human speech.
There is, however, clear evidence the audio was manipulated, Lyu said.
“There is some signs of editing, like putting different pieces together,” he said. “This has the sound features of AI generation. The tone is a little flat.”
AI-generated voices tend to have unusually clean background sounds, or a lack of consistent breathing sounds or pauses, Lyu said.
In recent months, universities and companies have been using artificial intelligence to create methods of detecting deepfakes in ways the human ear can’t. Their methods have been getting better with time. Lyu has created the
DeepFake-o-meter platform, for example.
Lyu, who has researched digital media forensics, computer vision and machine learning, said his team put the audio through several recent deepfake audio detection methods — three that their lab created, and two that others created. In four cases, the audio was deemed AI-generated with 99% surety, and in the other case with 74% certainty.
The less-certain detection method, Lyu said, identifies “vocoder artifacts,” or evidence of a step that converts a synthesized voice into audio. He said it’s a less reliable way to detect deepfakes than others.
Reality Defender has created its own methods, which Ben Colman, the CEO and co-founder, said could be done with 99% accuracy. The company has worked with governments and companies, including Visa, Microsoft and NBC, to detect images, text or audio deepfakes.
Colman’s team also deemed the Eiswert audio almost certainly AI-generated.
“Our platform not only found it was likely manipulated, but our team looked into it and found it has the hallmarks of AI-generated audio that was then recorded from speaker to another device, likely to mask its generative nature,” said Colman in a statement.
“Separately from the results on it being AI, the audio clearly has several moments where there’s an absence of sound,” Coleman said. “There are clear, sudden and incredibly short stops between bits of dialogue that indicate the absence of sound, which itself indicates some level of file manipulation.”