Artificial Intelligence Could Finally Let Us Talk with Animals
AI is poised to revolutionize our understanding of animal communication
www.scientificamerican.com
Artificial Intelligence Could Finally Let Us Talk with Animals
AI is poised to revolutionize our understanding of animal communication- By Lois Parshley on October 1, 2023
Scientific American October 2023 Issue
The Project Cetacean Translation Initiative (CETI) is using machine learning to try to understand the vocalizations of sperm whales. Credit: Franco Banfi/Minden Pictures
Underneath the thick forest canopy on a remote island in the South Pacific, a New Caledonian Crow peers from its perch, dark eyes glittering. The bird carefully removes a branch, strips off unwanted leaves with its bill and fashions a hook from the wood. The crow is a perfectionist: if it makes an error, it will scrap the whole thing and start over. When it's satisfied, the bird pokes the finished utensil into a crevice in the tree and fishes out a wriggling grub.
The New Caledonian Crow is one of the only birds known to manufacture tools, a skill once thought to be unique to humans. Christian Rutz, a behavioral ecologist at the University of St Andrews in Scotland, has spent much of his career studying the crow's capabilities. The remarkable ingenuity Rutz observed changed his understanding of what birds can do. He started wondering if there might be other overlooked animal capacities. The crows live in complex social groups and may pass toolmaking techniques on to their offspring. Experiments have also shown that different crow groups around the island have distinct vocalizations. Rutz wanted to know whether these dialects could help explain cultural differences in toolmaking among the groups.
New technology powered by artificial intelligence is poised to provide exactly these kinds of insights. Whether animals communicate with one another in terms we might be able to understand is a question of enduring fascination. Although people in many Indigenous cultures have long believed that animals can intentionally communicate, Western scientists traditionally have shied away from research that blurs the lines between humans and other animals for fear of being accused of anthropomorphism. But with recent breakthroughs in AI, “people realize that we are on the brink of fairly major advances in regard to understanding animals' communicative behavior,” Rutz says.
Beyond creating chatbots that woo people and producing art that wins fine-arts competitions, machine learning may soon make it possible to decipher things like crow calls, says Aza Raskin, one of the founders of the nonprofit Earth Species Project. Its team of artificial-intelligence scientists, biologists and conservation experts is collecting a wide range of data from a variety of species and building machine-learning models to analyze them. Other groups such as the Project Cetacean Translation Initiative (CETI) are focusing on trying to understand a particular species, in this case the sperm whale.
Decoding animal vocalizations could aid conservation and welfare efforts. It could also have a startling impact on us. Raskin compares the coming revolution to the invention of the telescope. “We looked out at the universe and discovered that Earth was not the center,” he says. The power of AI to reshape our understanding of animals, he thinks, will have a similar effect. “These tools are going to change the way that we see ourselves in relation to everything.”
When Shane Gero got off his research vessel in Dominica after a recent day of fieldwork, he was excited. The sperm whales that he studies have complex social groups, and on this day one familiar young male had returned to his family, providing Gero and his colleagues with an opportunity to record the group's vocalizations as they reunited.
For nearly 20 years Gero, a scientist in residence at Carleton University in Ottawa, kept detailed records of two clans of sperm whales in the turquoise waters of the Caribbean, capturing their clicking vocalizations and what the animals were doing when they made them. He found that the whales seemed to use specific patterns of sound, called codas, to identify one another. They learn these codas much the way toddlers learn words and names, by repeating sounds the adults around them make.
Having decoded a few of these codas manually, Gero and his colleagues began to wonder whether they could use AI to speed up the translation. As a proof of concept, the team fed some of Gero's recordings to a neural network, an algorithm that learns skills by analyzing data. It was able to correctly identify a small subset of individual whales from the codas 99 percent of the time. Next the team set an ambitious new goal: listen to large swathes of the ocean in the hopes of training a computer to learn to speak whale. Project CETI, for which Gero serves as lead biologist, plans to deploy an underwater microphone attached to a buoy to record the vocalizations of Dominica's resident whales around the clock.
As sensors have gotten cheaper and technologies such as hydrophones, biologgers and drones have improved, the amount of animal data has exploded. There's suddenly far too much for biologists to sift through efficiently by hand. AI thrives on vast quantities of information, though. Large language models such as ChatGPT must ingest massive amounts of text to learn how to respond to prompts: ChatGPT-3 was trained on around 45 terabytes of text data, a good chunk of the entire Library of Congress. Early models required humans to classify much of those data with labels. In other words, people had to teach the machines what was important. But the next generation of models learned how to “self-supervise,” automatically learning what's essential and independently creating an algorithm of how to predict what words come next in a sequence.
In 2017 two research groups discovered a way to translate between human languages without the need for a Rosetta stone. The discovery hinged on turning the semantic relations between words into geometric ones. Machine-learning models are now able to translate between unknown human languages by aligning their shapes—using the frequency with which words such as “mother” and “daughter” appear near each other, for example, to accurately predict what comes next. “There's this hidden underlying structure that seems to unite us all,” Raskin says. “The door has been opened to using machine learning to decode languages that we don't already know how to decode.”
The field hit another milestone in 2020, when natural-language processing began to be able to “treat everything as a language,” Raskin explains. Take, for example, DALL-E 2, one of the AI systems that can generate realistic images based on verbal descriptions. It maps the shapes that represent text to the shapes that represent images with remarkable accuracy—exactly the kind of “multimodal” analysis the translation of animal communication will probably require.
Many animals use different modes of communication simultaneously, just as humans use body language and gestures while talking. Any actions made immediately before, during, or after uttering sounds could provide important context for understanding what an animal is trying to convey. Traditionally, researchers have cataloged these behaviors in a list known as an ethogram. With the right training, machine-learning models could help parse these behaviors and perhaps discover novel patterns in the data. Scientists writing in the journal Nature Communications last year, for example, reported that a model found previously unrecognized differences in Zebra Finch songs that females pay attention to when choosing mates. Females prefer partners that sing like the birds the females grew up with.
You can already use one kind of AI-powered analysis with Merlin, a free app from the Cornell Lab of Ornithology that identifies bird species. To identify a bird by sound, Merlin takes a user's recording and converts it into a spectrogram—a visualization of the volume, pitch and length of the bird's call. The model is trained on Cornell's audio library, against which it compares the user's recording to predict the species identification. It then compares this guess to eBird, Cornell's global database of observations, to make sure it's a species that one would expect to find in the user's location. Merlin can identify calls from more than 1,000 bird species with remarkable accuracy.
But the world is loud, and singling out the tune of one bird or whale from the cacophony is difficult. The challenge of isolating and recognizing individual speakers, known as the cocktail party problem, has long plagued efforts to process animal vocalizations. In 2021 the Earth Species Project built a neural network that can separate overlapping animal sounds into individual tracks and filter background noise, such as car honks—and it released the open-source code for free. It works by creating a visual representation of the sound, which the neural network uses to determine which pixel is produced by which speaker. In addition, the Earth Species Project recently developed a so-called foundational model that can automatically detect and classify patterns in datasets.