Underage Workers Are Training AI

bnew · Nov 16, 2023

https://www.wired.co.uk/article/artificial-intelligence-data-labeling-children

Underage Workers Are Training AI

Companies that provide Big Tech with AI data-labeling services are inadvertently hiring young teens to work on their platforms, often exposing them to traumatic content.

Silhouette of a child using a laptop on a background containing internet captcha buttons.

PHOTO-ILLUSTRATION: CAMERON GETTY; GETTY IMAGES

Like most kids his age, 15-year-old Hassan spent a lot of time online. Before the pandemic, he liked playing football with local kids in his hometown of Burewala in the Punjab region of Pakistan. But Covid lockdowns made him something of a recluse, attached to his mobile phone. “I just got out of my room when I had to eat something,” says Hassan, now 18, who asked to be identified under a pseudonym because he was afraid of legal action. But unlike most teenagers, he wasn’t scrolling TikTok or gaming. From his childhood bedroom, the high schooler was working in the global artificial intelligence supply chain, uploading and labeling data to train algorithms for some of the world’s largest AI companies.

The raw data used to train machine-learning algorithms is first labeled by humans, and human verification is also needed to evaluate their accuracy. This data-labeling ranges from the simple—identifying images of street lamps, say, or comparing similar ecommerce products—to the deeply complex, such as content moderation, where workers classify harmful content within data scraped from all corners of the internet. These tasks are often outsourced to gig workers, via online crowdsourcing platforms such as Toloka, which was where Hassan started his career.

A friend put him on to the site, which promised work anytime, from anywhere. He found that an hour’s labor would earn him around $1 to $2, he says, more than the national minimum wage, which was about $0.26 at the time. His mother is a homemaker, and his dad is a mechanical laborer. “You can say I belong to a poor family,” he says. When the pandemic hit, he needed work more than ever. Confined to his home, online and restless, he did some digging, and found that Toloka was just the tip of the iceberg.

“AI is presented as a magical box that can do everything,” says Saiph Savage, director of Northeastern University’s Civic AI Lab. “People just simply don’t know that there are human workers behind the scenes.”

At least some of those human workers are children. Platforms require that workers be over 18, but Hassan simply entered a relative’s details and used a corresponding payment method to bypass the checks—and he wasn’t alone in doing so. WIRED spoke to three other workers in Pakistan and Kenya who said they had also joined platforms as minors, and found evidence that the practice is widespread.

“When I was still in secondary school, so many teens discussed online jobs and how they joined using their parents' ID,” says one worker who joined Appen at 16 in Kenya, who asked to remain anonymous. After school, he and his friends would log on to complete annotation tasks late into the night, often for eight hours or more.

Appen declined to give an attributable comment.

“If we suspect a user has violated the User Agreement, Toloka will perform an identity check and request a photo ID and a photo of the user holding the ID,” Geo Dzhikaev, head of Toloka operations, says.

Driven by a global rush into AI, the global data labeling and collection industry is expected to grow to over $17.1 billion by 2030, according to Grand View Research, a market research and consulting company. Crowdsourcing platforms such as Toloka, Appen, Clickworker, Teemwork.AI, and OneForma connect millions of remote gig workers in the global south to tech companies located in Silicon Valley. Platforms post micro-tasks from their tech clients, which have included Amazon, Microsoft Azure, Salesforce, Google, Nvidia, Boeing, and Adobe. Many platforms also partner with Microsoft’s own data services platform, the Universal Human Relevance System (UHRS).

These workers are predominantly based in East Africa, Venezuela, Pakistan, India, and the Philippines—though there are even workers in refugee camps, who label, evaluate, and generate data. Workers are paid per task, with remuneration ranging from a cent to a few dollars—although the upper end is considered something of a rare gem, workers say. “The nature of the work often feels like digital servitude—but it's a necessity for earning a livelihood,” says Hassan, who also now works for Clickworker and Appen.

Sometimes, workers are asked to upload audio, images, and videos, which contribute to the data sets used to train AI. Workers typically don’t know exactly how their submissions will be processed, but these can be pretty personal: On Clickworker’s worker jobs tab, one task states: “Show us you baby/child! Help to teach AI by taking 5 photos of your baby/child!” for €2 ($2.15). The next says: “Let your minor (aged 13-17) take part in an interesting selfie project!”

Some tasks involve content moderation—helping AI distinguish between innocent content and that which contains violence, hate speech, or adult imagery. Hassan shared screen recordings of tasks available the day he spoke with WIRED. One UHRS task asked him to identify “fukk,” “c**t,” “dikk,” and “bytch” from a body of text. For Toloka, he was shown pages upon pages of partially naked bodies, including sexualized images, lingerie ads, an exposed sculpture, and even a nude body from a Renaissance-style painting. The task? Decipher the adult from the benign, to help the algorithm distinguish between salacious and permissible torsos.

Hassan recalls moderating content while under 18 on UHRS that, he says, continues to weigh on his mental health. He says the content was explicit: accounts of rape incidents, lifted from articles quoting court records; hate speech from social media posts; descriptions of murders from articles; sexualized images of minors; naked images of adult women; adult videos of women and girls from YouTube and TikTok.

bnew · Nov 16, 2023

Many of the remote workers in Pakistan are underage, Hassan says. He conducted a survey of 96 respondents on a Telegram group chat with almost 10,000 UHRS workers, on behalf of WIRED. About a fifth said they were under 18.

Awais, 20, from Lahore, who spoke on condition that his first name not be published, began working for UHRS via Clickworker at 16, after he promised his girlfriend a birthday trip to the turquoise lakes and snow-capped mountains of Pakistan’s northern region. His parents couldn’t help him with the money, so he turned to data work, joining using a friend’s ID card. “It was easy,” he says.

He worked on the site daily, primarily completing Microsoft’s “Generic Scenario Testing Extension” task. This involved testing homepage and search engine accuracy. In other words, did selecting “car deals” on the MSN homepage bring up photos of cars? Did searching “cat” on Bing show feline images? He was earning $1 to $3 each day, but he found the work both monotonous and infuriating. At times he found himself working 10 hours for $1, because he had to do unpaid training to access certain tasks. Even when he passed the training, there might be no task to complete; or if he breached the time limit, they would suspend his account, he says. Then seemingly out of nowhere, he got banned from performing his most lucrative task—something workers say happens regularly. Bans can occur for a host of reasons, such as giving incorrect answers, answering too fast, or giving answers that deviate from the average pattern of other workers. He’d earned $70 in total. It was almost enough to take his high school sweetheart on the trip, so Awais logged off for good.

Clickworker did not respond to requests for comment. Microsoft declined to comment.

“In some instances, once a user finishes the training, the quota of responses has already been met for that project and the task is no longer available,” Dzhikaev said. “However, should other similar tasks become available, they will be able to participate without further training.”

Researchers say they’ve found evidence of underage workers in the AI industry elsewhere in the world. Julian Posada, assistant professor of American Studies at Yale University, who studies human labor and data production in the AI industry, says that he’s met workers in Venezuela who joined platforms as minors.

Bypassing age checks can be pretty simple. The most lenient platforms, like Clickworker and Toloka, simply ask workers to state they are over 18; the most secure, such as Remotasks, employ face recognition technology to match workers to their photo ID. But even that is fallible, says Posada, citing one worker who says he simply held the phone to his grandmother’s face to pass the checks. The sharing of a single account within family units is another way minors access the work, says Posada. He found that in some Venezuelan homes, when parents cook or run errands, children log on to complete tasks. He says that one family of six he met, with children as young as 13, all claimed to share one account. They ran their home like a factory, Posada says, so that two family members were at the computers working on data labeling at any given point. “Their backs would hurt because they have been sitting for so long. So they would take a break, and then the kids would fill in,” he says.

The physical distances between the workers training AI and the tech giants at the other end of the supply chain—“the deterritorialization of the internet,” Posada calls it—creates a situation where whole workforces are essentially invisible, governed by a different set of rules, or by none.

The lack of worker oversight can even prevent clients from knowing if workers are keeping their income. One Clickworker user in India, who requested anonymity to avoid being banned from the site, told WIRED he “employs” 17 UHRS workers in one office, providing them with a computer, mobile, and internet, in exchange for half their income. While his workers are aged between 18 and 20, due to Clickworker’s lack of age certification requirements, he knows of teenagers using the platform.

In the more shadowy corners of the crowdsourcing industry, the use of child workers is overt.

Captcha (Completely Automated Public Turing test to tell Computers and Humans Apart) solving services, where crowdsourcing platforms pay humans to solve captchas, are a less understood part in the AI ecosystem. Captchas are designed to distinguish a bot from a human—the most notable example being Google’s reCaptcha, which asks users to identify objects in images to enter a website. The exact purpose of services that pay people to solve them remains a mystery to academics, says Posada. “But what I can confirm is that many companies, including Google's reCaptcha, use these services to train AI models,” he says. “Thus, these workers indirectly contribute to AI advancements.”

Google did not respond to a request for comment in time for publication.

There are at least 152 active services, mostly based in China, with more than half a million people working in the underground reCaptcha market, according to a 2019 study by researchers from Zhejiang University in Hangzhou.

“Stable job for everyone. Everywhere,” one service, Kolotibablo, states on its website. The company has a promotional website dedicated to showcasing its worker testimonials, which includes images of young children from across the world. In one, a smiling Indonesian boy shows his 11th birthday cake to the camera. “I am very happy to be able to increase my savings for the future,” writes another, no older than 7 or 8. A 14-year-old girl in a long Hello Kitty dress shares a photo of her workstation: a laptop on a pink, Barbie-themed desk.

Not every worker WIRED interviewed felt frustrated with the platforms. At 17, most of Younis Hamdeen’s friends were waiting tables. But the Pakistani teen opted to join UHRS via Appen instead, using the platform for three or four hours a day, alongside high school, earning up to $100 a month. Comparing products listed on Amazon was the most profitable task he encountered. “I love working for this platform,” Hamdeen, now 18, says, because he is paid in US dollars—which is rare in Pakistan—and so benefits from favorable exchange rates.

But the fact that the pay for this work is incredibly low compared to the wages of in-house employees of the tech companies, and that the benefits of the work flow one way—from the global south to the global north, leads to uncomfortable parallels. “We do have to consider the type of colonialism that is being promoted with this type of work,” says the Civic AI Lab’s Savage.

Hassan recently got accepted to a bachelor’s program in medical lab technology. The apps remain his sole income, working an 8 am to 6 pm shift, followed by 2 am to 6 am. However, his earnings have fallen to just $100 per month, as demand for tasks has outstripped supply, as more workers have joined since the pandemic.

He laments that UHRS tasks can pay as little as 1 cent. Even on higher-paid jobs, such as occasional social media tasks on Appen, the amount of time he needs to spend doing unpaid research means he needs to work five or six hours to complete an hour of real-time work, all to earn $2, he says.

“It’s digital slavery,” says Hassan.

Underage Workers Are Training AI

More options

bnew

Veteran