“Torrenting from a corporate laptop doesn’t feel right”: Meta emails unsealed

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,827
Reputation
9,328
Daps
169,807

“Torrenting from a corporate laptop doesn’t feel right”: Meta emails unsealed​


Meta's alleged torrenting and seeding of pirated books complicates copyright case.

Ashley Belanger – Feb 6, 2025 4:26 PM |

200



Credit: Devonyu | iStock / Getty Images Plus

Newly unsealed emails
allegedly provide the "most damning evidence" yet against Meta in a copyright case raised by book authors alleging that Meta illegally trained its AI models on pirated books.

Last month, Meta admitted to torrenting a controversial large dataset known as LibGen, which includes tens of millions of pirated books. But details around the torrenting were murky until yesterday, when Meta's unredacted emails were made public for the first time. The new evidence showed that Meta torrented "at least 81.7 terabytes of data across multiple shadow libraries through the site Anna’s Archive, including at least 35.7 terabytes of data from Z-Library and LibGen," the authors' court filing said. And "Meta also previously torrented 80.6 terabytes of data from LibGen."

"The magnitude of Meta’s unlawful torrenting scheme is astonishing," the authors' filing alleged, insisting that "vastly smaller acts of data piracy—just .008 percent of the amount of copyrighted works Meta pirated—have resulted in Judges referring the conduct to the US Attorneys’ office for criminal investigation."

Seeding expands authors’ distribution theory​


Book authors had been pressing Meta for more information on the torrenting because of the obvious copyright concern over Meta seeding, and thus seemingly distributing, the pirated books in the dispute.

But Meta resisted those discovery attempts after an order denied authors' request to review Meta's torrenting and seeding data. That didn't stop authors from gathering evidence anyway, including a key document that starts with at least one staffer appearing to uncomfortably joke about the possible legal risks, eventually growing more serious about raising his concerns.

"Torrenting from a corporate laptop doesn’t feel right," Nikolay Bashlykov, a Meta research engineer, wrote in an April 2023 message, adding a smiley emoji. In the same message, he expressed "concern about using Meta IP addresses 'to load through torrents pirate content.'"

By September 2023, Bashlykov had seemingly dropped the emojis, consulting the legal team directly and emphasizing in an email that "using torrents would entail ‘seeding’ the files—i.e., sharing the content outside, this could be legally not OK."

Emails discussing torrenting prove that Meta knew it was "illegal," authors alleged. And Bashlykov's warnings seemingly landed on deaf ears, with authors alleging that evidence showed Meta chose to instead hide its torrenting as best it could while downloading and seeding terabytes of data from multiple shadow libraries as recently as April 2024.

Meta allegedly concealed seeding​


Supposedly, Meta tried to conceal the seeding by not using Facebook servers while downloading the dataset to "avoid" the "risk" of anyone "tracing back the seeder/downloader" from Facebook servers, an internal message from Meta researcher Frank Zhang said, while describing the work as being in "stealth mode." Meta also allegedly modified settings "so that the smallest amount of seeding possible could occur," a Meta executive in charge of project management, Michael Clark, said in a deposition.

Now that new information has come to light, authors claim that Meta staff involved in the decision to torrent LibGen must be deposed again because the new facts allegedly "contradict prior deposition testimony."

Mark Zuckerberg, for example, claimed to have no involvement in decisions to use LibGen to train AI models. But unredacted messages show the "decision to use LibGen occurred" after "a prior escalation to MZ," authors alleged.

Meta did not immediately respond to Ars' request for comment and has maintained throughout the litigation that AI training on LibGen was "fair use."

However, Meta has previously addressed its torrenting in a motion to dismiss filed last month, telling the court that "plaintiffs do not plead a single instance in which any part of any book was, in fact, downloaded by a third party from Meta via torrent, much less that Plaintiffs’ books were somehow distributed by Meta."

While Meta may be confident in its legal strategy despite the new torrenting wrinkle, the social media company has seemingly complicated its case by allowing authors to expand the distribution theory that's key to winning a direct copyright infringement claim beyond just claiming that Meta's AI outputs unlawfully distributed their works.

As limited discovery on Meta's seeding now proceeds, Meta is not fighting the seeding aspect of the direct copyright infringement claim at this time, telling the court that it plans to "set... the record straight and debunk... this meritless allegation on summary judgment."
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,827
Reputation
9,328
Daps
169,807
1/6
@flexghost@mastodon.social
Meta illegally downloaded 80+ terabytes of books from LibGen, Anna’s Archive, and Z-Library to train their AIIn 2010, Aaron Swartz downloaded just 70GB of JSTOR articles 0.0875% of what Meta took He faced $1M in fines and 35 years in jail. The pressure from the US government caused Aaron to take his own life in 2013If corporations are people than what fate does Meta deserve?(VIDEO: Standing ovation for the Aaron Swartz statue unveiling at the Internet Archive)

https://files.mastodon.social/media...912/001/007/157/original/9802934a993e623d.mp4

2/6
@CiaoBruno@newsie.social
@flexghost

e001f973895fc54a.jpeg


3/6
@whorfin
@CiaoBruno @flexghost Corporate Death Penalty: seize and sell all assets to benefit victims of wrongdoing, all IP to public domain

4/6
@bufalo1973@tuiter.rocks
@flexghostIf it was "fair", close to $1.5B and 40 000 years (distributed by shareholder investment)Edit: add to that the inflation.

5/6
@flexghost
@bufalo1973

6/6
@peteorrall
@flexghost #Meta deserves the worst.


To post posts in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196


1/6
@flexghost@mastodon.social
Meta illegally downloaded 80+ terabytes of books from LibGen, Anna’s Archive, and Z-Library to train their AIIn 2010, Aaron Swartz downloaded just 70GB of JSTOR articles 0.0875% of what Meta took He faced $1M in fines and 35 years in jail. The pressure from the US government caused Aaron to take his own life in 2013If corporations are people than what fate does Meta deserve?(VIDEO: Standing ovation for the Aaron Swartz statue unveiling at the Internet Archive)

https://files.mastodon.social/media...912/001/007/157/original/9802934a993e623d.mp4

2/6
@CiaoBruno@newsie.social
@flexghost

e001f973895fc54a.jpeg


3/6
@whorfin
@CiaoBruno @flexghost Corporate Death Penalty: seize and sell all assets to benefit victims of wrongdoing, all IP to public domain

4/6
@bufalo1973@tuiter.rocks
@flexghostIf it was "fair", close to $1.5B and 40 000 years (distributed by shareholder investment)Edit: add to that the inflation.

5/6
@flexghost
@bufalo1973

6/6
@peteorrall
@flexghost #Meta deserves the worst.


To post posts in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,827
Reputation
9,328
Daps
169,807

1/11
@BauerKahan
Authors have a right to control and profit off of their own intellectual property. My bill, AB 412, requires developers to be transparent about their use of copyrighted materials. “Torrenting from a corporate laptop doesn’t feel right”: Meta emails unsealed



2/11
@VWGAMEDEV
This is Great, thankyou 👏!



3/11
@JoePhix815
DO SOMETHING THAT PEOPLE GIVE A fukk ABOUT!! NOT THIS!



4/11
@HumanPwrdPics
Thank you @BauerKahan for AB 412! The creative industries contribute billions to the CA economy. Those who work in these industries are being severely impacted by predatory data gathering that is occurring. Creators who often barely earn enough to scrape by need transparency!



5/11
@Zsibo1
Finally! ❤️❤️



6/11
@Moonlogic_Games
We appreciate you. Keep fighting the good fight!



7/11
@neilturkewitz
🙏🏽 so much for this. Transparency doesn’t guarantee justice, but it’s definitely a predicate for getting there. There’s a reason AI companies don’t want to disclose what works they used to train their models. It exposes the exploitation! Instead, they want to hide behind slogans.



8/11
@Kerberrage
Thank you !



9/11
@AdeptAdaptor
Thank you for taking action! For years these companies have not only stolen copyrighted works of art and literature, but have also invaded people's privacy and harvested their data to target and influence them through social and search algorithms. This is a great first step!



10/11
@BobbyMiller
Thank you!



11/11
@StevenStahlberg
damn straight




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196




1/2
@jason_kint
Oomph. Same playbook, over and over. capture as much data however possible to accelerate growth. zuckerberg wants to win sooooo badly. 1/2



GjD59bOWUAANz_v.png


2/2
@jason_kint
here are the exhibits supporting the crime fraud doctrine claim which the Court ordered Meta to submit by 5pm on Friday for in camera review. 2/2 https://storage.courtlistener.com/recap/gov.uscourts.cand.415175/gov.uscourts.cand.415175.417.1.pdf



GjD6oWMW0AENcnC.png



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,827
Reputation
9,328
Daps
169,807


1/3
‪Techmeme‬ ‪@techmeme.com‬

Kadrey v. Meta: unsealed emails show Meta allegedly torrented 81.7TB+ of data across multiple shadow libraries through the site Anna's Archive for AI training (Ashley Belanger/Ars Technica)

“Torrenting from a corporate laptop doesn’t feel right”: Meta emails unsealed | Kadrey v. Meta: unsealed emails show Meta allegedly torrented 81.7TB+ of data across multiple shadow libraries through the site Anna's Archive, for AI training

2/3
‪abunchofmalarkey.bsky.social‬ ‪@abunchofmalarkey.bsky.social‬

Could you please explain what this means?

3/3
abunchofmalarkey.bsky.social

Thank you. So basically feeding other people’s intellectual property into Meta’s AI to teach it, on an industrial scale?

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/4
jaypeg89.bsky.social

Remember when Capitol Records sued a woman for downloading 24 songs on Kazaa and won a $1.92 million settlement? How much is 80TB worth…

2/4
‪yesthatkarim‬ ‪@yesthatkarim.bsky.social‬

yeah but the person who was ordered to pay $80,000 per song in statutory (read: NOT ACTUAL) damages is a minority female (Native American mother of 4).

Whereas Zuck is a billionaire white male who has obsequiously shoved his nose so far up Trump's ass he can see Trump's gallbladder. Soooo...

3/4
‪yesthatkarim‬ ‪@yesthatkarim.bsky.social‬

Trump (and by extension, Zuck) own the judiciary and the legislature. Meta's legal team is approximately the same size as the 101st Airborne Division. Meta also just paid $25 million to Trump to settle the lawsuit Trump brought against them for banning him from Facebook.

4/4
‪yesthatkarim‬ ‪@yesthatkarim.bsky.social‬

Facebook banned Trump after January 6 for fomenting insurrection. Trump called it "impermissible censorship," and now Facebook is bowing and scraping and begging Trump's forgiveness. It's sickening.

bafkreifoxtarkskcy5xapy3pe355wqqcpwsokxopxy65saeaamfxzfofea@jpeg


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
Jack Kennedy

I have this funny feeling that Meta will not be pursued by prosecutors with the same murderous aggression faced by, say, Aaron Swartz

bafkreiep4kyolgnsquaa4hatdc2ama7qskofueyvwhnrzgle7wtueyy6hu@jpeg


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/2
Amba Azaad

"Meta also allegedly modified settings "so that the smallest amount of seeding possible could occur"
Meta is a leecher, literally.

bafkreiep4kyolgnsquaa4hatdc2ama7qskofueyvwhnrzgle7wtueyy6hu@jpeg


2/2
‪Aditya M‬ ‪@almostinfamous.bsky.social‬

Why is this so unsurprising+

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top