bnew

Veteran
Joined
Nov 1, 2015
Messages
55,619
Reputation
8,224
Daps
157,090

Is A.I. the Death of I.P.?​


Generative A.I. is the latest in a long line of innovations to put pressure on our already dysfunctional copyright system.

By

January 15, 2024


A paper collage of a match lighting a physical copyright symbol.

All new creations derive from existing creations. The no man’s land between acceptable borrowing and penalizable theft is where most copyright wars are waged.Illustration by Ben Denzer

Intellectual property accounts for some or all of the wealth of at least half of the world’s fifty richest people, and it has been estimated to account for fifty-two per cent of the value of U.S. merchandise exports. I.P. is the new oil. Nations sitting on a lot of it are making money selling it to nations that have relatively little. It’s therefore in a country’s interest to protect the intellectual property of its businesses.

But every right is also a prohibition. My right of ownership of some piece of intellectual property bars everyone else from using that property without my consent. I.P. rights have an economic value but a social cost. Is that cost too high?

I.P. ownership comes in several legal varieties: copyrights, patents, design rights, publicity rights, and trademarks. And it’s everywhere you look. United Parcel Service has a trademark on the shade of brown it paints its delivery trucks. If you paint your delivery trucks the same color, UPS can get a court to make you repaint them. Coca-Cola owns the design rights to the Coke bottle: same deal. Some models of the Apple Watch were taken off the market this past Christmas after the United States International Trade Commission determined that Apple had violated the patent rights of a medical-device firm called Masimo. (A court subsequently paused the ban.)




In 2021, the N.C.A.A. began allowing college athletes to market their name, image, and likeness (N.I.L., the three elements of the right of publicity). Caitlin Clark, the University of Iowa women’s-basketball star, has an N.I.L. valued at around eight hundred thousand dollars a year. If you think there might conceivably be a gender gap here: LeBron James’s son Bronny, who played his first collegiate game on December 10th and scored four points in a losing effort, has an N.I.L. currently valued at $5.9 million.

Bob Dylan, Neil Young, and Stevie Nicks are among a number of artists who have recently sold the rights to some or all of their songs. Virtually every song that Bruce Springsteen has ever written is now owned by Sony, which is reported to have paid five hundred and fifty million dollars for the catalogue. Because the copyright clock does not start ticking until the demise of the creator, Sony could own those rights until past the end of the century. The longer the Boss lives, the richer Sony gets.

David Bellos and Alexandre Montagu use the story of Sony’s big Springsteen buy to lead off their lively, opinionated, and ultra-timely book, “ Who Owns This Sentence? A History of Copyrights and Wrongs” (Norton), because it epitomizes the trend that led them to write it. The rights to a vast amount of created material—music, movies, books, art, games, computer software, scholarly articles, just about any cultural product people will pay to consume—are increasingly owned by a small number of large corporations and are not due to expire for a long time.

So what? There is little danger that Sony will keep Bruce Springsteen’s songs locked up. On the contrary, it is likely that, from now until 2100 or so, it will be impossible to escape the sound of Springsteen’s voice, because Sony needs to find lots of ways to recoup its investment. Sony enjoys no benefit from sitting on its property, and the music costs it almost nothing to disseminate. The company just needs someone to deposit the checks.

Sony will collect many of those checks from people like you and me. Our contribution will come out of things like the subscription and downloading fees we pay our music-streaming services. Considering the amount of music those services give us access to, a lifetime of Springsteen is costing us pennies. But there are some six hundred and sixteen million subscribers to music-streaming services out there—the number has more than doubled in the past four years, which is why all these catalogue sales are happening now—so the math looks good for Sony.

There are other lucrative revenue streams. Car manufacturers have been trying to buy a license to use “Born to Run” in their commercials almost since the song was released, in 1975. Unless Springsteen, who has so far largely avoided endorsements, attached conditions to the sale, which seems unlikely given the dollars on the table, their day has probably arrived.

Bellos, a comparative-literature professor at Princeton, and Montagu, an intellectual-property lawyer, find this kind of rent-seeking objectionable. They complain that corporate copyright owners “strut the world stage as the new barons of the twenty-first century,” and they call copyright “the biggest money machine the world has seen.” They point out that, at a time when corporate ownership of copyrights has boomed, the income of authors, apart from a few superstars, has been falling. They think that I.P. law is not a set of rules protecting individual rights so much as a regulatory instrument for business.


But what Bellos and Montagu are ultimately distressed about isn’t that businesses like Sony are sucking in large sums for the right to play music they didn’t create, or that you and I have to pay to listen to it. We always had to pay to listen to it. The problem, as they see it, is that corporate control of cultural capital robs the commons.

In an important sense, when Bruce Springsteen releases a song or Jorie Graham publishes a poem, it belongs to all the world. Musical compositions, poems, works of art, books, TikTok videos—every type of cultural product is a public good. Our species draws upon them for pleasure, for edification, for inspiration and motivation, and sometimes for a cheesy simulacrum of such things. Because of the digital revolution, more of these goods are available to more people at less cost than ever. And we can do almost anything we like with them. We can listen to the songs or read the poems as often as we want, and they can excite us to create songs and poems of our own. What we cannot do, for a finite period of time, is put copies of those things on the market.

That period is set by Congress, under a power enumerated in Article I of the Constitution: “To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.” The first federal copyright act, passed in 1790, set the term of copyright at fourteen years from the date when a work was submitted for registration, renewable for another fourteen years.

You no longer have to register a work to hold its copyright. And the duration of that copyright has been extended several times. Since 1978, it has been seventy years from the death of the creator. For “corporate authors”—that is, companies that pay employees to make stuff (known as “work for hire”)—it is now ninety-five years from the date of publication or a hundred and twenty years from the date of creation, whichever is shorter. Mickey Mouse, who was first “published” in 1928, entered the public domain at the beginning of this year—but only in his 1928 form. Updated Mickeys are still protected. In short, by the time a work created today enters the public domain, most of us will be dead. Many of us will be very dead.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,619
Reputation
8,224
Daps
157,090
For you (probably) and me (definitely), the rights to our creations are not worth much money to anyone but ourselves. But, if you are the guy who wrote “Born to Run,” it is prudent to assign your rights to an entity that can pay you while you are alive some considerable portion of what your songs will be worth long after you are not. Bellos and Montagu argue that copyright law, originally enacted in Britain in the eighteenth century to protect publishers (and, to some extent, writers) from pirates, has evolved into a protection for corporate colossi with global reach. The law today treats companies as “authors,” and classifies things like the source code of software as “literary works,” giving software a much longer period of protection than it would have if it were classified only as an invention and eligible for a patent (now good for twenty years, with some exceptions).

Bellos and Montagu agree with many critics of contemporary copyright law that the current term of copyright is absurd. Often, we are locking away indefinitely stuff whose rights are owned by someone—an heir, an estate, some company that bought them along with other assets in a package—but no one knows who. For fear of a lawsuit, that material remains in a vault. A lot of video footage falls into this category, as do countless books that are out of print and music that can no longer be purchased in any format (much of Motown, for instance). There is no “use it or lose it” provision in copyright law.

Rights-owning heirs can be quite controlling, too. Martin Luther King, Jr.,’s family, along with EMI Music Publishing, owns the rights to film and audio recordings of the “I Have a Dream” speech. In 1996, the King family sued CBS for using portions of the speech without permission—even though it was CBS that made the film for which King’s heirs were charging a licensing fee. “It has to do with the principle that if you make a dollar, I should make a dime” is how King’s son Dexter explained the thinking. An initial verdict for CBS was overturned on appeal, and the Kings settled for a cash payment (which evidently took the form of a contribution to the King Center for Nonviolent Social Change and thus was tax deductible). CBS can afford the litigation. The average person cannot.

Corporations themselves can squeeze you shamelessly. Bellos and Montagu tell the story of a documentary filmmaker who shot a scene in which a group of workers were sitting around playing a board game with a television set on in the background. The TV happened to be showing “The Simpsons,” and the filmmaker applied for permission to use the four seconds of the “Simpsons” episode that was visible in the shot. The studio wanted ten thousand dollars.

A particularly notorious “background” lawsuit was the “Dancing Baby” case. At issue was a twenty-nine-second YouTube video a mother had taken of her thirteen-month-old bouncing up and down to a Prince song, which is indistinctly audible for approximately twenty seconds. In 2007, Prince’s label alleged copyright infringement and forced YouTube to take down the video. The case ended up in court. The baby’s mother, Stephanie Lenz, prevailed in a lawsuit, but the litigation took a decade. That’s why an author who wants to reproduce a photograph in a book would, if the photograph includes a painting in the background, even a fragment, be well advised to get permission not just from the photograph’s rights holder but from the painting’s.

What makes this ridiculous is that most of the photographs you see in books are on the Web, where they can be viewed by billions of people for nothing. But authors have to pay a fee, often hundreds of dollars for a single image, to reproduce them in a work that will be read by, with luck, ten or twenty thousand people. The major rent seeker here is Getty Images, which, after buying up most of its rivals, now controls more than four hundred and seventy-seven million “assets”—stock images, editorial photography, video, and music—and is worth five billion dollars. If you want to reprint a news photograph, chances are that Getty controls the rights.

Most litigation over copyright, like Lenz’s suit, involves a term that has eluded precise judicial definition: fair use. Fair use is where the commons enters the picture. When Ezra Pound said “Make It New,” he meant that putting old expressions to new uses is how civilizations evolve. The higher the firewall protecting the old expressions, the less dynamic the culture has a chance to be.

As Bellos and Montagu repeatedly point out, all new creations derive from existing creations. In our head when we write a poem or make a movie are all the poems we have read or movies we have seen. Philosophers build on the work of prior philosophers; historians rely on other historians. The same principle applies to TikTok videos. The same principle applies, really, to life. Living is a group effort.

The no man’s land between acceptable borrowing and penalizable theft is therefore where most copyright wars are waged. One thing that makes borrowing legal is a finding that the use of the original material is “transformative,” but that term does not appear in any statute. It’s a judge-made standard and plainly subjective. Fair-use litigation can make your head spin, not just because the claims of infringement often seem far-fetched—where is the damage to the rights holder, exactly?—but because the outcomes are unpredictable. And unpredictability is bad for business.

The publisher of “ The Wind Done Gone,” a 2001 retelling, by Alice Randall, of Margaret Mitchell’s “ Gone with the Wind” from the perspective of a Black character, was sued for infringement by the owner of the Mitchell estate. The parties reached a settlement when Randall’s publisher, Houghton Mifflin, agreed to make a contribution to Morehouse College (a peculiar outcome, as though the estate of the author of “Gone with the Wind” were somehow the party that stood for improving the life chances of Black Americans). Then there’s the case of Demetrious Polychron, a Tolkien fan who was recently barred from distributing his sequel to “ The Lord of the Rings,” titled “The Fellowship of the King.” Polychron had approached the Tolkien estate for permission and had been turned down, whereupon he self-published his book anyway, as the estate learned when it turned up for sale on Amazon.

In Randall’s case, Houghton Mifflin argued that the new novel represented a transformative use of Mitchell’s material because it told the story from a new perspective. It was plainly not written in the spirit of the original. In Polychron’s, the sequel was purposely faithful to the original. He called it “picture-perfect,” and it was clearly intended to be read as though Tolkien had written it himself. Polychron also brought his troubles on himself by first suing the Tolkien estate and Amazon for stealing from his book for the Amazon series “ The Lord of the Rings: The Rings of Power.” The suit was deemed “frivolous and unreasonably filed,” and it invited the successful countersuit.

Pop art, from Andy Warhol to Jeff Koons, is a lively arena for fair-use litigation, since the art deals explicitly with appropriated images. Very little is obviously “transformed.” Last spring, in Andy Warhol Foundation v. Goldsmith, the Supreme Court ruled that the foundation could not license the use of a Warhol work—featuring Prince, as it happens—that was silk-screened from a photograph by Lynn Goldsmith, a professional photographer.

The Court’s opinion, by Justice Sonia Sotomayor, largely restricted itself to the question of who had the right to license the image for use as a magazine illustration. It did not address the potentially explosive art-market question of whether Warhol’s Prince silk screens themselves (there are fourteen, plus two pencil drawings) are covered by fair use. Following his “Campbell’s Soup Cans” exhibition, in 1962, much of Warhol’s art reproduced images and designs made by other people. Are those works “transformative” because they’re Warhols? If I did the same thing, could I claim fair use?

The real circus act in copyright law, currently, is pop music. Pop is a highly formulaic art, and some amount of copying is pretty much inevitable. Most twelve-bar blues music is based on the same three chords. Much of jazz is built from the chord progression known as “rhythm changes.” Folk has a certain sound; rock has a certain sound; country has a certain sound. These sounds are created from a vocal and instrumental palette specific to each genre, and each genre has its own themes, tropes, imagery.

This is because although originality has high value in the fine arts, imitation—or, more precisely, imitation with a difference—has high value in entertainment media. People like the music they already like. Movies, too. If the first “Die Hard” is a hit, there is a sequel—in fact, four sequels. It’s the “Send more Chuck Berry” syndrome, the theory behind Pandora. Listeners want songs that sound like songs they enjoy, and a hit song spawns soundalikes seeking to cash in on what people are buying.

The insane part of all this is that I can record a cover—that is, a copy—of “Born to Run” without any permission at all. The legal requirement is only that I notify the rights holder and pay a royalty set by statute, which is currently about thirty-seven cents per sale for a three-minute song. Unsurprisingly, a huge portion of the pop repertoire therefore is covers. There are at least fifty covers of “Born to Run,” including one by the London Symphony Orchestra. There are more than fifteen hundred Bob Dylan covers. There were six versions of “Try a Little Tenderness” before Otis Redding made his immortal 1966 recording with Booker T. & the M.G.s, a rendition without which the lives of many of us would be poorer.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,619
Reputation
8,224
Daps
157,090
But if I write a song that simply shares a few musical elements with “Born to Run”—“substantial similarity” is the legal standard—I could be in trouble. The similarity does not have to be deliberate. George Harrison was found liable for “subconscious” infringement when he used chords from the Chiffons’ hit “He’s So Fine,” from 1963, in his 1970 song “My Sweet Lord,” and had to pay five hundred and eighty-seven thousand dollars. Harrison knew that “this combination of sounds would work,” the judge wrote, because it had already worked. Yes, that seems to be the way the music business operates.

To be found liable for subconscious infringement, you do at least have to have heard the song you’re accused of stealing from. In 1983, a jury found that the Bee Gees had borrowed illegally from a song by Roland Selle called “Let It End” when they wrote “How Deep Is Your Love,” but the verdict was thrown out on appeal because the plaintiff had not established that the Bee Gees could have heard his song, which he had distributed as a demo. The initial finding of “substantial similarity” was purely serendipitous.

In 2015, a jury decided that Robin Thicke and Pharrell Williams had copied Marvin Gaye’s “Got to Give It Up” in their hit “Blurred Lines.” Although the question of whether there were specific musical elements in common was contested, the jury evidently thought that they had a similar “feel.” Thicke and Williams had to pay the Gaye family $5.3 million plus fifty per cent of future revenues.

The finding shocked a lot of people in the legal and music worlds, and a backlash against the “Blurred Lines” verdict seems to have made it a little harder for music infringement claims to stick. The group Spirit had a plausible case that Led Zeppelin had borrowed the arpeggiated chords that open “Stairway to Heaven” from Spirit’s “Taurus”: the chords are not completely identical but they do sound a lot alike, and Led Zeppelin used to open for Spirit. Still, in 2016, a California jury sided with Led Zeppelin, in a verdict that survived appeal.

And, last spring, the singer-songwriter Ed Sheeran was found not liable for copying another Gaye song, “Let’s Get It On.” During the trial, Sheeran brought his guitar with him to the witness stand and demonstrated to the jury that the four-chord progression in his song was common in pop music. Sheeran is a charming fellow, and the jury was duly swayed. “I am unbelievably frustrated that baseless claims like this are allowed to go to court at all,” he said after the trial. But the legal uncertainty is an incentive to sue, since settlement dollars can be significant. (If you lose, though, the Copyright Act gives the court the discretion to make you pay the defendant’s attorney fees.)

The uncertainty exists because juries differ, but also because the goalposts move. The different results in the “Blurred Lines” and the “Stairway to Heaven” lawsuits had partly to do with something called the “inverse ratio” rule, a judge-made rule invented to establish the degree of similarity required for legal liability. Inverse ratio dictates that the more access the defendant had to the original work, the lower the bar for establishing substantial similarity. Which makes little sense. The court—the Ninth Circuit, where many entertainment-industry cases end up—applied the rule in the former case and then turned around and declared it void in the latter.

Judicial competence is also an issue. There is a special court for patent and trademark claims, which sits in Washington, D.C. But judges assigned in copyright cases generally know little about the fields in which fair-use concerns arise. This is why the matter of what’s “transformative” is such a judicial gray area. In a rather heated dissent in the Warhol case, Elena Kagan complained that Justice Sotomayor and the rest of the majority had no understanding of art. To know why a Warhol silk screen counts as transformative, or to give musical definition to a song’s “feel,” you need a kind of expertise that most judges—most people—don’t have.

Competence is also likely to be a factor in cases arising on the next frontier in I.P., artificial intelligence. Bellos and Montagu end their book with the intriguing suggestion that A.I. may be the technology that brings the whole legal structure of copyright down.

From a historical perspective, generative A.I. is just the latest in a line of innovations that have put pressure on copyright law. These include photography, which was not declared copyrightable until the second half of the nineteenth century; radio, which triggered a war between the American Society of Composers, Authors, and Publishers (ascap), which licenses performance rights for music, and the broadcast companies over whether on-air play of a song requires payment of a royalty (ascap won); and photocopying. Is a Xerox copy of an article or a book illegal under the terms of copyright law? How about a six-line poem? It is, after all, a copy, even if it was not made with a printing press.

The Internet spawned all kinds of methods for accessing copyrighted material and circumventing copyright claims. Napster, launched in 1999, is the landmark example. Its peer-to-peer file-sharing system was determined to be piracy, but Napster still revolutionized the music industry by moving it into the streaming business. Performance revenue aside, music income now comes primarily not from CD sales but from licensing deals. Spotify is a direct descendant of the Napster case.

On the other hand, in Authors Guild v. Google, decided in 2015, courts upheld the legality of Google Books, even though it is a Web site that was created by scanning tens of millions of books without permission from the copyright holders. That case didn’t even go to trial. Google won in summary judgment under the principle of fair use, and an appeals court held that Google Books’ copying had a “highly convincing transformative purpose” and did not constitute copyright infringement. The outcome portends trouble for parties with copyright cases against companies that use A.I.

Still, no one knows how courts will apply the current statutory authority—the Copyright Act of 1976 and subsequent amendments—to generative A.I., a technology whose capacities were barely contemplated in 1976. Apps like ChatGPT are large language models (L.L.M.s), meaning that they have “learned” by being “trained” on enormous amounts of digital information. What the models are “learning” are not even sentences but “tokens,” which are often pieces of words. When functioning properly, a model predicts, based on a statistical calculation, what token comes next.

This has been mocked as simply an advanced form of autofill. But, when I write a sentence, I, too, am trying to guess the best next word. It just doesn’t feel especially “auto.” One big difference is that, since I fancy myself a writer, I am trying to avoid, wherever possible, the statistically most common solution.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,619
Reputation
8,224
Daps
157,090
It is thought that a significant percentage of the token sequences that the L.L.M.s have trained on come from the Web sites of news organizations, whose material is copyrighted. The models are also believed to train on text in so-called shadow libraries, like Library Genesis and Z-Library, which include millions of pages of copyrighted material. A key legal question is whether the training process has involved copying this text and, if so, whether any or all of this process is protected by fair use.

I.P. experts completely disagree about what the answer should be. There are multiple legal challenges under way, which will probably result in cases argued in different venues producing inconsistent results. Ideally, this is an area where Congress, under its Article I power, would decide on the rules, but Congress these days is not exactly a well-oiled legislative machine.

Courts have already ruled that search engines, like Google and Bing, which scour enormous amounts of copyrighted material on the Web, are protected by fair use, because the thumbnail images and text snippets they display when you conduct a search qualify as “transformative.” Are generative-A.I. systems so different from search software in this respect?

The comedian and memoirist Sarah Silverman and two other writers have sued the tech companies Meta and OpenAI for copyright infringement. (Most of the suit was dismissed by a federal judge last November.) John Grisham and Jodi Picoult are part of a separate writers’ lawsuit, and there are others. It’s not obvious what sort of relief writers can ask for. Silverman’s memoir is protected against piracy by copyright. Someone else can’t print and sell a substantially similar work. But, in an L.L.M., her text is a drop in an ocean of digital data. There is no reason to think that well-known, best-selling writers such as Grisham and Picoult are somehow losing more to L.L.M.s than an equally prolific author of self-published guides to home repair is. Since A.I. technologies feed on the entire online universe of words and images, everyone, even if their creative activities are limited to taking selfies or posting tuna-casserole recipes, could sue. To an L.L.M., it’s tokens all the way down.

But the lawsuits keep on coming. Last winter, Getty Images sued Stability AI for what it called “brazen theft and freeriding” on a “staggering scale.” And, in December, the Times sued OpenAI and Microsoft, claiming that those companies are liable for “billions of dollars in statutory and actual damages” for their use of the Times’ archives.

The Times claims, for example, that Bing, Microsoft’s search engine, which uses OpenAI’s ChatGPT, provided results that substantially copied verbatim from the paper’s Wirecutter content, which makes money when readers use its links to sites where they can purchase recommended goods. (In effect, Bing visited the Wirecutter pages and then got the ChatGPT engine to paraphrase them closely.) The links were not included in Bing’s version, and so the Times lost money.

Some of these legal challenges can be met by licensing agreements, which is how music companies responded to the Napster episode. The Associated Press has agreed to license the use of its reporting to ChatGPT, and additional licensing deals have been consummated or are in the works. Other kinds of guardrails around the use of A.I. in the workplace can be erected through collective bargaining, as happened this fall after the Writers Guild of America, which represents more than eleven thousand screenwriters, and the Screen Actors Guild went on strike. Might similar guardrails be used to protect—oh, I don’t know—writers for weekly magazines?

Another question is whether works created by A.I. are themselves copyrightable. Last August, a federal court ruled that machine-made works are not copyrightable—in the court’s words, that “human authorship is a bedrock requirement of copyright.” But that conclusion is likely to be tested soon. After all, a camera is a machine. Why is it that, if I bring my Leica to a back-yard fireworks display, my photograph is eligible for copyright protection, but if I prompt Dall-E 3, an OpenAI service, to make me a photograph of fireworks, the image it produces might not be?

People loved the A.I.-generated version of Johnny Cash singing a Taylor Swift song, which was posted online last year by a person in Texas named Dustin Ballard. But who owns it? Could Taylor Swift sue? Probably not, since it’s a cover. Does the Cash estate have an ownership claim? Not necessarily, since you can’t copyright a style or a voice. Dustin Ballard? He neither composed nor performed the song. No one? Does it belong to all the world?

Some people may say that A.I. is robbing the commons. But A.I. is only doing what I do when I write a poem. It is reviewing all the poems it has encountered and using them to make something new. A.I. just “remembers” far more poems than I can, and it makes new poems a lot faster than I ever could. I don’t need permission to read those older poems. Why should ChatGPT? Are we penalizing a chatbot for doing what all human beings do just because it does so more efficiently? If the results are banal, so are most poems. God knows mine are.

Whatever happens, the existential threats of A.I. will not be addressed by copyright law. What we’re looking at right now is a struggle over money. Licensing agreements, copyright protections, employment contracts—it’s all going to result in a fantastically complex regulatory regime in which the legal fiction of information “ownership” gives some parties a bigger piece of the action than other parties. Life in an A.I. world will be very good for lawyers. Unless, of course, they are replaced with machines. ♦

Published in the print edition of the January 22, 2024, issue.[/SIZE]
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,619
Reputation
8,224
Daps
157,090

Garbage AI on Google News​

JOSEPH COX

JAN 18, 2024 AT 9:33 AM

404 Media reviewed multiple examples of AI rip-offs making their way into Google News. Google said it doesn't focus on how an article was produced—by an AI or human—opening the way for more AI-generated articles.

The Google logo.


Google News is boosting sites that rip-off other outlets by using AI to rapidly churn out content, 404 Media has found. Google told 404 Media that although it tries to address spam on Google News, the company ultimately does not focus on whether a news article was written by an AI or a human, opening the way for more AI-generated content making its way onto Google News.

The presence of AI-generated content on Google News signals two things: first, the black box nature of Google News, with entry into Google News’ rankings in the first place an opaque, but apparently gameable, system. Second, is how Google may not be ready for moderating its News service in the age of consumer-access AI, where essentially anyone is able to churn out a mass of content with little to no regard for its quality or originality.

“I want to read the original stories written by journalists who actually researched them and spoke to primary sources. Any news junkie would,” Brian Penny, a ghostwriter who first flagged some of the seemingly AI-generated articles to 404 Media, said.

💡

Do you know about any other AI-generated content farms? I would love to hear from you. Using a non-work device, you can message me securely on Signal at +44 20 8133 5190. Otherwise, send me an email at joseph@404media.co.

One example was a news site called Worldtimetodays.com, which is littered with full page and other ads. On Wednesday it published an article about Star Wars fandom. The article was very similar to one published a day earlier on the website Distractify, with even the same author photo. One major difference, though, was that Worldtimetodays.com wrote “Let’s be honest, war of stars fans,” rather than Star Wars fans. Another article is a clear rip-off of a piece from Heavy.com, with Worldtimetodays.com not even bothering to replace the Heavy.com watermarked artwork. Gary Graves, the listed author on Worldtimetodays.com, has published more than 40 articles in a 24 hour period.

Both of these rip-off articles appear in Google News search results. The first appears when searching for “Star Wars theory” and setting the results to the past 24 hours. The second appears when searching for the subject of the article with a similar 24 hour setting.

Gallery Image

Gallery Image

LEFT: THE DISTRACTIFY ARTICLE. RIGHT: THE ARTICLE ON WORLDTIMETODAYS.COM.​

Aaron Nobel, editor-in-chief of Heavy.com, told 404 Media in an email that “I was not aware of this particular ripoff or this particular website. But over the years we've encountered many other sites that rip and republish content at scale.” Neither Distractify or Worldtimetodays.com responded to a request for comment.

There are a few different ways to use Google News. One is to simply open the main Google News homepage, where Google surfaces what it thinks are the most important stories of the day. Another is to search for a particular outlet, where you’ll then see recent stories from just that site. A third is to search by “topic,” such as “artificial intelligence,” “Taylor Swift,” or whatever it is you’re interested in. Appearing in topic searches is especially important for outlets looking to garner more attention for their writings on particular beats. 404 Media, at the time of writing does not appear in topic searches (except people, funnily enough, writing about 404 Media, like this Fast Company article about us and other worker-owned media outlets). As in, if you searched “CivitAI,” an artificial intelligence company we’ve investigated extensively, our investigations would not appear in Google News, only people aggregating our work or producing their own would.


In another example of AI-generated rip-off content, Penny sent screenshots of search results for news related to the AI tool “midjourney.” At one point, those included articles from sites such as “WatchdogWire” and “Examiner.com.” These articles appear to use the same images, very similar or identical headlines, and pockets of similar text.

The Examiner.com domain was once used by a legitimate news service and went through various owners and iterations. The site adopted its current branding in around 2022, according to archived versions of the site on the Wayback Machine. With that in mind, it’s worth remembering that some of these sites that more recently pivoted to AI-generated content may have been accepted into Google News long ago, even before the advent of consumer-level AI.

Gallery Image

Gallery Image

Gallery Image

A SERIES OF GOOGLE NEWS SCREENSHOTS PROVIDED BY PENNY.​

Looking at WatchdogWire and Examiner.com more broadly, both sites regularly publish content with the same art and identical or very similar headlines in quick succession every day. Ahmed Baig, one of the listed authors on WatchdogWire, has published more than 500 articles in the past 30 days, according to his author page. Baig did not respond to a request for comment sent over LinkedIn asking whether he was taking work from other outlets and using AI to reword them. Baig lists himself as the editor-in-chief of WatchdogWire, as well as the head of SEO for a company called Sproutica. A contact email for Examiner.com uses the Sproutica domain.

Someone who replied to a request for comment to that address, and who signed off as “Nabeel,” confirmed Examiner.com is using AI to copy other peoples’ articles. “Sometimes it doesn’t perform well by answering out of context text, therefore, my writer proofread the content,” they wrote. “It's an experiment for now which isn't responding as expected in terms of Google Search. Despite publishing 400+ stories it attracted less than 1000 visits.”

The articles on WatchdogWire and Examiner.com are almost always very similar to those published on Watcher.Guru, another news site which also has a popular Twitter account with 2.1 million followers and which regularly goes viral on the platform. When asked if Watcher.Guru has any connection to WatchdogWire or Examiner.com, a person in control of the Watcher.Guru Twitter account told 404 Media in a direct message that “we are not affiliated with these sites. These sites are using AI to steal our content and featured images.”

In another case, Penny sent a screenshot of a Google News result that showed articles from CBC and another outlet called “PiPa News.” The PiPa News piece appears to be a rewrite of the CBC one, with a very similar headline and body of text. PiPa News did not respond to an emailed request for comment. Kerry Kelly from CBC’s public affairs department, said in an email that “We are aware of an increase in outlets and individuals using CBC News articles without proper licensing or attribution, and are working to curb this trend through media monitoring, takedown requests for individual sites, and connecting with social media platforms when appropriate.”




A SCREENSHOT OF WATCHER.GURU'S WEBSITE ON THURSDAY.​




A SCREENSHOT OF EXAMINER.COM'S WEBSITE ON THURSDAY.​

A Google spokesperson said the company focuses on the quality of the content, and not how it was created. Their statement read: “Our focus when ranking content is on the quality of the content, rather than how it was produced. Automatically-generated content produced primarily for ranking purposes is considered spam, and we take action as appropriate under our policies.” Google reiterated that websites are automatically considered for Google News, and that it can take time for the system to identify new websites. The company added that its Google News ranking systems aim to reward original content that demonstrates things such as expertise and trustworthiness.

With that in mind, after 404 Media approached Google for comment, Penny found that the WatchdogWire and Examiner.com results had apparently been removed from search results for the “midjourney” query and another for and “stable diffusion.” Google did not respond when asked multiple times to confirm if it took any action.

404 Media remains outside of news topics results for the beats we cover.









 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,619
Reputation
8,224
Daps
157,090

Anthropic hits back at music publishers in AI copyright lawsuit, accusing them of ‘volitional conduct’​

Bryson Masse @Bryson_M

January 18, 2024 3:56 PM

A robot lawyer holds a document in front of a counsel bench in a courtroom in a graphic novel style image.

Credit: VentureBeat made with OpenAI DALL-E 3 via ChatGPT Plus

Anthropic, a major generative AI startup, laid out its case why accusations of copyright infringement from a group of music publishers and content owners are invalid in a new court filing on Wednesday.

In fall 2023, music publishers including Concord, Universal, and ABKCO filed a lawsuit against Anthropic accusing it of copyright infringement over its chatbot Claude (now supplanted by Claude 2).

The complaint, filed in federal court in Tennessee (home of Nashville, one of America’s “Music Cities” and to many labels and musicians), alleges that Anthropic’s business profits from “unlawfully” scraping song lyrics from the internet to train its AI models, which then reproduce the copyrighted lyrics for users in the form of chatbot responses.

Responding to a motion for preliminary injunction — a measure that, if granted by the court, would force Anthropic to stop making its Claude AI model available — Anthropic laid out familiar arguments that have emerged in numerous other copyright disputes involving AI training data.

Gen AI companies like OpenAI and Anthropic rely heavily on scraping massive amounts of publicly available data, including copyrighted works, to train their models but they maintain this use constitutes fair use under the law. It’s expected the question of data scraping copyright will reach the Supreme Court.



Song lyrics only a ‘miniscule fracion’ of training data​

In its response, Anthropic argues its “use of Plaintiffs’ lyrics to train Claude is a transformative use” that adds “a further purpose or different character” to the original works.

To support this, the filing directly quotes Anthropic research director Jared Kaplan, stating the purpose is to “create a dataset to teach a neural network how human language works.”

Anthropic contends its conduct “has no ‘substantially adverse impact’ on a legitimate market for Plaintiffs’ copyrighted works,” noting song lyrics make up “a minuscule fraction” of training data and licensing the scale required is incompatible.

Joining OpenAI, Anthropic claims licensing the vast troves of text needed to properly train neural networks like Claude is technically and financially infeasible. Training demands trillions of snippets across genres may be an unachievable licensing scale for any party.

Perhaps the filing’s most novel argument claims the plaintiffs themselves, not Anthropic, engaged in the “volitional conduct” required for direct infringement liability regarding outputs.

Volitional conduct” in copyright law refers to the idea that a person accused of committing infringement must be shown to have control over the infringing content outputs. In this case, Anthropic is essentially saying that the label plaintiffs caused its AI model Claude to produce the infringing content, and thus, are in control of and responsible for the infringement they report, as opposed to Anthropic or its Claude product, which reacts to inputs of users autonomously.

The filing points to evidence the outputs were generated through the plaintiffs’ own “attacks” on Claude designed to elicit lyrics.



Irreparable harm?​

On top of contesting copyright liability, Anthropic maintains the plaintiffs cannot prove irreparable harm.

Citing a lack of evidence that song licensing revenues have decreased since Claude launched or that qualitative harms are “certain and immediate,” Anthropic pointed out that the publishers themselves believe monetary damages could make them whole, contradicting their own claims of “irreparable harm” (as, by definition, accepting monetary damages would indicate the harms do have a price that could be quantified and paid).

Anthropic asserts the “extraordinary relief” of an injunction against it and its AI models is unjustified given the plaintiffs’ weak showing of irreparable harm. It also argued that any output of lyrics by Claude was an unintentional “bug” that has now been fixed through new technological guardrails.

Specifically, Anthropic claims it has implemented additional safeguards in Claude to prevent any further display of the plaintiffs’ copyrighted song lyrics. Because the alleged infringing conduct cannot reasonably occur again, the model maker says the plaintiffs’ request for relief preventing Claude from outputting lyrics is moot.

It contends the music publishers’ request is overbroad, seeking to restrain use not just of the 500 representative works in the case, but millions of others that the publishers further claim to control.

As well, the AI start up pointed to the Tennessee venue and claimed the lawsuit was filed in the incorrect jurisdiction. Anthropic maintained that it has no relevant business connections to Tennessee. The company noted that its headquarters and principal operations are based in California.

Further, Anthropic stated that none of the allegedly infringing conduct cited in the suit, such as training its AI technology or providing user responses, took place within Tennessee’s borders.

The filing pointed out users of Anthropic’s products agreed any disputes would be litigated in California courts.



Copyright fight far from over​

The copyright battle in the burgeoning generative AI industry continues to intensify.

More artists joined lawsuits against art generators like Midjourney and OpenAI with the latter’s DALL-E model, bolstering evidence of infringement from diffusion model reconstructions.

The New York Times recently filed a copyright infringement lawsuit against OpenAI and Microsoft, alleging that their use of scraped Times’ content to train models for ChatGPT and other AI systems violated its copyrights. The suit calls for billions in damages and demands that any models or data trained on Times content be destroyed.

Amid these debates, a nonprofit group called “Fairly Trained” launched this week advocating for a “licensed model” certification for data used to train AI models — supported by Concord and Universal Music Group, among others.

Platforms have also stepped in, with Anthropic, Google and OpenAI as well as content companies like Shutterstock and Adobe pledging legal defenses for enterprise users of AI generated content.

Creators are undaunted though, fighting bids to dismiss claims from authors like Sarah Silverman’s against OpenAI. Judges will need to weigh technological progress and statutory rights in nuanced disputes.

Furthermore, regulators are listening to worries over datamining scopes. Lawsuits and congressional hearings may decide whether fair use shelters proprietary appropriations, frustrating some while enabling others. Overall, negotiations seem inevitable to satisfy all involved as generative AI matures.

What comes next remains unclear, but this week’s filing suggests generative AI companies are coalescing around a core set of fair use and harm-based defenses, forcing courts to weigh technological progress against rights owners’ control.

As VentureBeat reported previously, no copyright plaintiffs so far have won a preliminary injunction in these types of AI disputes. Anthropic’s arguments aim to ensure this precedent will persist, at least through this stage in one of many ongoing legal battles. The endgame remains to be seen.
 
Top