Plagiaristic visual outputs in another platform: DALL-E 3
An obvious follow-up question is to what extent are the things we have documented true of of other generative AI image-creation systems? Our next set of experiments asked whether what we found with respect to Midjourney was true on OpenAI’s
DALL-E 3, as made available through Microsoft’s Bing.
As we reported recently on Substack, the answer was again clearly yes. As with Midjourney, DALL-E 3 was capable of creating plagiaristic (near identical) representations of trademarked characters, even when those characters were not mentioned by name.
DALL-E 3 also created a whole universe of potential trademark infringements with this single two-word prompt: animated toys [bottom right].
OpenAI’s DALL-E 3, like Midjourney, produced images closely resembling characters from movies and games.GARY MARCUS AND REID SOUTHEN VIA DALL-E 3
OpenAI’s DALL-E 3, like Midjourney, appears to have drawn on a wide array of copyrighted sources. As in Midjourney’s case, OpenAI seems to be well aware of the fact that their software might infringe on copyright, offering in November to indemnify users (with some restrictions) from copyright infringement lawsuits. Given the scale of what we have uncovered here, the potential costs are considerable.
How hard is it to replicate these phenomena?
As with any stochastic system, we cannot guarantee that our specific prompts will lead other users to identical outputs; moreover there has been
some speculation that OpenAI has been changing their system in real time to rule out some specific behavior that we have reported on. Nonetheless, the overall phenomenon was widely replicated within two days of our original report, with
other trademarked entities and
even in other languages.
An X user showed this example of Midjourney producing an image that resembles a can of Coca-Cola when given only an indirect prompt. KATIE CONRADKS/X
The next question is, how hard is it to solve these problems?
Possible solution: removing copyright materials
The cleanest solution would be to retrain the image-generating models without using copyrighted materials, or to restrict training to properly licensed data sets.
Note that one obvious alternative—removing copyrighted materials only post hoc when there are complaints, analogous to takedown requests on YouTube—is much more costly to implement than many readers might imagine. Specific copyrighted materials cannot in any simple way be removed from existing models; large neural networks are not databases in which an offending record can easily be deleted. As things stand now, the equivalent of takedown notices would require (very expensive) retraining in every instance.
Even though companies clearly could avoid the risks of infringing by retraining their models without any unlicensed materials, many might be tempted to consider other approaches. Developers may well try to avoid licensing fees, and to avoid significant retraining costs. Moreover results may well be worse without copyrighted materials.
Generative AI vendors may therefore wish to patch their existing systems so as to restrict certain kinds of queries and certain kinds of outputs. We have already seem some signs of this (below), but believe it to be an uphill battle.
OpenAI may be trying to patch these problems on a case by case basis in a real time. An X user shared a DALL-E-3 prompt that first produced images of C-3PO, and then later produced a message saying it couldn’t generate the requested image. LARS WILDERÄNG/X
We see two basic approaches to solving the problem of plagiaristic images without retraining the models, neither easy to implement reliably.
Possible solution: filtering out queries that might violate copyright
For filtering out problematic queries, some low hanging fruit is trivial to implement (for example, don’t generate Batman). But other cases can be subtle, and can even span more than one query, as in this example from X user
NLeseul:
Experience has shown that guardrails in text-generating systems are often simultaneously too lax in some cases and too restrictive in others. Efforts to patch image- (and eventually video-) generation services are likely to encounter similar difficulties. For instance, a friend, Jonathan Kitzen, recently asked Bing for “ a toilet in a desolate sun baked landscape.” Bing refused to comply, instead returning a baffling “unsafe image content detected” flag. Moreover, as Katie Conrad has shown, Bing’s replies about whether the content it creates can legitimately used are at times deeply misguided.
Already, there are online guides with advice on how to outwit OpenAI’s guardrails for DALL-E 3, with advice like “Include specific details that distinguish the character, such as different hairstyles, facial features, and body textures” and “Employ color schemes that hint at the original but use unique shades, patterns, and arrangements.” The long tail of difficult-to-anticipate cases like the Brad Pitt interchange below ( reported on Reddit) may be endless.
A Reddit user shared this example of tricking ChatGPT into producing an image of Brad Pitt. LOVEGOV/REDDIT
Possible solution: filtering out sources
It would be great if art generation software could list the sources it drew from, allowing humans to judge whether an end product is derivative, but current systems are simply too opaque in their “black box” nature to allow this. When we get an output in such systems, we don’t know how it relates to any particular set of inputs.
The very existence of potentially infringing outputs is evidence of another problem: the nonconsensual use of copyrighted human work to train machines.
No current service offers to deconstruct the relations between the outputs and specific training examples, nor are we aware of any compelling demos at this time. Large neural networks, as we know how to build them, break information into many tiny distributed pieces; reconstructing provenance is known to be extremely difficult.
As a last resort, the X user @bartekxx12 has experimented with trying to get ChatGPT and Google Reverse Image Search to identify sources, with mixed (but not zero) success. It remains to be seen whether such approaches can be used reliably, particularly with materials that are more recent and less well-known than those we used in our experiments.
Importantly, although some AI companies and some defenders of the status quo have suggested filtering out infringing outputs as a possible remedy, such filters should in no case be understood as a complete solution. The very existence of potentially infringing outputs is evidence of another problem: the nonconsensual use of copyrighted human work to train machines. In keeping with the intent of international law protecting both intellectual property and human rights, no creator’s work should ever be used for commercial training without consent.
Why does all this matter, if everyone already knows Mario anyway?
Say you ask for an image of a plumber, and get Mario. As a user, can’t you just discard the Mario images yourself? X user
@Nicky_BoneZ addresses this vividly:
… everyone knows what Mario looks Iike. But nobody would recognize Mike Finklestein’s wildlife photography. So when you say “super super sharp beautiful beautiful photo of an otter leaping out of the water” You probably don’t realize that the output is essentially a real photo that Mike stayed out in the rain for three weeks to take.
As the same user points out, individuals artists such as Finklestein are also unlikely to have sufficient legal staff to pursue claims against AI companies, however valid.
Another X user similarly discussed an example of a friend who created an image with a prompt of “man smoking cig in style of 60s” and used it in a video; the friend didn’t know they’d just used a near duplicate of a Getty Image photo of Paul McCartney.
These companies may well also court attention from the U.S. Federal Trade Commission and other consumer protection agencies across the globe.