Microsoft's Bing Image Creator lets beloved characters fly planes toward tall buildings, illustrating the struggles of generative AI models have with copyright and filtering.
www.404media.co
Bing Is Generating Images of SpongeBob Doing 9/11
·OCT 4, 2023 AT 9:21 AM
Microsoft's Bing Image Creator lets beloved characters fly planes toward tall buildings, illustrating the struggles of generative AI models have with copyright and filtering.
MADE BY BING IMAGE CREATOR
Become a paid subscriber for unlimited, ad-free articles and access to bonus content. This site is funded by subscribers and you will be directly powering our journalism.
JOIN THE NEWSLETTER TO GET THE LATEST UPDATES.
I generated a bunch of AI images of beloved fictional characters doing 9/11, and I’m not the only one.
Microsoft’s Bing Image Creator, produced by one of the most brand-conscious companies in the world, is heavily filtered: images of real humans aren’t allowed, along with
a long list of scenarios and themes like violence, terrorism, and hate speech. It launched in March, and since then, users have been putting it through its paces. That people have found a way to easily produce images of Kirby, Mickey Mouse or Spongebob Squarepants doing 9/11 with Microsoft’s heavily restricted tools shows that even the most well-resourced companies in the world are still struggling to navigate issues of moderation and copyrighted material around generative AI.
I came across @tolstoybb’s Bing creation of Eva pilots from
Neon Genesis Evangelion in the cockpit of a plane giving a thumbs-up and headed for the twin towers, and found more people in the replies doing the same with LEGO minifigs, pirate ships, and soviet naval hero Stanislav Petrov. And it got me thinking: Who else could Bing put in the pilot’s seat on that day?
While I tried to make my Kirby versions, Microsoft blocked prompts with the phrases “World Trade Center,” “twin towers,” and “9/11,” and if you try to make images with those prompts, the site will give you an error that says it’s against the terms of use (and repeated attempts at blocked content will get you banned from using the site forever). But since I am a human with the power of abstract thought and Bing is a computer doing math with words, I can describe exactly what I want to see without using specifics. It works conceptually because anyone looking at an image of a generic set of skyscrapers from the POV of an airplane pilot aiming a plane directly at them can infer the reference and what comes next.
“KIRBY SITTING IN THE COCKPIT OF A PLANE, FLYING TOWARD TWO TALL SKYSCRAPERS" MADE BY BING IMAGE CREATOR
I made these with the prompt “kirby sitting in the cockpit of a plane, flying toward two tall skyscrapers.” I didn’t specify “New York City” in the prompt, but in three of the first four generations, it’s definitely NYC, and unmistakably the twin towers. In one of them, behind the skyscrapers, the Empire State Building and, improbably, the new 1WTC building are both visible in the distance. Adding “located in nyc” to the prompt isn’t a problem for the filters, however (and with the city included, the images take an even more sinister tone, with Kirbs furrowing his brow).
“KIRBY SITTING IN THE COCKPIT OF A PLANE, FLYING TOWARD TWO TALL SKYSCRAPERS IN NEW YORK CITY" MADE BY BING IMAGE CREATOR
Technically, there’s no violence, real people, or even terrorism depicted in these images. It’s just Kirby flying a plane with a view. We can fill in the blanks that AI can’t. There’s also just a ton of memes, shytposts, photos, and illustrations of both the twin towers as they existed for three decades, and plenty of images exist of the inside of plane cockpits, and my beloved Kirby (or any other popular animated character). Putting these together is straightforward enough for Bing, which doesn’t understand this context. I can do the same for any character, animated or not, but real names are off limits; Ryan Seacrest won’t work in this prompt, for example, but Walter White, Mickey Mouse, Mario, Spongebob, and I assume an infinite number of other fictional characters, animated or live-action, will work:
“WALTER WHITE SITTING IN THE COCKPIT OF A PLANE, FLYING TOWARD TWO TALL SKYSCRAPERS" MADE BY BING IMAGE CREATOR
“MICKEY MOUSE SITTING IN THE COCKPIT OF A PLANE, FLYING TOWARD TWO TALL SKYSCRAPERS" MADE BY BING IMAGE CREATOR
“SPONGEBOB SITTING IN THE COCKPIT OF A PLANE, FLYING TOWARD TWO TALL SKYSCRAPERS, NYC, PHOTOREALISTIC" MADE BY BING IMAGE CREATOR
Bing, and most generative AI models with very strict filters and terms of use — whether they’re making images or text — are playing a game of semantic wack-a-mole. Microsoft can ban individual phrases from prompts forever, until there are no words left, and people will still get around filters. Some companies are making models
without any filters whatsoever, and releasing them into the wild as permanently-accessible files.
It’s the same impossible problem sites like OnlyFans try to solve by banning certain words from profiles and messages; “blood” in one context means violence, and in another, it means a normal bodily function that 50 percent of the population experiences. Since these filter words are hidden from users, in the case of porn sites as well as image generators, humans are left to try to think like robots to figure out what will potentially get them a lifetime ban.
To make it all more difficult, it’s not clear that the companies making generative AI models even fully understand why some things get banned and others don’t; When I asked Microsoft why Bing Image Creator
blocked prompts with “Julius Caesar” it didn’t tell me. A lot of the biggest AI developers are operating within
black boxes. AI is moderating AI.
It’s also the reason people are able to jailbreak generative text models like OpenAI’s GPT to have
horny conversations with large language models that are otherwise heavily filtered. Even with filtered versions of these models, which normally don’t allow erotic roleplay, injecting the right prompt can manipulate the model into returning sexually explicit results.
All of this is a problem of moderation that user generated platforms have faced for as long as they’ve existed. When YouTube bans “sexually gratifying” content, for example, but allows “educational” videos,
people easily find loopholes because the meanings of those words are entirely subjective — and in some cases,
even more hardcoded exploits.
As computers learn how to convincingly parrot language, the way we speak, think, and behave as humans remains malleable and fluid. AI-generated images and text are everywhere, now, including in straightforward
search results of historical events, and it’s worth thinking about how our interactions with these models are changing the internet as we know it — even if that thought process includes making Kirby appear as if he’s about to do mass casualty terrorism.
Anyway, here’s an image of Kirby saving kittens from a burning building as a palate cleanser.
“KIRBY SAVING KITTENS FROM A BURNING BUILDING" MADE BY BING IMAGE CREATOR