From toy to tool: DALL-E 3 is a wake-up call for visual artists—and the rest of us

bnew · Nov 25, 2023

From toy to tool: DALL-E 3 is a wake-up call for visual artists—and the rest of us

AI image synthesis is getting more capable at executing ideas, and it's not slowing down.

arstechnica.com

From toy to tool: DALL-E 3 is a wake-up call for visual artists—and the rest of us

AI image synthesis is getting more capable at executing ideas, and it's not slowing down.

BENJ EDWARDS - 11/16/2023, 7:20 AM

An composite of three DALL-E 3 AI art generations: an oil painting of Hercules fighting a shark, an photo of the queen of the universe, and a marketing photo of Marshmallow Menace cereal.

Enlarge / A composite of three DALL-E 3 AI art generations: an oil painting of Hercules fighting a shark, a photo of the queen of the universe, and a marketing photo of "Marshmallow Menace" cereal.

DALL-E 3 / Benj Edwards

412WITH

In October, OpenAI launched its newest AI image generator—DALL-E 3—into wide release for ChatGPT subscribers. DALL-E can pull off media generation tasks that would have seemed absurd just two years ago—and although it can inspire delight with its unexpectedly detailed creations, it also brings trepidation for some. Science fiction forecast tech like this long ago, but seeing machines upend the creative order feels different when it's actually happening before our eyes.

Using AI to improve itself

We first covered DALL-E 3 upon its announcement from OpenAI in late September, and since then, we've used it quite a bit. For those just tuning in, DALL-E 3 is an AI model (a neural network) that uses a technique called latent diffusion to pull images it "recognizes" out of noise, progressively, based on written prompts provided by a user—or in this case, by ChatGPT. It works using the same underlying technique as other prominent image synthesis models like Stable Diffusion and Midjourney.

You type in a description of what you want to see, and DALL-E 3 creates it.

ChatGPT and DALL-E 3 currently work hand-in-hand, making AI art generation into an interactive and conversational experience. You tell ChatGPT (through the GPT-4 large language model) what you'd like it to generate, and it writes ideal prompts for you and submits them to the DALL-E backend. DALL-E returns the images (usually two at a time), and you see them appear through the ChatGPT interface, whether through the web or via the ChatGPT app.

Previous Slide Next Slide

[*]

An AI-generated image of a fictional "Beet Bros." arcade game, created by DALL-E 3.
DALL-E 3 / Benj Edwards

[*]

An AI-generated image of Abraham Lincoln holding a sign that is intended to say "Ars Technica," created by DALL-E 3.
DALL-E 3 / Benj Edwards

[*]

An AI-generated image of autumn leaves, created by DALL-E 3.
DALL-E 3 / Benj Edwards

[*]

An AI-generated image of a pixelated Christmas scene created by DALL-E 3.
DALL-E 3 / Benj Edwards

[*]

An AI-generated image of a neon shop sign created by DALL-E 3.
DALL-E 3 / Benj Edwards

[*]

An AI-generated image of a plate of pickles, created by DALL-E 3.
DALL-E 3 / Benj Edwards

[*]

An AI-generated illustration of a promotional image for "The Cave BBS," created by DALL-E 3.
DALL-E 3 / Benj Edwards

Many times, ChatGPT will vary the artistic medium of the outputs, so you might see the same subject depicted in a range of styles—such as photo, illustration, render, oil painting, or vector art. You can also change the aspect ratio of the generated image from the square default to "wide" (16:9) or "tall" (9:16).

OpenAI has not revealed the dataset used to train DALL-E 3, but if previous models are any indication, it's likely that OpenAI used hundreds of millions of images found online and licensed from Shutterstock libraries. To learn visual concepts, the AI training process typically associates words from descriptions of images found online (through captions, alt tags, and metadata) with the images themselves. Then it encodes that association in a multidimensional vector form. However, those scraped captions—written by humans—aren't always detailed or accurate, which leads to some faulty associations that reduce an AI model's ability to follow a written prompt.

To get around that problem, OpenAI decided to use AI to improve itself. As detailed in the DALL-E 3 research paper, the team at OpenAI trained this new model to surpass its predecessor by using synthetic (AI-written) image captions generated by GPT-4V, the visual version of GPT-4. With GPT-4V writing the captions, the team generated far more accurate and detailed descriptions for the DALL-E model to learn from during the training process. That made a world of difference in terms of DALL-E's prompt fidelity—accurately rendering what is in the written prompt. (It does hands pretty well, too.)

What the older DALL-E 2 generated when we prompted our old standby, "a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting." This was considered groundbreaking, state-of-the art AI image synthesis in April 2022.

DALL-E 2 / Benj Edwards
What the newer DALL-E 3 generated in October 2023 when we prompted our old standby, "a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting."

DALL-E 3 / Benj Edwards

Previous Slide Next Slide

bnew · Nov 25, 2023

In addition, DALL-E 3 is very good at rendering accurate text compared to DALL-E 2 and some other image synthesis models. This is another effect of the highly detailed captioning created by GPT-4V: "When building our captioner, we paid special attention to ensuring that it was able to include prominent words found in images in the captions it generated," writes the DALL-E 3 team in its paper. "As a result, DALL-E 3 can generate text when prompted."

An example of DALL-E 3 rendering text: A muscular barbarian with weapons stands confidently beside a CRT television set displaying the text 'Ars Technica'. The scene is cinematic with 8K resolution and dramatic studio lighting.

Enlarge / An example of DALL-E 3 rendering text: "A muscular barbarian with weapons stands confidently beside a CRT television set displaying the text 'Ars Technica'. The scene is cinematic with 8K resolution and dramatic studio lighting."

DALL-E 3 / Benj Edwards

DALL-E's text rendering ability isn't perfect—some words have extra or missing characters, and others seem garbled at times. The team speculates that this is due to the token encoder they used. Tokens are fragments of words (and sometimes whole words) used to represent words in machine learning models such as GPT-4 and the prompt interpreter for DALL-E 3. The reliance on tokens sometimes creates a type of blindness for certain words or spellings when chunks of words get lumped into a single token together.

For example, the word "dog" is represented to DALL-E 3 as a single token and not three characters (D-O-G), as you might expect. "When the model encounters text in a prompt, it actually sees tokens that represent whole words and must map those to letters in an image," the team writes. "In future work, we would like to explore conditioning on character-level language models to help improve this behavior."

What does it mean for artists?

By now, you've seen some of what DALL-E 3 can do. It vastly exceeds the state of the art from a year ago, and it dwarfs DALL-E 1 from 2021. It's a technical triumph. But it's still very difficult to write about AI image generators without an asterisk. The technology is highly divisive. For some people, the tech represents an exciting development in creative expression, but for others, it symbolizes insensitivity and corporate greed.

“Taste is the new skill”—AI art as an accessibility tool

As wholesome as handmade art sounds, not everyone can physically create art due to mental or physical limitations. Over the past year, we've heard from several people with disabilities who enjoy using image synthesis to express themselves in ways they could not otherwise. You can find their stories through searches on Reddit or social media.

AI art and jobs

Even if AI art makes human expression effortless, to borrow from the fire analogy, it can potentially destroy just as easily as it can create. "The Industrial Revolution ended a lot of jobs. It also created a lot of new ones," says Silver. "That’s the nature of progress. You adapt."

A potential boon for the public domain

In the United States, purely AI-generated art cannot currently be copyrighted and exists in the public domain. It's not cut and dried, though, because the US Copyright Office has supported the idea of allowing copyright protection for AI-generated artwork that has been appreciably altered by humans or incorporated into a larger work.

bnew · Nov 25, 2023

In the future, everyone may be a creative director—or “CEO”

It's easy to be negative about new things, but perhaps there is another path. Instead of AI making human artists extinct, an artist could use AI-driven capabilities to build new complexities into their works. For example, creative humans may someday command armies of creative AI agents to execute their visions, much like Andy Warhol relied upon underlings at The Factory to execute his famous artworks. Instead of being replaced, artists might become more potent.

From toy to tool: DALL-E 3 is a wake-up call for visual artists—and the rest of us

Veteran

From toy to tool: DALL-E 3 is a wake-up call for visual artists—and the rest of us​

AI image synthesis is getting more capable at executing ideas, and it's not slowing down.​

FURTHER READING​

Using AI to improve itself​

Veteran

What does it mean for artists?​

FURTHER READING​

Veteran

“Taste is the new skill”—AI art as an accessibility tool​

FURTHER READING​

AI art and jobs​

FURTHER READING​

A potential boon for the public domain​

Veteran

FURTHER READING​

In the future, everyone may be a creative director—or “CEO”​

FURTHER READING​

FURTHER READING​

Superstar

Veteran

:mjlit: A Fat Black Nasty MFer :mjlit:

Superstar

Chasing The Dragon

:mjlit: A Fat Black Nasty MFer :mjlit:

:mjlit: A Fat Black Nasty MFer :mjlit:

Moderator

:mjlit: A Fat Black Nasty MFer :mjlit:

Veteran

I don’t argue with niqqas on the Internet anymore

Similar threads

From toy to tool: DALL-E 3 is a wake-up call for visual artists—and the rest of us

AI image synthesis is getting more capable at executing ideas, and it's not slowing down.

FURTHER READING

Using AI to improve itself

What does it mean for artists?

FURTHER READING

“Taste is the new skill”—AI art as an accessibility tool

FURTHER READING

AI art and jobs

FURTHER READING

A potential boon for the public domain

FURTHER READING

In the future, everyone may be a creative director—or “CEO”

FURTHER READING

FURTHER READING