A Brief History of A.I. Generated Images

In recent years, artificial intelligence (A.I.) has made incredible strides in its ability to create images from nothing. This has profound implications for marketing professionals, who may soon be able to use A.I. to create realistic images of products or services that don't yet exist.

This technology is still in its early stages, but it shows great promise for the future of marketing. With the ability to create realistic images, marketers will be able to test consumer reactions to new products or services before they are made. This could save companies millions of dollars in development costs and help them bring new products to market faster.

The creative potential of A.I. is endless, and it is sure to revolutionise the marketing industry in the years to come. But it is not without its pitfalls, as fake news and disinformation are already a problem. As A.I. gets better at creating images, it will become easier for unscrupulous individuals to create bogus images that could mislead people. It will be necessary for users of A.I. image generation to be aware of this possibility and to use the technology responsibly.

The recent history of A.I. image generation

Manipulated images are seen by almost everyone on the planet every day. Photoshopped versions of celebrities and models adorn magazine covers and billboards to such an extent that we have come to accept it as a part of the marketing industry. This work was done by talented graphic designers and digital artists.

via GIPHY

Then, a few years ago, Prisma launched on the scene. Prisma is an app that could turn your photos into works of art in the style of Van Gogh, Picasso, or Munch, and it did it in seconds using cloud-based artificial intelligence. More than a filter, but not entirely a whole new creation, Prisma was using A.I. to transform existing images.

Did Obama really just say that?

Around the same time as Prisma launched, Deepfake videos entered the scene. Deepfakes are synthetic media in which a person in an existing image or video is replaced with someone else's likeness. These technologies combine and superimpose existing images and videos onto source images or videos using artificial intelligence algorithms. The result is a fake video that shows something that never actually happened.

On the one hand, deepfake technology can be used to create realistic images and videos that can be used for fun and entertainment. On the other hand, deepfake technology can also be used to create fake photos and videos that can be used to deceive and manipulate people. Suddenly we were able to watch footage of Barack Obama giving a speech that he had never given, but it looked authentic.

How Snapchat and Insta use A.I.

In the years since we first saw the fake Obama speech, investment in A.I. for visual processing has boomed, and consumer applications have popped up everywhere. For example, Snapchat uses AI-powered lenses with small machine learning models to detect a face, differentiate the structure and features within it, and then create a 3D model of the face. This allows for the application of filters to the face in real-time.

via GIPHY

It's not just the big tech firms that are getting involved. MyHeritage.com, the popular genealogy website, launched a colourisation tool to recolour old family photos using A.I. and deep learning.

The giant leap from manipulation to creation

One similarity between all the aforementioned A.I. image processing technologies is that they require source material in order to provide an output. Prisma and MyHeritage's colourisation tool but need your original photo to give you the new version. Likewise, deepfakes require source footage to create a new version.

The next stage of development for A.I. image creation would require images to be created from nothing. No starting reference, no visual source material to edit. Just the creation of a video or image-based on nothing more than a description. Like an artist in a studio tasked with painting a unicorn running through a field of wheat, A.I. would need to demonstrate a capability that resembles a seemingly unique human trait: imagination.

Enter DALL-E

DALL-E is an artificial intelligence program that creates images from textual descriptions, revealed by OpenAI on January 5, 2021. It uses a 12-billion parameter training version of the GPT-3 transformer model to interpret the natural language inputs and generate corresponding images. For example, when provided with the text "an astronaut riding a wave on a surfboard," it generates a corresponding picture.

Unlike the tools that we had seen before, DALL-E has the capability to understand the written instruction and then generate an image. The more specific the description, the better the outputs. The researchers even found that adding the "phrase "professional high quality" before "illustration" and "emoji" sometimes improves the quality and consistency of the results."

An italian town made of pasta, tomatoes, basil and parmesan #dalle2 #dalle pic.twitter.com/iEaIaGwIz5

— Dalle2 Pics (@Dalle2Pics) May 23, 2022

DALL-E is trained on a data set of 12 billion images and captions. The system looks at the pictures and generates a caption that describes what it sees. DALL-E was developed using OpenAI's GPT-3, a natural language generator. GPT-3 is a general-purpose natural language generator, which means it can be used for anything from generating news articles to creating text-to-speech systems.

2022: Let the games begin

In April 2022, OpenAI announced DALL-E 2, an update to their DALL-E software, which can now create photorealistic images from textual descriptions. The software is still in development, but pre-selected beta users already have access. There have been some reports of serious errors made by the software, but researchers are confident that these will be fixed in future updates.

Whereas DALL-E created images from nothing, DALL-E 2 was able to edit existing images with incredible results. They say it "can make realistic edits to existing images from a natural language caption. It can add and remove elements while taking shadows, reflections, and textures into account."

Dall-E 2 from OpenAI is a mind-blowing experience. Here is some fun we had with the team.

A marmot with a pearl earring” by Johannes Vermeer pic.twitter.com/gOiIU7frsJ

— Jean-Charles Samuelian-Werve ʕ•ᴥ•ʔ (@jcsamuelian) May 20, 2022

Last week, Google announced their alternative to OpenAI's DALL-E called Imagen. Google Brain, the A.I. research team at Google, says Imagen is "a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding." The results are incredible.

Use cases for DALL-E and Imagen

Generating images from nothing opens up a world of possibilities, particularly where the outputs can be anything from a simple sketch to photorealistic scenes of fantasy animals. Here are some use cases for DALL-E that I asked OpenAI's GT-3 to come up with:

1. Generating images of things that do not exist in the real world, such as imaginary creatures or futuristic products.

2. Creating artworks or visualisations based on textual descriptions, such as instructions for how to compose a machine learning model.

3. Generating realistic images for training and testing data sets in machine learning models.

4. Automatically creating 3D models of objects from 2D images.

I know that I would love to use DALL-E to create images in blog posts and online content to replace all the terrible stock imagery that has flooded the internet.

How to access DALL-E and Imagen?

The short answer is that you can't. Yet.

OpenAI has a waitlist for accessing DALL-E, and they are not processing applications quickly. A tiny handful of users have access to it. That is very much intentional, as DALL-E and Imagen can cause a lot of confusion or moral panic (see Deepfake) if misused.

Google has said that they are not providing access to Imagen at this stage as it has too many biases and prejudices. The research team said it showed "an overall bias towards generating images of people with lighter skin tones and … portraying different professions to align with Western gender stereotypes."

So, don't expect A.I. generated images to appear everywhere just yet.

Can these A.I. image generators fool human beings?

Absolutely, yes. DALL-E did just that when Matt Bell, an A.I. technologist, shared photos from his recent diving holiday alongside DALL-E 2 generated underwater photos. He shared the photos online, saying, "there were 22 real photos, four synthetic ones, and one final image that revealed the experiment. The synthetic ones all came after the real ones." He then asked for a short survey of people who viewed the gallery.

Did anybody spot the fake images in the collection? Yes, but not many. Here's what he wrote on his blog:

"An incredible 83% of the people (19 of 23) who answered the survey at the end missed the fact that there was something different about the DALL-E images. This worked despite the fact that the telltale DALL-E watermark was in the lower right corner (to comply with OpenAI'a access rules) and the fact that the images were a different aspect ratio and more grainy. My friend group is also a relatively sophisticated audience that is well aware of the existence of DALL-E and deepfakes."

What next for A.I. image generation?

OpenAI is leading the way in providing access to the tools. They will be the first to market with an API that developers can plug in to, much like they can with GPT-3.

A.I. research groups are encouraging both Google and OpenAI to release the code and training data for these new models to be properly scrutinised and, hopefully, improved. Ideally, we'll have image generation tools that aren't inherently sexist or racist. Obviously.

Precisely what it means for visual artists, graphic designers and how businesses use AI for their sales and marketing is the topic for another day. Much like GPT-3 won't remove the need for copywriters, DALL-E and Imagen won't remove the need for artists and designers. A.I. Image generation tools will become exactly that – tools. When you need a quick concept for a product, a texture for a surface in a video game, or an image to replace the stock photo of people shaking hands on your blog post, A.I. images generation software will give you something good enough to get you going.

But it will still take the human touch to turn it from something good to great.

A.I. Image generation tools you can try today

Here are a few A.I. image generation tools you can play with today.

Dream by WOMBO App (iOS and Android)

Create beautiful artwork using the power of A.I.! Enter a prompt, pick an art style and watch WOMBO Dream turn your idea into an AI-powered painting in seconds. Search Apple's App Store or Google Play Store to download it today.

DALL-E Mini – HuggingFace

DALL-E Mini is an A.I. model that generates images from any prompt you give. It isn't close to the quality of DALL-E or Imagen, but it's fun to play with.

A Brief History of A.I. Generated Images

A Brief History of A.I. Generated Images

The recent history of A.I. image generation

Did Obama really just say that?

How Snapchat and Insta use A.I.

The giant leap from manipulation to creation

Enter DALL-E

2022: Let the games begin

Use cases for DALL-E and Imagen

How to access DALL-E and Imagen?

Can these A.I. image generators fool human beings?

What next for A.I. image generation?

A.I. Image generation tools you can try today

Dream by WOMBO App (iOS and Android)

DALL-E Mini – HuggingFace

Comments