Are you looking for A.I. Text to Image tools that you can use straight away? Click here!

Artificial Intelligence image generators that use text-to-image are becoming increasingly popular. One of the most popular AI image generators is OpenAI's DALL-E. DALL-E is a neural network that generates images from textual descriptions, and it is constantly improving. The images produced by DALL-E are often surreal, but they can also be realistic. The possibilities are endless with this technology, and it is only going to get better.

In this article, I'll share some of the AI image generation models that you can expect to hear a lot more about in the months and years ahead.

OpenAI’s DALL-E

In January 2021, OpenAI announced it has developed a text-to-language model called DALL-E. DALL-E, a portmanteau of Pixar's Wall-E and Salvador Dali, is a artificial intelligence model that has been trained to generate images from text descriptions. It is based on the OpenAI GPT-3 model and has been trained with a dataset of text-image pairs. DALL·E is capable of creating anthropomorphised versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images.


Avacado Armchair Dall-e

Created from the text prompt: An armchair in the shape of an avacado.


DALL-E was never released publicly and the announcement was based on a technical research paper rather than being a new product announcement. However, DALL-E was the first tool to show the potential of AI text-to-image generation.

OpenAI’s DALL-E 2

Fast-forward to spring 2022. OpenAI announced an upgrade to DALL-E called, unsurprisingly, DALL-E 2. As upgrades go, this one was huge. In comparison to DALL-E, DALL-E 2 generates "more realistic and accurate images with 4x greater resolution." In just one year, OpenAI had greatly improved their model.

DALL-E 2 was now able to produce photorealistic images with much greater realism than the earlier model. It was also able to make precise edits to existing images. Unlike DALL-E, which was never publicly accessible, DALL-E 2 launched with a public preview waitlist and demand was incredibly high. Within the first two months of opening, 2.5 million users had access to DALL-E 2 (myself included).

UPDATE: On July 20th 2022, OpenAI announced pricing for Dall-E 2 beta users.

In this first phase of the beta, users can buy additional DALL·E credits in 115-credit increments (460 images) for $15 on top of their free monthly credits. One credit is applied each time a prompt is entered and a user hits “generate” or “variations.”

Number of images is approximate. DALL·E generates four images for every natural language prompt. DALL·E’s Edit and Variations features generate three images.

UPDATE: On September 28th 2022, OpenAI announced the waitlist for DALL·E is no longer in place

Starting today, we are removing the waitlist for the DALL·E beta so users can sign up and start using it immediately. More than 1.5M users are now actively creating over 2M images a day with DALL·E—from artists and creative directors to authors and architects—with over 100K users sharing their creations and feedback in our Discord community.

Craiyon (Formerly DALL-E Mini)

While every man and his dog waited for access to DALL-E 2, eager users were able to get a taste of DALL-E by using DALL-E Mini on HuggingFaces. DALL-E Mini is an open-source text-to-image generator and was made publicly available.

DALL-E Mini gained a lot of attention on social media, so much so that they were asked to change the name by OpenAI due to the confusion it was causing. Having rebranded under the name Craiyon, DALL-E Mini (now Mega), is rapidly improving in terms of outputs. As an open-source model, users are free to download the model for themselves or you can use it yourself over at https://www.craiyon.com

UPDATE: Craiyon did a brand collaboration with Smile, a new horror movie, which went viral on Twitter.

Google’s Imagen

Shortly after DALL-E 2 was announced, Google Research announced their own text-to-image generator called Imagen. Like DALL-E 2, Imagen is able to produce outstanding photorealistic images.

 

 

Researchers at Google created a new benchmark to assess the quality of AI-generated images, Drawbench. In the announcement of Imagen, they said "With DrawBench, we compare Imagen with recent methods including VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2, and find that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment."

Imagen is currently only available to a select group of researchers. Google has no plans to release the model at this time as they continue to weigh up the societal impact of such powerful technology.

Imagen Video

On 6th October 2022, Google’s Imagen team announced that they had created a text-conditioned video diffusion model that generates 1280x768 24fps HD videos.

Google’s Parti

Google Parti is a model that allows for the generation of high-quality, photorealistic images. Additionally, the model supports the synthesis of complex compositions and scenes that reflect real-world knowledge.

Scaling from 350M to 20B parameters

Created from the prompt: A portrait photo of a kangaroo wearing an orange hoodie and blue sunglasses standing on the grass in front of the Sydney Opera House holding a sign on the chest that says Welcome Friends!


Parti is built on previous advances with diffusion models, such as Google’s Imagen, which have demonstrated impressive capabilities and state-of-the-art performance on research benchmarks. The combination of these two powerful models opens up exciting possibilities for the creation of realistic, never-before-seen scenes and images.

Midjourney

Midjourney is a AI art generator. It is currently in closed beta, but it is already showing great promise as an AI-powered generative network. The results are truly breathtaking.

Midjourney bills itself as an independent research lab that is constantly exploring new mediums of thought. Beta access requires an invite, which limits the amount of users that can experience the tool but allows Midjourney to scale the product sensibly.

A subscription plan is available with plans starting at $10 per month.

Stable Diffusion

On August 22nd 2022, the team at Stability.ai announced the public release of Stable Diffusion for researchers following the announcement of the private beta earlier in the month.

In the earlier announcement, Stability.ai said:

Stable Diffusion is a text-to-image model that will empower billions of people to create stunning art within seconds. It is a breakthrough in speed and quality meaning that it can run on consumer GPUs. You can see some of the amazing output that has been created by this model without pre or post-processing on this page.

The model itself builds upon the work of the team at CompVis and Runway in their widely used latent diffusion model combined with insights from the conditional diffusion models by our lead generative AI developer Katherine Crowson, Dall-E 2 by Open AI, Imagen by Google Brain and many others. We are delighted that AI media generation is a cooperative field and hope it can continue this way to bring the gift of creativity to all.

Here are some of the test creations that I produced during the Stable Diffusion beta phase.

UPDATE: on 22nd August 2022, Stable DIffusion announced their public released.

It didn’t take long for app developers in a range of industries to implement Stable Diffusion into their procucts. As an open-source model, Stable Diffusion is free to use and it is also available to use via API on DreamStudio.

DALL-E vs Midjourney vs StableDiffusion

You might be asking which A.I. image generation model is best between DALL-E, Midjourney and Stable Diffusion?

Two see a side-by-side comparison of the models using different prompts, check out this Twitter thread from Fabian Stelzer where he uses the same prompt in each of the three models, sharing the best results from each.

How can I start generating A.I. Images?

If you want to start creating A.I. images today without the hassle of coding your own product or joining a Discord server, these are the best places to get started:

  1. OpenAI’s DALL-E

  2. DreamStudio

  3. Craiyon

  4. RIKU.AI

  5. Canva

Conclusion

We are very much at the beginning of the A.I. image generation story. These tools are not commercialised in any meaningful sense but it won't be long before they start opening up API access and developers can integrate them into their tools, much like we have seen with AI copywriting tools developed using GPT-3.

There is a wider conversation to be had about the ethics, biases and potential societal impact of AI generated images but that can wait for another day. For now, I'm excited to see where this leads us and what impact it has on the digital marketing industry.

Download the AI for Marketing Playbook

Comments