Are you looking for A.I. Text to Image tools that you can use straight away? Click here!
Artificial Intelligence image generators that use text-to-image are becoming increasingly popular. One of the most popular AI image generators is OpenAI's DALL-E. DALL-E is a neural network that generates images from textual descriptions, and it is constantly improving. The images produced by DALL-E are often surreal, but they can also be realistic. The possibilities are endless with this technology, and it is only going to get better.
In this article, I'll share some of the AI image generation models that you can expect to hear a lot more about in the months and years ahead.
OpenAI’s DALL-E
In January 2021, OpenAI announced it has developed a text-to-language model called DALL-E. DALL-E, a portmanteau of Pixar's Wall-E and Salvador Dali, is a artificial intelligence model that has been trained to generate images from text descriptions. It is based on the OpenAI GPT-3 model and has been trained with a dataset of text-image pairs. DALL·E is capable of creating anthropomorphised versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images.
DALL-E was never released publicly and the announcement was based on a technical research paper rather than being a new product announcement. However, DALL-E was the first tool to show the potential of AI text-to-image generation.
OpenAI’s DALL-E 2
Fast-forward to spring 2022. OpenAI announced an upgrade to DALL-E called, unsurprisingly, DALL-E 2. As upgrades go, this one was huge. In comparison to DALL-E, DALL-E 2 generates "more realistic and accurate images with 4x greater resolution." In just one year, OpenAI had greatly improved their model.
DALL-E 2 was now able to produce photorealistic images with much greater realism than the earlier model. It was also able to make precise edits to existing images. Unlike DALL-E, which was never publicly accessible, DALL-E 2 launched with a public preview waitlist and demand was incredibly high. Within the first two months of opening, 2.5 million users had access to DALL-E 2 (myself included).
UPDATE: On July 20th 2022, OpenAI announced pricing for Dall-E 2 beta users.
In this first phase of the beta, users can buy additional DALL·E credits in 115-credit increments (460 images) for $15 on top of their free monthly credits. One credit is applied each time a prompt is entered and a user hits “generate” or “variations.”
Number of images is approximate. DALL·E generates four images for every natural language prompt. DALL·E’s Edit and Variations features generate three images.
UPDATE: On September 28th 2022, OpenAI announced the waitlist for DALL·E is no longer in place
Starting today, we are removing the waitlist for the DALL·E beta so users can sign up and start using it immediately. More than 1.5M users are now actively creating over 2M images a day with DALL·E—from artists and creative directors to authors and architects—with over 100K users sharing their creations and feedback in our Discord community.
>
A Football Ram No. 1
Text Prompt: A professional, high-quality comic book style drawing of a ram wearing a white soccer t-shirt and black shorts next to a soccer ball.
A Football Ram No. 2
Text Prompt: A professional, high-quality comic book style drawing of a ram wearing a white soccer t-shirt and black shorts next to a soccer ball.
A Football Ram No. 3
Text Prompt: A professional, high-quality comic book style drawing of a ram wearing a white soccer t-shirt and black shorts next to a soccer ball.
Football Crowd No.1
Text prompt: A wide-angle scene of a crowd of soccer fans wildly celebrating a goal inside a stadium. The crowd are wearing black and white. Drawn in a comic book style.
Football Crowd No. 2
Text Prompt: A wide-angle scene of a crowd of soccer fans wildly celebrating a goal inside a stadium. The crowd are wearing black and white. Drawn in a comic book style.
Football Crowd No. 3
Text Prompt: A wide-angle scene of a crowd of soccer fans wildly celebrating a goal inside a stadium. The crowd are wearing black and white. Painted in watercolour with ink pen detail.
Craiyon (Formerly DALL-E Mini)
While every man and his dog waited for access to DALL-E 2, eager users were able to get a taste of DALL-E by using DALL-E Mini on HuggingFaces. DALL-E Mini is an open-source text-to-image generator and was made publicly available.
He was a sk8er
Via @imroisan#craiyon #craiyoncreations #AIart #aiartcommunity pic.twitter.com/H6jwOBVJkl— craiyon (@craiyonAI) August 11, 2022
DALL-E Mini gained a lot of attention on social media, so much so that they were asked to change the name by OpenAI due to the confusion it was causing. Having rebranded under the name Craiyon, DALL-E Mini (now Mega), is rapidly improving in terms of outputs. As an open-source model, users are free to download the model for themselves or you can use it yourself over at https://www.craiyon.com
UPDATE: Craiyon did a brand collaboration with Smile, a new horror movie, which went viral on Twitter.
It's time to spread the smile. Generate yours at https://t.co/7ZzuonOBT1. #SmileMovie pic.twitter.com/rYwiDQ8ztj
— Smile Movie (@SmileMovie) September 26, 2022
Google’s Imagen
Shortly after DALL-E 2 was announced, Google Research announced their own text-to-image generator called Imagen. Like DALL-E 2, Imagen is able to produce outstanding photorealistic images.
Researchers at Google created a new benchmark to assess the quality of AI-generated images, Drawbench. In the announcement of Imagen, they said "With DrawBench, we compare Imagen with recent methods including VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2, and find that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment."
Imagen is currently only available to a select group of researchers. Google has no plans to release the model at this time as they continue to weigh up the societal impact of such powerful technology.
Imagen Video
On 6th October 2022, Google’s Imagen team announced that they had created a text-conditioned video diffusion model that generates 1280x768 24fps HD videos.
Very happy to release #ImagenVideo today! Amazing work with an amazing team!https://t.co/Cdv8hKCGGk
High fidelity text to video with diffusion models: "Flying through an intense battle between pirate ships in a stormy ocean." https://t.co/0uxNTIoiFY pic.twitter.com/M3lAQPJG1K— Tim Salimans (@TimSalimans) October 5, 2022
Google’s Parti
Google Parti is a model that allows for the generation of high-quality, photorealistic images. Additionally, the model supports the synthesis of complex compositions and scenes that reflect real-world knowledge.
Scaling from 350M to 20B parameters
Parti is built on previous advances with diffusion models, such as Google’s Imagen, which have demonstrated impressive capabilities and state-of-the-art performance on research benchmarks. The combination of these two powerful models opens up exciting possibilities for the creation of realistic, never-before-seen scenes and images.
Midjourney
Midjourney is a AI art generator. It is currently in closed beta, but it is already showing great promise as an AI-powered generative network. The results are truly breathtaking.
Cotton candy factory inferno #midjourney pic.twitter.com/Mn36sY9hTh
— (@killedgier) June 7, 2022
Midjourney bills itself as an independent research lab that is constantly exploring new mediums of thought. Beta access requires an invite, which limits the amount of users that can experience the tool but allows Midjourney to scale the product sensibly.
Misery World #midjourney #disneyworld pic.twitter.com/dcS0gRj1hS
— douggy (@douggypledger) June 18, 2022
A subscription plan is available with plans starting at $10 per month.
Stable Diffusion
On August 22nd 2022, the team at Stability.ai announced the public release of Stable Diffusion for researchers following the announcement of the private beta earlier in the month.
In the earlier announcement, Stability.ai said:
Stable Diffusion is a text-to-image model that will empower billions of people to create stunning art within seconds. It is a breakthrough in speed and quality meaning that it can run on consumer GPUs. You can see some of the amazing output that has been created by this model without pre or post-processing on this page.
The model itself builds upon the work of the team at CompVis and Runway in their widely used latent diffusion model combined with insights from the conditional diffusion models by our lead generative AI developer Katherine Crowson, Dall-E 2 by Open AI, Imagen by Google Brain and many others. We are delighted that AI media generation is a cooperative field and hope it can continue this way to bring the gift of creativity to all.
Here are some of the test creations that I produced during the Stable Diffusion beta phase.
UPDATE: on 22nd August 2022, Stable DIffusion announced their public released.
It didn’t take long for app developers in a range of industries to implement Stable Diffusion into their procucts. As an open-source model, Stable Diffusion is free to use and it is also available to use via API on DreamStudio.
DALL-E vs Midjourney vs StableDiffusion
You might be asking which A.I. image generation model is best between DALL-E, Midjourney and Stable Diffusion?
Two see a side-by-side comparison of the models using different prompts, check out this Twitter thread from Fabian Stelzer where he uses the same prompt in each of the three models, sharing the best results from each.
DALL-E 2 vs Midjourney vs StableDiffusion mega thread: photography, illustration, painters, abstract
these image synths are like instruments - it's amazing we'll get so many of them, each with a unique "sound"
rules: same prompt, 1:1 aspect ratio, no living artists pic.twitter.com/47syy7uPJJ— fabians.eth (@fabianstelzer) August 20, 2022
How can I start generating A.I. Images?
If you want to start creating A.I. images today without the hassle of coding your own product or joining a Discord server, these are the best places to get started:
Conclusion
We are very much at the beginning of the A.I. image generation story. These tools are not commercialised in any meaningful sense but it won't be long before they start opening up API access and developers can integrate them into their tools, much like we have seen with AI copywriting tools developed using GPT-3.
There is a wider conversation to be had about the ethics, biases and potential societal impact of AI generated images but that can wait for another day. For now, I'm excited to see where this leads us and what impact it has on the digital marketing industry.
Comments