International Marketing Group
  • 24th April, 2021

On January 5, 2021, OpenAI announced DALL-E, an artificial intelligence software that generates images from textual descriptions. It interprets natural language inputs and generates images using a 12-billion parameter variant of the GPT-3 Transformer model. It has the ability to produce representations of both realistic and non-realistic objects.

What it is?

DALL-E was created and made public in tandem with CLIP, a separate model tasked with understanding and ranking its performance. CLIP curates the photographs that DALL-E produces, presenting the highest-quality images for each prompt. OpenAI has declined to make the source code for any DALL-E model accessible on their website.

What is it's Architecture?

The DALL-E model is a multimodal GPT-3 implementation with 12 billion parameters that "swaps text for pixels" and was educated on text-image pairs from the Internet. It generates output from a summary and cue using zero-shot learning, which requires no additional preparation.

How does it work?

In response to prompts, DALL-E produces a large number of images. CLIP, an OpenAI model created in collaboration with DALL-E, was used to understand and rank this performance. CLIP is an image recognition system, but unlike most classifier models, it was educated on photographs and definitions scraped from the Internet rather than curated databases of labelled images. Rather than taking advice from a sibling, CLIP associates photographs of whole captions rather than learning from a particular label. CLIP was taught to predict which caption would be most suitable for a given image, allowing it to recognise objects in images outside of its training collection.

DALL-E is a neural network that can effectively convert text into an effective image for a variety of natural language concepts. The GPT-3, which was a big advance in NLP, has already been implemented and shown to be effective. This network was built using a similar method.

Pros & Cons

DALL-E can make polygonal structures that may or may not exist in real life. DALL-E does not consistently create the correct item, but certain shapes have a lower success rate. Counting is currently a big problem with DALL-E. DALL-E can create several items, but it can't consistently count more than two. Though DALLE gives you some control over the attributes and number of items, the performance rate is dependent on how you word the caption. Where a text asks for nouns with different meanings, such as "glasses," "chips," or "cups," it will sometimes draw both interpretations, based on the plural form used.


Share on:

Recent post

Quantum computing
  • 17th May, 2021
Speech Recognition
  • 7th May, 2021
Let's work together

Are you interested?

Get in touch now
Or call us now +352 661 197 280