Google’s Imagen AI Model Makes Advances in Text-to-Image

Google has released a research paper on a new text-to-image generator called Imagen, which combines the power of large transformer language models for text with the capabilities of diffusion models in high-fidelity image generation. “Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis,” the company said. Simultaneously, Google is introducing DrawBench, a benchmark for text-to-image models it says was used to compare Imagen with other recent technologies including VQGAN+CLIP, latent diffusion models, and OpenAI’s DALL-E 2. Continue reading Google’s Imagen AI Model Makes Advances in Text-to-Image

OpenAI Unveils AI-Powered DALL-E Text-to-Image Generator

OpenAI unveiled DALL-E, which generates images from text using two multimodel AI systems that leverage computer vision and NLP. The name is a reference to surrealist artist Salvador Dali and Pixar’s animated robot WALL-E. DALL-E relies on a 12-billion parameter version of GPT-3. OpenAI demonstrated that DALL-E can manipulate and rearrange objects in generated imagery and also create images from scratch based on text prompts. It has stated that it plans to “analyze how models like DALL·E relate to societal issues.” Continue reading OpenAI Unveils AI-Powered DALL-E Text-to-Image Generator