February 9, 2024
Apple has released MGIE, an open-source AI model that edits images using natural language instructions. MGIE, short for MLLM-Guided Image Editing, can also modify and optimize images. Developed in conjunction with University of California Santa Barbara, MGIE is Apple’s first AI model. The multimodal MGIE, which understands text and image input, also crops, resizes, flips, and adds filters based on text instructions using what Apple says is an easier instruction set than other AI editing programs, and is simpler and faster than learning a traditional program, like Apple’s own Final Cut Pro.
MGIE was first revealed in an academic paper issued in September, but the researchers published a revised version preparatory to a presentation at the International Conference on Learning Representations (ICLR), a prestigious gathering focused on AI and deep learning that runs May 7-11 in Austria.
SiliconANGLE explains that while there are many language-based AI editing programs available, most require the detailed descriptions of changes they need to execute an edit reliably.
“In practice, however, users often enter only brief instructions, which limits the usefulness of AI-powered image editing tools,” SiliconANGLE explains, noting that “MGIE system aims to address that limitation,” and “can reliably edit an image even if the user describes the changes to be made in only a few words.”
MGIE accomplishes the feat through its mash-up of “standard image editing AI with a large language model,” per SiliconANGLE.
MGIE uses MLLMs in image editing in two ways, explains VentureBeat: It uses MLLMs to expand on user instructions, taking them from general to specific. “For example, given the input ‘make the sky more blue,’ MGIE can produce the instruction ‘increase the saturation of the sky region by 20 percent.’” The result is MGIE’s effectiveness at improving automatic metrics as they relate to human expression while keeping inference fast and accurate.
“Instead of brief but ambiguous guidance, MGIE derives explicit visual-aware intention and leads to reasonable image editing,” the paper states, adding that the developers believe “the MLLM-guided framework can contribute to future vision-and-language research.”
Apple made the open-source MGIE “available through GitHub for download, but it also released a web demo on Hugging Face Spaces,” according to The Verge, which notes that “the company did not say what its plans for the model are beyond research.”