Microsoft Unveils AI Model That Comprehends Image Content

Microsoft researchers have unveiled Kosmos-1, a new AI model the company says analyzes images for content, performs visual text recognition, solves visual puzzles and passes visual IQ tests. It also understands natural language instructions. The new model is what’s known as multimodal AI, which means it uses different instruction sets, from text to audio and video. Mixing media is a key step in building artificial general intelligence (AGI) that can perform tasks in a manner approximating human performance. Examples from a Kosmos-1 research paper show it can effectively analyze images, answering questions about them.

It can also read text from an image, caption pictures and score 22-26 percent accuracy on a visual IQ test. “Being a basic part of intelligence, multimodal perception is a necessity to achieve artificial general intelligence, in terms of knowledge acquisition and grounding to the real world,” the researchers write in an academic paper titled, “Language Is Not All You Need: Aligning Perception with Language Models.”

“Microsoft says its Kosmos-1 MLLM [multimodal large language model] can perceive general modalities, follow instructions (zero-shot learning), and learn in context (few-shot learning),” ZDNet reports. The paper says the goal is “to align perception with LLMs [large language models], so that the models are able to see and talk.”

“Some AI experts point to multimodal AI as a potential path toward general artificial intelligence, a hypothetical technology that will ostensibly be able to replace humans at any intellectual task (and any intellectual job),” Ars Technica reports, explaining that while “AGI is the stated goal of OpenAI,” Kosmos-1 appears to be Microsoft’s alone, without OpenAI’s participation.

Kosmos-1’s performance on a test called Raven’s Progressive Reasoning is of particular interest. The test assesses visual IQ using a sequence of shapes for complete-the-sentence exercises. To test Kosmos-1, researchers fed it completed tests, one at a time, and asked if the answers were right. Kosmos-1 answered 22 percent accurately, sometimes scoring as high as 26 percent.

“By no means a slam dunk,” Ars Technca notes, explaining that while methodological errors could have affected the results, “Kosmos-1 beat random chance (17 percent) on the Raven IQ test.”

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.