Alibaba’s Qwen VLo Generative AI Shows Images in Progress

Chinese e-commerce giant Alibaba has released a new multimodal model called Qwen VLo that can understand and generate images. Available for free in preview through Qwen Chat, it can use image or text prompts to generate pictures, and accepts text in multiple languages, including Chinese and English. It can also edit, change backgrounds and switch styles, handling multiple image edits in sequence. An upgrade over January’s Qwen 2.5-VL release, Qwen VLo uses progressive generation, allowing users to see the image creation in progress, and Alibaba says it’s particularly good at making inline adjustments to fine-tune images.

Alibaba is mounting what Bloomberg calls an “aggressive push” into artificial intelligence with offerings centered on its foundation model Qwen. “In February, CEO Eddie Wu went so far as to say the company’s ‘primary objective’ is now artificial general intelligence, a goal in the industry to build AI systems with human-level intellectual capabilities,” Bloomberg notes.

En route to that goal, it seems willing to keep customers happy using a variety of specialty models, having released more than 100 open-source variants since 2023, including Qwen Audio, Qwen2.5‑Code and Qwen2.5‑Math, in addition to Qwen Chat.

“With the new Qwen multimodal model, it’s aiming to compete with a flurry of new visual interfaces in the market, including from OpenAI,” Bloomberg writes. “It also faces aggressive domestic competition from the likes of DeepSeek.”

Qwen VLo “not only ‘understands’ the world but also generates high-quality recreations based on that understanding, truly bridging the gap between perception and creation,” according to a Qwen blog post on GitHub. As a result, it offers superior semantic persistence and consistency, an area in which AI has often struggled, misinterpreting objects or failing to retain details through iterations.

Users can provide open-ended, natural language instructions like “change this painting to a Van Gogh style,” “make this photo look like it’s from the 19th century,” or “add a sunny sky to this image” as well as providing specific alterations to individual details.

Additionally, Qwen VLo introduces “a progressive top-to-bottom, left-to-right generation process” that users can observe in real time, adjusting along the way, the blog post explains.

Gadgets 360 said it tested Qwen VLo and “found its image generation capability to be on par with Google’s Imagen 2,” noting that “the instruction following and image output quality is slightly lower than Imagen-3 and OpenAI’s GPT-4o-powered image generation feature” but is faster than each of those, “and it has a higher rate limit,” the maximum number of requests a user can make within a specific time period.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.