Alibaba’s Qwen3-Omni AI Ingests Text, Images, Audio, Video

Alibaba Cloud’s newest AI model, Qwen3-Omni-30B-A3B, has debuted with a splash. The Chinese company is touting it as “the first natively end-to-end omni-modal AI unifying text, image, audio & video in one model.” While Qwen3-Omni can accept prompts of text, image, audio and video, it only outputs text and audio. Alibaba Cloud has released the three versions of Qwen3-Omni so users can select based on their needs, choosing between general multimodal capabilities, deep reasoning or specialized audio understanding. Alibaba has also developed an AI chip called T-Head that performs comparably to Nvidia’s H20. Continue reading Alibaba’s Qwen3-Omni AI Ingests Text, Images, Audio, Video

Alibaba’s Powerful Multimodal Qwen Model Is Built for Mobile

Alibaba Cloud has released Qwen2.5-Omni-7B, a new AI model the company claims is efficient enough to run on edge devices like mobile phones and laptops. Boasting a relatively light 7-billion parameter footprint, Qwen2.5-Omni-7B understands text, images, audio and video and generates real-time responses in text and natural speech. Alibaba says its combination of compact size and multimodal capabilities is “unique,” offering “the perfect foundation for developing agile, cost-effective AI agents that deliver tangible value, especially intelligent voice applications.” One example would be using a phone’s camera to help a vision impaired-person navigate their environment. Continue reading Alibaba’s Powerful Multimodal Qwen Model Is Built for Mobile