Multimodal AI Archives

Samsung Unveils Galaxy S25 FE Smartphone and S11 Tablets

By Paula Parisi
September 9, 2025

Following the reveal of its new foldable phones, Samsung hosted a second Galaxy Unpacked summer event to unveil its Galaxy S25 FE smartphone and Galaxy Tab S11 and Tab S11 Ultra tablets. Equipped with Samsung’s latest One UI 8 user interface, the Galaxy S25 FE features multimodal Galaxy AI capabilities including Generative Edit and Instant Slow-mo and a 12MP front-facing camera powered by the ProVisual Engine for enhanced selfies. The Galaxy Tab S11 Ultra is billed as “the thinnest Galaxy Tab to date.” Both new tablets also boast multimodal AI capabilities with Galaxy AI. Continue reading Samsung Unveils Galaxy S25 FE Smartphone and S11 Tablets

Grok 4 Offered Free in xAI Move on ChatGPT-5 Market Share

By Paula Parisi
August 13, 2025

Elon Musk’s xAI has made Grok 4 available on its free tiers as it seeks to take advantage of initial user dissatisfaction with OpenAI’s new GPT-5. The company has positioned Grok as freewheeling and uncensored, a contrast to GPT-5, which has been criticized on Reddit and other social platforms as a “corporate beige zombie” with too many guardrails. After its February debut, Grok 3 was reined-in with checks including removal of its native image generator in March. Grok 4 was released in July with integrated image and video features as well as a “Spicy” mode for creating risqué content. Continue reading Grok 4 Offered Free in xAI Move on ChatGPT-5 Market Share

Grok Imagine from xAI Offers Video Generation, ‘Spicy’ Mode

By Paula Parisi
August 8, 2025

Grok Imagine is xAI’s new video and image generator, which is currently available via the X social platform, the Grok mobile app, and Grok web interface. Imagine replaces AI image generator Aurora, which was retired in May following a string of offensive posts that led to media criticism and user concerns. Despite the backlash, Elon Musk’s xAI seems determined to have Imagine push conventional limits, with a “spicy” mode that outputs imagery including adult content. Its text-to-image capabilities work with text or voice prompts, while the video tool relies on image prompts to make short clips using images from a user’s gallery or generated by Grok. Continue reading Grok Imagine from xAI Offers Video Generation, ‘Spicy’ Mode

Baidu Delivers AI Updates to Search, Open-Sources Ernie 4.5

By Paula Parisi
July 7, 2025

China’s Baidu has added AI and voice features to its Ernie search engine and as of June 30 officially made its generative Ernie large language model open source in a direct challenge to OpenAI and Anthropic as well as local Chinese rival DeepSeek. “This isn’t just a China story. Every time a major lab open-sources a powerful model, it raises the bar for the entire industry,” University of Southern California Associate Professor of Computer Science Sean Ren told CNBC. Baidu is also giving the Ernie mobile app more chatbot-like functionality, enabling it to help with drawing, writing and travel tasks. Continue reading Baidu Delivers AI Updates to Search, Open-Sources Ernie 4.5

Xiaomi AI Glasses Tout Electrochromic Lenses and a Camera

By Paula Parisi
June 30, 2025

Xiaomi AI Glasses are the Beijing-based company’s first smart glasses to include a camera. Equipped with a 12MP Sony IMX681 point-of-view camera that captures 2K video at 30 fps, the new glasses also support real-time live-streaming and first-person video calls. Notable features include a 263mAh battery that Xiaomi says delivers 8.6 hours of mixed use and an electrochromic shaded lenses option that allows the wearer to control the tint. The clear-lens model starts at $280, while those with grayscale electrochromic shaded lenses start at $380. Colored electrochromic lens models begin at $420. Continue reading Xiaomi AI Glasses Tout Electrochromic Lenses and a Camera

Google Gemini Robotics On-Device Controls Robots Locally

By Paula Parisi
June 26, 2025

Google DeepMind has released a new vision-language-action (VLA) model, Gemini Robotics On-Device, that can operate robots locally, controlling their movements without requiring an Internet connection or the cloud. Google says the software provides “general-purpose dexterity and fast task adaptation,” building on the March release of the first Gemini Robotics VLA model, which brought “Gemini 2.0’s multimodal reasoning and real-world understanding into the physical world.” Since the model operates independent of a data network, it’s useful for latency sensitive applications as well as low or no connectivity environments. Google is also releasing a Gemini Robotics SDK for developers. Continue reading Google Gemini Robotics On-Device Controls Robots Locally

WPP Media Launches Industry’s First Large Marketing Model

By Paula Parisi
June 9, 2025

Two weeks after its global rebranding of GroupM to WPP Media, the bespoke London-based marketing behemoth is launching Open Intelligence, an “AI identity solution” that WPP says will better target viewers with privacy-conscious solutions that more effectively message on behalf of its clients. Built around what WPP calls “the industry’s first Large Marketing Model,” Open Intelligence is “trained on the world’s largest and most diverse set of audience, behavioral, and event data,” culled from WPP’s decentralized partnership network. “Our model learns continuously from trillions of signals across more than 350 partners in over 75 markets,” the company claims. Continue reading WPP Media Launches Industry’s First Large Marketing Model

Alibaba Touts Advance in Open-Source AI with Qwen3 Series

By Paula Parisi
April 30, 2025

China’s Alibaba Group has released a Qwen3 LLM series said to be at the leading edge of open-source models, nearly achieving the performance of proprietary models from AI competitors OpenAI and Google. Alibaba says Qwen3 offers improvements in reasoning, tool use, instruction following and multilingual abilities. The Qwen3 series features eight new models — two that are mixture-of-experts and six built on dense neural networks. Their sizes range from 600 million to 235 billion parameters. The size and scope of the Alibaba slate maintains China’s accelerated AI pace in the wake of DeepSeek’s game-changing debut. Continue reading Alibaba Touts Advance in Open-Source AI with Qwen3 Series

Cohere’s Multimodal Embed Model Organizes Enterprise Data

By Paula Parisi
April 17, 2025

As enterprises rely more heavily on AI integration to compile research and summarize things like meetings and email threads, the need for contextual search has become increasingly important. AI startup Cohere has released Embed 4 to make the task easier. Embed 4 is a multimodal embedding model that transforms text, images and mixed data (like PDFs, slides or tables) into numerical representations (or “embeddings”) for tasks including semantic search, retrieval-augmented generation (RAG) and classification. Supporting over 100 languages, Embed 4 has an extremely large context window of up to 128,000 tokens. Continue reading Cohere’s Multimodal Embed Model Organizes Enterprise Data

OpenAI’s Affordable GPT-4.1 Models Place Focus on Coding

By Paula Parisi
April 16, 2025

OpenAI has launched a new series of multimodal models dubbed GPT-4.1 that represent what the company says is a leap in small model performance, including longer context windows and improvements in coding and instruction following. Geared to developers and available exclusively via API (not through ChatGPT), the 4.1 series comes in three variations: in addition to the flagship GPT‑4.1, GPT‑4.1 mini and GPT‑4.1 nano, OpenAI’s first nano model. Unlike Web-connected models (which have “retrieval-augmented generation,” or RAG) and can access up-to-date information, they are static knowledge models. Continue reading OpenAI’s Affordable GPT-4.1 Models Place Focus on Coding

Meta Unveils Multimodal Llama 4 Models, Previews Behemoth

By Paula Parisi
April 8, 2025

Meta Platforms has released its first Llama 4 models, a multimodal trio that ranges from the foundational Behemoth to tiny Scout, with Maverick in between. With 16 experts and only 17B active parameters (the number used per task), Llama Scout is “more powerful than all previous generation Llama models, while fitting in a single Nvidia H100 GPU,” according to Meta. Maverick, with 17B active parameters and 128 experts, is touted as beating GPT-4o and Gemini 2.0 Flash across various benchmarks, “while achieving comparable results to the new DeepSeek v3 on reasoning and coding with less than half the active parameters.” Continue reading Meta Unveils Multimodal Llama 4 Models, Previews Behemoth

Alibaba’s Powerful Multimodal Qwen Model Is Built for Mobile

By Paula Parisi
March 28, 2025

Alibaba Cloud has released Qwen2.5-Omni-7B, a new AI model the company claims is efficient enough to run on edge devices like mobile phones and laptops. Boasting a relatively light 7-billion parameter footprint, Qwen2.5-Omni-7B understands text, images, audio and video and generates real-time responses in text and natural speech. Alibaba says its combination of compact size and multimodal capabilities is “unique,” offering “the perfect foundation for developing agile, cost-effective AI agents that deliver tangible value, especially intelligent voice applications.” One example would be using a phone’s camera to help a vision impaired-person navigate their environment. Continue reading Alibaba’s Powerful Multimodal Qwen Model Is Built for Mobile

Canvas and Live Video Add Productivity Features to Gemini AI

By Paula Parisi
March 25, 2025

Google has added a Canvas feature to its Gemini AI chatbot that provides users with a real-time collaborative space where writing and coding projects can be refined and other ideas iterated and shared. “Canvas is designed for seamless collaboration with Gemini,” according to Gemini Product Director Dave Citron, who notes that Canvas makes it “an even more effective collaborator” in helping bring ideas to life. The move marks a trend whereby AI companies are trying to turn chatbot platforms into turnkey productivity suites. Google is launching a limited release of Gemini Live Video in addition to bringing its Audio Overview feature of NotebookLM to Gemini. Continue reading Canvas and Live Video Add Productivity Features to Gemini AI

Baidu Releases New LLMs that Undercut Competition’s Price

By Paula Parisi
March 18, 2025

Baidu has launched two new AI systems, the native multimodal foundation model Ernie 4.5 and deep-thinking reasoning model Ernie X1. The latter supports features like generative imaging, advanced search and webpage content comprehension. Baidu is touting Ernie X1 as of comparable performance to another Chinese model, DeepSeek-R1, but says it is half the price. Both Baidu models are available to the public, including individual users, through the Ernie website. Baidu, the dominant search engine in China, says its new models mark a milestone in both reasoning and multimodal AI, “offering advanced capabilities at a more accessible price point.” Continue reading Baidu Releases New LLMs that Undercut Competition’s Price

YouTube Shorts Updates Dream Screen with Google Veo 2 AI

By Paula Parisi
February 19, 2025

YouTube Shorts has upgraded its Dream Screen AI background generator to incorporate Google DeepMind’s latest video model, Veo 2, which will also generate standalone video clips that users can post to Shorts. “Need a specific scene but don’t have the right footage? Want to turn your imagination into reality and tell a unique story? Simply use a text prompt to generate a video clip that fits perfectly into your narrative, or create a whole new world,” coaxes YouTube, which seems to be trying out “Dream Screen” branding as an umbrella for its genAI efforts. Continue reading YouTube Shorts Updates Dream Screen with Google Veo 2 AI