Reasoning Model Competes with Advanced AI at a Lower Cost

Model training continues to hit new lows in terms of cost, a phenomenon known as the commoditization of AI that has rocked Wall Street. An AI reasoning model created for under $50 in cloud compute credits is reportedly performing comparably to established reasoning models such as OpenAI o1 and DeepSeek-R1 on tests of math and coding aptitude. Called s1-32B, it was created by researchers at Stanford and the University of Washington by customizing Alibaba’s Qwen2.5-32B-Instruct, feeding it 1,000 prompts with responses sourced from Google’s new Gemini 2.0 Flash Thinking Experimental reasoning model. Continue reading Reasoning Model Competes with Advanced AI at a Lower Cost

Google Adds Gemini Flash Thinking to Search, Maps and More

Google has initiated a flurry of AI activity following the recent collection of Chinese AI releases. The Alphabet company has launched an experimental version of a new flagship AI model, Gemini 2.0 Pro. Its premiere coding and complex questions model is now available in Google AI Studio, Vertex AI and the Gemini Advanced app. The company has also made its general-purpose “workhorse” model, Gemini 2.0 Flash, available in general release via the Gemini API in AI Studio and Vertex. This follows last week’s announcement that Gemini 2.0 Flash is powering the Gemini app for desktop and mobile. Continue reading Google Adds Gemini Flash Thinking to Search, Maps and More

Anthropic Will Award Cash for Jailbreaking AI Defense System

Anthropic has created a method to defend AI models against “jailbreaks” — unauthorized workarounds to get an AI model to do things it was trained not to do, like providing instructions for building chemical weapons. Called Constitutional Classifiers, the system was 95 percent effective in identifying and preventing jailbreaks of Anthropic’s Claude 3.5 Sonnet in a test environment. In an effort to drum up real-world red-teaming, the company offered cash prizes of up to $15,000 to anyone who could jailbreak its Sonnet AI model. After some 3,000 hours of attempts by 185 participants, none claimed an award. Now the company is offering additional incentives. Continue reading Anthropic Will Award Cash for Jailbreaking AI Defense System

Alibaba Plans to Take On AI Competitors with Qwen2.5-Max

An internecine AI battle has erupted between Alibaba and DeepSeek. Days after DeepSeek dominated several news cycles with its affordable DeepSeek-R1 reasoning model and the multimodal Janus-Pro-7B, Alibaba released its latest LLM, Qwen 2.5-Max, available via API from Alibaba Cloud. As with DeepSeek, Alibaba is looking beyond its domestic borders, but the fact that a public-facing AI battle is heating up between Chinese companies indicates the People’s Republic isn’t going to quietly cede the AI race to the U.S. Alibaba claims Qwen 2.5-Max outperforms models from DeepSeek, Meta and OpenAI. Continue reading Alibaba Plans to Take On AI Competitors with Qwen2.5-Max

Codename Goose: Block Unveils Open-Source AI Agent Builder

Jack Dorsey’s financial tech and media firm Block (formerly Square) has released a platform for building AI agents: Codename Goose. Previously available in beta, Goose is primarily designed to build agents for coding and software development, but Block built in many basic features that could be applied to general purpose pursuits. Because it is open source and offered under Apache License 2.0, the hope is that developers will apply it to varied use cases. A leading feature of Codename Goose is its flexibility. It can integrate a wide range of large language models, letting developers use it with their preferred model. Continue reading Codename Goose: Block Unveils Open-Source AI Agent Builder

DeepSeek Follows Its R1 LLM Debut with Multimodal Janus-Pro

Less than a week after sending tremors through Silicon Valley and across the media landscape with an affordable large language model called DeepSeek-R1, the Chinese AI startup behind that technology has debuted another new product — the multimodal Janus-Pro-7B with an aptitude for image generation. Further mining the vein of efficiency that made R1 impressive to many, Janus-Pro-7B utilizes “a single, unified transformer architecture for processing.” Emphasizing “simplicity, high flexibility and effectiveness,” DeepSeek says Janus Pro is positioned to be a frontrunner among next-generation unified multimodal models. Continue reading DeepSeek Follows Its R1 LLM Debut with Multimodal Janus-Pro

CES: Nvidia’s Cosmos Models Teach AI About Physical World

Nvidia Cosmos, a platform of generative world foundation models (WFMs) and related tools to advance the development of physical AI systems like autonomous vehicles and robots, was introduced at CES 2025. Cosmos WFMs are designed to provide developers a way to generate massive amounts of photo-real, physics-based synthetic data to train and evaluate their existing models. The goal is to reduce costs by streamlining real-world testing with a ready data pipeline. Developers can also build custom models by fine-tuning Cosmos WFMs. Cosmos integrates Nvidia Omniverse, a physics simulation tool used for entertainment world-building. Continue reading CES: Nvidia’s Cosmos Models Teach AI About Physical World

CES: Is the ChatGPT Moment for Robotics Around the Corner?

CES has regularly featured robots over the years, but we’ve never really seen anything pivotal. CES 2025 marked a change in this area. “The ChatGPT moment for robotics is just around the corner,” said Nvidia CEO Jensen Huang in his keynote, and we couldn’t agree more. And while attention was focused on LLMs, the field of industrial robotics has been unleashed like never before. According to World Robotics 2024, the International Federation of Robotics’ recent report, 4.3 million units were deployed in factories worldwide as of Q3 2024, a number that’s increasing at a clip of half a million units per year. This is double from 7 years ago, and the trend is accelerating. Continue reading CES: Is the ChatGPT Moment for Robotics Around the Corner?

OpenAI Previews Two New Reasoning Models: o3 and o3-Mini

OpenAI has unveiled a new frontier model, OpenAI o3, which it claims can “reason” through challenges involving math, science and computer programming. Available to safety and research testers, it is expected to be available to individuals and businesses this year. OpenAI o3 is said to be over 20 percent more efficient at common programming tasks than its predecessor OpenAI o1 and beat a company scientist on a programming test. Model o3 is part of a broader effort to create AI systems that can reason through complex problems. In late December Google debuted a similar platform, the experimental Gemini 2.0 Flash Thinking Mode. Continue reading OpenAI Previews Two New Reasoning Models: o3 and o3-Mini

CES: Utilizing Real-Time AI to Measure Representation in Ads

Brands Mastercard and MGM Resorts International, the Ad Council and advertising technology company XR Extreme Reach (XR) gathered for a CES panel discussion on how real-time AI metrics can help increase representation in ads, thus boosting greater marketing ROI and audience trust. It was moderated by The Female Quotient Chief Executive Shelley Zalis, whose company collaborated with XR to unveil, in October, the Representation Index (RX) to measure inclusivity in global advertising. XR’s SVP of Enterprise Solutions Kristin Wnuk was also there to describe her company’s work in the space. Continue reading CES: Utilizing Real-Time AI to Measure Representation in Ads

CES: Show Features a Surprisingly Small Number of AI Agents

In the never-ending smorgasbord of AI hype, “agents” represent practical and worthwhile potential. AI agents are autonomous AI programs that can understand some context and take action in that context. Agents can autonomously perform a task that involves mapping a goal to its context and parameters (even if they’re not explicitly laid out), process data across multiple formats and ontologies to understand the goal and work through the task, call multiple functions across multiple apps, and take some action to achieve the goal. Unfortunately, however, while many are talking about AI agents, few are promoting actual products at CES. Continue reading CES: Show Features a Surprisingly Small Number of AI Agents

CES: BMW iDrive Turns the Car Windshield into an AR Display

BMW has revealed an upcoming release of its iDrive operating system that essentially turns the entire windshield into a 3D heads-up display. The “close-to-production” version of BMW Panoramic Vision showcased at CES 2025 integrates augmented reality to layer navigational directions and driver assistance tips onto the windshield. It also does away with the conventional dashboard “gauge cluster,” projecting digital equivalents onto the windshield that can be customized. The setup is powered by the new BMW Operating System X and will be introduced in all new BMW models from the end of 2025. Continue reading CES: BMW iDrive Turns the Car Windshield into an AR Display

Meta’s Llama 3.3 Delivers More Processing for Less Compute

Meta Platforms has packed more artificial intelligence into a smaller package with Llama 3.3, which the company released last week. The open-source large language model (LLM) “improves core performance at a significantly lower cost, making it even more accessible to the entire open-source community,” Meta VP of Generative AI Ahmad Al-Dahle wrote on X social. The 70 billion parameter text-only Llama 3.3 is said to perform on par with the 405 billion parameter model that was part of Meta’s Llama 3.1 release in July, with less computing power required, significantly lowering its operational costs. Continue reading Meta’s Llama 3.3 Delivers More Processing for Less Compute

Qwen with Questions: Alibaba Previews New Reasoning Model

Alibaba Cloud has released the latest entry in its growing Qwen family of large language models. The new Qwen with Questions (QwQ) is an open-source competitor to OpenAI’s o1 reasoning model. As with competing large reasoning models (LRMs), QwQ can correct its own mistakes, relying on extra compute cycles during inference to assess its responses, making it well suited for reasoning tasks like math and coding. Described as an “experimental research model,” this preview version of QwQ has 32-billion-parameters and a 32,000-token context, leading to speculation that a more powerful iteration is in the offing. Continue reading Qwen with Questions: Alibaba Previews New Reasoning Model

Couchbase Capella AI Helps Deploy Agents, Models, Services

Couchbase, the publicly traded data platform for developers, has launched Capella AI Services with the aim of simplifying the process of developing and deploying agentic AI apps for enterprise clients. Capella AI joins the company’s flagship Couchbase Capella cloud data platform. AI offerings include model hosting, automated vectorization, unstructured data preprocessing and AI agent catalog services. Couchbase’s goal is to “allow organizations to prototype, build, test and deploy AI agents” while giving developers control over data across the development lifecycle, including secure data mitigation for large language models running outside the organization. Continue reading Couchbase Capella AI Helps Deploy Agents, Models, Services