LLM Archives - Page 2 of 12

Anthropic Introduces a New Claude Hybrid Reasoning Model

By Paula Parisi
February 26, 2025

Anthropic has released a new frontier model, Claude 3.7 Sonnet, described as the industry’s first “hybrid AI reasoning model.” The new Claude is different in that it can both respond to questions in real time or, alternatively, “think” about a problem for a prolonged period of time — basically as long as a user would like. Users can choose between “near-instant responses or extended, step-by-step thinking that is made visible to the user” by selecting the appropriate “reasoning” capability for Claude, Anthropic says. Along with the new model, Anthropic is also debuting a command line tool for agentic coding, Claude Code. Continue reading Anthropic Introduces a New Claude Hybrid Reasoning Model

YouTube Shorts Updates Dream Screen with Google Veo 2 AI

By Paula Parisi
February 19, 2025

YouTube Shorts has upgraded its Dream Screen AI background generator to incorporate Google DeepMind’s latest video model, Veo 2, which will also generate standalone video clips that users can post to Shorts. “Need a specific scene but don’t have the right footage? Want to turn your imagination into reality and tell a unique story? Simply use a text prompt to generate a video clip that fits perfectly into your narrative, or create a whole new world,” coaxes YouTube, which seems to be trying out “Dream Screen” branding as an umbrella for its genAI efforts. Continue reading YouTube Shorts Updates Dream Screen with Google Veo 2 AI

xAI Launches Grok 3 as Standalone and for X Premium+ Subs

By Paula Parisi
February 19, 2025

Elon Musk’s xAI has released its latest AI model Grok 3, which the company is describing as the “smartest AI on Earth.” It includes reasoning capabilities and a new web analysis tool called DeepSearch that returns results “within seconds” and can refine specific sources, according to xAI. Grok 3 was trained with 200,000 Nvidia GPUs, resulting in improved response times and processing power. Future capabilities will include Voice Mode for conversational interaction and audio-to-text conversion. Access to Grok 3 is limited to X Premium+ subscribers or via a SuperGrok plan (that does not include X social features). Continue reading xAI Launches Grok 3 as Standalone and for X Premium+ Subs

Gemini Recalls Previous Chats to Provide Helpful Responses

By Rob Scott
February 18, 2025

Google announced last week that its Gemini AI chatbot now offers the ability to provide responses based on earlier conversations. It can also summarize a previous chat and recall information the user has shared in other threads. “Whether you’re asking a question about something you’ve already discussed, or asking Gemini to summarize a previous conversation, Gemini now uses information from relevant chats to craft a response,” according to Google. The new feature is rolling out via Google’s $20-per-month One AI Premium Plan to start and will be available to Google Workspace Business and Enterprise customers in the coming weeks. Continue reading Gemini Recalls Previous Chats to Provide Helpful Responses

Reasoning Model Competes with Advanced AI at a Lower Cost

By Paula Parisi
February 10, 2025

Model training continues to hit new lows in terms of cost, a phenomenon known as the commoditization of AI that has rocked Wall Street. An AI reasoning model created for under $50 in cloud compute credits is reportedly performing comparably to established reasoning models such as OpenAI o1 and DeepSeek-R1 on tests of math and coding aptitude. Called s1-32B, it was created by researchers at Stanford and the University of Washington by customizing Alibaba’s Qwen2.5-32B-Instruct, feeding it 1,000 prompts with responses sourced from Google’s new Gemini 2.0 Flash Thinking Experimental reasoning model. Continue reading Reasoning Model Competes with Advanced AI at a Lower Cost

Google Adds Gemini Flash Thinking to Search, Maps and More

By Paula Parisi
February 7, 2025

Google has initiated a flurry of AI activity following the recent collection of Chinese AI releases. The Alphabet company has launched an experimental version of a new flagship AI model, Gemini 2.0 Pro. Its premiere coding and complex questions model is now available in Google AI Studio, Vertex AI and the Gemini Advanced app. The company has also made its general-purpose “workhorse” model, Gemini 2.0 Flash, available in general release via the Gemini API in AI Studio and Vertex. This follows last week’s announcement that Gemini 2.0 Flash is powering the Gemini app for desktop and mobile. Continue reading Google Adds Gemini Flash Thinking to Search, Maps and More

Anthropic Will Award Cash for Jailbreaking AI Defense System

By Paula Parisi
February 6, 2025

Anthropic has created a method to defend AI models against “jailbreaks” — unauthorized workarounds to get an AI model to do things it was trained not to do, like providing instructions for building chemical weapons. Called Constitutional Classifiers, the system was 95 percent effective in identifying and preventing jailbreaks of Anthropic’s Claude 3.5 Sonnet in a test environment. In an effort to drum up real-world red-teaming, the company offered cash prizes of up to $15,000 to anyone who could jailbreak its Sonnet AI model. After some 3,000 hours of attempts by 185 participants, none claimed an award. Now the company is offering additional incentives. Continue reading Anthropic Will Award Cash for Jailbreaking AI Defense System

Alibaba Plans to Take On AI Competitors with Qwen2.5-Max

By Paula Parisi
February 3, 2025

An internecine AI battle has erupted between Alibaba and DeepSeek. Days after DeepSeek dominated several news cycles with its affordable DeepSeek-R1 reasoning model and the multimodal Janus-Pro-7B, Alibaba released its latest LLM, Qwen 2.5-Max, available via API from Alibaba Cloud. As with DeepSeek, Alibaba is looking beyond its domestic borders, but the fact that a public-facing AI battle is heating up between Chinese companies indicates the People’s Republic isn’t going to quietly cede the AI race to the U.S. Alibaba claims Qwen 2.5-Max outperforms models from DeepSeek, Meta and OpenAI. Continue reading Alibaba Plans to Take On AI Competitors with Qwen2.5-Max

Codename Goose: Block Unveils Open-Source AI Agent Builder

By Paula Parisi
January 30, 2025

Jack Dorsey’s financial tech and media firm Block (formerly Square) has released a platform for building AI agents: Codename Goose. Previously available in beta, Goose is primarily designed to build agents for coding and software development, but Block built in many basic features that could be applied to general purpose pursuits. Because it is open source and offered under Apache License 2.0, the hope is that developers will apply it to varied use cases. A leading feature of Codename Goose is its flexibility. It can integrate a wide range of large language models, letting developers use it with their preferred model. Continue reading Codename Goose: Block Unveils Open-Source AI Agent Builder

DeepSeek Follows Its R1 LLM Debut with Multimodal Janus-Pro

By Paula Parisi
January 30, 2025

Less than a week after sending tremors through Silicon Valley and across the media landscape with an affordable large language model called DeepSeek-R1, the Chinese AI startup behind that technology has debuted another new product — the multimodal Janus-Pro-7B with an aptitude for image generation. Further mining the vein of efficiency that made R1 impressive to many, Janus-Pro-7B utilizes “a single, unified transformer architecture for processing.” Emphasizing “simplicity, high flexibility and effectiveness,” DeepSeek says Janus Pro is positioned to be a frontrunner among next-generation unified multimodal models. Continue reading DeepSeek Follows Its R1 LLM Debut with Multimodal Janus-Pro

CES: Nvidia’s Cosmos Models Teach AI About Physical World

By Paula Parisi
January 14, 2025

Nvidia Cosmos, a platform of generative world foundation models (WFMs) and related tools to advance the development of physical AI systems like autonomous vehicles and robots, was introduced at CES 2025. Cosmos WFMs are designed to provide developers a way to generate massive amounts of photo-real, physics-based synthetic data to train and evaluate their existing models. The goal is to reduce costs by streamlining real-world testing with a ready data pipeline. Developers can also build custom models by fine-tuning Cosmos WFMs. Cosmos integrates Nvidia Omniverse, a physics simulation tool used for entertainment world-building. Continue reading CES: Nvidia’s Cosmos Models Teach AI About Physical World

CES: Is the ChatGPT Moment for Robotics Around the Corner?

By Yves Bergquist
January 14, 2025

CES has regularly featured robots over the years, but we’ve never really seen anything pivotal. CES 2025 marked a change in this area. “The ChatGPT moment for robotics is just around the corner,” said Nvidia CEO Jensen Huang in his keynote, and we couldn’t agree more. And while attention was focused on LLMs, the field of industrial robotics has been unleashed like never before. According to World Robotics 2024, the International Federation of Robotics’ recent report, 4.3 million units were deployed in factories worldwide as of Q3 2024, a number that’s increasing at a clip of half a million units per year. This is double from 7 years ago, and the trend is accelerating. Continue reading CES: Is the ChatGPT Moment for Robotics Around the Corner?

OpenAI Previews Two New Reasoning Models: o3 and o3-Mini

By Paula Parisi
January 10, 2025

OpenAI has unveiled a new frontier model, OpenAI o3, which it claims can “reason” through challenges involving math, science and computer programming. Available to safety and research testers, it is expected to be available to individuals and businesses this year. OpenAI o3 is said to be over 20 percent more efficient at common programming tasks than its predecessor OpenAI o1 and beat a company scientist on a programming test. Model o3 is part of a broader effort to create AI systems that can reason through complex problems. In late December Google debuted a similar platform, the experimental Gemini 2.0 Flash Thinking Mode. Continue reading OpenAI Previews Two New Reasoning Models: o3 and o3-Mini

CES: Utilizing Real-Time AI to Measure Representation in Ads

By Debra Kaufman
January 9, 2025

Brands Mastercard and MGM Resorts International, the Ad Council and advertising technology company XR Extreme Reach (XR) gathered for a CES panel discussion on how real-time AI metrics can help increase representation in ads, thus boosting greater marketing ROI and audience trust. It was moderated by The Female Quotient Chief Executive Shelley Zalis, whose company collaborated with XR to unveil, in October, the Representation Index (RX) to measure inclusivity in global advertising. XR’s SVP of Enterprise Solutions Kristin Wnuk was also there to describe her company’s work in the space. Continue reading CES: Utilizing Real-Time AI to Measure Representation in Ads

CES: Show Features a Surprisingly Small Number of AI Agents

By Yves Bergquist
January 9, 2025

In the never-ending smorgasbord of AI hype, “agents” represent practical and worthwhile potential. AI agents are autonomous AI programs that can understand some context and take action in that context. Agents can autonomously perform a task that involves mapping a goal to its context and parameters (even if they’re not explicitly laid out), process data across multiple formats and ontologies to understand the goal and work through the task, call multiple functions across multiple apps, and take some action to achieve the goal. Unfortunately, however, while many are talking about AI agents, few are promoting actual products at CES. Continue reading CES: Show Features a Surprisingly Small Number of AI Agents