Anthropic and OpenAI Report Findings of Joint AI Safety Tests

OpenAI and Anthropic — rivals in the AI space who guard their proprietary systems — joined forces for a misalignment evaluation, safety testing each other’s models to identify when and how they fall short of human values. Among the findings: reasoning models including Anthropic’s Claude Opus 4 and Sonnet 4, and OpenAI’s o3 and o4-mini resist jailbreaks, while conversational models like GPT-4.1 were susceptible to prompts or techniques intended to bypass safety protocols. Although the test results were unveiled as users complain chatbots have become overly sycophantic, the tests were “primarily interested in understanding model propensities for harmful action,” per OpenAI. Continue reading Anthropic and OpenAI Report Findings of Joint AI Safety Tests

Alibaba Is Rolling Out Its ‘Most Agentic Code Model to Date’

Alibaba’s Qwen team has launched Qwen3-Coder, which it calls its “most agentic code model to date.” While it will be made available in multiple sizes, the most powerful variant — Qwen3-Coder-480B-A35B-Instruct — is being released first. The 480 billion parameter mixture-of-experts model has 35 billion active parameters supporting a context length of 256,000 tokens natively and 1 million tokens with extrapolation methods for “exceptional performance in both coding and agentic tasks,” explains the group, which claims the quasi-open source model has agentic coding, agentic browser use, and agentic tool use comparable to Anthropic’s proprietary Claude Sonnet 4. Continue reading Alibaba Is Rolling Out Its ‘Most Agentic Code Model to Date’

Alibaba’s Qwen VLo Generative AI Shows Images in Progress

Chinese e-commerce giant Alibaba has released a new multimodal model called Qwen VLo that can understand and generate images. Available for free in preview through Qwen Chat, it can use image or text prompts to generate pictures, and accepts text in multiple languages, including Chinese and English. It can also edit, change backgrounds and switch styles, handling multiple image edits in sequence. An upgrade over January’s Qwen 2.5-VL release, Qwen VLo uses progressive generation, allowing users to see the image creation in progress, and Alibaba says it’s particularly good at making inline adjustments to fine-tune images. Continue reading Alibaba’s Qwen VLo Generative AI Shows Images in Progress

New Reasoning Model Improves Smarts of OpenAI Operator

OpenAI has upgraded its autonomous web browsing agent Operator to the new reasoning model OpenAI o3 from the prior GPT-4o multimodal LLM engine. The update is being released globally in research preview this month for those who subscribe to OpenAI’s ChatGPT Pro for $200 per month. Operator serves OpenAI’s “computer-using agent” (CUA), a model trained to interact with graphical interfaces that uses the Web to perform tasks for people. “Using its own browser, it can look at a webpage, and interact with it much like a human would by typing, clicking, scrolling and more,” OpenAI explains. Continue reading New Reasoning Model Improves Smarts of OpenAI Operator

OpenAI’s Affordable GPT-4.1 Models Place Focus on Coding

OpenAI has launched a new series of multimodal models dubbed GPT-4.1 that represent what the company says is a leap in small model performance, including longer context windows and improvements in coding and instruction following. Geared to developers and available exclusively via API (not through ChatGPT), the 4.1 series comes in three variations: in addition to the flagship GPT‑4.1, GPT‑4.1 mini and GPT‑4.1 nano, OpenAI’s first nano model. Unlike Web-connected models (which have “retrieval-augmented generation,” or RAG) and can access up-to-date information, they are static knowledge models. Continue reading OpenAI’s Affordable GPT-4.1 Models Place Focus on Coding

Deep Cogito Is Out of Stealth with Hybrid Reasoning Models

San Francisco-based AI startup Deep Cogito has released five AI models in preview, making them available under an open-source license agreement. The models come in sizes 3B, 8B, 14B, 32B and 70B, with plans to release 109B, 400B and 671B versions in the weeks and months ahead. As for the current models, “each outperforms the best available open models of the same size, including counterparts from Meta, DeepSeek and Alibaba, across most standard benchmarks,” Deep Cogito claims, noting that the 70B model in particular “outperforms the newly released Llama 4 109B MoE model.” Continue reading Deep Cogito Is Out of Stealth with Hybrid Reasoning Models

Meta Unveils Multimodal Llama 4 Models, Previews Behemoth

Meta Platforms has released its first Llama 4 models, a multimodal trio that ranges from the foundational Behemoth to tiny Scout, with Maverick in between. With 16 experts and only 17B active parameters (the number used per task), Llama Scout is “more powerful than all previous generation Llama models, while fitting in a single Nvidia H100 GPU,” according to Meta. Maverick, with 17B active parameters and 128 experts, is touted as beating GPT-4o and Gemini 2.0 Flash across various benchmarks, “while achieving comparable results to the new DeepSeek v3 on reasoning and coding with less than half the active parameters.” Continue reading Meta Unveils Multimodal Llama 4 Models, Previews Behemoth

Non-Profit Sentient Launches New ‘Open Deep Search’ Model

Sentient, a year-old non-profit backed by Peter Thiel’s Founders Fund, has released Open Deep Search (ODS), an open-source framework that leverages existing LLMs to enhance search and reasoning capabilities. Essentially a system of custom plugins and tools, ODS works with DeepSeek’s open-source R1 model as well as proprietary systems like OpenAI’s GPT-4o and Anthropic’s Claude to deliver advanced search functionality. That modular aspect is in fact ODS’s main innovation, its creators say, claiming it beats Perplexity and OpenAI’s GPT-4o Search Preview on benchmarks for accuracy and transparency. Continue reading Non-Profit Sentient Launches New ‘Open Deep Search’ Model

Runway Gen-4 Tackles AI’s Elusive Video Scene Consistency

Runway has introduced a new video generation model, launching a next phase of competition that could transform film production. Notably, its Gen-4 system improves the consistency of characters, locations and objects across multiple scenes, an elusive prospect for most AI video generators. The New York-based startup calls its new development “a step towards Universal Generative Models that understand the world.” The key, Runway says, is to provide a single reference image of the character, item or environment as part of the model’s project material. Runway Gen-4 can generate 5- and 10-second clips at 720p resolution. Continue reading Runway Gen-4 Tackles AI’s Elusive Video Scene Consistency

OpenAI Delivers Native GPT-4o Image Generator to ChatGPT

OpenAI has activated the multimodal image generation capabilities of GPT-4o, making it available to ChatGPT users on the Plus, Pro, Team and Free tiers. It replaces DALL-E 3 as the default image generator for the popular chatbot. GPT-4o’s accuracy with text, understanding of symbols and precision with prompts combined with well multimodal capabilities that allow the model to take cues from visual material have transformed its image capabilities from largely unpredictable to “consistent and context-aware,” resulting in “a practical tool with precision and power,” claims OpenAI. Continue reading OpenAI Delivers Native GPT-4o Image Generator to ChatGPT

OpenAI Pushes Conversational Agents with Three New Models

OpenAI has debuted three new models for transcription and voice generation — gpt-4o-transcribe, gpt-4o-mini-transcribe and gpt-4o-mini-tts. The text-to-speech and speech-to-text AI models are designed to help developers create AI agents with highly customizable voices. OpenAI claims these models will power natural and responsive voice agents, moving AI out of the text-based communications stage and into intuitive spoken conversations. The suite outperforms existing solutions in accuracy and reliability, OpenAI says, especially with “accents, noisy environments, and varying speech speeds,” making them well-suited for customer call centers and meeting notes. Continue reading OpenAI Pushes Conversational Agents with Three New Models

OpenAI’s GPT-4.5 Model Sees Patterns and Thinks Creatively

OpenAI is releasing a research preview of what it calls its “largest and best” chat model to date, GPT‑4.5, which scales unsupervised learning in pre-training and post-training. As a result, the new chat model has the ability to recognize patterns, draw connections, and generate creative insights without having to draw on time and energy consuming “reasoning.” GPT‑4.5 is currently available to ChatGPT Pro subscribers ($200 per month) and developers subscribing to OpenAI’s API tier. ChatGPT Plus and ChatGPT Team customers are expected to gain access this week. Continue reading OpenAI’s GPT-4.5 Model Sees Patterns and Thinks Creatively

xAI Launches Grok 3 as Standalone and for X Premium+ Subs

Elon Musk’s xAI has released its latest AI model Grok 3, which the company is describing as the “smartest AI on Earth.” It includes reasoning capabilities and a new web analysis tool called DeepSearch that returns results “within seconds” and can refine specific sources, according to xAI. Grok 3 was trained with 200,000 Nvidia GPUs, resulting in improved response times and processing power. Future capabilities will include Voice Mode for conversational interaction and audio-to-text conversion. Access to Grok 3 is limited to X Premium+ subscribers or via a SuperGrok plan (that does not include X social features). Continue reading xAI Launches Grok 3 as Standalone and for X Premium+ Subs

Gemini Recalls Previous Chats to Provide Helpful Responses

Google announced last week that its Gemini AI chatbot now offers the ability to provide responses based on earlier conversations. It can also summarize a previous chat and recall information the user has shared in other threads. “Whether you’re asking a question about something you’ve already discussed, or asking Gemini to summarize a previous conversation, Gemini now uses information from relevant chats to craft a response,” according to Google. The new feature is rolling out via Google’s $20-per-month One AI Premium Plan to start and will be available to Google Workspace Business and Enterprise customers in the coming weeks. Continue reading Gemini Recalls Previous Chats to Provide Helpful Responses

Sam Altman Reveals Plans to Simplify OpenAI’s Product Line

OpenAI has decided to simplify its product offerings. A month after announcing the in-development GPT-o3 as its next frontier model, the company has canceled it as a standalone release, explaining that it would be integrated into the upcoming GPT-5 instead. “A top goal for us is to unify o-series models and GPT-series models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks,” OpenAI co-founder and CEO Sam Altman wrote in a social media post this week. Expected to ship later this year, the GPT-5 models will incorporate voice, canvas, search, deep research and more, OpenAI says. Continue reading Sam Altman Reveals Plans to Simplify OpenAI’s Product Line