Benchmark Archives

DeepSeek Debuts Its V3.2 Reasoning Model in Two Versions

By Paula Parisi
December 3, 2025

DeepSeek-V3.2 is now in release, integrating thinking directly into tool-use for the first time, improving its predecessor DeepSeek-V3.2 Experimental. The model supports tool-use in both thinking and non-thinking modes. China-based DeepSeek began disrupting the U.S. AI market in January with the debut of foundation models that rival those from Google and OpenAI that are available for free. The company released internal benchmark scores indicating its new model can compete with OpenAI’s GPT-5 in reasoning benchmarks and agentic tasks. A variation, DeepSeek-V3.2-Speciale, has been released for specialized math and is said to perform comparably to Google’s Gemini 3 Pro. Continue reading DeepSeek Debuts Its V3.2 Reasoning Model in Two Versions

Sony Debuts Benchmark for Measuring Computer Vision Bias

By Paula Parisi
November 11, 2025

Sony AI has introduced the Fair Human-Centric Image Benchmark (FHIBE, pronounced “Fee-bee”), a new global benchmark for fairness evaluation in computer vision models. FHIBE addresses the industry challenge of identifying biased and ethically compromised training data for AI, aiming to trigger “industry-wide improvements for responsible and ethical protocols throughout the entire life span of data — from sourcing and management to utilization — including fair compensation for participants and clear consent mechanisms,” Sony AI says. The FHIBE dataset is publicly available now, following publication in the science journal Nature. Continue reading Sony Debuts Benchmark for Measuring Computer Vision Bias

DeepSeek-V3.1 Offered with Improvements in Speed, Context

By Paula Parisi
August 21, 2025

This week, DeepSeek-V3.1 dropped on Hugging Face. Media outlets immediately began citing benchmark scores that rival proprietary systems from OpenAI and Anthropic for a system that is available via a permissive license, facilitating wide access. The 685-billion parameter Mixture-of-Experts (MoE) model has 37 billion active parameters and is designed for efficiency. It builds on DeepSeek-pioneered processes like multi-head latent attention (MLA) and multi-token prediction (MTP) to optimize inference, enabling high-performance computing on both enterprise servers loaded with H100 GPUs and consumer hardware like a Mac Studio or comparably powered PC. Continue reading DeepSeek-V3.1 Offered with Improvements in Speed, Context

Anthropic Seeks to Raise $5 Billion, Debuts Claude Opus 4.1

By Paula Parisi
August 11, 2025

Anthropic has released Claude Opus 4.1, an upgrade to Opus 4 that reportedly improves on agentic tasks, computer coding and reasoning. Pricing has not increased from what customers were paying for Opus 4, and the company promises “substantially larger improvements to our models in the coming weeks.” The move comes as Anthropic nears a new funding round targeting $3 to $5 billion, which could place a valuation of up to $170 billion on the startup. Recurring revenue hit $5 billion as of late July, which could increase to $9 billion by the end of the year. Claude Opus 4.1 was released two days before OpenAI unleashed GPT-5, and performs comparably in coding benchmarks. Continue reading Anthropic Seeks to Raise $5 Billion, Debuts Claude Opus 4.1

Manus AI Takes an Agentic Approach with Its Video Generator

By Paula Parisi
June 9, 2025

China’s Manus AI has unveiled a text-to-video generator it says can transform “prompts into complete stories — structured, sequenced, and ready to watch. With a single prompt, Manus plans each scene, crafts the visuals, and animates your vision,” the company announced last week. Manus generated buzz in March for its agentic approach to AI, and now it is putting that autonomous technology to work on generative AI, promising story generation within minutes. Last month, the firm that developed Manus, Butterfly Effect, reportedly secured $75 million in funding led by U.S.-based Benchmark for a nearly $500 million valuation. Continue reading Manus AI Takes an Agentic Approach with Its Video Generator

New Reasoning Model Improves Smarts of OpenAI Operator

By Paula Parisi
May 28, 2025

OpenAI has upgraded its autonomous web browsing agent Operator to the new reasoning model OpenAI o3 from the prior GPT-4o multimodal LLM engine. The update is being released globally in research preview this month for those who subscribe to OpenAI’s ChatGPT Pro for $200 per month. Operator serves OpenAI’s “computer-using agent” (CUA), a model trained to interact with graphical interfaces that uses the Web to perform tasks for people. “Using its own browser, it can look at a webpage, and interact with it much like a human would by typing, clicking, scrolling and more,” OpenAI explains. Continue reading New Reasoning Model Improves Smarts of OpenAI Operator

Researchers Debut Preview of DeepCoder Reasoning Model

By Paula Parisi
April 15, 2025

A new open-source code reasoning model called DeepCoder-14B-Preview has hit the market. Built atop DeepSeek-R1 and Qwen2.5 using reinforcement learning (RL), it aims to provide more flexibility by combining high-performance code generation with reasoning capabilities for real-world applications. Its performance is said to be comparable to OpenAI’s o3-mini, “but with a smaller footprint,” say its developers, the research-driven AI companies Together AI and Agentica. “We democratize the recipe for training a small model into a strong competitive coder,” explains Together AI. Continue reading Researchers Debut Preview of DeepCoder Reasoning Model

Non-Profit Sentient Launches New ‘Open Deep Search’ Model

By Paula Parisi
April 8, 2025

Sentient, a year-old non-profit backed by Peter Thiel’s Founders Fund, has released Open Deep Search (ODS), an open-source framework that leverages existing LLMs to enhance search and reasoning capabilities. Essentially a system of custom plugins and tools, ODS works with DeepSeek’s open-source R1 model as well as proprietary systems like OpenAI’s GPT-4o and Anthropic’s Claude to deliver advanced search functionality. That modular aspect is in fact ODS’s main innovation, its creators say, claiming it beats Perplexity and OpenAI’s GPT-4o Search Preview on benchmarks for accuracy and transparency. Continue reading Non-Profit Sentient Launches New ‘Open Deep Search’ Model

Cerebras Is Moving into Mainstream with New AI Data Centers

By Paula Parisi
March 17, 2025

Cerebras Systems was founded 10 years ago on the belief that there would be a shortage of processors powerful enough to drive enterprise AI computing at scale. Its solution, the Cerebras Wafer-Scale Engine, is integrated into Cerebras’ CS-3 systems, which will power six new data centers launching this year that the company says will make it “the world’s number one provider of high-speed inference and the largest domestic high speed inference cloud.” Cerebras notes the new facilities will collectively serve over 40 million Llama 70B tokens per second to clients that now include Hugging Face and financial intelligence firm AlphaSense. Continue reading Cerebras Is Moving into Mainstream with New AI Data Centers

Startup Claims AI Agent Manus Is an Autonomy Breakthrough

By Paula Parisi
March 11, 2025

Butterfly Effect is the latest Chinese AI firm to get global attention, having drummed up interest in Manus, positioned as a “general agent” that can scour online resources to produce reports. Companies like OpenAI and Google are competing in this space, called deep research. Butterfly Effect says Manus has surpassed OpenAI Deep Research on the GAIA benchmark and the world is listening. The Manus Discord server swelled to more than 138,000 members in the past weeks, and “invite codes” to gain access at this “invitation-only” phase are allegedly going for thousands of dollars on Chinese sales app Xianyu. Continue reading Startup Claims AI Agent Manus Is an Autonomy Breakthrough

Perplexity Deep Research Productivity Tool Offers a Free Tier

By Paula Parisi
February 19, 2025

“Deep research” is emerging as a model trend, with Perplexity’s Deep Research launching less than three weeks after OpenAI unveiled its own ChatGPT deep research agent, which followed Google’s similar Gemini feature. As its name implies, deep research is a productivity tool, designed to save time by having an AI agent scour materials, compiling data and analysis. Perplexity’s Deep Research “performs dozens of searches, reads hundreds of sources, and reasons through the material to autonomously deliver a comprehensive report,” across topics ranging “from finance and marketing to product research,” the company says. Continue reading Perplexity Deep Research Productivity Tool Offers a Free Tier

OpenAI Previews Two New Reasoning Models: o3 and o3-Mini

By Paula Parisi
January 10, 2025

OpenAI has unveiled a new frontier model, OpenAI o3, which it claims can “reason” through challenges involving math, science and computer programming. Available to safety and research testers, it is expected to be available to individuals and businesses this year. OpenAI o3 is said to be over 20 percent more efficient at common programming tasks than its predecessor OpenAI o1 and beat a company scientist on a programming test. Model o3 is part of a broader effort to create AI systems that can reason through complex problems. In late December Google debuted a similar platform, the experimental Gemini 2.0 Flash Thinking Mode. Continue reading OpenAI Previews Two New Reasoning Models: o3 and o3-Mini

Nvidia’s Impressive AI Model Could Compete with Top Brands

By Paula Parisi
October 21, 2024

Nvidia has debuted a new AI model, Llama-3.1-Nemotron-70B-Instruct, that it claims is outperforming competitors GPT-4o from OpenAI and Anthropic’s Claude 3.5 Sonnet. The impressive showing has prompted speculation of an AI shakeup and a significant shift in Nividia’s AI strategy, which has thus far been focused primarily on chipmaking. The model was quietly released on Hugging Face, and Nvidia says as of October 1 it ranked first on three top automatic alignment benchmarks, “edging out strong frontier models” and vaulting Nvidia to the forefront of the LLM field in areas like comprehension, context and generation. Continue reading Nvidia’s Impressive AI Model Could Compete with Top Brands

AI Boom Continues to Drive Strong Nvidia Revenue and Profit

By Paula Parisi
August 30, 2024

Nvidia has had another impressive quarter. Record revenue of $30 billion in Q2 was up 122 percent from a year ago, while data center revenue of $26.3 billion marked a 154 percent increase from the same period in 2023. The performance was seen by many as an assurance of AI’s staying power, although others raised concern that if the AI companies buying chips do not start generating profits soon, the sugar high of the two-year AI boom could precede a crash. Nvidia took the occasion to tout its next-generation Blackwell chips, reassuring investors that a mid-production “tweak” would not delay release. Continue reading AI Boom Continues to Drive Strong Nvidia Revenue and Profit

FCC Announces Updated Benchmark for Broadband Speeds

By ETCentric Staff
March 18, 2024

The Federal Communications Commission has updated its definition of what constitutes high-speed broadband, increasing it fourfold to download speeds of 100 megabits per second and upload speeds of 20 megabits per second from the 2015 benchmarks of 25/3 Mbps. The change is based on speeds available from Internet service providers, consumer usage patterns and federal and state programs, the FCC says. In a report assessing whether advanced telecommunications capability is being deployed “in a reasonable and timely fashion” across the U.S., the FCC concludes it is not, and that gaps in deployment are not closing rapidly enough. Continue reading FCC Announces Updated Benchmark for Broadband Speeds