By
Paula ParisiNovember 11, 2025
Sony AI has introduced the Fair Human-Centric Image Benchmark (FHIBE, pronounced “Fee-bee”), a new global benchmark for fairness evaluation in computer vision models. FHIBE addresses the industry challenge of identifying biased and ethically compromised training data for AI, aiming to trigger “industry-wide improvements for responsible and ethical protocols throughout the entire life span of data — from sourcing and management to utilization — including fair compensation for participants and clear consent mechanisms,” Sony AI says. The FHIBE dataset is publicly available now, following publication in the science journal Nature. Continue reading Sony Debuts Benchmark for Measuring Computer Vision Bias
By
Paula ParisiAugust 21, 2025
This week, DeepSeek-V3.1 dropped on Hugging Face. Media outlets immediately began citing benchmark scores that rival proprietary systems from OpenAI and Anthropic for a system that is available via a permissive license, facilitating wide access. The 685-billion parameter Mixture-of-Experts (MoE) model has 37 billion active parameters and is designed for efficiency. It builds on DeepSeek-pioneered processes like multi-head latent attention (MLA) and multi-token prediction (MTP) to optimize inference, enabling high-performance computing on both enterprise servers loaded with H100 GPUs and consumer hardware like a Mac Studio or comparably powered PC. Continue reading DeepSeek-V3.1 Offered with Improvements in Speed, Context
By
Paula ParisiAugust 11, 2025
Anthropic has released Claude Opus 4.1, an upgrade to Opus 4 that reportedly improves on agentic tasks, computer coding and reasoning. Pricing has not increased from what customers were paying for Opus 4, and the company promises “substantially larger improvements to our models in the coming weeks.” The move comes as Anthropic nears a new funding round targeting $3 to $5 billion, which could place a valuation of up to $170 billion on the startup. Recurring revenue hit $5 billion as of late July, which could increase to $9 billion by the end of the year. Claude Opus 4.1 was released two days before OpenAI unleashed GPT-5, and performs comparably in coding benchmarks. Continue reading Anthropic Seeks to Raise $5 Billion, Debuts Claude Opus 4.1
By
Paula ParisiJune 9, 2025
China’s Manus AI has unveiled a text-to-video generator it says can transform “prompts into complete stories — structured, sequenced, and ready to watch. With a single prompt, Manus plans each scene, crafts the visuals, and animates your vision,” the company announced last week. Manus generated buzz in March for its agentic approach to AI, and now it is putting that autonomous technology to work on generative AI, promising story generation within minutes. Last month, the firm that developed Manus, Butterfly Effect, reportedly secured $75 million in funding led by U.S.-based Benchmark for a nearly $500 million valuation. Continue reading Manus AI Takes an Agentic Approach with Its Video Generator
By
Paula ParisiMay 28, 2025
OpenAI has upgraded its autonomous web browsing agent Operator to the new reasoning model OpenAI o3 from the prior GPT-4o multimodal LLM engine. The update is being released globally in research preview this month for those who subscribe to OpenAI’s ChatGPT Pro for $200 per month. Operator serves OpenAI’s “computer-using agent” (CUA), a model trained to interact with graphical interfaces that uses the Web to perform tasks for people. “Using its own browser, it can look at a webpage, and interact with it much like a human would by typing, clicking, scrolling and more,” OpenAI explains. Continue reading New Reasoning Model Improves Smarts of OpenAI Operator
By
Paula ParisiApril 15, 2025
A new open-source code reasoning model called DeepCoder-14B-Preview has hit the market. Built atop DeepSeek-R1 and Qwen2.5 using reinforcement learning (RL), it aims to provide more flexibility by combining high-performance code generation with reasoning capabilities for real-world applications. Its performance is said to be comparable to OpenAI’s o3-mini, “but with a smaller footprint,” say its developers, the research-driven AI companies Together AI and Agentica. “We democratize the recipe for training a small model into a strong competitive coder,” explains Together AI. Continue reading Researchers Debut Preview of DeepCoder Reasoning Model
By
Paula ParisiApril 8, 2025
Sentient, a year-old non-profit backed by Peter Thiel’s Founders Fund, has released Open Deep Search (ODS), an open-source framework that leverages existing LLMs to enhance search and reasoning capabilities. Essentially a system of custom plugins and tools, ODS works with DeepSeek’s open-source R1 model as well as proprietary systems like OpenAI’s GPT-4o and Anthropic’s Claude to deliver advanced search functionality. That modular aspect is in fact ODS’s main innovation, its creators say, claiming it beats Perplexity and OpenAI’s GPT-4o Search Preview on benchmarks for accuracy and transparency. Continue reading Non-Profit Sentient Launches New ‘Open Deep Search’ Model
By
Paula ParisiMarch 17, 2025
Cerebras Systems was founded 10 years ago on the belief that there would be a shortage of processors powerful enough to drive enterprise AI computing at scale. Its solution, the Cerebras Wafer-Scale Engine, is integrated into Cerebras’ CS-3 systems, which will power six new data centers launching this year that the company says will make it “the world’s number one provider of high-speed inference and the largest domestic high speed inference cloud.” Cerebras notes the new facilities will collectively serve over 40 million Llama 70B tokens per second to clients that now include Hugging Face and financial intelligence firm AlphaSense. Continue reading Cerebras Is Moving into Mainstream with New AI Data Centers
By
Paula ParisiMarch 11, 2025
Butterfly Effect is the latest Chinese AI firm to get global attention, having drummed up interest in Manus, positioned as a “general agent” that can scour online resources to produce reports. Companies like OpenAI and Google are competing in this space, called deep research. Butterfly Effect says Manus has surpassed OpenAI Deep Research on the GAIA benchmark and the world is listening. The Manus Discord server swelled to more than 138,000 members in the past weeks, and “invite codes” to gain access at this “invitation-only” phase are allegedly going for thousands of dollars on Chinese sales app Xianyu. Continue reading Startup Claims AI Agent Manus Is an Autonomy Breakthrough
By
Paula ParisiFebruary 19, 2025
“Deep research” is emerging as a model trend, with Perplexity’s Deep Research launching less than three weeks after OpenAI unveiled its own ChatGPT deep research agent, which followed Google’s similar Gemini feature. As its name implies, deep research is a productivity tool, designed to save time by having an AI agent scour materials, compiling data and analysis. Perplexity’s Deep Research “performs dozens of searches, reads hundreds of sources, and reasons through the material to autonomously deliver a comprehensive report,” across topics ranging “from finance and marketing to product research,” the company says. Continue reading Perplexity Deep Research Productivity Tool Offers a Free Tier
By
Paula ParisiJanuary 10, 2025
OpenAI has unveiled a new frontier model, OpenAI o3, which it claims can “reason” through challenges involving math, science and computer programming. Available to safety and research testers, it is expected to be available to individuals and businesses this year. OpenAI o3 is said to be over 20 percent more efficient at common programming tasks than its predecessor OpenAI o1 and beat a company scientist on a programming test. Model o3 is part of a broader effort to create AI systems that can reason through complex problems. In late December Google debuted a similar platform, the experimental Gemini 2.0 Flash Thinking Mode. Continue reading OpenAI Previews Two New Reasoning Models: o3 and o3-Mini
By
Paula ParisiOctober 21, 2024
Nvidia has debuted a new AI model, Llama-3.1-Nemotron-70B-Instruct, that it claims is outperforming competitors GPT-4o from OpenAI and Anthropic’s Claude 3.5 Sonnet. The impressive showing has prompted speculation of an AI shakeup and a significant shift in Nividia’s AI strategy, which has thus far been focused primarily on chipmaking. The model was quietly released on Hugging Face, and Nvidia says as of October 1 it ranked first on three top automatic alignment benchmarks, “edging out strong frontier models” and vaulting Nvidia to the forefront of the LLM field in areas like comprehension, context and generation. Continue reading Nvidia’s Impressive AI Model Could Compete with Top Brands
By
Paula ParisiAugust 30, 2024
Nvidia has had another impressive quarter. Record revenue of $30 billion in Q2 was up 122 percent from a year ago, while data center revenue of $26.3 billion marked a 154 percent increase from the same period in 2023. The performance was seen by many as an assurance of AI’s staying power, although others raised concern that if the AI companies buying chips do not start generating profits soon, the sugar high of the two-year AI boom could precede a crash. Nvidia took the occasion to tout its next-generation Blackwell chips, reassuring investors that a mid-production “tweak” would not delay release. Continue reading AI Boom Continues to Drive Strong Nvidia Revenue and Profit
By
ETCentric StaffMarch 18, 2024
The Federal Communications Commission has updated its definition of what constitutes high-speed broadband, increasing it fourfold to download speeds of 100 megabits per second and upload speeds of 20 megabits per second from the 2015 benchmarks of 25/3 Mbps. The change is based on speeds available from Internet service providers, consumer usage patterns and federal and state programs, the FCC says. In a report assessing whether advanced telecommunications capability is being deployed “in a reasonable and timely fashion” across the U.S., the FCC concludes it is not, and that gaps in deployment are not closing rapidly enough. Continue reading FCC Announces Updated Benchmark for Broadband Speeds
By
ETCentric StaffMarch 13, 2024
The Apple WebKit team introduced the initial version of the Speedometer benchmark in 2014. Since then, it has become an industry-wide tool for gauging browser optimization and performance, even as some stakeholders complained that having been developed in the Apple ecosystem, it could not help but exhibit systemic biases that favored Safari. So, Microsoft, Google and Mozilla joined Apple to create Speedometer 3.0, “a new governance benchmark” that aims for neutrality across the architectures used by Google Chrome, Microsoft Edge and Mozilla’s Firefox. Continue reading Apple, Google, Microsoft, Mozilla Team on Speedometer 3.0