By
Paula ParisiSeptember 29, 2025
Google has launched its Gemini-powered AI search tool Search Live in the United States. The mobile integration for iOS and Android can look at the world through your phone’s camera and respond to questions conversationally, in real time, while also offering helpful Web links for a deeper dive. “Just open the Google app and tap the new Live icon under the search bar,” Google explains. Camera sharing will be activated by default and the app also accepts video input. If you’re already pointing your camera with Google Lens, you can select the Live option at the bottom of the screen. Continue reading Google Launches Conversational ‘Search Live’ for U.S. Mobile
By
Paula ParisiAugust 15, 2025
Google DeepMind has unveiled Genie 3, a world-building model that uses text and image prompts to generate 3D environments in real time. Still in research preview, Genie 3 can output “several minutes” of video that can be navigated in real time at 24fps and a resolution of 720p. Because it remembers the rules of the world it creates, Genie 3 allows agents to predict how the environment evolves and how actions affect it. Google says world models are “a key steppingstone” to artificial general intelligence, or AGI, since they can train AI agents in “an unlimited curriculum of rich simulation.” Continue reading Genie 3 World Model Produces Minutes of Video in Real Time
By
Paula ParisiJuly 24, 2025
Startup Decart AI is showcasing MirageLSD, a “world transformation model” that can change the look of a camera feed, recorded video or game in real time. Built on the company’s Live-Stream Diffusion (LSD) model, Mirage debuted last week as a demo on the company website with iOS and Android apps scheduled for release this week. Mirage makes it possible to manipulate video continuously, in real time with zero latency. The technology has created buzz as a potential disruptor in the live-streaming space, and it looks like it could be an impactful special effects tool as well. Continue reading Decart AI’s Mirage Transforms Live-Stream Video in Real Time
By
Paula ParisiMarch 27, 2025
OpenAI has activated the multimodal image generation capabilities of GPT-4o, making it available to ChatGPT users on the Plus, Pro, Team and Free tiers. It replaces DALL-E 3 as the default image generator for the popular chatbot. GPT-4o’s accuracy with text, understanding of symbols and precision with prompts combined with well multimodal capabilities that allow the model to take cues from visual material have transformed its image capabilities from largely unpredictable to “consistent and context-aware,” resulting in “a practical tool with precision and power,” claims OpenAI. Continue reading OpenAI Delivers Native GPT-4o Image Generator to ChatGPT
By
Paula ParisiMarch 25, 2025
Google has added a Canvas feature to its Gemini AI chatbot that provides users with a real-time collaborative space where writing and coding projects can be refined and other ideas iterated and shared. “Canvas is designed for seamless collaboration with Gemini,” according to Gemini Product Director Dave Citron, who notes that Canvas makes it “an even more effective collaborator” in helping bring ideas to life. The move marks a trend whereby AI companies are trying to turn chatbot platforms into turnkey productivity suites. Google is launching a limited release of Gemini Live Video in addition to bringing its Audio Overview feature of NotebookLM to Gemini. Continue reading Canvas and Live Video Add Productivity Features to Gemini AI
By
Paula ParisiMarch 25, 2025
Anthropic’s Claude can now search the Internet in real time, allowing it to provide timely and relevant responses that are also more accurate than what the chatbot previously offered, according to the company. Claude incorporates direct citations for its Web-retrieved material, so users can fact-check its sources. “Instead of finding search results yourself, Claude processes and delivers relevant sources in a conversational format.” While this is not exactly groundbreaking — ChatGPT, Grok 3, Copilot, Perplexity and Gemini all have real-time Web retrieval and most include citations — Claude takes a slightly different approach. Continue reading Real-Time Web Access Informs Claude 3.7 Sonnet Responses
By
Paula ParisiFebruary 4, 2025
ChatGPT has a new “deep research” agent that OpenAI says uses reasoning to synthesize large amounts of online information and complete multi-step research tasks. “It accomplishes in tens of minutes what would take a human many hours,” OpenAI suggests, claiming it will “synthesize hundreds of online sources to create a comprehensive report at the level of a research analyst.” Powered by a version of the upcoming OpenAI o3 model optimized for web browsing and data analysis, the company says the deep research agent will typically take 5 to 30 minutes to complete its work. The agent is described as an ideal research tool for areas such as finance, science and engineering. Continue reading ChatGPT ‘Deep Research’ Agent Can Create Detailed Reports
By
George GerbaJanuary 10, 2025
During CES this week, Sony demonstrated a proof-of-concept experience based on the popular HBO post-apocalyptic drama “The Last of Us.” We were dropped into a six-person pod of newly enlisted defenders and assigned to a hardened defender who needed new recruits to combat a serious surge of zombie assaults that she was convinced could be overcome with our assistance. Armed with LED-enabled shotgun-like devices and tracked flashlights to assist our leader in discovering the concealed attackers, our combat leader led us with sharp and direct commands as she guided us through the terrors of the attack. Continue reading CES: Sony Introduces Interactive Experience – ‘The Last of Us’
By
Douglas ChanJanuary 8, 2025
Nvidia founder and CEO Jensen Huang kicked off CES 2025 with a keynote that was filled with new product announcements and visionary demonstrations of how the company plans to advance the field of AI. The first product that Huang unveiled was the GeForce RTX 50 series of consumer graphics processing units (GPUs). The series is also called RTX Blackwell because it is based on Nvidia’s latest Blackwell microarchitecture design for next generation data center and gaming applications. To showcase RTX Blackwell’s prowess, Huang played an impressively photorealistic video sequence of rich imagery under contrasting light ranges — all rendered in real time. Continue reading CES: Nvidia Unveils New GeForce RTX 50, AI Video Rendering
By
Paula ParisiDecember 4, 2024
Artificial voice startup Hume AI has had a busy Q4, introducing Voice Control, a no-code artificial speech interface that gives users control over 10 voice dimensions ranging from “assertiveness” to “buoyancy” and “nasality.” The company also debuted an interface that “creates emotionally intelligent voice interactions” with Anthropic’s foundation model Claude that has prompted one observer to ponder the possibility that keyboards will become a thing of the past when it comes to controlling computers. Both advances expand on Hume’s work with its own foundation model, Empathic Voice Interface 2 (EVI 2), which adds emotional timbre to AI voices. Continue reading Hume AI Introduces Voice Control and Claude Interoperability
By
Paula ParisiNovember 5, 2024
D-ID has launched two new types of AI-powered avatars: Premium+ and Express. The company’s video-to-video avatar tools aim to provide personal look-alikes that can sub for their creators in uses ranging from instructional videos to business presentations, offloading on-camera duties in areas including sales, marketing and customer support. “Premium+ Avatars can generate hyper-realistic digital humans that are indistinguishable from real people and will serve as the foundation for fully interactive digital agents revolutionizing how brands communicate,” while Express Avatars can rapidly generate serviceable avatars “from just one minute of source footage.” Continue reading D-ID’s New Business-Use Avatars Can Converse in Real Time
By
Paula ParisiJune 25, 2024
OpenAI has acquired Rockset, a database firm that provides real-time analytics, indexing and search capabilities. Rockset will help OpenAI enable its customers to better leverage their own data as they build and utilize intelligent applications. Rockset technology will be integrated into the retrieval infrastructure across OpenAI products, with members of Rockset’s San Mateo, California-based team joining the staff of OpenAI, which is headquartered in San Francisco. This is the second major purchase for OpenAI, following last year’s acquisition of New York-based AI design studio Global Illumination. Financial terms of the deal were not disclosed. Continue reading OpenAI to Expand Data Indexing, Analysis with Rockset Tech
By
Paula ParisiJune 24, 2024
Snap Inc. teased a new on-device AI model capable of real-time filter creation in-app using Snapchat. At last week’s Augmented World Expo in Long Beach, California, Snap co-founder and CTO Bobby Murphy explained that the model, which runs on smartphones, can re-render frames on the fly guided by text prompts. Snap’s unnamed prototype model “can instantly bring your imagination to life in AR,” Snap says, explaining “this early prototype makes it possible to type in an idea for a transformation and generate vivid AR experiences in real time.” Continue reading Snapchat Previews Instant AR Filters, GenAI Developer Tools
By
Paula ParisiDecember 5, 2023
The research division of Meta AI has developed Seamless Communication, a suite of artificial intelligence models that generate what the company says is natural and authentic communication across languages, facilitating what amounts to real-time universal speech translation. The models were released with accompanying research papers and data. The flagship model, Seamless, merges capabilities from a trio of models — SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2 — into a single system that can translate between almost 100 spoken and written languages, preserving idioms, emotion and the speaker’s vocal style, Meta says. Continue reading Meta AI Seamless Translator Converts Nearly 100 Languages
By
Paul BennunDecember 4, 2023
Stability AI, developer of Stable Diffusion (one of the leading visual content generators, alongside Midjourney and DALL-E), has introduced SDXL Turbo — a new AI model that demonstrates more of the latent possibilities of the common diffusion generation approach: images that update in real time as the user’s prompt updates. This feature was always a possibility even with previous diffusion models given text and images are comprehended differently across linear time, but increased efficiency of generation algorithms and the steady accretion of GPUs and TPUs in a developer’s data center makes the experience more magical. Continue reading Stability AI Intros Real-Time Text-to-Image Generation Model