Artificial intelligence startup Odyssey, which turns two this year, has unveiled an interactive streaming AI video model. Available on the web in research preview, the model generates video streams every 40 milliseconds that viewers can navigate through — much like interacting with a 3D-rendered video game using either a keyboard, game controller or smartphone. Odyssey describes the current experience as similar to “exploring a glitchy dream” and says that while “utility is limited for now” its breakthrough is based on the fact that “improvements won’t be driven by hand-built game engines, but rather by models and data.”
“We believe this shift will rapidly unlock lifelike visuals, deeper interactivity, richer physics, and entirely new experiences that just aren’t possible within traditional film and gaming,” Odyssey details in a blog post that includes a link for experimenting with the model.
In its current state, the model’s images are indeed visually crude and the onscreen navigation tools awkward to work with, but the technology shows promise.
TechCrunch describes the environments generated as “blurry and distorted, and unstable in the sense that the layouts don’t always remain the same,” but reports “the company’s promising to rapidly improve upon the model, which can currently stream video at up to 30 frames per second from clusters of Nvidia H100 GPUs at the cost of $1 to $2 per ‘user-hour.’”
TechCrunch explains that Odyssey’s approach is different than what AI labs typically use for world modeling, writing that “it designed a 360-degree, backpack-mounted camera system to capture real-world landscapes, which Odyssey thinks can serve as a basis for higher-quality models than models trained solely on publicly available data.”
“Unlike traditional video models, which generate an entire clip at once, Odyssey’s world model updates the video frame by frame, constantly responding to your choices,” reports The Decoder, noting that Odyssey’s long-term goal “is to simulate visuals and actions so realistically that they’re indistinguishable from real life.”
The result is about two-and-a-half minutes of video generated at a time, as opposed to the 20 to 30 seconds more generalist models are capable of, Decoder says, quoting Oliver Cameron, who co-founded the Palo Alto-based company with Jeff Hawke in late 2023.
The Verge reports that Pixar co-founder Ed Catmull is among the investors that have funded the company to the tune of $27 million.
Other companies pursuing world models include Google DeepMind, Fei-Fei Li’s World Labs, Microsoft, and Decart. “They believe that world models could one day be used to create interactive media, such as games and movies, and run realistic simulations like training environments for robots,” according to TechCrunch.
No Comments Yet
You can be the first to comment!
Leave a comment
You must be logged in to post a comment.