Google Gemini Robotics On-Device Controls Robots Locally

Google DeepMind has released a new vision-language-action (VLA) model, Gemini Robotics On-Device, that can operate robots locally, controlling their movements without requiring an Internet connection or the cloud. Google says the software provides “general-purpose dexterity and fast task adaptation,” building on the March release of the first Gemini Robotics VLA model, which brought “Gemini 2.0’s multimodal reasoning and real-world understanding into the physical world.” Since the model operates independent of a data network, it’s useful for latency sensitive applications as well as low or no connectivity environments. Google is also releasing a Gemini Robotics SDK for developers.

“In benchmarks, Google claims the model performs at a level close to the cloud-based Gemini Robotics model,” reports TechCrunch, adding that “the company says it outperforms other on-device models in general benchmarks, though it didn’t name those models.”

A Google demo embedded in a blog post shows a variety of locally run robots performing tasks that include unzipping bags and folding clothes.

Gemini Robotics On-Device is the first VLA model Google is making available to developers for fine-tuning. “While many tasks will work out of the box, developers can also choose to adapt the model to achieve better performance for their applications,” Google says, noting that the new model “quickly adapts to new tasks, with as few as 50 to 100 demonstrations — indicating how well this on-device model can generalize its foundational knowledge to new tasks.”

Google explains the On-Device model was initially trained for ALOHA industrial robots from Trossen AI, but that the company “later adapted it to work on a bi-arm Franka FR3 robot and the Apollo humanoid robot by Apptronik,” according to TechCrunch.

Google DeepMind Head of Robotics Carolina Parada says the local approach to AI robotics “could make robots more reliable in challenging situations,” according to Ars Technica, which notes that robotics present a unique challenge for AI “because, not only does the robot exist in the physical world, but it also changes its environment. Whether you’re having it move blocks around or tie your shoes, it’s hard to predict every eventuality a robot might encounter.”

While the traditional approach of robot action training with reinforcement learning “was very slow,” generative AI improved things by allowing “for much greater generalization,” Ars Technica notes.

Google DeepMind emphasizes it is “developing all Gemini Robotics models in alignment with our AI Principles and applying a holistic safety approach spanning semantic and physical safety.”

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.