“The next great jump of humanity will be humanoid robots,” says Rev Lebaredian, vice president of Omniverse and simulation technology in the Nvidia computer giant. The trampoline for that jump, planned as one of the disruptive advances of the coming years, is already here and Google has just joined the race when announcing Gemini Robotics, the development of its artificial intelligence model (AI) for machines, for both industrial and humanoid robots, and which has made available to the greats of the industry, such as Apptronik, Agile Robots, Agile Robots Robots, Boston Dynamics and Enchanted Tools, to test it.
The robots so far were articulated “blind and dumb” mechanisms, as the old models described, designed to perform repetitive, but unable to learn, to develop in unknown scenarios and act accordingly.
For Dennis Hong, founder of Romela, “the future is that robots can execute anything a human can do.” But to act as a person they need a brain that allows them to understand, learn, perceive and act. And that mind is the AI based on large language models (LLM), artificial intelligence capable of developing machines until their last expression: androids, robots with appearance and behaviors similar to humans capable of developing in a world developed by and for people.
Robots with Google’s artificial intelligence still do not show such complex skills as Figure 01, the prototype closest to humanoid that science fiction had anticipated and backed by Open AI, Nvidia and Jeff Bezos, founder of Amazon.
But those endowed with Gemini Robotics are very close after the change of course adopted in 2024. “Last year,” explains Carolina Parada, director of Engineering at Google Deepmind Robotics and of Venezuelan origin, “we decided to take a new challenge and focus on teaching robots to perform complex tasks of fine manipulation, such as those we do when we do when we do the cords of the shoes. real world and simulation data to learn. ”
From that challenge Gemini Robotics has emerged, the AI model destined to the development of general purpose robots (humanoids). “For this, they need to be really useful, to understand you, to understand the world that surrounds you and then be able to act safely, interactively and skillfully,” says stop.
The laboratory tests shown, where the robots, from voice commands, collect and keep objects in specific containers described only for their color and that are changing site, may seem simple, but for a robot it is very difficult. In this sense, Kanisha Rao, a stop partner in Deepmind, points out that robots, “work well in scenarios that have experienced before, but fail in strangers.”
In this way, according to Rao, during the tests the machines have been taken to situations where the objects that have to identify and manipulate change color, the environments are modified and the AI responds to orders of unpublished actions for the machine or on objects that I did not know, such as to turn a toy basketball ball without having known before what this sport is.
To achieve these skills, according to Parada, the Robot AI has to understand the natural legend, “understand the physical world in great detail” and, adds Vikas Sindhwani, scientific researcher in the Google Deepmind robotics team, act safely through “evaluations of the properties of the scene and the consequences of performing a certain action”.
The security path is still open. Sindhwani states that they have managed to get robots with a wide “understanding” of this concept from both real and simulated data that their AI feeds, but they continue to adjust to “allow increasingly interactive and collaborative tasks” without risks and comply with the three rules of Isaac Asimov: a robot should not damage a human by action or omission; It must obey human orders, unless it is in conflict with the first law; and must protect your own existence, unless you conflict with the first or second law.
The global concept of the new Google step to robotization is the transfer of what is achieved in the digital world, with the development of agents (assistants) increasingly sophisticated to the physical environment. “In Deepmind, we have been progressing in the way our Gemini models solve complex problems through multimodal reasoning from texts, images, audios and videos. Until now, however, these skills have been largely limited to the digital field. In order for AI to be useful for people in the physical field, they have to demonstrate a reasoning “Embodied“, The human ability to understand and react to the world around us,” explains Parada.
The two Google AI models for robotization are VLA (vision-language-action), built from Gemini 2.0 and to which physical actions were incorporated, and ER (and ER (Embodied Reasoning), with reasoning skills.
These tools are the way for real utility, which stops summarizes: “IA models for robotics need three main qualities: they have to be general, that is, they are able to adapt to different situations; They have to be interactive, which means that they can quickly understand and respond to instructions or changes in their environment; And they have to have skill, which means that they can do the kind of things that people can usually do with their hands and fingers, such as manipulating objects carefully. ”