Google DeepMind Unveils RT-2, Enhancing Robotics Transformer System's Ability to Execute Novel Tasks
Google's DeepMind has upgraded Robotics Transformer to RT-2, enhancing its capability to teach robots new tasks.

A remarkable progression in the sphere of robotics has been announced by Google’s DeepMind team, introducing Robotics Transformer RT-2, an advanced version of their initial Robotics Transformer system, RT-1. This new system continues the development of the Everyday Robot initiative, imparting skills to robots such as object handling and drawer opening.
Launched last year, RT-1, with its vast database of 130,000 demonstrations, imparted an array of simple tasks to Everyday Robot systems, attaining a phenomenal success rate of 97% in executing over 700 tasks, stated by the robotics team.
The newly revealed RT-2, as outlined in a recent blog post by DeepMind’s Distinguished Scientist and Head of Robotics, Vincent Vanhoucke, has escalated this process by enabling robots to effectively utilize lessons learned from limited datasets, and apply them in diverse scenarios.
Google elaborates on RT-2’s enhanced capabilities, stating that it not only exhibits better understanding and generalization skills but also, can comprehend and react to new commands. The system goes beyond its initial robotic training and offers a basic level of reasoning, such as making deductions regarding object categorizations and high-level descriptions. This capability underscores an impressive feature of RT-2 where it can decide on the appropriate tool for a completely new task, based on pre-existing situational information.
Vanhoucke illustrates this with an example where RT-2 successfully identifies and disposes of trash. In traditional models, the user needed to train the robot to discern how to recognize and categorize trash and then, instruct it further on how to pick and dispose of it. Such detailed processes are not highly scalable when applied to systems predicted to perform a wide-ranging list of tasks.
However, RT-2, through its capability to utilize knowledge from a vast corpus of web data, is already equipped with a concept of what constitutes trash and can pinpoint this without explicit directive, as explained by Vanhoucke. The system even comprehends the action of disposing of trash, despite the absence of specific training on the action. One of the standout abilities of RT-2 is understanding the abstract nature of trash. It understands a used bag of chips or a banana peel can be categorized as trash, deducing this from its vision-language training data, and executes the required action accordingly.
The DeepMind team reports a notable improvement in RT-2's rate of success when executing new tasks in comparison to its predecessor, growing from 32% to 62% with this new iteration. Platforms like AppMaster can be highly beneficial in the development of such transformative projects, providing a robust no-code tool for creating backend, web, and mobile applications that can streamline the workflow of the development process. With the new advancements, such platforms are not only expected to improve robotics efficiency but also to nurture a new wave of technological advancements in various sectors.


