Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer

📄 arXiv: 2510.03342v3 📥 PDF

作者: Gemini Robotics Team, Abbas Abdolmaleki, Saminda Abeyruwan, Joshua Ainslie, Jean-Baptiste Alayrac, Montserrat Gonzalez Arenas, Ashwin Balakrishna, Nathan Batchelor, Alex Bewley, Jeff Bingham, Michael Bloesch, Konstantinos Bousmalis, Philemon Brakel, Anthony Brohan, Thomas Buschmann, Arunkumar Byravan, Serkan Cabi, Ken Caluwaerts, Federico Casarini, Christine Chan, Oscar Chang, London Chappellet-Volpini, Jose Enrique Chen, Xi Chen, Hao-Tien Lewis Chiang, Krzysztof Choromanski, Adrian Collister, David B. D'Ambrosio, Sudeep Dasari, Todor Davchev, Meet Kirankumar Dave, Coline Devin, Norman Di Palo, Tianli Ding, Carl Doersch, Adil Dostmohamed, Yilun Du, Debidatta Dwibedi, Sathish Thoppay Egambaram, Michael Elabd, Tom Erez, Xiaolin Fang, Claudio Fantacci, Cody Fong, Erik Frey, Chuyuan Fu, Ruiqi Gao, Marissa Giustina, Keerthana Gopalakrishnan, Laura Graesser, Oliver Groth, Agrim Gupta, Roland Hafner, Steven Hansen, Leonard Hasenclever, Sam Haves, Nicolas Heess, Brandon Hernaez, Alex Hofer, Jasmine Hsu, Lu Huang, Sandy H. Huang, Atil Iscen, Mithun George Jacob, Deepali Jain, Sally Jesmonth, Abhishek Jindal, Ryan Julian, Dmitry Kalashnikov, M. Emre Karagozler, Stefani Karp, Matija Kecman, J. Chase Kew, Donnie Kim, Frank Kim, Junkyung Kim, Thomas Kipf, Sean Kirmani, Ksenia Konyushkova, Li Yang Ku, Yuheng Kuang, Thomas Lampe, Antoine Laurens, Tuan Anh Le, Isabel Leal, Alex X. Lee, Tsang-Wei Edward Lee, Guy Lever, Jacky Liang, Li-Heng Lin, Fangchen Liu, Shangbang Long, Caden Lu, Sharath Maddineni, Anirudha Majumdar, Kevis-Kokitsi Maninis, Andrew Marmon, Sergio Martinez, Assaf Hurwitz Michaely, Niko Milonopoulos, Joss Moore, Robert Moreno, Michael Neunert, Francesco Nori, Joy Ortiz, Kenneth Oslund, Carolina Parada, Emilio Parisotto, Amaris Paryag, Acorn Pooley, Thomas Power, Alessio Quaglino, Haroon Qureshi, Rajkumar Vasudeva Raju, Helen Ran, Dushyant Rao, Kanishka Rao, Isaac Reid, David Rendleman, Krista Reymann, Miguel Rivas, Francesco Romano, Yulia Rubanova, Peter Pastor Sampedro, Pannag R Sanketi, Dhruv Shah, Mohit Sharma, Kathryn Shea, Mohit Shridhar, Charles Shu, Vikas Sindhwani, Sumeet Singh, Radu Soricut, Rachel Sterneck, Ian Storz, Razvan Surdulescu, Jie Tan, Jonathan Tompson, Saran Tunyasuvunakool, Jake Varley, Grace Vesom, Giulia Vezzani, Maria Bauza Villalonga, Oriol Vinyals, René Wagner, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Chengda Wu, Markus Wulfmeier, Fei Xia, Ted Xiao, Annie Xie, Jinyu Xie, Peng Xu, Sichun Xu, Ying Xu, Zhuo Xu, Jimmy Yan, Sherry Yang, Skye Yang, Yuxiang Yang, Hiu Hong Yu, Wenhao Yu, Wentao Yuan, Yuan Yuan, Jingwei Zhang, Tingnan Zhang, Zhiyuan Zhang, Allan Zhou, Guangyao Zhou, Yuxiang Zhou

分类: cs.RO

发布日期: 2025-10-02 (更新: 2025-11-28)


💡 一句话要点

Gemini Robotics 1.5:通过具身推理、思考和运动迁移推进通用机器人前沿

🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 通用机器人 具身推理 视觉-语言-动作模型 运动迁移 多层次推理

📋 核心要点

  1. 通用机器人需要对物理世界有深刻的理解、高级推理能力以及通用且灵巧的控制能力。
  2. Gemini Robotics 1.5通过新颖的架构和运动迁移机制,从异构多具身机器人数据中学习,提升VLA模型的通用性。
  3. Gemini Robotics 1.5结合多层次内部推理过程,使机器人具备“三思而后行”的能力,从而更好地分解和执行复杂任务。

📝 摘要(中文)

本报告介绍了Gemini Robotics模型系列的最新一代:Gemini Robotics 1.5,一个多具身视觉-语言-动作(VLA)模型,以及Gemini Robotics-ER 1.5,一个最先进的具身推理(ER)模型。我们汇集了三个主要创新。首先,Gemini Robotics 1.5采用了一种新颖的架构和运动迁移(MT)机制,使其能够从异构、多具身机器人数据中学习,并使VLA更具通用性。其次,Gemini Robotics 1.5将动作与自然语言中的多层次内部推理过程交织在一起。这使得机器人能够“三思而后行”,并显著提高其分解和执行复杂、多步骤任务的能力,同时也使机器人的行为对用户更具可解释性。第三,Gemini Robotics-ER 1.5为具身推理建立了一个新的最先进水平,即对于机器人至关重要的推理能力,如视觉和空间理解、任务规划和进度估计。总之,这一系列模型使我们朝着物理智能体时代迈进了一步——使机器人能够感知、思考然后行动,从而解决复杂的多步骤任务。

🔬 方法详解

问题定义:现有通用机器人缺乏足够的物理世界理解、高级推理和通用灵巧控制能力,难以胜任复杂多步骤任务。现有方法在处理异构多具身机器人数据时存在泛化性问题,且缺乏有效的内部推理机制,导致任务分解和执行能力不足。

核心思路:Gemini Robotics 1.5的核心思路是通过引入运动迁移机制和多层次内部推理过程,提升机器人的通用性和任务执行能力。运动迁移机制允许模型从异构数据中学习,而内部推理过程则使机器人能够在行动前进行思考和规划。

技术框架:Gemini Robotics 1.5是一个多具身视觉-语言-动作(VLA)模型。其整体架构包含视觉感知模块、语言理解模块、动作生成模块和内部推理模块。视觉感知模块负责从机器人传感器获取视觉信息,语言理解模块负责解析用户指令,动作生成模块负责生成机器人执行的动作序列,内部推理模块则负责在动作生成前进行任务分解和规划。Gemini Robotics-ER 1.5则专注于提升具身推理能力,包括视觉和空间理解、任务规划和进度估计。

关键创新:Gemini Robotics 1.5的关键创新在于其运动迁移(MT)机制和多层次内部推理过程。运动迁移机制允许模型从不同类型的机器人数据中学习,从而提升模型的泛化能力。多层次内部推理过程则使机器人能够在执行任务前进行思考和规划,从而更好地分解和执行复杂任务。

关键设计:Gemini Robotics 1.5的具体技术细节(如损失函数、网络结构等)在论文中未详细说明,属于未知信息。但可以推测,运动迁移机制可能涉及到领域自适应或元学习等技术,而内部推理过程可能涉及到强化学习或规划算法。

📊 实验亮点

Gemini Robotics 1.5在多项具身推理和任务执行基准测试中取得了显著的性能提升。具体的数据和对比基线在论文中未详细给出,属于未知信息。但摘要中提到,Gemini Robotics-ER 1.5在具身推理方面建立了新的state-of-the-art,表明其在视觉和空间理解、任务规划和进度估计等方面具有显著优势。

🎯 应用场景

该研究成果可应用于各种需要通用机器人的场景,例如家庭服务、工业自动化、医疗辅助等。通过提升机器人的感知、推理和控制能力,使其能够更好地理解用户指令、执行复杂任务,从而提高生产效率和服务质量。未来,该技术有望推动机器人更广泛地应用于各行各业,并改变人们的生活方式。

📄 摘要(原文)

General-purpose robots need a deep understanding of the physical world, advanced reasoning, and general and dexterous control. This report introduces the latest generation of the Gemini Robotics model family: Gemini Robotics 1.5, a multi-embodiment Vision-Language-Action (VLA) model, and Gemini Robotics-ER 1.5, a state-of-the-art Embodied Reasoning (ER) model. We are bringing together three major innovations. First, Gemini Robotics 1.5 features a novel architecture and a Motion Transfer (MT) mechanism, which enables it to learn from heterogeneous, multi-embodiment robot data and makes the VLA more general. Second, Gemini Robotics 1.5 interleaves actions with a multi-level internal reasoning process in natural language. This enables the robot to "think before acting" and notably improves its ability to decompose and execute complex, multi-step tasks, and also makes the robot's behavior more interpretable to the user. Third, Gemini Robotics-ER 1.5 establishes a new state-of-the-art for embodied reasoning, i.e., for reasoning capabilities that are critical for robots, such as visual and spatial understanding, task planning, and progress estimation. Together, this family of models takes us a step towards an era of physical agents-enabling robots to perceive, think and then act so they can solve complex multi-step tasks.