Demystifying Diffusion Policies: Action Memorization and Simple Lookup Table Alternatives

作者: Chengyang He, Xu Liu, Gadiel Sznaier Camps, Guillaume Sartoretti, Mac Schwager

分类: cs.RO

发布日期: 2025-05-09

🔗 代码/项目: PROJECT_PAGE

💡 一句话要点

提出动作查找表政策以替代扩散政策解决机器人操作问题

🎯 匹配领域: 支柱一：机器人控制 (Robot Control) 支柱二：RL算法与架构 (RL & Architecture)

关键词: 扩散政策 动作查找表 机器人操作 稀疏数据 对比学习 运行时监控 高效推理

📋 核心要点

现有的扩散政策在高维机器人操作任务中表现优异，但其性能机制尚不清晰，尤其在稀疏数据情况下的泛化能力不足。
本文提出的动作查找表（ALT）政策通过对比图像编码器作为哈希函数，显式地索引与训练动作序列最接近的图像，提供了一种轻量级的替代方案。
实验结果表明，ALT在小型数据集上与扩散模型的性能相当，同时推理时间仅为0.0034，内存占用为0.0085，显著提高了闭环推理的效率。

📝 摘要（中文）

扩散政策在复杂高维机器人操作任务中表现出色，尤其是在少量示例训练的情况下。然而，其性能背后的原因仍然不明。本文提出了一个假设：扩散政策本质上是记忆一个动作查找表，这在稀疏数据环境中是有益的。我们通过实证研究支持这一观点，并提出了一种简单的替代方案——动作查找表（ALT），该方法在小型数据集上与扩散模型的性能相当，但推理时间和内存占用显著降低，适用于资源受限的机器人。

🔬 方法详解

问题定义：本文旨在解决扩散政策在高维机器人操作任务中性能机制不明的问题，尤其是在稀疏数据环境下的泛化能力不足。现有方法在处理未知分布（OOD）图像时表现不佳，缺乏有效的运行时监控机制。

核心思路：论文提出的动作查找表（ALT）政策通过对比图像编码器作为哈希函数，显式地索引与训练数据中最接近的动作序列，从而实现高效的动作回忆，而无需复杂的动作泛化。

技术框架：ALT政策的整体架构包括图像编码器、动作查找模块和运行时监控模块。图像编码器将输入图像映射到潜在空间，查找模块根据潜在空间中的距离找到最接近的训练动作序列，监控模块则判断输入图像是否超出训练分布。

关键创新：ALT政策的主要创新在于其显式的动作查找机制，区别于扩散政策的隐式学习方式。通过这种设计，ALT能够在稀疏数据情况下仍然保持高效的性能。

关键设计：ALT政策使用对比损失函数来训练图像编码器，确保相似图像在潜在空间中距离较近。此外，设计了一个简单的OOD标志机制，当输入图像与训练图像的距离超过设定阈值时，系统会发出警告，增强了运行时的安全性。

📊 实验亮点

实验结果显示，ALT政策在小型数据集上的性能与扩散模型相当，推理时间仅为0.0034，内存占用为0.0085，显著提高了闭环推理的效率。这表明ALT在资源受限环境中的应用潜力。

🎯 应用场景

该研究的潜在应用领域包括工业机器人、服务机器人以及任何需要高效决策的自动化系统。ALT政策的高效性和低资源消耗使其适用于资源受限的环境，未来可能推动更多智能机器人在复杂任务中的应用。

📄 摘要（原文）

Diffusion policies have demonstrated remarkable dexterity and robustness in intricate, high-dimensional robot manipulation tasks, while training from a small number of demonstrations. However, the reason for this performance remains a mystery. In this paper, we offer a surprising hypothesis: diffusion policies essentially memorize an action lookup table -- and this is beneficial. We posit that, at runtime, diffusion policies find the closest training image to the test image in a latent space, and recall the associated training action sequence, offering reactivity without the need for action generalization. This is effective in the sparse data regime, where there is not enough data density for the model to learn action generalization. We support this claim with systematic empirical evidence. Even when conditioned on wildly out of distribution (OOD) images of cats and dogs, the Diffusion Policy still outputs an action sequence from the training data. With this insight, we propose a simple policy, the Action Lookup Table (ALT), as a lightweight alternative to the Diffusion Policy. Our ALT policy uses a contrastive image encoder as a hash function to index the closest corresponding training action sequence, explicitly performing the computation that the Diffusion Policy implicitly learns. We show empirically that for relatively small datasets, ALT matches the performance of a diffusion model, while requiring only 0.0034 of the inference time and 0.0085 of the memory footprint, allowing for much faster closed-loop inference with resource constrained robots. We also train our ALT policy to give an explicit OOD flag when the distance between the runtime image is too far in the latent space from the training images, giving a simple but effective runtime monitor. More information can be found at: https://stanfordmsl.github.io/alt/.

Demystifying Diffusion Policies: Action Memorization and Simple Lookup Table Alternatives

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册