SafeMo: Linguistically Grounded Unlearning for Trustworthy Text-to-Motion Generation

作者: Yiling Wang, Zeyu Zhang, Yiran Wang, Hao Tang

分类: cs.CV

发布日期: 2026-01-02

🔗 代码/项目: GITHUB | PROJECT_PAGE

💡 一句话要点

提出SafeMo以解决文本到运动生成中的安全性问题

🎯 匹配领域: 支柱四：生成式动作 (Generative Motion)

关键词: 文本到运动生成 安全性 机器遗忘 生成模型 运动学 数据集构建 虚拟现实 人机交互

📋 核心要点

现有的文本到运动生成方法在安全性方面存在显著不足，尤其是在处理不安全意图时容易产生不良行为。
本文提出SafeMo框架，结合最小运动遗忘（MMU）策略，实现安全的人类运动生成，避免了离散代码本的缺陷。
实验结果显示，SafeMo在遗忘不安全提示方面显著提升，同时在安全提示下的表现保持良好，展示了其有效性。

📝 摘要（中文）

文本到运动生成（T2M）方法在现实性和一致性方面表现出色，但近年来对其安全性提出了担忧。现有方法通过替换离散VQ-VAE代码本条目来避免不安全行为，但存在两个主要缺陷：一是替换代码本条目可能影响正常任务的表现，二是离散方法引入量化和光滑损失，导致生成的运动出现伪影和不连贯的过渡。为了解决这些问题，本文提出了SafeMo，一个集成最小运动遗忘（MMU）的可信运动生成框架，能够在连续空间中安全生成运动，保持连续运动学特性，并在安全性与效用之间实现良好的权衡。此外，本文还构建了首个安全文本到运动数据集SafeMoVAE-29K，包含重写的安全文本提示和连续精炼的运动。实验结果表明，SafeMo在遗忘不安全提示方面表现出色，较现有方法在HumanML3D和Motion-X上分别提升了2.5倍和14.4倍的遗忘集FID，同时在安全提示下的表现优于或可比于现有方法。

🔬 方法详解

问题定义：本文旨在解决文本到运动生成中的安全性问题，现有方法通过替换离散代码本条目来避免不安全行为，但这会影响正常任务的表现，并引入伪影和不连贯的过渡。

核心思路：SafeMo框架通过最小运动遗忘（MMU）策略，在连续空间中安全生成运动，保持运动学特性，避免了离散方法的缺陷，从而实现安全性与效用的良好平衡。

技术框架：SafeMo的整体架构包括两个主要阶段：首先是最小运动遗忘阶段，针对不安全提示进行遗忘处理；其次是安全运动生成阶段，利用改进的生成模型生成连续且安全的运动。

关键创新：SafeMo的核心创新在于引入了MMU策略，能够有效地遗忘不安全提示，同时保持生成模型在安全提示下的良好性能，与现有的离散方法相比，避免了量化和光滑损失。

关键设计：在技术细节上，SafeMo采用了改进的生成网络结构，优化了损失函数以平衡安全性与生成质量，同时在数据集构建中引入了重写的安全文本提示，确保训练数据的安全性。

🖼️ 关键图片

📊 实验亮点

实验结果表明，SafeMo在遗忘不安全提示方面表现优异，HumanML3D和Motion-X上的遗忘集FID分别提升了2.5倍和14.4倍，相较于现有的SOTA方法LCR。同时，在安全提示下，SafeMo的表现优于或可与现有方法相媲美，展示了其有效性和实用性。

🎯 应用场景

SafeMo的研究成果在多个领域具有潜在应用价值，包括虚拟现实、游戏开发和人机交互等场景。通过安全生成的人类运动，能够提升用户体验，减少不安全行为的发生，推动相关技术的健康发展。未来，SafeMo有望在更广泛的应用中实现安全与效用的平衡。

📄 摘要（原文）

Text-to-motion (T2M) generation with diffusion backbones achieves strong realism and alignment. Safety concerns in T2M methods have been raised in recent years; existing methods replace discrete VQ-VAE codebook entries to steer the model away from unsafe behaviors. However, discrete codebook replacement-based methods have two critical flaws: firstly, replacing codebook entries which are reused by benign prompts leads to drifts on everyday tasks, degrading the model's benign performance; secondly, discrete token-based methods introduce quantization and smoothness loss, resulting in artifacts and jerky transitions. Moreover, existing text-to-motion datasets naturally contain unsafe intents and corresponding motions, making them unsuitable for safety-driven machine learning. To address these challenges, we propose SafeMo, a trustworthy motion generative framework integrating Minimal Motion Unlearning (MMU), a two-stage machine unlearning strategy, enabling safe human motion generation in continuous space, preserving continuous kinematics without codebook loss and delivering strong safety-utility trade-offs compared to current baselines. Additionally, we present the first safe text-to-motion dataset SafeMoVAE-29K integrating rewritten safe text prompts and continuous refined motion for trustworthy human motion unlearning. Built upon DiP, SafeMo efficiently generates safe human motions with natural transitions. Experiments demonstrate effective unlearning performance of SafeMo by showing strengthened forgetting on unsafe prompts, reaching 2.5x and 14.4x higher forget-set FID on HumanML3D and Motion-X respectively, compared to the previous SOTA human motion unlearning method LCR, with benign performance on safe prompts being better or comparable. Code: https://github.com/AIGeeksGroup/SafeMo. Website: https://aigeeksgroup.github.io/SafeMo.

SafeMo: Linguistically Grounded Unlearning for Trustworthy Text-to-Motion Generation

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册