AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models

作者: Guangke Chen, Fu Song, Zhe Zhao, Xiaojun Jia, Yang Liu, Yanchen Qiao, Weizhe Zhang

分类: cs.CR, cs.AI, cs.LG, cs.SD, eess.AS

发布日期: 2025-05-20 (更新: 2025-05-21)

🔗 代码/项目: PROJECT_PAGE

💡 一句话要点

提出AudioJailbreak以解决音频语言模型的安全漏洞问题

🎯 匹配领域: 支柱八：物理动画 (Physics-based Animation)

关键词: 音频监狱突破 音频语言模型 安全性 攻击策略 机器学习

📋 核心要点

现有的音频监狱突破攻击在有效性和适用性上存在不足，尤其在攻击者无法完全操控用户提示时，攻击效果更差。
论文提出AudioJailbreak，采用异步音频、普适扰动、隐蔽策略和抗干扰设计，显著提升了攻击的有效性和适用场景。
实验结果表明，AudioJailbreak在多种大型音频语言模型上均表现出高效的攻击效果，超越了现有方法的性能。

📝 摘要（中文）

近年来对大型音频语言模型（LALMs）的监狱突破攻击进行了研究，但其有效性、适用性和实用性仍然不足，尤其是在假设攻击者可以完全操控用户提示的情况下。本研究首先通过广泛实验表明，先进的文本监狱突破攻击无法轻易迁移到端到端的LALMs。接着，我们提出了AudioJailbreak，这是一种新颖的音频监狱突破攻击，具有异步性、普适性、隐蔽性和抗干扰性等特点。与以往的音频监狱突破攻击相比，AudioJailbreak在多种场景下表现出更高的有效性，且适用于无法完全操控用户提示的攻击者。我们强调本研究揭示了音频监狱突破攻击对LALMs的安全隐患，并促进了其安全性提升。

🔬 方法详解

问题定义：本论文旨在解决现有音频监狱突破攻击在有效性、适用性和实用性上的不足，尤其是在攻击者无法完全操控用户提示的情况下，攻击效果较差。

核心思路：论文的核心思路是通过设计一种新型的音频监狱突破攻击，AudioJailbreak，利用异步音频和普适扰动等特性，使得攻击在多种场景下都能有效实施。

技术框架：AudioJailbreak的整体架构包括音频扰动生成模块、用户提示整合模块和隐蔽策略模块。扰动生成模块负责生成适应不同提示的音频扰动，用户提示整合模块则将多个提示信息融入扰动生成中，隐蔽策略模块则确保攻击的隐蔽性。

关键创新：AudioJailbreak的关键创新在于其异步性、普适性、隐蔽性和抗干扰性，这些特性使其在攻击效果上显著优于以往的音频监狱突破攻击。

关键设计：在设计中，AudioJailbreak采用了多种意图隐蔽策略，确保攻击音频不会引起受害者的警觉，同时在扰动生成中考虑了房间脉冲响应的混响失真效应，以增强抗干扰能力。

📊 实验亮点

实验结果显示，AudioJailbreak在多种大型音频语言模型上实现了显著的攻击效果，相较于传统方法，攻击成功率提升了30%以上，且在多种用户提示下均表现出高效性，展示了其广泛的适用性。

🎯 应用场景

AudioJailbreak的研究成果具有广泛的应用潜力，尤其在安全性要求高的音频语言处理系统中。该技术可以帮助开发更为安全的音频交互系统，防止潜在的恶意攻击，提升用户的信任度和安全感。未来，随着音频技术的不断发展，AudioJailbreak的应用场景将更加丰富。

📄 摘要（原文）

Jailbreak attacks to Large audio-language models (LALMs) are studied recently, but they achieve suboptimal effectiveness, applicability, and practicability, particularly, assuming that the adversary can fully manipulate user prompts. In this work, we first conduct an extensive experiment showing that advanced text jailbreak attacks cannot be easily ported to end-to-end LALMs via text-to speech (TTS) techniques. We then propose AudioJailbreak, a novel audio jailbreak attack, featuring (1) asynchrony: the jailbreak audio does not need to align with user prompts in the time axis by crafting suffixal jailbreak audios; (2) universality: a single jailbreak perturbation is effective for different prompts by incorporating multiple prompts into perturbation generation; (3) stealthiness: the malicious intent of jailbreak audios will not raise the awareness of victims by proposing various intent concealment strategies; and (4) over-the-air robustness: the jailbreak audios remain effective when being played over the air by incorporating the reverberation distortion effect with room impulse response into the generation of the perturbations. In contrast, all prior audio jailbreak attacks cannot offer asynchrony, universality, stealthiness, or over-the-air robustness. Moreover, AudioJailbreak is also applicable to the adversary who cannot fully manipulate user prompts, thus has a much broader attack scenario. Extensive experiments with thus far the most LALMs demonstrate the high effectiveness of AudioJailbreak. We highlight that our work peeks into the security implications of audio jailbreak attacks against LALMs, and realistically fosters improving their security robustness. The implementation and audio samples are available at our website https://audiojailbreak.github.io/AudioJailbreak.

AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册