Advancing and Benchmarking Personalized Tool Invocation for LLMs

作者: Xu Huang, Yuefeng Huang, Weiwen Liu, Xingshan Zeng, Yasheng Wang, Ruiming Tang, Hong Xie, Defu Lian

分类: cs.CL, cs.AI

发布日期: 2025-05-07

备注: 14 pages, 7 figures, 5 tables

🔗 代码/项目: GITHUB

💡 一句话要点

提出个性化工具调用框架以提升LLMs的能力

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 个性化工具调用 大型语言模型 用户偏好 数据合成 基准评估

📋 核心要点

现有方法主要关注LLMs的工具调用能力，忽视了个性化约束，导致在特定用户场景下效果不佳。
本文提出个性化工具调用的概念，设计了PTool框架以合成个性化数据，并定义了工具偏好和基于用户档案的查询任务。
通过对开源模型的微调，验证了PTool框架的有效性，提供了个性化工具调用的评估基准PTBench。

📝 摘要（中文）

工具调用是扩展大型语言模型（LLMs）能力的重要机制，近年来受到广泛关注。它使LLMs能够通过工具调用解决复杂问题，并获取最新的世界知识。然而，现有研究主要集中在LLMs调用工具的基本能力上，未考虑个性化约束。本文引入个性化工具调用的概念，定义了两个关键任务：工具偏好和基于用户档案的查询。工具偏好关注用户在选择功能相似工具时的偏好，而基于用户档案的查询则考虑用户查询缺少某些工具参数的情况，要求模型从用户档案中推断这些参数。为应对这些挑战，我们提出了PTool，一个用于个性化工具调用的数据合成框架，并构建了PTBench，这是评估个性化工具调用的首个基准。我们对多种开源模型进行了微调，展示了框架的有效性，并提供了有价值的见解。基准数据已公开。

🔬 方法详解

问题定义：本文旨在解决现有LLMs在工具调用中未考虑用户个性化需求的问题，导致在特定场景下的调用效果不理想。

核心思路：通过引入个性化工具调用的概念，设计PTool框架，旨在根据用户的偏好和档案信息优化工具调用过程。

技术框架：PTool框架包括数据合成模块、工具偏好识别模块和基于用户档案的查询推断模块，整体流程为：用户输入→偏好识别→工具选择→查询推断。

关键创新：最重要的创新在于引入个性化工具调用的概念，并构建了PTBench基准，填补了个性化评估的空白。

关键设计：在设计中，采用了用户偏好建模和档案信息推断的技术细节，确保模型能够有效理解和利用用户的个性化信息。具体参数设置和损失函数设计待进一步详细说明。

📊 实验亮点

实验结果表明，使用PTool框架的模型在个性化工具调用任务上相较于基线模型性能提升显著，具体提升幅度达到XX%（具体数据待补充），验证了个性化方法的有效性。

🎯 应用场景

该研究的潜在应用领域包括智能助手、个性化推荐系统和客户服务等。通过更好地理解用户需求，个性化工具调用能够显著提升用户体验和服务质量，未来可能在多个行业中发挥重要作用。

📄 摘要（原文）

Tool invocation is a crucial mechanism for extending the capabilities of Large Language Models (LLMs) and has recently garnered significant attention. It enables LLMs to solve complex problems through tool calls while accessing up-to-date world knowledge. However, existing work primarily focuses on the fundamental ability of LLMs to invoke tools for problem-solving, without considering personalized constraints in tool invocation. In this work, we introduce the concept of Personalized Tool Invocation and define two key tasks: Tool Preference and Profile-dependent Query. Tool Preference addresses user preferences when selecting among functionally similar tools, while Profile-dependent Query considers cases where a user query lacks certain tool parameters, requiring the model to infer them from the user profile. To tackle these challenges, we propose PTool, a data synthesis framework designed for personalized tool invocation. Additionally, we construct \textbf{PTBench}, the first benchmark for evaluating personalized tool invocation. We then fine-tune various open-source models, demonstrating the effectiveness of our framework and providing valuable insights. Our benchmark is public at https://github.com/hyfshadow/PTBench.

Advancing and Benchmarking Personalized Tool Invocation for LLMs

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册