Comparative Analysis of Large Language Models for the Machine-Assisted Resolution of User Intentions

作者: Justus Flerlage, Alexander Acker, Odej Kao

分类: cs.SE, cs.AI, cs.CL, cs.HC

发布日期: 2025-08-29 (更新: 2025-11-11)

备注: Accepted at First International Workshop on Human-AI Collaborative Systems (HAIC), published in CEUR-WS.org Vol-4072 (2025). URN: urn:nbn:de:0074-4072-x

💡 一句话要点

比较分析大型语言模型以解决用户意图问题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 大型语言模型 用户意图解析 开源模型 自然语言处理 隐私保护 本地部署 智能助手

📋 核心要点

现有的云端专有模型在隐私和自治方面存在局限，无法满足用户对本地部署的需求。
本研究提出评估开源和开放访问的LLMs，以实现用户意图的本地解析，强调其在未来操作系统中的重要性。
通过与GPT-4系统的比较，研究发现开源LLMs在生成用户工作流方面具有可行性和潜力，提供了实证支持。

📝 摘要（中文）

大型语言模型（LLMs）已成为自然语言理解和用户意图解析的变革性工具，支持翻译、摘要等任务，并逐渐实现复杂工作流的协调。这一发展标志着从传统的图形用户界面向直观的语言优先交互范式的转变。然而，现有实现往往依赖于云端专有模型，带来了隐私、自治和可扩展性方面的限制。为实现语言优先交互的稳健性，本研究评估了多个开源和开放访问模型在用户意图解析中的能力，并与OpenAI的GPT-4系统进行了比较分析，提供了关于开源LLMs作为未来意图驱动操作系统基础组件的实证见解。

🔬 方法详解

问题定义：本研究旨在解决现有云端专有模型在隐私和自治方面的不足，探索本地部署的开源LLMs在用户意图解析中的应用潜力。

核心思路：论文的核心思路是评估开源和开放访问的LLMs，以实现用户意图的本地解析，强调其在未来操作系统中的重要性。通过这种方式，用户可以在不依赖云服务的情况下，利用LLMs进行自然语言交互。

技术框架：整体架构包括多个模块：用户输入解析、意图识别、工作流生成和执行。用户通过自然语言输入其意图，系统解析后生成相应的工作流，并协调多个应用程序的操作。

关键创新：最重要的技术创新在于对开源LLMs的评估与比较，提供了与专有模型（如GPT-4）在性能上的实证对比，强调了开源模型在隐私和自治方面的优势。

关键设计：在实验中，采用了特定的参数设置和损失函数，以优化模型在用户意图解析中的表现。具体的网络结构和训练方法未详细披露，需进一步研究。

📊 实验亮点

实验结果表明，开源LLMs在生成用户工作流方面表现出色，与OpenAI的GPT-4系统相比，开源模型在隐私和自治方面具有显著优势。具体性能数据尚未披露，但研究提供了实证支持，表明开源模型在实际应用中的可行性。

🎯 应用场景

该研究的潜在应用领域包括智能助手、自动化工作流管理和个性化用户体验等。通过本地部署的开源LLMs，用户可以在更安全和自主的环境中进行自然语言交互，提升工作效率和隐私保护。未来，这种技术可能会推动智能设备的普及和智能家居的进一步发展。

📄 摘要（原文）

Large Language Models (LLMs) have emerged as transformative tools for natural language understanding and user intent resolution, enabling tasks such as translation, summarization, and, increasingly, the orchestration of complex workflows. This development signifies a paradigm shift from conventional, GUI-driven user interfaces toward intuitive, language-first interaction paradigms. Rather than manually navigating applications, users can articulate their objectives in natural language, enabling LLMs to orchestrate actions across multiple applications in a dynamic and contextual manner. However, extant implementations frequently rely on cloud-based proprietary models, which introduce limitations in terms of privacy, autonomy, and scalability. For language-first interaction to become a truly robust and trusted interface paradigm, local deployment is not merely a convenience; it is an imperative. This limitation underscores the importance of evaluating the feasibility of locally deployable, open-source, and open-access LLMs as foundational components for future intent-based operating systems. In this study, we examine the capabilities of several open-source and open-access models in facilitating user intention resolution through machine assistance. A comparative analysis is conducted against OpenAI's proprietary GPT-4-based systems to assess performance in generating workflows for various user intentions. The present study offers empirical insights into the practical viability, performance trade-offs, and potential of open LLMs as autonomous, locally operable components in next-generation operating systems. The results of this study inform the broader discussion on the decentralization and democratization of AI infrastructure and point toward a future where user-device interaction becomes more seamless, adaptive, and privacy-conscious through locally embedded intelligence.

Comparative Analysis of Large Language Models for the Machine-Assisted Resolution of User Intentions

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册