Talk Freely, Execute Strictly: Schema-Gated Agentic AI for Flexible and Reproducible Scientific Workflows

作者: Joel Strickland, Arjun Vijeta, Chris Moores, Oliwia Bodek, Bogdan Nenchev, Thomas Whitehead, Charles Phillips, Karl Tassenberg, Gareth Conduit, Ben Pellegrini

分类: cs.AI, cs.LG, cs.MA

发布日期: 2026-03-06

💡 一句话要点

提出Schema-Gated Agentic AI，解决科学工作流中确定性与灵活性的矛盾。

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: Agentic AI 科学工作流 大型语言模型 Schema-Gated 执行确定性 对话灵活性 人机协同 自动化科研

📋 核心要点

现有方法难以在科学工作流中同时保证LLM的执行确定性和对话灵活性，限制了其应用。
提出Schema-Gated Agentic AI，通过schema作为执行边界，分离对话与执行权限，实现确定性和灵活性的解耦。
通过多模型LLM评分评估了20个系统，揭示了帕累托前沿，并验证了Schema-Gated架构的潜力。

📝 摘要（中文）

大型语言模型（LLMs）可以将研究人员的自然语言目标转化为可执行的计算，但科学工作流需要确定性、溯源性和可控性，这在使用LLM决定运行内容时难以保证。通过对10个工业研发领域的18位专家的半结构化访谈，我们发现了两个相互竞争的需求：确定性的、受约束的执行和无需工作流刚性的对话灵活性，以及任何解决方案必须满足的边界属性（人机协同控制和透明度）。我们提出了schema-gated编排作为解决方案：schema成为组合工作流级别的强制执行边界，只有完整的动作（包括跨步骤依赖）通过机器可检查的规范验证后才能运行。我们将这两个需求分别定义为执行确定性（ED）和对话灵活性（CF），并使用这些轴来评估跨越5个架构组的20个系统。通过多模型协议（3个LLM家族的15个独立会话）分配分数，产生了实质上接近完美的模型间一致性（ED的Krippendorff a=0.80，CF的a=0.98），表明多模型LLM评分可以作为人工专家组的替代方案，用于架构评估。由此产生的格局揭示了一个经验帕累托前沿——没有一个被评估的系统同时实现了高灵活性和高确定性——但在生成式和以工作流为中心的极端之间出现了一个收敛区。我们认为，一种将对话权限与执行权限分离的schema-gated架构，能够解耦这种权衡，并提炼出三个操作原则——执行前澄清、约束性计划-行动编排和工具到工作流级别的门控——以指导采用。

🔬 方法详解

问题定义：现有基于LLM的科学工作流系统难以同时满足两个关键需求：一是执行的确定性，保证结果可复现；二是对话的灵活性，允许用户以自然语言交互并调整流程。现有方法往往偏向一方，难以兼顾，导致实际应用受限。

核心思路：论文的核心思路是引入“schema-gated”机制，将schema作为工作流执行的强制边界。这意味着只有符合预定义schema的动作才能被执行，从而保证了执行的确定性。同时，通过将对话权限与执行权限分离，允许用户以灵活的方式与系统交互，而最终的执行仍然受到schema的约束。

技术框架：整体架构包含以下几个主要模块：1) 自然语言理解模块，负责将用户的自然语言指令转化为结构化的意图；2) Schema管理模块，维护预定义的schema，用于验证动作的有效性；3) 计划与编排模块，根据用户意图和schema生成可执行的工作流计划；4) 执行引擎，负责执行工作流计划，并确保所有动作都符合schema的约束。

关键创新：最重要的技术创新点在于schema-gated机制，它将schema作为工作流执行的强制边界，从而保证了执行的确定性。与现有方法相比，该方法能够更好地平衡执行确定性和对话灵活性，从而更适用于科学工作流等对可靠性要求较高的场景。

关键设计：论文提出了三个关键的操作原则：1) 执行前澄清，在执行任何动作之前，系统会与用户进行澄清，以确保用户意图的准确性；2) 约束性计划-行动编排，系统会根据用户意图和schema生成可执行的工作流计划，并严格按照计划执行；3) 工具到工作流级别的门控，系统会对所有动作进行验证，确保其符合schema的约束。

🖼️ 关键图片

📊 实验亮点

论文通过多模型LLM评分评估了20个系统，结果显示没有一个系统能同时实现高灵活性和高确定性，验证了现有方法的局限性。同时，实验结果表明，Schema-Gated架构有潜力解耦这种权衡，为未来的研究方向提供了指导。

🎯 应用场景

该研究成果可应用于需要高可靠性和可复现性的科学研究领域，例如药物发现、材料科学、化学工程等。通过Schema-Gated Agentic AI，研究人员可以使用自然语言与系统交互，并确保实验流程的正确性和结果的可信度，从而加速科研进程。

📄 摘要（原文）

Large language models (LLMs) can now translate a researcher's plain-language goal into executable computation, yet scientific workflows demand determinism, provenance, and governance that are difficult to guarantee when an LLM decides what runs. Semi-structured interviews with 18 experts across 10 industrial R&D stakeholders surface 2 competing requirements--deterministic, constrained execution and conversational flexibility without workflow rigidity--together with boundary properties (human-in-the-loop control and transparency) that any resolution must satisfy. We propose schema-gated orchestration as the resolving principle: the schema becomes a mandatory execution boundary at the composed-workflow level, so that nothing runs unless the complete action--including cross-step dependencies--validates against a machine-checkable specification. We operationalize the 2 requirements as execution determinism (ED) and conversational flexibility (CF), and use these axes to review 20 systems spanning 5 architectural groups along a validation-scope spectrum. Scores are assigned via a multi-model protocol--15 independent sessions across 3 LLM families--yielding substantial-to-near-perfect inter-model agreement (Krippendorff a=0.80 for ED and a=0.98 for CF), demonstrating that multi-model LLM scoring can serve as a reusable alternative to human expert panels for architectural assessment. The resulting landscape reveals an empirical Pareto front--no reviewed system achieves both high flexibility and high determinism--but a convergence zone emerges between the generative and workflow-centric extremes. We argue that a schema-gated architecture, separating conversational from execution authority, is positioned to decouple this trade-off, and distill 3 operational principles--clarification-before-execution, constrained plan-act orchestration, and tool-to-workflow-level gating--to guide adoption.

Talk Freely, Execute Strictly: Schema-Gated Agentic AI for Flexible and Reproducible Scientific Workflows

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理