BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence

📄 arXiv: 2312.16893v2 📥 PDF

作者: Zhecheng Sheng, Tianhao Zhang, Chen Jiang, Dongyeop Kang

分类: cs.CL, cs.AI

发布日期: 2023-12-28 (更新: 2025-03-11)

备注: Accepted to the 38th Annual AAAI Conference on Artificial Intelligence (AAAI-24)


💡 一句话要点

提出BBScore以解决文本连贯性评估问题

🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 文本连贯性 布朗桥理论 无参考度量 自然语言处理 机器学习

📋 核心要点

  1. 现有方法在评估长文本的整体连贯性时,往往依赖静态嵌入或局部上下文,导致效果受限。
  2. 本文提出BBScore,基于布朗桥理论,旨在捕捉句子间的连贯性和主题传达,提供一种新的评估方式。
  3. 实验结果显示,BBScore在与最先进技术的比较中表现相当,并能有效区分人类与机器生成的文本。

📝 摘要(中文)

文本连贯性评估是衡量书面内容质量的重要方面。近年来,神经连贯性建模的进展显示出其在捕捉实体共指和话语关系方面的有效性,从而增强了连贯性评估。然而,许多现有方法过于依赖静态嵌入或局部上下文,限制了它们对长文本整体连贯性的测量。本文提出了BBScore,这是一种基于布朗桥理论的无参考度量,用于评估文本连贯性。研究表明,结合简单的分类组件后,该度量在标准人工区分任务中达到了与最先进技术相当的性能。此外,该度量在下游任务中有效区分人类撰写的文档与特定领域的大型语言模型生成的文本,展示了其广泛的适用性。

🔬 方法详解

问题定义:本文旨在解决现有文本连贯性评估方法在处理长文本时的不足,特别是对整体连贯性的测量能力有限。

核心思路:提出BBScore度量,基于布朗桥理论,强调句子间的顺序和连贯性,旨在更全面地捕捉文本的主题和目的。

技术框架:BBScore的整体架构包括文本输入、布朗桥模型计算和分类组件,能够在无参考的情况下评估文本连贯性。

关键创新:BBScore的创新在于其基于布朗桥理论的设计,使其能够同时评估局部和整体的文本连贯性,区别于传统方法的静态嵌入。

关键设计:在实现中,BBScore结合了简单的分类器,采用特定的损失函数来优化连贯性评估,确保其在不同文本类型中的适用性。

📊 实验亮点

实验结果表明,BBScore在标准人工区分任务中表现出与最先进技术相当的性能,能够有效区分人类撰写的文本与大型语言模型生成的文本,展示出其在文本连贯性评估中的有效性和可靠性。

🎯 应用场景

BBScore的潜在应用领域包括文本生成质量评估、自动摘要生成、以及对话系统的连贯性检测等。其无参考的特性使得该方法在多种文本处理任务中具有实际价值,能够帮助提升自然语言处理系统的性能和用户体验。

📄 摘要(原文)

Measuring the coherence of text is a vital aspect of evaluating the quality of written content. Recent advancements in neural coherence modeling have demonstrated their efficacy in capturing entity coreference and discourse relations, thereby enhancing coherence evaluation. However, many existing methods heavily depend on static embeddings or focus narrowly on nearby context, constraining their capacity to measure the overarching coherence of long texts. In this paper, we posit that coherent texts inherently manifest a sequential and cohesive interplay among sentences, effectively conveying the central theme, purpose, or standpoint. To explore this abstract relationship, we introduce the "BBScore," a novel reference-free metric grounded in Brownian bridge theory for assessing text coherence. Our findings showcase that when synergized with a simple additional classification component, this metric attains a performance level comparable to state-of-the-art techniques on standard artificial discrimination tasks. We also establish in downstream tasks that this metric effectively differentiates between human-written documents and text generated by large language models under a specific domain. Furthermore, we illustrate the efficacy of this approach in detecting written styles attributed to diverse large language models, underscoring its potential for generalizability. In summary, we present a novel Brownian bridge coherence metric capable of measuring both local and global text coherence, while circumventing the need for end-to-end model training. This flexibility allows for its application in various downstream tasks.