TerraIncognita: A Dynamic Benchmark for Species Discovery Using Frontier Models

作者: Shivani Chiranjeevi, Hossein Zaremehrjerdi, Zi K. Deng, Talukder Z. Jubery, Ari Grele, Arti Singh, Asheesh K Singh, Soumik Sarkar, Nirav Merchant, Harold F. Greeney, Baskar Ganapathysubramanian, Chinmay Hegde

分类: cs.CV, cs.LG

发布日期: 2025-05-29

🔗 代码/项目: PROJECT_PAGE

💡 一句话要点

提出TerraIncognita以解决昆虫物种发现的挑战

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 昆虫物种发现 动态基准 多模态模型 生物多样性 开放世界场景 层次分类 模型评估

📋 核心要点

现有的昆虫物种发现方法依赖人工，速度缓慢且受限于分类学专业知识，影响保护行动的及时性。
TerraIncognita是一个动态基准，结合已知和稀有昆虫图像，旨在评估多模态模型在物种识别中的表现。
实验结果显示，顶尖模型在已知物种的分类准确率超过90%，但在物种级别的准确率降至2%以下，反映出分类难度的显著差异。

📝 摘要（中文）

全球生物多样性的快速丧失，尤其是昆虫的减少，构成了紧迫的生态危机。现有的昆虫物种发现方法依赖人工，速度缓慢，且受到分类学专业知识的限制，妨碍了及时的保护行动。本文提出了TerraIncognita，一个动态基准，旨在评估最先进的多模态模型在识别未知昆虫物种方面的能力。该基准数据集结合了专家注释的已知昆虫图像和稀有物种的图像，模拟了生态学家面临的开放世界发现场景。基准评估模型在层次分类、检测新物种样本的能力及生成与专家知识一致的解释的能力。尽管顶尖模型在已知物种的分类上表现良好，但在物种级别的准确率却显著下降，显示出从粗到细分类的挑战。

🔬 方法详解

问题定义：本文旨在解决昆虫物种发现中的识别难题，现有方法因依赖人工和专业知识而效率低下，无法及时应对生物多样性丧失的危机。

核心思路：提出TerraIncognita基准，通过结合专家注释的已知物种图像与稀有物种图像，模拟开放世界的发现场景，以评估多模态模型的识别能力。

技术框架：整体架构包括数据收集、图像标注、模型训练和评估四个主要模块。数据集涵盖了多种昆虫图像，模型通过层次分类进行训练与评估。

关键创新：最重要的创新在于动态更新的数据集和评估机制，定期扩展已知和新物种，提供长期基准测试的平台。与现有方法相比，TerraIncognita更注重开放世界场景的模拟与模型的适应性。

关键设计：在模型训练中，采用了多模态学习策略，设计了适应性损失函数以处理OOD样本，并通过专家知识生成解释，确保模型输出的可解释性。具体参数设置和网络结构细节在论文中详细描述。

📊 实验亮点

实验结果显示，顶尖模型在已知物种的分类准确率超过90%，而在物种级别的准确率降至2%以下，突显了从粗到细分类的显著挑战。这一结果为未来的研究提供了重要的基准和方向。

🎯 应用场景

该研究的潜在应用领域包括生态保护、物种监测和生物多样性研究。通过提高昆虫物种的识别效率，TerraIncognita能够为生态学家提供更及时的数据支持，促进保护措施的实施，进而对生态系统的可持续性产生积极影响。

📄 摘要（原文）

The rapid global loss of biodiversity, particularly among insects, represents an urgent ecological crisis. Current methods for insect species discovery are manual, slow, and severely constrained by taxonomic expertise, hindering timely conservation actions. We introduce TerraIncognita, a dynamic benchmark designed to evaluate state-of-the-art multimodal models for the challenging problem of identifying unknown, potentially undescribed insect species from image data. Our benchmark dataset combines a mix of expertly annotated images of insect species likely known to frontier AI models, and images of rare and poorly known species, for which few/no publicly available images exist. These images were collected from underexplored biodiversity hotspots, realistically mimicking open-world discovery scenarios faced by ecologists. The benchmark assesses models' proficiency in hierarchical taxonomic classification, their capability to detect and abstain from out-of-distribution (OOD) samples representing novel species, and their ability to generate explanations aligned with expert taxonomic knowledge. Notably, top-performing models achieve over 90\% F1 at the Order level on known species, but drop below 2\% at the Species level, highlighting the sharp difficulty gradient from coarse to fine taxonomic prediction (Order $\rightarrow$ Family $\rightarrow$ Genus $\rightarrow$ Species). TerraIncognita will be updated regularly, and by committing to quarterly dataset expansions (of both known and novel species), will provide an evolving platform for longitudinal benchmarking of frontier AI methods. All TerraIncognita data, results, and future updates are available \href{https://baskargroup.github.io/TerraIncognita/}{here}.

TerraIncognita: A Dynamic Benchmark for Species Discovery Using Frontier Models

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册