Prithvi-Complimentary Adaptive Fusion Encoder (CAFE): unlocking full-potential for flood inundation mapping

作者: Saurabh Kaushik, Lalit Maurya, Beth Tellman

分类: cs.CV

发布日期: 2026-01-05

备注: Accepted at CV4EO Workshop @ WACV 2026

🔗 代码/项目: GITHUB

💡 一句话要点

提出Prithvi-CAFE以解决洪水淹没映射中的局部细节捕捉问题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 洪水映射 地理基础模型 卷积注意模块 多尺度融合 深度学习

📋 核心要点

现有的地理基础模型在洪水映射任务中未能有效捕捉局部细节，导致性能不足。
本文提出的Prithvi-CAFE通过结合预训练的Prithvi编码器和卷积注意模块，增强了模型的局部特征捕捉能力。
实验结果显示，Prithvi-CAFE在Sen1Flood11和FloodPlanet数据集上均超越了基线U-Net和其他GFMs，表现出显著的性能提升。

📝 摘要（中文）

地理基础模型（GFMs）在多种下游应用中表现出色，但在使用Sen1Flood11数据集进行洪水映射时，GFMs未能超越基线U-Net，显示出模型在捕捉关键局部细节方面的局限性。为此，本文提出了Prithvi-补充自适应融合编码器（CAFE），该模型将预训练的Prithvi GFM编码器与增强卷积注意模块（CAM）的并行CNN残差分支相结合。Prithvi-CAFE通过适配器实现快速高效的微调，并进行多尺度、多层次的特征融合，能够捕捉关键局部细节，同时保留长距离依赖关系。在Sen1Flood11和FloodPlanet两个洪水映射数据集上，我们取得了最先进的结果。

🔬 方法详解

问题定义：本文旨在解决洪水映射任务中，现有地理基础模型在捕捉局部细节方面的不足，导致其性能未能超越传统的U-Net模型。

核心思路：提出Prithvi-CAFE模型，通过将预训练的Prithvi GFM编码器与并行的CNN残差分支相结合，利用卷积注意模块增强局部特征的捕捉能力，从而提升模型的整体性能。

技术框架：Prithvi-CAFE的整体架构包括一个预训练的Prithvi编码器和一个并行的CNN残差分支，后者通过卷积注意模块进行特征融合，支持多尺度和多层次的信息整合。

关键创新：Prithvi-CAFE的主要创新在于其自适应融合机制，能够有效整合多通道和多模态数据，显著提升局部细节的捕捉能力，与传统方法相比具有本质的区别。

关键设计：在模型设计中，采用了适配器进行快速微调，损失函数和网络结构经过优化，以确保在不同数据集上的高效表现。

🖼️ 关键图片

📊 实验亮点

在Sen1Flood11测试数据上，Prithvi-CAFE的IoU达到了83.41，超越了原始Prithvi（82.50）和其他主要GFMs（如TerraMind 82.90，DOFA 81.54，spectralGPT 81.02）。在持出测试站点，Prithvi-CAFE的IoU为81.37，相较于基线U-Net（70.57）和原始Prithvi（72.42）有显著提升。在FloodPlanet数据集上，Prithvi-CAFE同样表现优异，IoU为64.70，超越了U-Net（60.14）及其他GFMs。

🎯 应用场景

该研究的潜在应用领域包括自然灾害监测、环境保护和城市规划等。通过提高洪水映射的精度，Prithvi-CAFE能够为决策者提供更可靠的数据支持，帮助制定有效的应对措施，减少洪水带来的损失。未来，该模型的设计理念也可扩展至其他地理信息系统（GIS）相关任务中。

📄 摘要（原文）

Geo-Foundation Models (GFMs), have proven effective in diverse downstream applications, including semantic segmentation, classification, and regression tasks. However, in case of flood mapping using Sen1Flood11 dataset as a downstream task, GFMs struggles to outperform the baseline U-Net, highlighting model's limitation in capturing critical local nuances. To address this, we present the Prithvi-Complementary Adaptive Fusion Encoder (CAFE), which integrate Prithvi GFM pretrained encoder with a parallel CNN residual branch enhanced by Convolutional Attention Modules (CAM). Prithvi-CAFE enables fast and efficient fine-tuning through adapters in Prithvi and performs multi-scale, multi-level fusion with CNN features, capturing critical local details while preserving long-range dependencies. We achieve state-of-the-art results on two comprehensive flood mapping datasets: Sen1Flood11 and FloodPlanet. On Sen1Flood11 test data, Prithvi-CAFE (IoU 83.41) outperforms the original Prithvi (IoU 82.50) and other major GFMs (TerraMind 82.90, DOFA 81.54, spectralGPT: 81.02). The improvement is even more pronounced on the hold-out test site, where Prithvi-CAFE achieves an IoU of 81.37 compared to the baseline U-Net (70.57) and original Prithvi (72.42). On FloodPlanet, Prithvi-CAFE also surpasses the baseline U-Net and other GFMs, achieving an IoU of 64.70 compared to U-Net (60.14), Terramind (62.33), DOFA (59.15) and Prithvi 2.0 (61.91). Our proposed simple yet effective Prithvi-CAFE demonstrates strong potential for improving segmentation tasks where multi-channel and multi-modal data provide complementary information and local details are critical. The code is released on \href{https://github.com/Sk-2103/Prithvi-CAFE}{Prithvi-CAFE Github}

Prithvi-Complimentary Adaptive Fusion Encoder (CAFE): unlocking full-potential for flood inundation mapping

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

🖼️ 关键图片

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册