| 1 |
Measuring Epistemic Humility in Multimodal Large Language Models |
HumbleBench:评估多模态大语言模型认知谦逊性的新基准 |
large language model multimodal |
✅ |
|
| 2 |
SQAP-VLA: A Synergistic Quantization-Aware Pruning Framework for High-Performance Vision-Language-Action Models |
提出SQAP-VLA框架,协同量化与剪枝加速高性能视觉-语言-动作模型推理。 |
vision-language-action VLA |
|
|
| 3 |
Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization |
提出MatCha:材料表征多模态基准,评估MLLM在材料科学图像理解能力 |
large language model multimodal chain-of-thought |
✅ |
|
| 4 |
Visual Grounding from Event Cameras |
提出Talk2Event,首个基于事件相机的语言驱动物体定位大规模基准 |
multimodal visual grounding |
|
|
| 5 |
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis |
Kling-Avatar:通过多模态指令驱动的级联式长时程虚拟形象动画合成 |
large language model multimodal |
|
|
| 6 |
Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis |
提出MMOral:用于全景X光分析的多模态基准和指令数据集,并构建OralGPT模型。 |
multimodal instruction following |
✅ |
|
| 7 |
PeftCD: Leveraging Vision Foundation Models with Parameter-Efficient Fine-Tuning for Remote Sensing Change Detection |
PeftCD:利用参数高效微调的视觉基础模型进行遥感变化检测 |
foundation model |
✅ |
|
| 8 |
Modality-Agnostic Input Channels Enable Segmentation of Brain lesions in Multimodal MRI with Sequences Unavailable During Training |
提出模态无关输入通道的U-Net,实现多模态脑部MRI病灶分割,无需训练时可见序列 |
multimodal |
✅ |
|
| 9 |
VQualA 2025 Challenge on Visual Quality Comparison for Large Multimodal Models: Methods and Results |
VQualA 2025挑战赛:评估并提升大型多模态模型在视觉质量比较方面的能力 |
multimodal |
|
|
| 10 |
DATE: Dynamic Absolute Time Enhancement for Long Video Understanding |
提出DATE框架,通过动态绝对时间增强提升MLLM在长视频理解中的时序推理能力。 |
large language model multimodal |
|
|
| 11 |
Video Understanding by Design: How Datasets Shape Architectures and Insights |
从数据集视角解读视频理解:揭示数据集如何塑造模型架构与洞见 |
foundation model multimodal |
|
|
| 12 |
DGFusion: Depth-Guided Sensor Fusion for Robust Semantic Perception |
提出DGFusion,利用深度信息引导传感器融合,提升语义感知鲁棒性 |
multimodal |
✅ |
|
| 13 |
Fine-Grained Customized Fashion Design with Image-into-Prompt benchmark and dataset from LMM |
提出基于LMM的图像到提示微调服装设计框架,解决文本描述不确定性问题 |
multimodal |
✅ |
|