| 1 |
Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness |
提出C$^3$B漫画跨文化基准,评估多模态大语言模型的文化感知能力 |
large language model multimodal |
|
|
| 2 |
Planning with Unified Multimodal Models |
Uni-Plan:基于统一多模态模型的规划框架,提升长程决策任务性能 |
large language model multimodal |
|
|
| 3 |
DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice |
DentVLM:用于全面牙科诊断和增强临床实践的多模态视觉-语言模型 |
multimodal |
|
|
| 4 |
Decoupling Reasoning and Perception: An LLM-LMM Framework for Faithful Visual Reasoning |
提出解耦推理与感知的LLM-LMM框架,提升视觉推理的可靠性 |
large language model multimodal chain-of-thought |
|
|
| 5 |
Learning Regional Monsoon Patterns with a Multimodal Attention U-Net |
提出多模态注意力U-Net,用于印度区域高分辨率季风降雨预测。 |
multimodal |
|
|
| 6 |
TATTOO: Training-free AesTheTic-aware Outfit recOmmendation |
提出TATTOO:一种无需训练的、具有美学意识的服装搭配推荐方法 |
large language model multimodal chain-of-thought |
|
|
| 7 |
GRAPE: Let GPRO Supervise Query Rewriting by Ranking for Retrieval |
GRAPE:通过排序监督查询重写,提升检索效果 |
large language model multimodal |
✅ |
|
| 8 |
Seeing Symbols, Missing Cultures: Probing Vision-Language Models' Reasoning on Fire Imagery and Cultural Meaning |
提出火主题文化图像诊断框架,揭示视觉-语言模型在文化理解上的偏差 |
multimodal |
|
|
| 9 |
SynDoc: A Hybrid Discriminative-Generative Framework for Enhancing Synthetic Domain-Adaptive Document Key Information Extraction |
SynDoc:一种混合判别-生成框架,用于增强合成领域自适应文档关键信息提取 |
multimodal |
|
|
| 10 |
Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection |
提出基于自反思的自洽性方法,减少视觉-语言模型中的幻觉问题 |
instruction following |
|
|
| 11 |
Uncovering Intrinsic Capabilities: A Paradigm for Data Curation in Vision-Language Models |
提出能力归因数据精选框架CADC,提升视觉-语言模型指令调优效率。 |
multimodal |
|
|