| 1 |
ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model |
提出ExpVG以系统研究多模态大语言模型中的视觉定位问题 |
large language model multimodal visual grounding |
|
|
| 2 |
MDD-Net: Multimodal Depression Detection through Mutual Transformer |
提出MDD-Net以解决多模态抑郁检测问题 |
multimodal |
✅ |
|
| 3 |
Prompt-Guided Relational Reasoning for Social Behavior Understanding with Vision Foundation Models |
提出ProGraD以解决群体活动检测中的社交行为理解问题 |
foundation model |
|
|
| 4 |
CATP: Contextually Adaptive Token Pruning for Efficient and Enhanced Multimodal In-Context Learning |
提出CATP以解决多模态上下文学习中的图像令牌冗余问题 |
multimodal |
|
|
| 5 |
MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization |
提出MIMIC框架以解决视觉语言模型的可解释性问题 |
multimodal |
|
|
| 6 |
Segmenting and Understanding: Region-aware Semantic Attention for Fine-grained Image Quality Assessment with Large Language Models |
提出RSFIQA以解决无参考图像质量评估中的区域敏感性不足问题 |
large language model |
|
|
| 7 |
Towards Effective MLLM Jailbreaking Through Balanced On-Topicness and OOD-Intensity |
提出四轴评估框架与BSD策略以提升MLLM越狱效果 |
large language model multimodal |
|
|
| 8 |
Learning User Preferences for Image Generation Model |
提出基于多模态大语言模型的用户偏好学习方法以提升图像生成质量 |
large language model multimodal |
|
|
| 9 |
TBAC-UniImage: Unified Understanding and Generation by Ladder-Side Diffusion Tuning |
提出TBAC-UniImage以解决多模态理解与生成的深度整合问题 |
large language model multimodal |
|
|
| 10 |
The Escalator Problem: Identifying Implicit Motion Blindness in AI for Accessibility |
提出隐性运动失明问题以提升辅助技术的可靠性 |
large language model multimodal |
|
|
| 11 |
Re:Verse -- Can Your VLM Read a Manga? |
提出新评估框架以解决视觉语言模型在漫画叙事理解中的不足 |
multimodal |
|
|
| 12 |
MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling |
提出MAViS框架以解决长视频生成的多重挑战 |
multimodal |
|
|
| 13 |
Towards Scalable Training for Handwritten Mathematical Expression Recognition |
提出TexTeller以解决手写数学表达式识别数据稀缺问题 |
foundation model |
|
|