| 1 |
From Learning to Unlearning: Biomedical Security Protection in Multimodal Large Language Models |
提出MLLMU-Med以解决生物医学多模态大语言模型的安全问题 |
large language model multimodal |
|
|
| 2 |
Beyond the Visible: Benchmarking Occlusion Perception in Multimodal Large Language Models |
提出O-Bench以解决多模态大语言模型的遮挡感知问题 |
large language model multimodal |
|
|
| 3 |
UniFGVC: Universal Training-Free Few-Shot Fine-Grained Vision Classification via Attribute-Aware Multimodal Retrieval |
提出UniFGVC以解决少样本细粒度视觉分类问题 |
large language model multimodal chain-of-thought |
|
|
| 4 |
Revealing Temporal Label Noise in Multimodal Hateful Video Classification |
提出细粒度标签噪声分析以提升多模态仇恨视频分类准确性 |
multimodal TAMP |
✅ |
|
| 5 |
FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging |
提出FinMMR以提升金融数值推理的多模态能力 |
large language model multimodal |
|
|
| 6 |
AD-FM: Multimodal LLMs for Anomaly Detection via Multi-Stage Reasoning and Fine-Grained Reward Optimization |
提出AD-FM框架以解决多模态异常检测中的适应性问题 |
large language model multimodal |
|
|
| 7 |
Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability |
提出输入审查能力评估框架以解决多模态模型输入错误识别问题 |
large language model multimodal |
✅ |
|
| 8 |
TotalRegistrator: Towards a Lightweight Foundation Model for CT Image Registration |
提出TotalRegistrator以解决CT图像多器官配准问题 |
foundation model |
✅ |
|
| 9 |
Benchmarking Foundation Models for Mitotic Figure Classification |
提出自监督学习方法以提升有丝分裂图像分类性能 |
foundation model |
|
|
| 10 |
VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Vision Backbones |
提出VisionTS++以解决视觉模型在时间序列预测中的跨模态转移问题 |
foundation model |
✅ |
|
| 11 |
Intention Enhanced Diffusion Model for Multimodal Pedestrian Trajectory Prediction |
提出意图增强扩散模型以解决多模态行人轨迹预测问题 |
multimodal |
|
|
| 12 |
Small Lesions-aware Bidirectional Multimodal Multiscale Fusion Network for Lung Disease Classification |
提出MMCAF-Net以解决小病灶误诊问题 |
multimodal |
✅ |
|
| 13 |
SVC 2025: the First Multimodal Deception Detection Challenge |
提出SVC 2025挑战以解决多模态欺骗检测的跨域泛化问题 |
multimodal |
|
|
| 14 |
X-SAM: From Segment Anything to Any Segmentation |
提出X-SAM以解决现有图像分割模型的局限性 |
large language model multimodal |
✅ |
|
| 15 |
CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event Localization |
提出基于跨模态显著锚点的语义传播方法以解决弱监督密集音视频事件定位问题 |
multimodal TAMP |
|
|
| 16 |
Static and Plugged: Make Embodied Evaluation Simple |
提出StaticEmbodiedBench以解决现有评估方法的局限性 |
vision-language-action VLA |
|
|
| 17 |
Unlocking the Potential of MLLMs in Referring Expression Segmentation via a Light-weight Mask Decoder |
提出MLLMSeg以解决参考表达分割中的性能与成本问题 |
large language model multimodal |
✅ |
|
| 18 |
EncQA: Benchmarking Vision-Language Models on Visual Encodings for Charts |
提出EncQA基准以提升图表理解的视觉推理能力 |
multimodal |
|
|
| 19 |
Face-voice Association in Multilingual Environments (FAME) 2026 Challenge Evaluation Plan |
提出FAME挑战以解决多语言环境中的人脸与声音关联问题 |
multimodal |
|
|
| 20 |
Analyzing and Mitigating Object Hallucination: A Training Bias Perspective |
提出Obliviate以解决大视觉语言模型的物体幻觉问题 |
multimodal |
|
|
| 21 |
Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation |
提出TGS-Agent以解决音频视觉分割中的对象理解问题 |
multimodal |
✅ |
|
| 22 |
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting |
提出针对视觉语言模型的持续学习方法以解决遗忘问题 |
multimodal |
✅ |
|
| 23 |
ToxicTAGS: Decoding Toxic Memes with Rich Tag Annotations |
提出ToxicTAGS以解决有害表情包内容的标注与检测问题 |
multimodal |
|
|