| 1 |
Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios |
提出面向图像的自适应数据集构建方法,应对真实世界多模态安全场景挑战 |
large language model multimodal |
|
|
| 2 |
Skywork UniPic 2.0: Building Kontext Model with Online RL for Unified Multimodal Model |
UniPic 2.0:通过在线强化学习构建Kontext模型,实现统一多模态图像生成与编辑 |
multimodal instruction following |
|
|
| 3 |
TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection |
提出TRUST-VL,一个可解释的多模态新闻助手,用于检测通用多模态虚假信息。 |
multimodal |
|
|
| 4 |
SliceSemOcc: Vertical Slice Based Multimodal 3D Semantic Occupancy Representation |
提出SliceSemOcc以解决3D语义占用预测中的高度信息不足问题 |
multimodal |
|
|
| 5 |
Promptception: How Sensitive Are Large Multimodal Models to Prompts? |
Promptception框架揭示多模态大模型对提示词的敏感性,并提出优化原则。 |
multimodal |
|
|
| 6 |
Multimodal Feature Fusion Network with Text Difference Enhancement for Remote Sensing Change Detection |
提出MMChange,一种融合图像与文本差异增强的多模态遥感变化检测网络。 |
multimodal |
✅ |
|
| 7 |
A Generative Foundation Model for Chest Radiography |
ChexGen:用于胸部X光片的生成式基础模型,提升医疗AI性能与公平性 |
foundation model |
|
|
| 8 |
Efficient Odd-One-Out Anomaly Detection |
提出一种高效的基于DINO的奇数项异常检测模型,在保持性能的同时显著降低参数量和训练时间。 |
large language model multimodal |
✅ |
|
| 9 |
ANTS: Adaptive Negative Textual Space Shaping for OOD Detection via Test-Time MLLM Understanding and Reasoning |
提出ANTS:利用MLLM理解和推理自适应地塑造负文本空间,提升OOD检测性能。 |
large language model multimodal |
|
|
| 10 |
VisioFirm: Cross-Platform AI-assisted Annotation Tool for Computer Vision |
提出VisioFirm以解决计算机视觉标注效率低下问题 |
foundation model |
✅ |
|
| 11 |
SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation |
SPECS:用于长图像描述评估的特异性增强CLIP-Score |
large language model |
✅ |
|
| 12 |
Visible Yet Unreadable: A Systematic Blind Spot of Vision Language Models Across Writing Systems |
揭示视觉语言模型在跨书写系统中的盲点:对可见但不可读文本的脆弱性 |
multimodal |
|
|