| 1 |
AnatomiX, an Anatomy-Aware Grounded Multimodal Large Language Model for Chest X-Ray Interpretation |
AnatomiX:面向胸部X光片解读的解剖学感知多模态大语言模型 |
large language model multimodal |
✅ |
|
| 2 |
Text-Guided Layer Fusion Mitigates Hallucination in Multimodal LLMs |
提出TGIF:文本引导层融合缓解多模态LLM中的幻觉问题 |
large language model multimodal visual grounding |
|
|
| 3 |
Understanding Multi-Agent Reasoning with Large Language Models for Cartoon VQA |
提出多Agent LLM框架,解决卡通VQA中视觉抽象和叙事推理难题 |
large language model multimodal |
|
|
| 4 |
PrismVAU: Prompt-Refined Inference System for Multimodal Video Anomaly Understanding |
PrismVAU:用于多模态视频异常理解的Prompt优化推理系统 |
large language model multimodal |
|
|
| 5 |
A Versatile Multimodal Agent for Multimedia Content Generation |
提出一种多模态Agent,用于自动化复杂多媒体内容生成任务,提升内容创作效率。 |
multimodal |
|
|
| 6 |
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision |
UniCorn:通过自生成监督提升统一多模态模型的生成能力 |
multimodal |
|
|
| 7 |
TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors |
提出TA-Prompting,通过时序锚点增强VideoLLM在密集视频字幕生成中的时序理解能力。 |
large language model |
|
|
| 8 |
Unveiling and Bridging the Functional Perception Gap in MLLMs: Atomic Visual Alignment and Hierarchical Evaluation via PET-Bench |
PET-Bench揭示MLLM在功能影像感知上的差距,提出AVA方法提升诊断准确率。 |
large language model multimodal chain-of-thought |
✅ |
|
| 9 |
ClearAIR: A Human-Visual-Perception-Inspired All-in-One Image Restoration |
ClearAIR:受人类视觉感知启发的全能图像复原框架,有效解决现有方法过平滑和伪影问题。 |
large language model multimodal |
|
|
| 10 |
DiffBench Meets DiffAgent: End-to-End LLM-Driven Diffusion Acceleration Code Generation |
提出DiffAgent,通过LLM驱动的端到端代码生成加速扩散模型。 |
large language model |
|
|
| 11 |
Omni2Sound: Towards Unified Video-Text-to-Audio Generation |
Omni2Sound:提出统一的视频-文本-音频生成模型,解决多模态对齐和任务竞争问题 |
multimodal |
✅ |
|