| 15 |
A Reasoning-Enabled Vision-Language Foundation Model for Chest X-ray Interpretation |
CheXOne:一种用于胸部X光片解释的、具备推理能力的视觉-语言基础模型 |
reinforcement learning foundation model visual grounding |
|
|
| 16 |
LinguDistill: Recovering Linguistic Ability in Vision- Language Models via Selective Cross-Modal Distillation |
LinguDistill:通过选择性跨模态蒸馏恢复视觉-语言模型中的语言能力 |
distillation multimodal visual grounding |
|
|
| 17 |
STAR: Mitigating Cascading Errors in Spatial Reasoning via Turn-point Alignment and Segment-level DPO |
STAR:通过转折点对齐和分段DPO缓解空间推理中的级联错误 |
DPO direct preference optimization large language model |
|
|
| 18 |
Query-Conditioned Evidential Keyframe Sampling for MLLM-Based Long-Form Video Understanding |
提出查询条件证据关键帧采样方法,提升MLLM长视频理解性能 |
reinforcement learning large language model multimodal |
|
|
| 19 |
KG-CMI: Knowledge graph enhanced cross-Mamba interaction for medical visual question answering |
提出KG-CMI框架,利用知识图谱增强跨模态交互,提升医疗视觉问答性能。 |
Mamba multimodal |
|
|
| 20 |
MATHENA: Mamba-based Architectural Tooth Hierarchical Estimator and Holistic Evaluation Network for Anatomy |
MATHENA:基于Mamba的牙齿解剖结构分层估计与整体评估网络 |
Mamba SSM state space model |
|
|
| 21 |
Mine-JEPA: In-Domain Self-Supervised Learning for Mine-Like Object Classification in Side-Scan Sonar |
Mine-JEPA:用于侧扫声呐水雷目标分类的域内自监督学习 |
JEPA foundation model |
|
|
| 22 |
AceTone: Bridging Words and Colors for Conditional Image Grading |
AceTone:提出一种多模态条件下的图像色彩分级方法,弥合文本与色彩之间的鸿沟。 |
reinforcement learning VQ-VAE multimodal |
|
|
| 23 |
MAESIL: Masked Autoencoder for Enhanced Self-supervised Medical Image Learning |
MAESIL:一种用于增强自监督医学图像学习的掩码自编码器 |
masked autoencoder VQ-VAE |
|
|
| 24 |
VADMamba++: Efficient Video Anomaly Detection via Hybrid Modeling in Grayscale Space |
VADMamba++:基于灰度空间混合建模的高效视频异常检测 |
Mamba optical flow |
|
|
| 25 |
TTA-Vid: Generalized Test-Time Adaptation for Video Reasoning |
提出TTA-Vid,一种用于视频推理的通用测试时自适应方法 |
reinforcement learning multimodal |
|
|
| 26 |
Learnability-Guided Diffusion for Dataset Distillation |
提出学习可指导的扩散方法以解决数据集蒸馏冗余问题 |
distillation |
✅ |
|
| 27 |
FreqPhys: Repurposing Implicit Physiological Frequency Prior for Robust Remote Photoplethysmography |
FreqPhys:利用生理频率先验知识增强鲁棒性远程光电容积脉搏波信号提取 |
representation learning PULSE |
|
|
| 28 |
A Benchmark of State-Space Models vs. Transformers and BiLSTM-based Models for Historical Newspaper OCR |
提出基于状态空间模型的OCR架构,在历史报纸识别中实现效率与精度平衡。 |
Mamba SSM |
|
|