| 21 |
Seeing Space and Motion: Enhancing Latent Actions with Spatial and Dynamic Awareness for VLA |
提出Farsighted-LAM和SSM-VLA,增强VLA系统中潜在动作模型的空间和动态感知能力 |
SSM vision-language-action VLA |
|
|
| 22 |
Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization |
提出MISP-DPO框架以解决多模态偏好优化中的负样本选择问题 |
DPO direct preference optimization multimodal |
|
|
| 23 |
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance |
提出IMG,通过隐式多模态引导校准扩散模型,提升图文对齐精度。 |
DPO large language model multimodal |
✅ |
|
| 24 |
Generalized Contrastive Learning for Universal Multimodal Retrieval |
提出广义对比学习GCL,解决通用多模态检索中组合模态泛化性问题。 |
contrastive learning multimodal |
|
|
| 25 |
ProbMed: A Probabilistic Framework for Medical Multimodal Binding |
ProbMED:提出概率多模态融合框架,提升医学影像与文本的联合诊断能力 |
contrastive learning multimodal |
|
|
| 26 |
Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation |
提出基于知识蒸馏的后训练流程,提升小型语言模型在边缘设备上的性能。 |
distillation large language model |
|
|
| 27 |
PRPO: Paragraph-level Policy Optimization for Vision-Language Deepfake Detection |
提出PRPO算法,通过段落级策略优化提升视觉-语言大模型在Deepfake检测中的性能。 |
reinforcement learning large language model multimodal |
|
|
| 28 |
Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation |
提出自监督解剖一致性学习框架,用于视觉引导的医学报告生成。 |
contrastive learning foundation model visual grounding |
|
|
| 29 |
More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models |
揭示视觉语言模型推理的二元性,提出VAPO优化视觉感知能力 |
reinforcement learning large language model multimodal |
✅ |
|
| 30 |
Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs |
提出文本偏好优化(TPO),实现文本到图像扩散模型的免标注对齐。 |
reinforcement learning RLHF DPO |
✅ |
|
| 31 |
Dolphin v1.0 Technical Report |
Dolphin v1.0:首个大规模多模态超声影像基础模型,统一解决多种临床任务。 |
reinforcement learning foundation model multimodal |
|
|
| 32 |
Beyond Pixels: Efficient Dataset Distillation via Sparse Gaussian Representation |
提出基于稀疏高斯表示的数据集蒸馏方法GSDD,提升效率与性能。 |
distillation splatting |
✅ |
|
| 33 |
Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents |
提出Ferret-UI Lite,一个紧凑型端到端GUI智能体,用于跨平台交互。 |
reinforcement learning chain-of-thought |
|
|
| 34 |
FLOWER: A Flow-Matching Solver for Inverse Problems |
提出FLOWER,一种基于Flow-Matching的逆问题求解器 |
flow matching |
|
|
| 35 |
Generalized Fine-Grained Category Discovery with Multi-Granularity Conceptual Experts |
提出多粒度概念专家网络MGCE,解决广义细粒度类别发现问题 |
representation learning contrastive learning |
✅ |
|