| 15 |
Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools |
Video-STAR:利用工具增强的强化学习进行开放词汇动作识别 |
reinforcement learning open-vocabulary open vocabulary |
|
|
| 16 |
FOLK: Fast Open-Vocabulary 3D Instance Segmentation via Label-guided Knowledge Distillation |
提出FOLK,通过标签引导的知识蒸馏实现快速开放词汇3D实例分割 |
distillation open-vocabulary open vocabulary |
|
|
| 17 |
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization |
MM-HELIX:通过整体平台和自适应混合策略优化提升多模态长链反思推理能力 |
reinforcement learning large language model multimodal |
|
|
| 18 |
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning |
提出MATRIX框架,通过多模态Agent调优实现稳健的工具使用推理 |
preference learning multimodal |
✅ |
|
| 19 |
CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving |
提出CVD-STORM,利用时空重建扩散模型生成自动驾驶多视角长视频,并具备4D重建能力。 |
world model depth estimation gaussian splatting |
✅ |
|
| 20 |
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding |
提出MARC:一种基于记忆增强强化学习的视频token压缩方法,用于高效视频理解。 |
reinforcement learning distillation large language model |
|
|
| 21 |
Dream to Recall: Imagination-Guided Experience Retrieval for Memory-Persistent Vision-and-Language Navigation |
Memoir:提出基于想象引导的经验检索方法,提升记忆持久性视觉语言导航性能。 |
world model VLN language conditioned |
✅ |
|
| 22 |
Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning |
提出基于回报引导对比学习的视觉注意力机制,提升强化学习样本效率 |
reinforcement learning contrastive learning |
|
|
| 23 |
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models |
SpatialLadder:通过渐进式训练提升视觉语言模型中的空间推理能力 |
reinforcement learning multimodal |
|
|
| 24 |
VideoVerse: How Far is Your T2V Generator from a World Model? |
VideoVerse:构建更全面的文本到视频生成模型评估基准,衡量模型与世界模型的差距 |
world model |
|
|
| 25 |
SimCast: Enhancing Precipitation Nowcasting with Short-to-Long Term Knowledge Distillation |
SimCast:利用短时到长时知识蒸馏增强降水临近预报 |
distillation |
|
|
| 26 |
FlowLensing: Simulating Gravitational Lensing with Flow Matching |
FlowLensing:利用Flow Matching加速引力透镜模拟,助力暗物质研究 |
flow matching |
|
|
| 27 |
LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation |
LinVideo:一种后训练框架,实现高效视频生成中O(n)复杂度Attention |
linear attention spatiotemporal |
|
|