| 16 |
LaST-VLA: Thinking in Latent Spatio-Temporal Space for Vision-Language-Action in Autonomous Driving |
提出LaST-VLA,通过潜在时空推理解决自动驾驶中视觉-语言-动作模型的语义解耦问题。 |
reinforcement learning world model vision-language-action |
|
|
| 17 |
From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video Agents |
提出MM-Mem,通过语义信息瓶颈蒸馏金字塔式多模态记忆,解决长时域视频Agent问题。 |
distillation large language model multimodal |
✅ |
|
| 18 |
Sketch2Colab: Sketch-Conditioned Multi-Human Animation via Controllable Flow Distillation |
Sketch2Colab:通过可控流蒸馏实现草图驱动的多人动画生成 |
distillation physically plausible human motion |
|
|
| 19 |
Generative Visual Chain-of-Thought for Image Editing |
提出生成式视觉思维链(GVCoT)框架,用于解决图像编辑中复杂场景下的精细化空间指令理解问题。 |
reinforcement learning chain-of-thought |
|
|
| 20 |
LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation |
LiftAvatar:通过运动空间补全实现表情控制的3D高斯头像动画 |
distillation 3D gaussian splatting gaussian splatting |
|
|
| 21 |
WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories |
WorldStereo:通过3D几何记忆桥接相机引导的视频生成与场景重建 |
world model scene reconstruction |
|
|
| 22 |
Learning Domain-Aware Task Prompt Representations for Multi-Domain All-in-One Image Restoration |
提出DATPRL-IR,解决多领域全能图像复原问题,提升泛化能力。 |
representation learning large language model multimodal |
✅ |
|
| 23 |
Preference Score Distillation: Leveraging 2D Rewards to Align Text-to-3D Generation with Human Preference |
提出Preference Score Distillation (PSD),利用2D奖励模型对齐文本到3D生成的人类偏好。 |
distillation classifier-free guidance |
|
|
| 24 |
Towards Principled Dataset Distillation: A Spectral Distribution Perspective |
提出类感知谱分布匹配(CSDM)方法,解决数据集蒸馏在长尾数据集上的性能退化问题。 |
distillation |
|
|
| 25 |
Cross-modal Identity Mapping: Minimizing Information Loss in Modality Conversion via Reinforcement Learning |
提出跨模态身份映射(CIM),通过强化学习最小化模态转换中的信息损失,提升图像描述质量。 |
reinforcement learning |
|
|
| 26 |
MixerCSeg: An Efficient Mixer Architecture for Crack Segmentation via Decoupled Mamba Attention |
MixerCSeg:通过解耦Mamba注意力机制的高效裂缝分割混合器架构 |
Mamba |
✅ |
|
| 27 |
CoopDiff: A Diffusion-Guided Approach for Cooperation under Corruptions |
CoopDiff:基于扩散模型的协同感知框架,提升在多种退化条件下的鲁棒性 |
teacher-student scene understanding |
|
|