| 1 |
HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation |
HY-Motion 1.0:扩展Flow Matching模型至十亿参数规模,实现文本驱动的3D人体动作生成。 |
reinforcement learning flow matching text-to-motion |
|
|
| 2 |
PathFound: An Agentic Multimodal Model Activating Evidence-seeking Pathological Diagnosis |
PathFound:一种主动证据搜寻的病理诊断多模态Agent模型 |
reinforcement learning representation learning foundation model |
|
|
| 3 |
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation |
提出改进的On-Policy蒸馏方法,实现多模态交互式实时视频扩散 |
distillation multimodal |
|
|
| 4 |
ProGuard: Towards Proactive Multimodal Safeguard |
提出ProGuard,一种主动式多模态安全防护方法,用于识别和描述生成模型中的OOD安全风险。 |
reinforcement learning multimodal |
|
|
| 5 |
ThinkGen: Generalized Thinking for Visual Generation |
ThinkGen:提出基于广义思维的视觉生成框架,提升多场景适应性。 |
reinforcement learning large language model multimodal |
✅ |
|
| 6 |
GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation |
提出基于3D高斯表示的驾驶世界模型GaussianDWM,实现统一的场景理解和多模态生成。 |
world model scene understanding |
✅ |
|
| 7 |
CME-CAD: Heterogeneous Collaborative Multi-Expert Reinforcement Learning for CAD Code Generation |
提出CME-CAD异构协作多专家强化学习框架,用于高精度可编辑CAD代码生成。 |
reinforcement learning chain-of-thought |
|
|
| 8 |
GVSynergy-Det: Synergistic Gaussian-Voxel Representations for Multi-View 3D Object Detection |
GVSynergy-Det:协同高斯-体素表示用于多视角3D目标检测 |
representation learning gaussian splatting splatting |
|
|
| 9 |
Bridging Cognitive Gap: Hierarchical Description Learning for Artistic Image Aesthetics Assessment |
提出ArtQuant框架,通过层级描述学习解决艺术图像美学评估中的认知鸿沟。 |
contrastive learning multimodal |
|
|
| 10 |
Visual Language Hypothesis |
提出视觉语言假设,从结构和拓扑角度分析视觉表征学习 |
representation learning multimodal |
|
|
| 11 |
SoulX-LiveTalk Technical Report |
提出SoulX-LiveTalk框架,实现高保真实时音频驱动的数字人生成。 |
distillation spatiotemporal |
|
|