| 17 |
OmniMotion: Multimodal Motion Generation with Continuous Masked Autoregression |
OmniMotion:提出连续掩码自回归Transformer,用于多模态全身人体运动生成。 |
linear attention text-to-motion motion generation |
|
|
| 18 |
WeCKD: Weakly-supervised Chained Distillation Network for Efficient Multimodal Medical Imaging |
提出WeCKD:一种弱监督链式蒸馏网络,用于高效多模态医学影像分析。 |
teacher-student distillation multimodal |
|
|
| 19 |
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering |
提出Wiki-PRF框架,解决知识库VQA中多模态查询质量和检索结果相关性问题 |
reinforcement learning multimodal |
✅ |
|
| 20 |
Capturing Context-Aware Route Choice Semantics for Trajectory Representation Learning |
提出CORE框架,融合上下文感知的路径选择语义,提升轨迹表示学习效果 |
representation learning spatiotemporal large language model |
✅ |
|
| 21 |
Directional Reasoning Injection for Fine-Tuning MLLMs |
提出DRIFT,通过梯度空间注入方向性推理知识,高效微调多模态大语言模型 |
reinforcement learning large language model multimodal |
|
|
| 22 |
Composition-Grounded Instruction Synthesis for Visual Reasoning |
提出COGS框架以提升多模态大语言模型的推理能力 |
reinforcement learning large language model multimodal |
|
|
| 23 |
Spatial Preference Rewarding for MLLMs Spatial Understanding |
提出空间偏好奖励SPR,提升MLLM在细粒度空间理解上的能力 |
direct preference optimization large language model multimodal |
✅ |
|
| 24 |
RealDPO: Real or Not Real, that is the Preference |
RealDPO:利用真实数据偏好学习,提升视频生成模型运动真实性 |
preference learning DPO direct preference optimization |
|
|
| 25 |
Terra: Explorable Native 3D World Model with Point Latents |
Terra:基于点潜变量的可探索原生3D世界模型 |
flow matching world model |
|
|
| 26 |
DRBD-Mamba for Robust and Efficient Brain Tumor Segmentation with Analytical Insights |
提出DRBD-Mamba模型,用于鲁棒高效的脑肿瘤分割,并提供分析性见解 |
Mamba state space model |
|
|
| 27 |
Generalized Dynamics Generation towards Scannable Physical World Model |
GDGen:基于势能的通用动力学生成框架,用于可扫描物理世界建模 |
world model |
|
|
| 28 |
Vision Mamba for Permeability Prediction of Porous Media |
提出基于Vision Mamba的多孔介质渗透率预测模型,提升计算效率和内存利用率。 |
Mamba |
|
|
| 29 |
Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning |
提出Identity-GRPO,通过强化学习优化多人视频生成中的身份保持问题。 |
reinforcement learning |
|
|
| 30 |
Multi-modal video data-pipelines for machine learning with minimal human supervision |
提出一种基于弱监督多模态视频数据管道的机器学习方法 |
MAE depth estimation |
|
|
| 31 |
Decorrelation Speeds Up Vision Transformers |
提出DBP-MAE加速ViT预训练,降低计算成本和碳排放,提升下游任务性能。 |
masked autoencoder MAE |
|
|