| 10 |
BLIP3o-NEXT: Next Frontier of Native Image Generation |
BLIP3o-NEXT:原生图像生成的新前沿,统一文本到图像生成与图像编辑 |
reinforcement learning foundation model multimodal |
|
|
| 11 |
UniMedVL: Unifying Medical Multimodal Understanding And Generation Through Observation-Knowledge-Analysis |
提出UniMedVL,统一医学多模态理解与生成,提升医疗诊断应用性能。 |
curriculum learning multimodal |
✅ |
|
| 12 |
Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation |
提出CoMe:通过层拼接压缩大语言模型,在显著剪枝的同时保持性能。 |
distillation large language model |
✅ |
|
| 13 |
VM-BeautyNet: A Synergistic Ensemble of Vision Transformer and Mamba for Facial Beauty Prediction |
VM-BeautyNet:融合Vision Transformer与Mamba的面部美学预测模型 |
Mamba MAE spatial relationship |
|
|
| 14 |
Cortical-SSM: A Deep State Space Model for EEG and ECoG Motor Imagery Decoding |
提出Cortical-SSM,利用深度状态空间模型解码脑电和皮层脑电运动想象信号 |
SSM state space model |
|
|
| 15 |
StretchySnake: Flexible SSM Training Unlocks Action Recognition Across Spatio-Temporal Scales |
StretchySnake:灵活的SSM训练解锁跨时空尺度的动作识别 |
SSM state space model |
|
|
| 16 |
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset |
提出Ditto框架,通过高质量合成数据集Editto-1M,显著提升指令驱动的视频编辑能力。 |
curriculum learning instruction following |
|
|
| 17 |
Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning |
提出基于证据优先的自适应框架EARL,解决视频LLM长视频推理中信息稀释问题。 |
reinforcement learning large language model |
|
|