| 1 |
EgoPoseFormer v2: Accurate Egocentric Human Motion Estimation for AR/VR |
EgoPoseFormer v2:用于AR/VR的精准第一人称视角人体运动估计 |
teacher-student distillation egocentric |
|
|
| 2 |
PROSPECT: Unified Streaming Vision-Language Navigation via Semantic--Spatial Fusion and Latent Predictive Representation |
PROSPECT:通过语义-空间融合和潜在预测表征实现统一的流式视觉-语言导航 |
predictive model representation learning vision-language-action |
|
|
| 3 |
Scaling Dense Event-Stream Pretraining from Visual Foundation Models |
提出一种基于视觉基础模型的事件流预训练方法,解决事件表示的语义坍塌问题。 |
distillation foundation model |
|
|
| 4 |
Cross-Modal Mapping and Dual-Branch Reconstruction for 2D-3D Multimodal Industrial Anomaly Detection |
提出CMDR-IAD以解决多模态工业异常检测问题 |
teacher-student multimodal |
✅ |
|
| 5 |
From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning |
提出AVAR框架,解决多模态大模型冷启动阶段的注意力分配问题,显著提升推理性能。 |
reward shaping multimodal |
✅ |
|
| 6 |
CoRe-BT: A Multimodal Radiology-Pathology-Text Benchmark for Robust Brain Tumor Typing |
CoRe-BT:用于鲁棒性脑肿瘤分型的多模态放射-病理-文本基准数据集 |
representation learning multimodal |
|
|
| 7 |
Real Eyes Realize Faster: Gaze Stability and Pupil Novelty for Efficient Egocentric Learning |
提出基于注视稳定性和瞳孔新颖性的双重标准框架策展方法,用于高效的以自我为中心的学习。 |
imitation learning egocentric |
|
|
| 8 |
Discriminative Perception via Anchored Description for Reasoning Segmentation |
提出DPAD,通过锚定描述实现判别感知,提升推理分割性能。 |
reinforcement learning large language model multimodal |
✅ |
|
| 9 |
Separators in Enhancing Autoregressive Pretraining for Vision Mamba |
提出STAR,通过分隔符增强Vision Mamba的自回归预训练,提升长序列处理能力。 |
Mamba state space model |
|
|
| 10 |
TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning |
TaxonRL:利用强化学习与中间奖励实现可解释的细粒度视觉推理 |
reinforcement learning |
|
|
| 11 |
DiverseDiT: Towards Diverse Representation Learning in Diffusion Transformers |
DiverseDiT:通过扩散Transformer中的多样性表示学习提升图像合成质量。 |
representation learning |
|
|
| 12 |
UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization |
UniRain:提出基于RAG的数据集蒸馏和多目标重加权优化的统一图像去雨框架 |
distillation |
|
|
| 13 |
Vector-Quantized Soft Label Compression for Dataset Distillation |
提出基于向量量化自编码器的软标签压缩方法,用于加速数据集蒸馏并降低存储开销。 |
distillation |
|
|