cs.CV(2025-05-06)
📊 共 23 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (7 🔗3)
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (5)
支柱七:动作重定向 (Motion Retargeting) (2 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning | 提出统一多模态链式思维奖励模型以提升视觉任务的准确性 | multimodal chain-of-thought | ||
| 9 | PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing | 提出PhysLLM以解决远程生理信号测量中的噪声敏感问题 | large language model | ||
| 10 | UPMAD-Net: A Brain Tumor Segmentation Network with Uncertainty Guidance and Adaptive Multimodal Feature Fusion | 提出UPMAD-Net以解决脑肿瘤分割中的不确定性与多模态特征融合问题 | multimodal | ✅ | |
| 11 | Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach | 提出能力编码方法以高效评估遥感基础模型性能 | foundation model | ✅ | |
| 12 | Deep Learning for Sports Video Event Detection: Tasks, Datasets, Methods, and Challenges | 提出深度学习框架以解决体育视频事件检测的挑战 | multimodal | ||
| 13 | Multi-Agent System for Comprehensive Soccer Understanding | 提出综合框架以解决足球理解的局限性问题 | multimodal | ||
| 14 | SD-VSum: A Method and Dataset for Script-Driven Video Summarization | 提出SD-VSum以解决脚本驱动的视频摘要问题 | multimodal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | STG: Spatiotemporal Graph Neural Network with Fusion and Spatiotemporal Decoupling Learning for Prognostic Prediction of Colorectal Cancer Liver Metastasis | 提出STG框架以解决结直肠癌肝转移预测问题 | contrastive learning spatiotemporal multimodal | ||
| 16 | Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning | 提出新方法量化与减少图像-文本表示学习中的模态差距 | representation learning multimodal | ||
| 17 | MambaStyle: Efficient StyleGAN Inversion for Real Image Editing with State-Space Models | 提出MambaStyle以解决GAN反演与编辑效率问题 | Mamba | ||
| 18 | Real-Time Person Image Synthesis Using a Flow Matching Model | 提出基于流匹配模型的实时人物图像合成方法以解决生成速度问题 | flow matching | ||
| 19 | seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models | 提出seq-JEPA以解决自监督学习中的表示灵活性问题 | world model |
🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | Fixed-Length Dense Fingerprint Representation | 提出固定长度密集指纹表示以解决指纹匹配挑战 | spatial relationship | ✅ | |
| 21 | Blending 3D Geometry and Machine Learning for Multi-View Stereopsis | 提出GC MVSNet++以解决多视图立体视觉中的几何一致性问题 | geometric consistency |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data | 提出StableMotion以解决运动捕捉数据清理问题 | motion generation |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | GUAVA: Generalizable Upper Body 3D Gaussian Avatar | 提出GUAVA框架以解决单图像重建3D人类头像问题 | SMPL-X |