cs.CV(2025-08-19)
📊 共 31 篇论文 | 🔗 8 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (10 🔗5)
支柱九:具身大模型 (Embodied Foundation Models) (7)
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2)
支柱七:动作重定向 (Motion Retargeting) (2)
支柱六:视频提取与匹配 (Video Extraction) (2 🔗1)
支柱八:物理动画 (Physics-based Animation) (2)
支柱四:生成式动作 (Generative Motion) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting | 提出GALA框架以解决开放词汇3D场景理解问题 | contrastive learning 3D gaussian splatting 3DGS | ||
| 2 | Distilled-3DGS:Distilled 3D Gaussian Splatting | 提出蒸馏3D高斯点云以解决高保真渲染的存储问题 | distillation 3D gaussian splatting 3DGS | ✅ | |
| 3 | PhysGM: Large Physical Gaussian Model for Feed-Forward 4D Synthesis | 提出PhysGM以解决物理基础4D合成中的效率与准确性问题 | DPO direct preference optimization distillation | ✅ | |
| 4 | Pixels to Play: A Foundation Model for 3D Gameplay | 提出Pixels2Play-0.1以解决3D游戏智能体行为生成问题 | behavior cloning foundation model | ||
| 5 | Structured Prompting and Multi-Agent Knowledge Distillation for Traffic Video Interpretation and Risk Inference | 提出结构化提示与多智能体知识蒸馏以解决交通视频理解问题 | distillation scene understanding chain-of-thought | ||
| 6 | Diversity-enhanced Collaborative Mamba for Semi-supervised Medical Image Segmentation | 提出Diversity-enhanced Collaborative Mamba以解决半监督医学图像分割问题 | Mamba SSM state space model | ||
| 7 | Towards Efficient Vision State Space Models via Token Merging | 提出MaMe以解决SSM模型计算效率问题 | SSM state space model | ||
| 8 | LENS: Learning to Segment Anything with Unified Reinforced Reasoning | 提出LENS框架以解决文本提示图像分割中的推理不足问题 | reinforcement learning chain-of-thought | ✅ | |
| 9 | Backdooring Self-Supervised Contrastive Learning by Noisy Alignment | 提出噪声对齐方法以解决自监督对比学习中的后门攻击问题 | contrastive learning | ✅ | |
| 10 | Multi-view Clustering via Bi-level Decoupling and Consistency Learning | 提出双层解耦与一致性学习框架以提升多视角聚类效果 | representation learning contrastive learning | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | Online 3D Gaussian Splatting Modeling with Novel View Selection | 提出在线3D高斯点云建模方法以解决场景重建不完整问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 19 | LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos | 提出LongSplat以解决长视频中的视角合成问题 | 3D gaussian splatting gaussian splatting splatting | ✅ | |
| 20 | ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving | 提出ROVR数据集以解决深度估计多样性不足问题 | depth estimation monocular depth scene understanding | ||
| 21 | EAvatar: Expression-Aware Head Avatar Reconstruction with Generative Geometry Priors | 提出EAvatar以解决高保真头部虚拟形象重建中的表情捕捉问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 22 | MR6D: Benchmarking 6D Pose Estimation for Mobile Robots | 提出MR6D数据集以解决移动机器人6D姿态估计问题 | 6D pose estimation | ✅ | |
| 23 | MF-LPR$^2$: Multi-Frame License Plate Image Restoration and Recognition using Optical Flow | 提出MF-LPR$^2$以解决低质量车牌图像恢复与识别问题 | optical flow |
🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 24 | RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation | 提出RotBench以评估多模态大语言模型的图像旋转识别能力 | spatial relationship large language model multimodal | ||
| 25 | Calibrating Biased Distribution in VFM-derived Latent Space via Cross-Domain Geometric Consistency | 提出几何知识引导的分布校准方法以解决样本偏差问题 | geometric consistency foundation model |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | RynnEC: Bringing MLLMs into Embodied World | 提出RynnEC以解决多模态大语言模型在具身认知中的应用问题 | egocentric large language model foundation model | ✅ | |
| 27 | Self-Supervised Sparse Sensor Fusion for Long Range Perception | 提出自监督稀疏传感器融合以解决长距离感知问题 | sparse sensors |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 28 | UNICON: UNIfied CONtinual Learning for Medical Foundational Models | 提出UNICON框架以解决医学基础模型的持续学习问题 | UniCon foundation model | ||
| 29 | FAMNet: Integrating 2D and 3D Features for Micro-expression Recognition via Multi-task Learning and Hierarchical Attention | 提出FAMNet以解决微表情识别中的特征提取挑战 | spatiotemporal |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 30 | VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization | 提出VisionLaw以解决物体内在动力学推断问题 | physically plausible |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 31 | EDTalk++: Full Disentanglement for Controllable Talking Head Synthesis | 提出EDTalk++以解决可控人头合成中的特征解耦问题 | manipulation |