cs.CV(2026-04-08)
📊 共 37 篇论文 | 🔗 14 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (10 🔗4)
支柱三:空间感知与语义 (Perception & Semantics) (9 🔗3)
支柱九:具身大模型 (Embodied Foundation Models) (9 🔗4)
支柱八:物理动画 (Physics-based Animation) (4 🔗2)
支柱四:生成式动作 (Generative Motion) (2)
支柱一:机器人控制 (Robot Control) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
支柱七:动作重定向 (Motion Retargeting) (1 🔗1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | AnchorSplat: Feed-Forward 3D Gaussian SplattingWith 3D Geometric Priors | AnchorSplat:提出基于3D几何先验的Feed-Forward高斯溅射方法,用于场景级重建。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 12 | DOC-GS: Dual-Domain Observation and Calibration for Reliable Sparse-View Gaussian Splatting | 提出DOC-GS框架,通过双域观测与校准提升稀疏视角下高斯溅射的重建质量。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 13 | LiftFormer: Lifting and Frame Theory Based Monocular Depth Estimation Using Depth and Edge Oriented Subspace Representation | 提出基于提升理论和帧理论的LiftFormer,用于单目深度估计,提升边缘区域深度预测精度。 | depth estimation monocular depth metric depth | ||
| 14 | VGGT-SLAM++ | VGGT-SLAM++:融合VGGT几何信息的精确、高效、可扩展视觉SLAM系统 | visual odometry visual SLAM elevation map | ||
| 15 | From Blobs to Spokes: High-Fidelity Surface Reconstruction via Oriented Gaussians | 提出基于带方向高斯体的表面重建方法,解决3DGS表面提取难题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 16 | 4D Vessel Reconstruction for Benchtop Thrombectomy Analysis | 提出基于4D高斯溅射的血管重建方法,用于体外血栓切除术分析 | gaussian splatting splatting | ✅ | |
| 17 | Mem3R: Streaming 3D Reconstruction with Hybrid Memory via Test-Time Training | Mem3R:通过测试时训练和混合记忆实现流式3D重建,提升长序列一致性。 | depth estimation | ✅ | |
| 18 | Synthetic Dataset Generation for Partially Observed Indoor Objects | 提出基于Unity的虚拟扫描框架,用于生成部分观测室内物体的合成数据集。 | scene reconstruction | ||
| 19 | LiveStre4m: Feed-Forward Live Streaming of Novel Views from Unposed Multi-View Video | LiveStre4m:一种从无位姿多视角视频实时生成新视角的Feed-Forward方法 | scene reconstruction | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)
🔬 支柱八:物理动画 (Physics-based Animation) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 29 | Location Is All You Need: Continuous Spatiotemporal Neural Representations of Earth Observation Data | 提出LIANet:一种基于坐标的地球观测数据时空神经表示方法 | spatiotemporal foundation model | ✅ | |
| 30 | EventFace: Event-Based Face Recognition via Structure-Driven Spatiotemporal Modeling | EventFace:通过结构驱动的时空建模实现基于事件的人脸识别 | spatiotemporal | ||
| 31 | Fast Spatial Memory with Elastic Test-Time Training | 提出基于弹性测试时训练的快速空间记忆,用于长序列4D重建。 | spatiotemporal | ||
| 32 | Insights from Visual Cognition: Understanding Human Action Dynamics with Overall Glance and Refined Gaze Transformer | 提出OG-ReG Transformer,模拟人类视觉认知,提升视频动作理解能力 | spatiotemporal | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 33 | MoRight: Motion Control Done Right | MoRight:提出解耦运动控制框架,实现可控且因果一致的视频生成。 | physically plausible | ||
| 34 | Not all tokens contribute equally to diffusion learning | DARE:通过分布感知修正和空间集成提升扩散模型中的语义引导,优化文本到视频生成。 | classifier-free guidance |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 35 | PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing | PhyEdit:通过物理约束的图像编辑实现真实世界物体操作 | manipulation world model world models |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 36 | Improving Local Feature Matching by Entropy-inspired Scale Adaptability and Flow-endowed Local Consistency | 提出熵引导的尺度自适应和流场局部一致性方法,提升局部特征匹配性能 | feature matching |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 37 | CWRNN-INVR: A Coupled WarpRNN based Implicit Neural Video Representation | 提出基于耦合WarpRNN的隐式神经视频表示方法CWRNN-INVR,提升视频重建质量。 | motion representation | ✅ |