cs.CV(2025-11-20)

📊 共 40 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱三:空间感知 (Perception & SLAM) (21 🔗4) 支柱一:机器人控制 (Robot Control) (10 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (7 🔗3) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱三:空间感知 (Perception & SLAM) (21 篇)

#题目一句话要点标签🔗
1 Rad-GS: Radar-Vision Integration for 3D Gaussian Splatting SLAM in Outdoor Environments Rad-GS:用于室外环境的雷达-视觉融合3D高斯溅射SLAM SLAM 3D gaussian splatting gaussian splatting
2 Building temporally coherent 3D maps with VGGT for memory-efficient Semantic SLAM 提出基于VGGT的时序一致性3D地图构建方法,用于内存高效的语义SLAM SLAM scene understanding navigation
3 CuriGS: Curriculum-Guided Gaussian Splatting for Sparse View Synthesis CuriGS:面向稀疏视图合成的课程引导高斯溅射方法 3D gaussian splatting 3DGS gaussian splatting
4 LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM LEGO-SLAM:基于语言嵌入高斯优化的实时开放词汇SLAM系统 SLAM 3D gaussian splatting 3DGS
5 CRISTAL: Real-time Camera Registration in Static LiDAR Scans using Neural Rendering CRISTAL:利用神经渲染在静态激光雷达扫描中进行实时相机注册 SLAM point cloud localization
6 Optimizing 3D Gaussian Splattering for Mobile GPUs Texture3dgs:针对移动GPU优化的3D高斯溅射算法,提升排序效率与整体性能。 3D gaussian splatting 3DGS gaussian splatting
7 Investigating Optical Flow Computation: From Local Methods to a Multiresolution Horn-Schunck Implementation with Bilinear Interpolation 研究光流计算:从局部方法到多分辨率Horn-Schunck算法与双线性插值 optical flow
8 BoxingVI: A Multi-Modal Benchmark for Boxing Action Recognition and Localization BoxingVI:一个用于拳击动作识别与定位的多模态基准数据集 localization
9 LLaVA$^3$: Representing 3D Scenes like a Cubist Painter to Boost 3D Scene Understanding of VLMs LLaVA$^3$:借鉴立体画派,提升VLM对3D场景的理解能力 scene understanding
10 CylinderDepth: Cylindrical Spatial Attention for Multi-View Consistent Self-Supervised Surround Depth Estimation CylinderDepth:利用柱面空间注意力实现多视角一致的自监督环视深度估计 depth estimation
11 Real-Time 3D Object Detection with Inference-Aligned Learning 提出SR3D框架,通过推理对齐学习实现室内点云实时3D目标检测 scene understanding point cloud navigation
12 Clustered Error Correction with Grouped 4D Gaussian Splatting 提出基于聚类误差校正的分组4D高斯溅射方法,提升动态场景重建质量。 gaussian splatting
13 End-to-End Motion Capture from Rigid Body Markers with Geodesic Loss 提出基于刚体标记和测地线损失的端到端人体运动捕捉方法 pose estimation SMPL
14 Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling 提出Upsample Anything,一种无需训练的特征上采样通用基线方法 depth estimation gaussian splatting
15 Mesh RAG: Retrieval Augmentation for Autoregressive Mesh Generation Mesh RAG:用于自回归网格生成的检索增强框架,提升质量与速度。 point cloud
16 SAM 3: Segment Anything with Concepts SAM 3:基于概念提示的图像和视频通用分割模型 localization
17 Late-decoupled 3D Hierarchical Semantic Segmentation with Semantic Prototype Discrimination based Bi-branch Supervision 提出基于语义原型判别的解耦3D层级语义分割框架,解决跨层级冲突和类别不平衡问题。 point cloud
18 YOWO: You Only Walk Once to Jointly Map An Indoor Scene and Register Ceiling-mounted Cameras 提出YOWO,单次行走即可完成室内场景地图构建与天花板相机注册 localization
19 NaTex: Seamless Texture Generation as Latent Color Diffusion NaTex:提出一种基于潜在颜色扩散的无缝纹理生成框架,直接在3D空间预测纹理颜色。 point cloud
20 PairHuman: A High-Fidelity Photographic Dataset for Customized Dual-Person Generation 提出PairHuman数据集,用于高质量定制双人肖像生成,并提出DHumanDiff基线模型。 localization
21 Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click 提出Click2Graph,通过单次点击实现交互式全景视频场景图生成。 scene understanding

🔬 支柱一:机器人控制 (Robot Control) (10 篇)

#题目一句话要点标签🔗
22 Lite Any Stereo: Efficient Zero-Shot Stereo Matching 提出Lite Any Stereo,实现高效的零样本立体匹配深度估计 sim-to-real depth estimation stereo depth
23 Physics-Informed Machine Learning for Efficient Sim-to-Real Data Augmentation in Micro-Object Pose Estimation 提出物理信息GAN,用于微型物体位姿估计的高效Sim-to-Real数据增强 sim-to-real pose estimation
24 Mem-MLP: Real-Time 3D Human Motion Generation from Sparse Inputs Mem-MLP:基于稀疏输入的实时3D人体动作生成 running motion generation
25 SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation SceneDesigner:提出基于CNOCS Map和强化学习的两阶段训练方法,实现多物体9自由度姿态精确控制的图像生成。 manipulation reinforcement learning
26 BOP-ASK: Object-Interaction Reasoning for Vision-Language Models BOP-ASK:用于视觉-语言模型的目标交互推理数据集与基准 grasp pose estimation localization
27 EvoVLA: Self-Evolving Vision-Language-Action Model EvoVLA:一种自进化视觉-语言-动作模型,解决长时程机器人操作中的阶段幻觉问题。 manipulation sim-to-real contrastive learning
28 Physically Realistic Sequence-Level Adversarial Clothing for Robust Human-Detection Evasion 提出序列级对抗服装生成方法,提升人体检测规避在真实场景下的鲁棒性 walking manipulation
29 VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference VLA-Pruner:面向高效视觉-语言-动作推理的时序感知双层视觉Token剪枝 manipulation
30 When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models VLA-Fool:针对具身视觉-语言-动作模型的多模态对抗攻击研究 manipulation
31 Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight Mantis:一种具有解耦视觉预测的多功能视觉-语言-动作模型 manipulation

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
32 EOGS++: Earth Observation Gaussian Splatting with Internal Camera Refinement and Direct Panchromatic Rendering EOGS++:结合内部相机优化的地球观测高斯溅射,实现直接全色渲染 MAE 3D gaussian splatting gaussian splatting
33 POMA-3D: The Point Map Way to 3D Scene Understanding POMA-3D:提出基于点图的自监督3D场景理解模型,提升多项下游任务性能。 representation learning scene understanding localization
34 Simba: Towards High-Fidelity and Geometrically-Consistent Point Cloud Completion via Transformation Diffusion Simba:基于变换扩散的高保真几何一致性点云补全 Mamba point cloud
35 LiSTAR: Ray-Centric World Models for 4D LiDAR Sequences in Autonomous Driving LiSTAR:面向自动驾驶,提出基于射线中心世界模型的4D激光雷达序列生成方法 world model point cloud
36 TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding TimeViper:一种混合Mamba-Transformer视觉-语言模型,用于高效长视频理解 Mamba
37 Supervised Contrastive Learning for Few-Shot AI-Generated Image Detection and Attribution 提出基于监督对比学习的框架,用于少样本AI生成图像检测与溯源。 contrastive learning
38 VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning 提出VideoSeg-R1,首个基于强化学习的视频推理分割框架,提升复杂场景泛化性。 reinforcement learning

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
39 Motion Transfer-Enhanced StyleGAN for Generating Diverse Macaque Facial Expressions 提出基于运动迁移增强的StyleGAN,用于生成多样化的猕猴面部表情 motion transfer

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
40 Can MLLMs Read the Room? A Multimodal Benchmark for Assessing Deception in Multi-Party Social Interactions 提出MIDA基准测试,评估多模态大语言模型在多人社交互动中识别欺骗的能力。 social interaction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页