cs.CV(2025-11-25)

📊 共 47 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱三:空间感知 (Perception & SLAM) (21 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (13 🔗1) 支柱一:机器人控制 (Robot Control) (10 🔗2) 支柱四:生成式动作 (Generative Motion) (1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱六:视频提取与匹配 (Video Extraction & Matching) (1 🔗1)

🔬 支柱三:空间感知 (Perception & SLAM) (21 篇)

#题目一句话要点标签🔗
1 $Δ$-NeRF: Incremental Refinement of Neural Radiance Fields through Residual Control and Knowledge Transfer 提出$Δ$-NeRF,通过残差控制和知识迁移实现神经辐射场的增量优化,适用于卫星图像等序列数据场景。 NeRF neural radiance novel view synthesis
2 DeLightMono: Enhancing Self-Supervised Monocular Depth Estimation in Endoscopy by Decoupling Uneven Illumination DeLightMono:通过解耦不均匀光照增强内窥镜自监督单目深度估计 depth estimation monocular depth navigation
3 Redefining Radar Segmentation: Simultaneous Static-Moving Segmentation and Ego-Motion Estimation using Radar Point Clouds 提出基于雷达点云的静态-动态分割与自运动估计同步方法 point cloud ego-motion
4 3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding 提出基于跨视角相关性的3D感知多任务学习,用于密集场景理解 depth estimation scene understanding geometric consistency
5 Material-informed Gaussian Splatting for 3D World Reconstruction in a Digital Twin 提出基于材质信息的3D高斯溅射方法,用于数字孪生中的三维世界重建 3D gaussian splatting gaussian splatting point cloud
6 VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction VGGT4D:挖掘视觉几何Transformer中的运动线索,用于4D场景重建 scene reconstruction pose estimation VGGT
7 ACIT: Attention-Guided Cross-Modal Interaction Transformer for Pedestrian Crossing Intention Prediction 提出ACIT模型,利用注意力机制和跨模态交互Transformer提升行人过街意图预测精度。 optical flow interaction transformer
8 MODEST: Multi-Optics Depth-of-Field Stereo Dataset MODEST:多光圈景深立体视觉数据集,弥合真实光学与合成数据差距 depth estimation stereo depth novel view synthesis
9 Conceptual Evaluation of Deep Visual Stereo Odometry for the MARWIN Radiation Monitoring Robot in Accelerator Tunnels 探索深度视觉立体里程计在加速器隧道辐射监测机器人中的应用 optical flow ego-motion localization
10 FLaTEC: Frequency-Disentangled Latent Triplanes for Efficient Compression of LiDAR Point Clouds FLaTEC:提出频率解耦的隐式三平面表示,高效压缩LiDAR点云。 point cloud
11 ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding ReDirector:利用旋转相机编码生成任意长度的视频重拍 localization geometric consistency
12 VGGTFace: Topologically Consistent Facial Geometry Reconstruction in the Wild VGGTFace:利用3D基础模型实现拓扑一致的人脸几何重建 point cloud VGGT
13 AMB3R: Accurate Feed-forward Metric-scale 3D Reconstruction with Backend AMB3R:利用紧凑体素后端实现精确的度量尺度三维重建 visual odometry SLAM
14 STAvatar: Soft Binding and Temporal Density Control for Monocular 3D Head Avatars Reconstruction STAvatar:提出软绑定与时序密度控制的单目3D头部Avatar重建方法 3D gaussian splatting gaussian splatting
15 Estimating Fog Parameters from a Sequence of Stereo Images 提出一种基于立体图像序列的雾参数动态估计方法,适用于视觉SLAM和里程计系统。 SLAM
16 Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos 提出Mistake Attribution (MATT)任务,用于细粒度理解以自我为中心的视频中的人类错误。 localization
17 Zoo3D: Zero-Shot 3D Object Detection at Scene Level Zoo3D:提出一种场景级零样本3D目标检测框架,无需训练即可实现SOTA性能。 point cloud
18 Explainable Visual Anomaly Detection via Concept Bottleneck Models 提出基于概念瓶颈模型的可解释视觉异常检测方法CONVAD localization
19 Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention 提出视觉引导注意力机制(VGA),缓解多模态大语言模型中的幻觉问题 localization
20 Foundry: Distilling 3D Foundation Models for the Edge Foundry:边缘设备3D基础模型蒸馏,保持通用性的同时实现高效压缩 point cloud
21 Multi-Context Fusion Transformer for Pedestrian Crossing Intention Prediction in Urban Environments 提出多上下文融合Transformer(MFT)用于城市环境中行人意图预测。 localization

🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)

#题目一句话要点标签🔗
22 DAPointMamba: Domain Adaptive Point Mamba for Point Cloud Completion DAPointMamba:面向点云补全的领域自适应Point Mamba模型 Mamba SSM state space model
23 MFM-point: Multi-scale Flow Matching for Point Cloud Generation MFM-Point:多尺度流匹配点云生成方法,提升点云生成质量与可扩展性。 flow matching point cloud
24 AD-R1: Closed-Loop Reinforcement Learning for End-to-End Autonomous Driving with Impartial World Models AD-R1:基于公正世界模型的端到端自动驾驶闭环强化学习 reinforcement learning world model
25 BRIC: Bridging Kinematic Plans and Physical Control at Test Time BRIC:桥接运动规划与物理控制的测试时自适应框架 reinforcement learning motion generation human-scene interaction
26 ChessMamba: Structure-Aware Interleaving of State Spaces for Change Detection in Remote Sensing Images ChessMamba:一种结构感知的状态空间交错方法,用于遥感图像变化检测 Mamba localization
27 Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning Flash-DMD:通过高效蒸馏与联合强化学习实现高保真快速图像生成 reinforcement learning flow matching
28 WPT: World-to-Policy Transfer via Online World Model Distillation 提出WPT:通过在线世界模型蒸馏实现世界到策略的迁移,提升规划性能。 imitation learning world model
29 MambaEye: A Size-Agnostic Visual Encoder with Causal Sequential Processing MambaEye:基于因果序列处理的尺寸无关视觉编码器 Mamba state space model
30 MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts MajutsuCity:提出语言驱动的美学自适应城市生成框架,可控3D资产与布局。 world model dreamer height map
31 DRL-Guided Neural Batch Sampling for Semi-Supervised Pixel-Level Anomaly Detection 提出基于DRL引导的神经批量采样半监督像素级异常检测方法 reinforcement learning deep reinforcement learning localization
32 Hybrid Convolution and Frequency State Space Network for Image Compression 提出HCFSSNet,一种混合卷积和频率状态空间网络的图像压缩方法 Mamba SSM state space model
33 One Patch is All You Need: Joint Surface Material Reconstruction and Classification from Minimal Visual Cues SMARC:仅需图像10%区域,即可实现表面材质重建与分类 masked autoencoder MAE
34 Video Object Recognition in Mobile Edge Networks: Local Tracking or Edge Detection? 针对移动边缘网络视频目标识别,提出基于深度强化学习的自适应跟踪与检测算法 reinforcement learning deep reinforcement learning

🔬 支柱一:机器人控制 (Robot Control) (10 篇)

#题目一句话要点标签🔗
35 GS-Checker: Tampering Localization for 3D Gaussian Splatting GS-Checker:提出3D高斯溅射篡改定位方法,保障3D内容安全 manipulation 3D gaussian splatting 3DGS
36 GigaWorld-0: World Models as Data Engine to Empower Embodied AI GigaWorld-0:构建世界模型作为数据引擎,赋能具身智能。 motion planning world model 3D gaussian splatting
37 From Passive Perception to Active Memory: A Weakly Supervised Image Manipulation Localization Framework Driven by Coarse-Grained Annotations 提出BoxPromptIML框架,以低成本粗略标注实现图像篡改精确定位。 manipulation localization
38 Map-World: Masked Action planning and Path-Integral World Model for Autonomous Driving 提出MAP-World,结合掩码动作规划与路径积分世界模型,实现自动驾驶多模态运动规划。 motion planning reinforcement learning world model
39 Wanderland: Geometrically Grounded Simulation for Open-World Embodied AI Wanderland:面向开放世界具身AI的几何校准仿真框架 sim-to-real policy learning 3DGS
40 DinoLizer: Learning from the Best for Generative Inpainting Localization DinoLizer:利用DINOv2学习生成式图像修复篡改区域的定位 manipulation localization
41 Thinking in 360°: Humanoid Visual Search in the Wild 提出H* Bench基准,研究具身智能体在360°全景图像中的视觉搜索能力。 humanoid
42 Boosting Reasoning in Large Multimodal Models via Activation Replay 提出Activation Replay,通过激活重放提升大型多模态模型推理能力,无需额外训练。 manipulation reinforcement learning
43 Evaluating the Performance of Deep Learning Models in Whole-body Dynamic 3D Posture Prediction During Load-reaching Activities 提出基于Transformer的深度学习模型,用于预测负重活动中全身动态3D姿态。 gait
44 V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs V-Attack通过操控解耦的Value特征,实现对LVLM的可控对抗攻击。 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
45 Learning to Generate Human-Human-Object Interactions from Textual Descriptions 提出HHOI生成框架,从文本描述生成人-人-物交互场景,并构建了相关数据集。 motion generation human-object interaction HOI

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
46 Motion Marionette: Rethinking Rigid Motion Transfer via Prior Guidance 提出Motion Marionette以解决刚性运动转移问题 motion transfer

🔬 支柱六:视频提取与匹配 (Video Extraction & Matching) (1 篇)

#题目一句话要点标签🔗
47 SKEL-CF: Coarse-to-Fine Biomechanical Skeleton and Surface Mesh Recovery 提出SKEL-CF框架,用于从图像中恢复生物力学骨骼和表面网格,提升人体运动分析的真实性。 SMPL

⬅️ 返回 cs.CV 首页 · 🏠 返回主页