cs.CV(2025-11-24)

📊 共 38 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱三:空间感知 (Perception & SLAM) (21 🔗5) 支柱二:RL算法与架构 (RL & Architecture) (5) 支柱一:机器人控制 (Robot Control) (5 🔗2) 支柱五:交互与反应 (Interaction & Reaction) (3) 支柱四:生成式动作 (Generative Motion) (2) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱六:视频提取与匹配 (Video Extraction & Matching) (1)

🔬 支柱三:空间感知 (Perception & SLAM) (21 篇)

#题目一句话要点标签🔗
1 DensifyBeforehand: LiDAR-assisted Content-aware Densification for Efficient and Quality 3D Gaussian Splatting 提出LiDAR辅助的内容感知稠密化方法,提升3D高斯溅射效率与质量 depth estimation monocular depth 3D gaussian splatting
2 Neural Texture Splatting: Expressive 3D Gaussian Splatting for View Synthesis, Geometry, and Dynamic Reconstruction 提出神经纹理溅射(NTS),提升3D高斯溅射在视图合成、几何及动态重建任务上的性能。 3D gaussian splatting 3DGS gaussian splatting
3 IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes IDSplat:面向自动驾驶场景的实例分解3D高斯溅射重建 3D gaussian splatting gaussian splatting
4 NVGS: Neural Visibility for Occlusion Culling in 3D Gaussian Splatting 提出基于神经可见性的3D高斯溅射遮挡剔除方法,提升复杂场景渲染效率。 3D gaussian splatting gaussian splatting
5 MapRF: Weakly Supervised Online HD Map Construction via NeRF-Guided Self-Training MapRF:基于NeRF引导自训练的弱监督在线高清地图构建 NeRF neural radiance
6 Sphinx: Efficiently Serving Novel View Synthesis using Regression-Guided Selective Refinement Sphinx:提出一种基于回归引导选择性优化的高效新视角合成框架 novel view synthesis navigation
7 MetroGS: Efficient and Stable Reconstruction of Geometrically Accurate High-Fidelity Large-Scale Scenes MetroGS:高效稳定地重建几何精确的高保真大规模场景 3D gaussian splatting gaussian splatting scene reconstruction
8 Graph-based 3D Human Pose Estimation using WiFi Signals 提出GraphPose-Fi,利用WiFi信号和图神经网络进行3D人体姿态估计 pose estimation
9 Robust Long-term Test-Time Adaptation for 3D Human Pose Estimation through Motion Discretization 提出基于运动离散化的鲁棒长期测试时自适应3D人体姿态估计方法 pose estimation
10 Proxy-Free Gaussian Splats Deformation with Splat-Based Surface Estimation 提出无代理高斯斑点变形方法以解决表面信息捕捉不足问题 NeRF point cloud
11 The Determinant Ratio Matrix Approach to Solving 3D Matching and 2D Orthographic Projection Alignment Tasks 提出基于行列式比率矩阵(DRaM)的EnP和OnP问题求解方法 pose estimation 3D pose estimation
12 Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering Prune-Then-Plan:通过步级校准实现具身问答中稳定的边界探索 navigation
13 A Storage-Efficient Feature for 3D Concrete Defect Segmentation to Replace Normal Vector 提出基于相对角度的3D混凝土缺陷分割特征,实现存储效率提升。 point cloud
14 IndEgo: A Dataset of Industrial Scenarios and Collaborative Work for Egocentric Assistants IndEgo:用于第一人称视角工业助手协作任务的多模态数据集 point cloud
15 From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation 提出检索增强的时尚描述与标签生成框架,提升属性保真度和领域泛化性。 localization
16 DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection DiffSeg30k:用于AIGC精细化检测的多轮扩散编辑基准数据集 localization
17 Granular Computing-driven SAM: From Coarse-to-Fine Guidance for Prompt-Free Segmentation 提出基于粒计算的Grc-SAM,实现无提示图像分割的粗到细精度提升。 localization
18 Perceptual Taxonomy: Evaluating and Guiding Hierarchical Scene Reasoning in Vision-Language Models 提出感知分类法,用于评估和指导视觉-语言模型中的分层场景推理 scene understanding
19 TPG-INR: Target Prior-Guided Implicit 3D CT Reconstruction for Enhanced Sparse-view Imaging 提出TPG-INR以解决超稀视图下3D CT重建精度不足问题 NeRF
20 PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion PartDiffuser:通过离散扩散实现分部件的三维网格生成 point cloud
21 Understanding Task Transfer in Vision-Language Models 提出Perfection Gap Factor,系统研究视觉-语言模型中的任务迁移现象 depth estimation

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
22 FineXtrol: Controllable Motion Generation via Fine-Grained Text FineXtrol:通过细粒度文本控制实现可控的运动生成 contrastive learning text-driven motion motion generation
23 ReEXplore: Improving MLLMs for Embodied Exploration with Contextualized Retrospective Experience Replay ReEXplore:利用情境化回顾经验回放改进MLLM在具身探索中的性能 reinforcement learning imitation learning navigation
24 VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning 提出VideoChat-M1,通过多智能体强化学习实现视频理解的协同策略规划。 reinforcement learning
25 Fewer Tokens, Greater Scaling: Self-Adaptive Visual Bases for Efficient and Expansive Representation Learning 提出自适应视觉基,减少视觉Token数量,提升视觉表征学习的效率和可扩展性 representation learning
26 MagicWorld: Interactive Geometry-driven Video World Exploration MagicWorld:提出几何引导的交互式视频世界探索模型,提升场景稳定性和连续性。 world model point cloud

🔬 支柱一:机器人控制 (Robot Control) (5 篇)

#题目一句话要点标签🔗
27 AuViRe: Audio-visual Speech Representation Reconstruction for Deepfake Temporal Localization 提出AuViRe,通过音视频语音表征重建实现Deepfake视频的时间定位 manipulation localization
28 UMCL: Unimodal-generated Multimodal Contrastive Learning for Cross-compression-rate Deepfake Detection 提出UMCL框架,通过单模态生成多模态对比学习,解决跨压缩率深度伪造检测难题。 manipulation contrastive learning
29 GuideFlow: Constraint-Guided Flow Matching for Planning in End-to-End Autonomous Driving GuideFlow:一种约束引导的Flow Matching方法,用于端到端自动驾驶规划。 manipulation flow matching
30 LAA3D: A Benchmark of Detecting and Tracking Low-Altitude Aircraft in 3D Space LAA3D:构建低空飞行器三维感知基准数据集与单目3D检测基线。 sim-to-real pose estimation localization
31 VDC-Agent: When Video Detailed Captioners Evolve Themselves via Agentic Self-Reflection VDC-Agent:通过Agent自反思进化视频详细描述模型,无需人工标注和大型教师模型。 running

🔬 支柱五:交互与反应 (Interaction & Reaction) (3 篇)

#题目一句话要点标签🔗
32 Mitigating Long-Tail Bias in HOI Detection via Adaptive Diversity Cache 提出自适应多样性缓存模块,无需额外训练即可缓解HOI检测中的长尾偏差。 human-object interaction HOI
33 SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis SyncMV4D:同步多视角联合扩散生成手-物交互视频与4D运动 HOI
34 Peregrine: One-Shot Fine-Tuning for FHE Inference of General Deep CNNs Peregrine:用于通用深度CNN的FHE推理的单次微调方法 OMOMO

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
35 ReAlign: Text-to-Motion Generation via Step-Aware Reward-Guided Alignment 提出ReAlign,通过步进式奖励引导对齐实现高质量文本到动作生成 text-to-motion motion generation
36 Rethinking Garment Conditioning in Diffusion-based Virtual Try-On 提出Re-CatVTON,高效单UNet扩散模型实现高性能虚拟试穿 classifier-free guidance

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
37 Unsupervised Multi-View Visual Anomaly Detection via Progressive Homography-Guided Alignment 提出ViewSense-AD,通过同构变换引导对齐实现无监督多视角异常检测。 geometric consistency

🔬 支柱六:视频提取与匹配 (Video Extraction & Matching) (1 篇)

#题目一句话要点标签🔗
38 MonoMSK: Monocular 3D Musculoskeletal Dynamics Estimation MonoMSK:单目视频中基于物理的3D人体骨骼肌肉动力学估计 SMPL

⬅️ 返回 cs.CV 首页 · 🏠 返回主页