cs.CV(2025-11-22)

📊 共 15 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知 (Perception & SLAM) (8 🔗2) 支柱一:机器人控制 (Robot Control) (3 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1)

🔬 支柱三:空间感知 (Perception & SLAM) (8 篇)

#题目一句话要点标签🔗
1 Frequency-Adaptive Sharpness Regularization for Improving 3D Gaussian Splatting Generalization 提出频率自适应锐度正则化(FASR)以提升3D高斯溅射在少样本视角合成中的泛化能力 3D gaussian splatting 3DGS gaussian splatting
2 Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion 提出基于视频扩散模型的零样本新视角合成方法,解决稀疏视角下的场景重建问题。 3D gaussian splatting gaussian splatting novel view synthesis
3 ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization 提出ARIAL框架,通过Agentic方式实现文档VQA的精确答案定位与抽取。 localization
4 Plan-X: Instruct Video Generation via Semantic Planning Plan-X通过语义规划指导视频生成,显著减少视觉幻觉并提升指令对齐。 scene understanding human-object interaction
5 Muskie: Multi-view Masked Image Modeling for 3D Vision Pre-training Muskie:面向3D视觉预训练的多视角掩码图像建模 pose estimation
6 AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens AdaPerceiver:提出首个在深度、宽度和tokens上自适应的Transformer架构。 depth estimation
7 Spotlight: Identifying and Localizing Video Generation Errors Using VLMs Spotlight:利用视觉语言模型识别和定位视频生成错误 localization
8 VK-Det: Visual Knowledge Guided Prototype Learning for Open-Vocabulary Aerial Object Detection VK-Det:视觉知识引导的原型学习用于开放词汇空中目标检测 localization

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
9 SFHand: A Streaming Framework for Language-guided 3D Hand Forecasting and Embodied Manipulation SFHand:用于语言引导的3D手部预测和具身操作的流式框架 manipulation
10 InfiniBench: Infinite Benchmarking for Visual Spatial Reasoning with Customizable Scene Complexity InfiniBench:提出可定制场景复杂度的无限视觉空间推理评测基准。 trajectory optimization
11 ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models ActDistill:面向高效VLA模型的动作引导自蒸馏框架 manipulation

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
12 CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking CADTrack:面向鲁棒RGBT跟踪,提出基于可变形对齐的上下文聚合方法 Mamba state space model localization
13 MambaTAD: When State-Space Models Meet Long-Range Temporal Action Detection MambaTAD:结合状态空间模型的长程时序动作检测方法 Mamba
14 PA-FAS: Towards Interpretable and Generalizable Multimodal Face Anti-Spoofing via Path-Augmented Reinforcement Learning 提出PA-FAS,通过路径增强强化学习提升多模态人脸反欺骗的泛化性和可解释性 reinforcement learning

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
15 Multi-speaker Attention Alignment for Multimodal Social Interaction 提出多说话人注意力对齐方法,提升MLLM在多模态社交互动中的理解能力 social interaction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页