cs.CV(2025-11-26)

📊 共 28 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知 (Perception & SLAM) (10 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (8 🔗1) 支柱一:机器人控制 (Robot Control) (5) 支柱四:生成式动作 (Generative Motion) (3) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱三:空间感知 (Perception & SLAM) (10 篇)

#题目一句话要点标签🔗
1 Multi-modal On-Device Learning for Monocular Depth Estimation on Ultra-low-power MCUs 提出一种多模态片上学习方法,用于超低功耗MCU上的单目深度估计。 depth estimation monocular depth
2 Unlocking Zero-shot Potential of Semi-dense Image Matching via Gaussian Splatting MatchGS:利用高斯溅射解锁半稠密图像匹配的零样本潜力 3D gaussian splatting 3DGS gaussian splatting
3 HTTM: Head-wise Temporal Token Merging for Faster VGGT 提出头部分时序Token合并(HTTM)加速VGGT,用于快速3D场景重建 scene reconstruction VGGT
4 PathReasoning: A multimodal reasoning agent for query-based ROI navigation on whole-slide images PathReasoning:一种用于全切片图像上基于查询的ROI导航的多模态推理Agent navigation
5 SurgMLLMBench: A Multimodal Large Language Model Benchmark Dataset for Surgical Scene Understanding SurgMLLMBench:用于手术场景理解的多模态大语言模型基准数据集 scene understanding
6 Endo-G$^{2}$T: Geometry-Guided & Temporally Aware Time-Embedded 4DGS For Endoscopic Scenes Endo-G²T:针对内窥镜场景,提出几何引导和时序感知的时序嵌入4D高斯溅射方法 monocular depth gaussian splatting
7 Scenes as Tokens: Multi-Scale Normal Distributions Transform Tokenizer for General 3D Vision-Language Understanding 提出NDTokenizer3D,用于通用3D视觉-语言理解的多尺度NDT Tokenizer scene understanding point cloud
8 FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain FaithFusion:提出基于像素级信息增益的3DGS-扩散融合框架,解决可控驾驶场景重建与生成问题。 3DGS scene reconstruction
9 AmodalGen3D: Generative Amodal 3D Object Reconstruction from Sparse Unposed Views 提出AmodalGen3D以解决稀疏视角下的3D物体重建问题 scene reconstruction
10 MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training MoGAN:通过少量步数的运动对抗后训练提升视频扩散模型的运动质量 optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
11 Scaling Foundation Models for Radar Scene Understanding 提出RadarFM雷达基础模型,通过结构化空间语言监督实现场景理解。 contrastive learning scene understanding localization
12 Multimodal Robust Prompt Distillation for 3D Point Cloud Models 提出多模态鲁棒Prompt蒸馏框架,提升3D点云模型在对抗攻击下的鲁棒性。 teacher-student point cloud
13 PathMamba: A Hybrid Mamba-Transformer for Topologically Coherent Road Segmentation in Satellite Imagery 提出PathMamba,用于卫星图像中拓扑连续的道路分割 Mamba state space model
14 BotaCLIP: Contrastive Learning for Botany-Aware Representation of Earth Observation Data BotaCLIP:通过对比学习实现地球观测数据的植物学感知表征 representation learning contrastive learning
15 CLRecogEye : Curriculum Learning towards exploiting convolution features for Dynamic Iris Recognition 提出CLRecogEye,利用卷积特征和课程学习提升动态虹膜识别的鲁棒性。 curriculum learning
16 FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation FlowerDance:结合MeanFlow的高效精细3D舞蹈生成方法 Mamba motion generation
17 Seeing without Pixels: Perception from Camera Trajectories 仅凭相机轨迹感知视频内容:提出CamFormer对比学习框架 contrastive learning pose estimation
18 E-M3RF: An Equivariant Multimodal 3D Re-assembly Framework 提出E-M3RF,一种用于多模态3D重组的等变框架,提升几何重建精度。 flow matching point cloud

🔬 支柱一:机器人控制 (Robot Control) (5 篇)

#题目一句话要点标签🔗
19 PFF-Net: Patch Feature Fitting for Point Cloud Normal Estimation 提出PFF-Net,通过多尺度patch特征拟合实现鲁棒的点云法向量估计。 running point cloud
20 3-Tracer: A Tri-level Temporal-Aware Framework for Audio Forgery Detection and Localization 提出T3-Tracer,用于音频篡改检测与定位,实现帧、段、音频三层时序分析。 manipulation localization
21 Merge and Bound: Direct Manipulations on Weights for Class Incremental Learning 提出Merge-and-Bound方法,通过权重空间操作解决类增量学习中的灾难性遗忘问题 manipulation
22 When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models 提出UPA-RFAS以解决VLA模型的通用可转移攻击问题 manipulation sim-to-real
23 PAT3D: Physics-Augmented Text-to-3D Scene Generation PAT3D:首个物理增强的文本到3D场景生成框架,实现逼真、可交互的场景创建。 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (3 篇)

#题目一句话要点标签🔗
24 Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy Harmony:通过跨任务协同实现音视频生成和谐统一 classifier-free guidance
25 From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings 提出基于隐式动作原语分割的VLA预训练方法,用于工业场景 motion token
26 MUSE: Manipulating Unified Framework for Synthesizing Emotions in Images via Test-Time Optimization MUSE:提出统一框架,通过测试时优化实现图像情感的生成与编辑 motion synthesis

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
27 Privacy-Preserving Federated Vision Transformer Learning Leveraging Lightweight Homomorphic Encryption in Medical AI 提出基于同态加密的联邦Vision Transformer学习框架,保护医疗AI中的患者隐私。 OMOMO

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
28 Pygmalion Effect in Vision: Image-to-Clay Translation for Reflective Geometry Reconstruction 提出基于图像到黏土转换的Pygmalion效应,用于反射几何体重建 geometric consistency

⬅️ 返回 cs.CV 首页 · 🏠 返回主页