cs.CV(2025-10-24)

📊 共 23 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (9 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (2) 支柱一:机器人控制 (Robot Control) (2) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
1 NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation NoisyGRPO:通过噪声注入和贝叶斯估计激励多模态CoT推理 reinforcement learning large language model multimodal
2 Foundation Models in Dermatopathology: Skin Tissue Classification 利用皮肤病理学Foundation Model进行皮肤组织分类,提升诊断效率 representation learning foundation model
3 DAP-MAE: Domain-Adaptive Point Cloud Masked Autoencoder for Effective Cross-Domain Learning DAP-MAE:领域自适应点云掩码自编码器,提升跨域学习效果 masked autoencoder MAE
4 FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning 提出FineRS,基于强化学习解决MLLM在高分辨率图像中小目标精细推理与分割难题。 reinforcement learning large language model
5 PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis PhysWorld:通过物理感知演示合成,从真实视频构建可变形对象的交互式世界模型 world model physically plausible
6 WorldGrow: Generating Infinite 3D World WorldGrow:提出无限3D世界生成框架,解决场景级生成难题 world model implicit representation foundation model
7 A Dynamic Knowledge Distillation Method Based on the Gompertz Curve 提出Gompertz-CNN,利用Gompertz曲线动态调整知识蒸馏,提升学生模型性能。 teacher-student distillation
8 Blockwise Flow Matching: Improving Flow Matching Models For Efficient High-Quality Generation 提出Blockwise Flow Matching,提升Flow Matching模型生成效率和质量。 flow matching
9 WaveSeg: Enhancing Segmentation Precision via High-Frequency Prior and Mamba-Driven Spectrum Decomposition WaveSeg:利用高频先验和Mamba驱动的频谱分解增强分割精度 Mamba

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
10 PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments 提出PhysVLM-AVR以解决动态环境中的视觉推理问题 large language model multimodal chain-of-thought
11 KBE-DME: Dynamic Multimodal Evaluation via Knowledge Enhanced Benchmark Evolution 提出KBE,通过知识增强基准演化实现多模态大模型的动态评估 large language model multimodal
12 Head Pursuit: Probing Attention Specialization in Multimodal Transformers 提出一种基于信号处理的注意力头分析方法,用于理解和编辑多模态Transformer模型。 multimodal
13 MoniTor: Exploiting Large Language Models with Instruction for Online Video Anomaly Detection MoniTor:利用指令驱动的大语言模型进行在线视频异常检测。 large language model
14 Towards Physics-informed Spatial Intelligence with Human Priors: An Autonomous Driving Pilot Study 提出SIG结构化空间智能网格,提升自动驾驶场景下多模态大模型的空间推理能力。 foundation model multimodal
15 VLM-SlideEval: Evaluating VLMs on Structured Comprehension and Perturbation Sensitivity in PPT VLM-SlideEval:评估VLM在PPT结构化理解和扰动敏感性上的性能 multimodal
16 Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts Controllable-LPMoE:通过动态局部先验混合专家网络提升目标分割性能 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
17 OpenHype: Hyperbolic Embeddings for Hierarchical Open-Vocabulary Radiance Fields OpenHype:提出基于双曲嵌入的开放词汇神经辐射场,用于建模场景层级结构。 neural radiance field implicit representation scene understanding
18 ZING-3D: Zero-shot Incremental 3D Scene Graphs via Vision-Language Models ZING-3D:利用视觉-语言模型实现零样本增量式3D场景图构建 open-vocabulary open vocabulary spatial relationship

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
19 Towards Physically Executable 3D Gaussian for Embodied Navigation 提出SAGE-3D,增强3D高斯表达的语义和物理可执行性,用于具身导航。 sim-to-real 3D gaussian splatting 3DGS
20 ArtiLatent: Realistic Articulated 3D Object Generation via Structured Latents ArtiLatent:通过结构化隐空间生成逼真可动3D物体 manipulation physically plausible geometric consistency

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
21 Gaze-VLM:Bridging Gaze and VLMs through Attention Regularization for Egocentric Understanding Gaze-VLM:通过注视正则化增强VLM的以自我为中心的理解能力 egocentric
22 Towards Fine-Grained Human Motion Video Captioning 提出运动增强的字幕模型(M-ACM),用于生成细粒度的人体运动视频描述。 human mesh recovery

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
23 Towards a Golden Classifier-Free Guidance Path via Foresight Fixed Point Iterations 提出基于前瞻定点迭代的黄金无分类器引导路径,提升文图生成质量与效率 classifier-free guidance

⬅️ 返回 cs.CV 首页 · 🏠 返回主页