cs.CV（2025-10-24）

📊 共 23 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (9 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (2) 支柱一：机器人控制 (Robot Control) (2) 支柱六：视频提取与匹配 (Video Extraction) (2 🔗1) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation	NoisyGRPO：通过噪声注入和贝叶斯估计激励多模态CoT推理	reinforcement learning large language model multimodal	✅
2	Foundation Models in Dermatopathology: Skin Tissue Classification	利用皮肤病理学Foundation Model进行皮肤组织分类，提升诊断效率	representation learning foundation model
3	DAP-MAE: Domain-Adaptive Point Cloud Masked Autoencoder for Effective Cross-Domain Learning	DAP-MAE：领域自适应点云掩码自编码器，提升跨域学习效果	masked autoencoder MAE
4	FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning	提出FineRS，基于强化学习解决MLLM在高分辨率图像中小目标精细推理与分割难题。	reinforcement learning large language model
5	PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis	PhysWorld：通过物理感知演示合成，从真实视频构建可变形对象的交互式世界模型	world model physically plausible
6	WorldGrow: Generating Infinite 3D World	WorldGrow：提出无限3D世界生成框架，解决场景级生成难题	world model implicit representation foundation model
7	A Dynamic Knowledge Distillation Method Based on the Gompertz Curve	提出Gompertz-CNN，利用Gompertz曲线动态调整知识蒸馏，提升学生模型性能。	teacher-student distillation
8	Blockwise Flow Matching: Improving Flow Matching Models For Efficient High-Quality Generation	提出Blockwise Flow Matching，提升Flow Matching模型生成效率和质量。	flow matching	✅
9	WaveSeg: Enhancing Segmentation Precision via High-Frequency Prior and Mamba-Driven Spectrum Decomposition	WaveSeg：利用高频先验和Mamba驱动的频谱分解增强分割精度	Mamba

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
10	PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments	提出PhysVLM-AVR以解决动态环境中的视觉推理问题	large language model multimodal chain-of-thought
11	KBE-DME: Dynamic Multimodal Evaluation via Knowledge Enhanced Benchmark Evolution	提出KBE，通过知识增强基准演化实现多模态大模型的动态评估	large language model multimodal
12	Head Pursuit: Probing Attention Specialization in Multimodal Transformers	提出一种基于信号处理的注意力头分析方法，用于理解和编辑多模态Transformer模型。	multimodal
13	MoniTor: Exploiting Large Language Models with Instruction for Online Video Anomaly Detection	MoniTor：利用指令驱动的大语言模型进行在线视频异常检测。	large language model	✅
14	Towards Physics-informed Spatial Intelligence with Human Priors: An Autonomous Driving Pilot Study	提出SIG结构化空间智能网格，提升自动驾驶场景下多模态大模型的空间推理能力。	foundation model multimodal
15	VLM-SlideEval: Evaluating VLMs on Structured Comprehension and Perturbation Sensitivity in PPT	VLM-SlideEval：评估VLM在PPT结构化理解和扰动敏感性上的性能	multimodal	✅
16	Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts	Controllable-LPMoE：通过动态局部先验混合专家网络提升目标分割性能	foundation model	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
17	OpenHype: Hyperbolic Embeddings for Hierarchical Open-Vocabulary Radiance Fields	OpenHype：提出基于双曲嵌入的开放词汇神经辐射场，用于建模场景层级结构。	neural radiance field implicit representation scene understanding
18	ZING-3D: Zero-shot Incremental 3D Scene Graphs via Vision-Language Models	ZING-3D：利用视觉-语言模型实现零样本增量式3D场景图构建	open-vocabulary open vocabulary spatial relationship

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Towards Physically Executable 3D Gaussian for Embodied Navigation	提出SAGE-3D，增强3D高斯表达的语义和物理可执行性，用于具身导航。	sim-to-real 3D gaussian splatting 3DGS
20	ArtiLatent: Realistic Articulated 3D Object Generation via Structured Latents	ArtiLatent：通过结构化隐空间生成逼真可动3D物体	manipulation physically plausible geometric consistency

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
21	Gaze-VLM:Bridging Gaze and VLMs through Attention Regularization for Egocentric Understanding	Gaze-VLM：通过注视正则化增强VLM的以自我为中心的理解能力	egocentric	✅
22	Towards Fine-Grained Human Motion Video Captioning	提出运动增强的字幕模型(M-ACM)，用于生成细粒度的人体运动视频描述。	human mesh recovery

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	Towards a Golden Classifier-Free Guidance Path via Foresight Fixed Point Iterations	提出基于前瞻定点迭代的黄金无分类器引导路径，提升文图生成质量与效率	classifier-free guidance

⬅️ 返回 cs.CV 首页 · 🏠 返回主页