cs.CV（2025-10-25）

📊 共 17 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (6 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱六：视频提取与匹配 (Video Extraction) (2) 支柱九：具身大模型 (Embodied Foundation Models) (2) 支柱四：生成式动作 (Generative Motion) (1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Cross-Enhanced Multimodal Fusion of Eye-Tracking and Facial Features for Alzheimer's Disease Diagnosis	提出一种交叉增强的多模态融合框架，用于眼动追踪和面部特征的阿尔茨海默病诊断。	representation learning multimodal
2	GRPO-Guard: Mitigating Implicit Over-Optimization in Flow Matching via Regulated Clipping	GRPO-Guard：通过调节裁剪缓解Flow Matching中的隐式过度优化	reinforcement learning PPO flow matching
3	CityRiSE: Reasoning Urban Socio-Economic Status in Vision-Language Models via Reinforcement Learning	提出CityRiSE，利用强化学习提升视觉-语言模型在城市社会经济地位推理中的能力	reinforcement learning reward design
4	LOC: A General Language-Guided Framework for Open-Set 3D Occupancy Prediction	LOC：一种通用的语言引导框架，用于开放集3D occupancy预测	contrastive learning distillation scene understanding
5	Beyond Augmentation: Leveraging Inter-Instance Relation in Self-Supervised Representation Learning	提出基于图神经网络的自监督学习方法，利用实例间关系提升表征质量	representation learning	✅
6	LongCat-Video Technical Report	LongCat-Video：基于扩散Transformer的高效长视频生成模型	RLHF world model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
7	EndoSfM3D: Learning to 3D Reconstruct Any Endoscopic Surgery Scene using Self-supervised Foundation Model	EndoSfM3D：利用自监督基础模型学习内窥镜手术场景的3D重建	depth estimation monocular depth Depth Anything	✅
8	I2-NeRF: Learning Neural Radiance Fields Under Physically-Grounded Media Interactions	I2-NeRF：提出一种物理可信的神经辐射场，增强介质退化下的三维重建。	NeRF neural radiance field
9	CogStereo: Neural Stereo Matching with Implicit Spatial Cognition Embedding	CogStereo：利用隐式空间认知嵌入的神经立体匹配，提升零样本泛化能力。	monocular depth scene understanding scene flow
10	DynamicTree: Interactive Real Tree Animation via Sparse Voxel Spectrum	DynamicTree：利用稀疏体素谱实现交互式真实树木动画	3DGS gaussian splatting splatting
11	STG-Avatar: Animatable Human Avatars via Spacetime Gaussian	提出STG-Avatar，通过时空高斯优化实现高保真可动画人体化身重建	3DGS optical flow	✅

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Benchmarking Egocentric Multimodal Goal Inference for Assistive Wearable Agents	WAGIBench：用于辅助可穿戴代理的自中心多模态目标推断基准	egocentric multimodal
13	egoEMOTION: Egocentric Vision and Physiological Signals for Emotion and Personality Recognition in Real-World Tasks	egoEMOTION：结合第一人称视觉与生理信号的情感与人格识别数据集	egocentric egocentric vision

🔬 支柱九：具身大模型 (Embodied Foundation Models) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
14	Mitigating Coordinate Prediction Bias from Positional Encoding Failures	针对MLLM坐标预测偏差，提出Vision-PE Shuffle Guidance方法提升定位精度	large language model multimodal
15	HARMONY: Hidden Activation Representations and Model Output-Aware Uncertainty Estimation for Vision-Language Models	提出HARMONY，利用隐层激活和模型输出来提升视觉-语言模型的不确定性估计。	multimodal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	MOGRAS: Human Motion with Grasping in 3D Scenes	MOGRAS：提出大规模3D场景中人体抓取交互运动数据集与基准方法。	physically plausible human-scene interaction

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	GRAID: Enhancing Spatial Reasoning of VLMs Through High-Fidelity Data Generation	GRAID：通过高质量数据生成增强视觉语言模型空间推理能力	spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页