cs.CV（2025-08-11）

📊 共 35 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (13 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (9 🔗6) 支柱三：空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱一：机器人控制 (Robot Control) (3) 支柱四：生成式动作 (Generative Motion) (3 🔗1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (13 篇)

#	题目	一句话要点	标签	🔗	⭐
1	ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model	提出ExpVG以系统研究多模态大语言模型中的视觉定位问题	large language model multimodal visual grounding
2	MDD-Net: Multimodal Depression Detection through Mutual Transformer	提出MDD-Net以解决多模态抑郁检测问题	multimodal	✅
3	Prompt-Guided Relational Reasoning for Social Behavior Understanding with Vision Foundation Models	提出ProGraD以解决群体活动检测中的社交行为理解问题	foundation model
4	CATP: Contextually Adaptive Token Pruning for Efficient and Enhanced Multimodal In-Context Learning	提出CATP以解决多模态上下文学习中的图像令牌冗余问题	multimodal
5	MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization	提出MIMIC框架以解决视觉语言模型的可解释性问题	multimodal
6	Segmenting and Understanding: Region-aware Semantic Attention for Fine-grained Image Quality Assessment with Large Language Models	提出RSFIQA以解决无参考图像质量评估中的区域敏感性不足问题	large language model
7	Towards Effective MLLM Jailbreaking Through Balanced On-Topicness and OOD-Intensity	提出四轴评估框架与BSD策略以提升MLLM越狱效果	large language model multimodal
8	Learning User Preferences for Image Generation Model	提出基于多模态大语言模型的用户偏好学习方法以提升图像生成质量	large language model multimodal
9	TBAC-UniImage: Unified Understanding and Generation by Ladder-Side Diffusion Tuning	提出TBAC-UniImage以解决多模态理解与生成的深度整合问题	large language model multimodal
10	The Escalator Problem: Identifying Implicit Motion Blindness in AI for Accessibility	提出隐性运动失明问题以提升辅助技术的可靠性	large language model multimodal
11	Re:Verse -- Can Your VLM Read a Manga?	提出新评估框架以解决视觉语言模型在漫画叙事理解中的不足	multimodal
12	MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling	提出MAViS框架以解决长视频生成的多重挑战	multimodal
13	Towards Scalable Training for Handwritten Mathematical Expression Recognition	提出TexTeller以解决手写数学表达式识别数据稀缺问题	foundation model

🔬 支柱二：RL算法与架构 (RL & Architecture) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
14	FantasyStyle: Controllable Stylized Distillation for 3D Gaussian Splatting	提出FantasyStyle以解决3D风格转移中的不一致性与内容泄露问题	distillation 3D gaussian splatting 3DGS	✅
15	Selective Contrastive Learning for Weakly Supervised Affordance Grounding	提出选择性对比学习以解决弱监督效能定位问题	contrastive learning distillation affordance
16	Reinforcement Learning for Large Model: A Survey	综述视觉强化学习领域的最新进展与挑战	reinforcement learning RLHF vision-language-action	✅
17	MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision	提出MedReasoner以解决医疗影像中ROI精准定位问题	reinforcement learning large language model multimodal
18	TRIDE: A Text-assisted Radar-Image weather-aware fusion network for Depth Estimation	提出TRIDE以解决天气影响下的深度估计问题	MAE depth estimation monocular depth	✅
19	Neural Tangent Knowledge Distillation for Optical Convolutional Networks	提出神经切线知识蒸馏以解决光学卷积网络的准确性问题	distillation
20	KARMA: Efficient Structural Defect Segmentation via Kolmogorov-Arnold Representation Learning	提出KARMA以解决基础设施结构缺陷语义分割问题	representation learning	✅
21	Deep Space Weather Model: Long-Range Solar Flare Prediction from Multi-Wavelength Images	提出深空天气模型以解决太阳耀斑长时间预测问题	state space model representation learning masked autoencoder	✅
22	ME-TST+: Micro-expression Analysis via Temporal State Transition with ROI Relationship Awareness	提出ME-TST+以解决微表情分析中的时序与任务关联问题	Mamba state space model	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
23	ReferSplat: Referring Segmentation in 3D Gaussian Splatting	提出ReferSplat以解决3D场景中的目标分割问题	3D gaussian splatting gaussian splatting splatting	✅
24	Multi-view Normal and Distance Guidance Gaussian Splatting for Surface Reconstruction	提出多视角法向与距离引导的高斯点云重建方法以解决表面重建问题	metric depth 3D gaussian splatting 3DGS	✅
25	SAGOnline: Segment Any Gaussians Online	提出SAGOnline以解决高效3D分割问题	3D gaussian splatting 3DGS gaussian splatting
26	Mem4D: Decoupling Static and Dynamic Memory for Dynamic Scene Reconstruction	提出Mem4D以解决动态场景重建中的记忆需求困境	scene reconstruction
27	GRASPTrack: Geometry-Reasoned Association via Segmentation and Projection for Multi-Object Tracking	提出GRASPTrack以解决单目视频中的多目标跟踪问题	depth estimation monocular depth
28	Matrix-3D: Omnidirectional Explorable 3D World Generation	提出Matrix-3D以解决全景可探索3D世界生成问题	scene reconstruction

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
29	ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction	提出ReconDreamer-RL以解决仿真与现实之间的差距问题	sim2real reinforcement learning imitation learning
30	AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning	提出AR-VRM以解决机器人视觉操控中的数据稀缺问题	manipulation
31	VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models	提出VISOR以解决视觉输入引导输出重定向问题	manipulation multimodal

🔬 支柱四：生成式动作 (Generative Motion) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
32	PP-Motion: Physical-Perceptual Fidelity Evaluation for Human Motion Generation	提出PP-Motion以解决人类动作生成的评估问题	motion generation
33	Learning an Implicit Physics Model for Image-based Fluid Simulation	提出一种隐式物理模型以解决基于图像的流体模拟问题	physically plausible
34	Being-M0.5: A Real-Time Controllable Vision-Language-Motion Model	提出Being-M0.5以解决人类动作生成的可控性问题	motion generation	✅

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
35	Spatial-ORMLLM: Improve Spatial Relation Understanding in the Operating Room with Multimodal Large Language Model	提出Spatial-ORMLLM以解决手术室空间关系理解问题	spatial relationship large language model multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页