cs.CV(2025-08-11)

📊 共 35 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (13 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (9 🔗6) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱一:机器人控制 (Robot Control) (3) 支柱四:生成式动作 (Generative Motion) (3 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (13 篇)

#题目一句话要点标签🔗
1 ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model 提出ExpVG以系统研究多模态大语言模型中的视觉定位问题 large language model multimodal visual grounding
2 MDD-Net: Multimodal Depression Detection through Mutual Transformer 提出MDD-Net以解决多模态抑郁检测问题 multimodal
3 Prompt-Guided Relational Reasoning for Social Behavior Understanding with Vision Foundation Models 提出ProGraD以解决群体活动检测中的社交行为理解问题 foundation model
4 CATP: Contextually Adaptive Token Pruning for Efficient and Enhanced Multimodal In-Context Learning 提出CATP以解决多模态上下文学习中的图像令牌冗余问题 multimodal
5 MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization 提出MIMIC框架以解决视觉语言模型的可解释性问题 multimodal
6 Segmenting and Understanding: Region-aware Semantic Attention for Fine-grained Image Quality Assessment with Large Language Models 提出RSFIQA以解决无参考图像质量评估中的区域敏感性不足问题 large language model
7 Towards Effective MLLM Jailbreaking Through Balanced On-Topicness and OOD-Intensity 提出四轴评估框架与BSD策略以提升MLLM越狱效果 large language model multimodal
8 Learning User Preferences for Image Generation Model 提出基于多模态大语言模型的用户偏好学习方法以提升图像生成质量 large language model multimodal
9 TBAC-UniImage: Unified Understanding and Generation by Ladder-Side Diffusion Tuning 提出TBAC-UniImage以解决多模态理解与生成的深度整合问题 large language model multimodal
10 The Escalator Problem: Identifying Implicit Motion Blindness in AI for Accessibility 提出隐性运动失明问题以提升辅助技术的可靠性 large language model multimodal
11 Re:Verse -- Can Your VLM Read a Manga? 提出新评估框架以解决视觉语言模型在漫画叙事理解中的不足 multimodal
12 MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling 提出MAViS框架以解决长视频生成的多重挑战 multimodal
13 Towards Scalable Training for Handwritten Mathematical Expression Recognition 提出TexTeller以解决手写数学表达式识别数据稀缺问题 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
14 FantasyStyle: Controllable Stylized Distillation for 3D Gaussian Splatting 提出FantasyStyle以解决3D风格转移中的不一致性与内容泄露问题 distillation 3D gaussian splatting 3DGS
15 Selective Contrastive Learning for Weakly Supervised Affordance Grounding 提出选择性对比学习以解决弱监督效能定位问题 contrastive learning distillation affordance
16 Reinforcement Learning for Large Model: A Survey 综述视觉强化学习领域的最新进展与挑战 reinforcement learning RLHF vision-language-action
17 MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision 提出MedReasoner以解决医疗影像中ROI精准定位问题 reinforcement learning large language model multimodal
18 TRIDE: A Text-assisted Radar-Image weather-aware fusion network for Depth Estimation 提出TRIDE以解决天气影响下的深度估计问题 MAE depth estimation monocular depth
19 Neural Tangent Knowledge Distillation for Optical Convolutional Networks 提出神经切线知识蒸馏以解决光学卷积网络的准确性问题 distillation
20 KARMA: Efficient Structural Defect Segmentation via Kolmogorov-Arnold Representation Learning 提出KARMA以解决基础设施结构缺陷语义分割问题 representation learning
21 Deep Space Weather Model: Long-Range Solar Flare Prediction from Multi-Wavelength Images 提出深空天气模型以解决太阳耀斑长时间预测问题 state space model representation learning masked autoencoder
22 ME-TST+: Micro-expression Analysis via Temporal State Transition with ROI Relationship Awareness 提出ME-TST+以解决微表情分析中的时序与任务关联问题 Mamba state space model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
23 ReferSplat: Referring Segmentation in 3D Gaussian Splatting 提出ReferSplat以解决3D场景中的目标分割问题 3D gaussian splatting gaussian splatting splatting
24 Multi-view Normal and Distance Guidance Gaussian Splatting for Surface Reconstruction 提出多视角法向与距离引导的高斯点云重建方法以解决表面重建问题 metric depth 3D gaussian splatting 3DGS
25 SAGOnline: Segment Any Gaussians Online 提出SAGOnline以解决高效3D分割问题 3D gaussian splatting 3DGS gaussian splatting
26 Mem4D: Decoupling Static and Dynamic Memory for Dynamic Scene Reconstruction 提出Mem4D以解决动态场景重建中的记忆需求困境 scene reconstruction
27 GRASPTrack: Geometry-Reasoned Association via Segmentation and Projection for Multi-Object Tracking 提出GRASPTrack以解决单目视频中的多目标跟踪问题 depth estimation monocular depth
28 Matrix-3D: Omnidirectional Explorable 3D World Generation 提出Matrix-3D以解决全景可探索3D世界生成问题 scene reconstruction

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
29 ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction 提出ReconDreamer-RL以解决仿真与现实之间的差距问题 sim2real reinforcement learning imitation learning
30 AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning 提出AR-VRM以解决机器人视觉操控中的数据稀缺问题 manipulation
31 VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models 提出VISOR以解决视觉输入引导输出重定向问题 manipulation multimodal

🔬 支柱四:生成式动作 (Generative Motion) (3 篇)

#题目一句话要点标签🔗
32 PP-Motion: Physical-Perceptual Fidelity Evaluation for Human Motion Generation 提出PP-Motion以解决人类动作生成的评估问题 motion generation
33 Learning an Implicit Physics Model for Image-based Fluid Simulation 提出一种隐式物理模型以解决基于图像的流体模拟问题 physically plausible
34 Being-M0.5: A Real-Time Controllable Vision-Language-Motion Model 提出Being-M0.5以解决人类动作生成的可控性问题 motion generation

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
35 Spatial-ORMLLM: Improve Spatial Relation Understanding in the Operating Room with Multimodal Large Language Model 提出Spatial-ORMLLM以解决手术室空间关系理解问题 spatial relationship large language model multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页