cs.CV(2025-06-26)

📊 共 50 篇论文 | 🔗 11 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (19 🔗6) 支柱三:空间感知与语义 (Perception & Semantics) (11 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (10 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (3) 支柱一:机器人控制 (Robot Control) (3 🔗1) 支柱四:生成式动作 (Generative Motion) (2) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (19 篇)

#题目一句话要点标签🔗
1 ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing 提出ThinkSound框架以解决视频音频生成中的高保真挑战 large language model foundation model multimodal
2 DRISHTIKON: Visual Grounding at Multiple Granularities in Documents 提出DRISHTIKON以解决文档图像中的视觉定位问题 large language model visual grounding
3 Multimodal Prompt Alignment for Facial Expression Recognition 提出多模态提示对齐框架以提升面部表情识别精度 large language model multimodal
4 Bridging Video Quality Scoring and Justification via Large Multimodal Models 提出基于SIG的多模态模型以提升视频质量评分与解释能力 multimodal chain-of-thought
5 SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark 提出SiM3D以解决单实例多视角多模态3D异常检测问题 multimodal
6 Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset Evaluation 基于深度学习的非典型有丝分裂分类基准研究 foundation model
7 SimVecVis: A Dataset for Enhancing MLLMs in Visualization Understanding 提出SimVec以解决多模态大语言模型在可视化理解中的挑战 large language model multimodal chain-of-thought
8 SAMURAI: Shape-Aware Multimodal Retrieval for 3D Object Identification 提出SAMURAI以解决复杂室内环境中的3D物体检索问题 multimodal
9 LASFNet: A Lightweight Attention-Guided Self-Modulation Feature Fusion Network for Multimodal Object Detection 提出LASFNet以简化多模态目标检测中的特征融合问题 multimodal
10 FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering 提出FOCUS以解决细粒度视觉问答中的视觉裁剪问题 large language model multimodal
11 Exploring the Design Space of 3D MLLMs for CT Report Generation 提出3D多模态大语言模型以提升CT报告生成效果 large language model multimodal
12 LLaVA-Pose: Enhancing Human Pose and Action Understanding via Keypoint-Integrated Instruction Tuning 提出LLaVA-Pose以解决人类姿态与动作理解问题 multimodal instruction following
13 GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding 提出GroundFlow模块以解决3D点云序列定位中的时间推理问题 large language model visual grounding
14 Task-Aware KV Compression For Cost-Effective Long Video Understanding 提出Video-X^2L以解决长视频理解中的KV压缩问题 large language model multimodal
15 OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography 提出OracleFusion以解决甲骨文字符解读难题 large language model multimodal
16 Evidence-based diagnostic reasoning with multi-agent copilot for human pathology 提出PathChat+以解决病理学诊断推理不足问题 large language model multimodal
17 Global and Local Entailment Learning for Natural World Imagery 提出Radial Cross-Modal Embeddings以解决视觉语言模型中的推理问题 foundation model
18 ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models 提出ShotBench以解决电影语言理解不足的问题 multimodal
19 FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing 提出FaSTA$^*$以解决高效的多轮图像编辑问题 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (11 篇)

#题目一句话要点标签🔗
20 EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting 提出EndoFlow-SLAM以解决内窥镜SLAM中的光流约束问题 3D gaussian splatting 3DGS gaussian splatting
21 CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization 提出CL-Splats以解决动态3D场景更新问题 gaussian splatting splatting scene reconstruction
22 ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation 提出ReME框架以解决训练无关的开放词汇分割问题 scene understanding open-vocabulary open vocabulary
23 Curve-Aware Gaussian Splatting for 3D Parametric Curve Reconstruction 提出CurveGaussian以解决3D参数曲线重建问题 gaussian splatting splatting
24 DBMovi-GS: Dynamic View Synthesis from Blurry Monocular Video via Sparse-Controlled Gaussian Splatting 提出DBMovi-GS以解决动态模糊视频中的新视角合成问题 gaussian splatting splatting
25 MADrive: Memory-Augmented Driving Scene Modeling 提出MADrive以解决自动驾驶场景重建的局限性 3D gaussian splatting gaussian splatting splatting
26 PanSt3R: Multi-view Consistent Panoptic Segmentation 提出PanSt3R以解决多视角一致的全景分割问题 3DGS NeRF spatial relationship
27 PhotonSplat: 3D Scene Reconstruction and Colorization from SPAD Sensors 提出PhotonSplat以解决运动模糊下的3D重建问题 scene reconstruction
28 WAFT: Warping-Alone Field Transforms for Optical Flow 提出WAFT以解决光流估计中的高内存消耗问题 optical flow
29 User-in-the-Loop View Sampling with Error Peaking Visualization 提出基于用户反馈的视图采样方法以解决AR数据收集挑战 3D gaussian splatting gaussian splatting splatting
30 CoPa-SG: Dense Scene Graphs with Parametric and Proto-Relations 提出CoPa-SG以解决场景图数据不足问题 scene understanding

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
31 GoIRL: Graph-Oriented Inverse Reinforcement Learning for Multimodal Trajectory Prediction 提出GoIRL框架以解决多模态轨迹预测问题 reinforcement learning inverse reinforcement learning multimodal
32 EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception 提出EgoAdapt以解决多模态自我感知的高计算成本问题 policy learning distillation egocentric
33 G$^{2}$D: Boosting Multimodal Learning with Gradient-Guided Distillation 提出G²D以解决多模态学习中的模态不平衡问题 distillation multimodal
34 StruMamba3D: Exploring Structural Mamba for Self-supervised Point Cloud Representation Learning 提出StruMamba3D以解决SSM在点云表示学习中的局限性 Mamba SSM state space model
35 Asymmetric Dual Self-Distillation for 3D Self-Supervised Representation Learning 提出不对称双重自蒸馏框架以解决3D自监督表示学习问题 representation learning distillation
36 Continual Self-Supervised Learning with Masked Autoencoders in Remote Sensing 提出CoSMAE以解决遥感中的持续学习问题 masked autoencoder MAE distillation
37 HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context 提出HumanOmniV2以解决多模态推理中的上下文理解不足问题 reinforcement learning large language model multimodal
38 Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image 提出几何与感知引导的高斯模型以解决单图生成3D一致性问题 distillation 3D gaussian splatting gaussian splatting
39 Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs 提出SpatialReasoner-R1以解决视觉语言模型的空间推理问题 DPO direct preference optimization chain-of-thought
40 Hierarchical Sub-action Tree for Continuous Sign Language Recognition 提出层次子动作树以解决连续手语识别中的数据不足问题 representation learning large language model

🔬 支柱六:视频提取与匹配 (Video Extraction) (3 篇)

#题目一句话要点标签🔗
41 Whole-Body Conditioned Egocentric Video Prediction 提出基于全身条件的自我中心视频预测以解决环境建模问题 egocentric
42 Comparing Learning Paradigms for Egocentric Video Summarization 比较学习范式以提升第一人称视频摘要效果 egocentric
43 The Role of Cyclopean-Eye in Stereo Vision 提出新的几何约束以改善立体视觉中的深度重建 feature matching

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
44 DidSee: Diffusion-Based Depth Completion for Material-Agnostic Robotic Perception and Manipulation 提出DidSee以解决非朗伯物体的深度补全问题 manipulation
45 Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition 提出拓扑感知建模以解决无监督仿真到现实点云识别问题 sim2real contrastive learning
46 M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization 提出M2SFormer以解决图像伪造定位中的细节损失问题 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
47 Rethinking Oversaturation in Classifier-Free Guidance via Low Frequency 提出低频改进的无分类器引导以解决过饱和问题 classifier-free guidance
48 PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling 提出PhysRig框架以解决传统皮肤绑定与装配问题 physically plausible

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
49 Lightweight Physics-Informed Zero-Shot Ultrasound Plane Wave Denoising 提出轻量级物理信息零-shot超声平面波去噪方法以解决图像噪声问题 structure preservation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
50 PoseMaster: Generating 3D Characters in Arbitrary Poses from a Single Image 提出PoseMaster以解决3D角色建模中的姿态标准化问题 character animation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页