cs.CV(2025-06-25)

📊 共 17 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (7) 支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4) 支柱一:机器人控制 (Robot Control) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
1 A Novel Large Vision Foundation Model (LVFM)-based Approach for Generating High-Resolution Canopy Height Maps in Plantations for Precision Forestry Management 提出基于大型视觉基础模型的高分辨率冠层高度图生成方法 height map foundation model
2 THIRDEYE: Cue-Aware Monocular Depth Estimation via Brain-Inspired Multi-Stage Fusion 提出THIRDEYE以解决单目深度估计中的线索利用不足问题 depth estimation monocular depth
3 Video Perception Models for 3D Scene Synthesis 提出VIPScene以解决3D场景合成中的一致性问题 open-vocabulary open vocabulary first-person view
4 StereoDiff: Stereo-Diffusion Synergy for Video Depth Estimation 提出StereoDiff以解决视频深度估计中的时空一致性问题 depth estimation
5 Feature Hallucination for Self-supervised Action Recognition 提出深度转化动作识别框架以提升视频动作识别准确性 optical flow multimodal
6 Joint attitude estimation and 3D neural reconstruction of non-cooperative space objects 利用NeRF实现非合作空间物体的姿态估计与3D重建 NeRF neural radiance field
7 IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals 提出IPFormer以解决视觉3D全景场景补全问题 scene understanding

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
8 Visual-Semantic Knowledge Conflicts in Operating Rooms: Synthetic Data Curation for Surgical Risk Perception in Multimodal Large Language Models 提出合成数据集以解决手术室中的视觉语义知识冲突问题 large language model multimodal
9 How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction? 比较基础模型与骨架方法在机器人交互手势识别中的应用 foundation model multimodal
10 UniCode$^2$: Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation 提出UniCode$^2$以解决多模态理解与生成中的视觉编码问题 large language model multimodal
11 How Can Multimodal Remote Sensing Datasets Transform Classification via SpatialNet-ViT? 提出SpatialNet-ViT以解决遥感数据分类问题 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
12 Towards Scalable and Generalizable Earth Observation Data Mining via Foundation Model Composition 通过基础模型组合实现可扩展的地球观测数据挖掘 distillation foundation model
13 Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision 提出向量对比学习以解决医学视觉中的像素级预训练问题 contrastive learning foundation model
14 FixCLR: Negative-Class Contrastive Learning for Semi-Supervised Domain Generalization 提出FixCLR以解决半监督领域泛化问题 contrastive learning
15 MMSearch-R1: Incentivizing LMMs to Search 提出MMSearch-R1以解决多模态模型搜索效率问题 reinforcement learning multimodal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
16 ConViTac: Aligning Visual-Tactile Fusion with Contrastive Representations 提出ConViTac以解决视觉触觉融合特征对齐问题 manipulation representation learning contrastive learning

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
17 WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration 提出WonderFree以解决3D场景探索中的视角一致性和图像质量问题 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页