cs.CV（2025-06-25）

📊 共 17 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (7) 支柱九：具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (4) 支柱一：机器人控制 (Robot Control) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	A Novel Large Vision Foundation Model (LVFM)-based Approach for Generating High-Resolution Canopy Height Maps in Plantations for Precision Forestry Management	提出基于大型视觉基础模型的高分辨率冠层高度图生成方法	height map foundation model
2	THIRDEYE: Cue-Aware Monocular Depth Estimation via Brain-Inspired Multi-Stage Fusion	提出THIRDEYE以解决单目深度估计中的线索利用不足问题	depth estimation monocular depth
3	Video Perception Models for 3D Scene Synthesis	提出VIPScene以解决3D场景合成中的一致性问题	open-vocabulary open vocabulary first-person view
4	StereoDiff: Stereo-Diffusion Synergy for Video Depth Estimation	提出StereoDiff以解决视频深度估计中的时空一致性问题	depth estimation
5	Feature Hallucination for Self-supervised Action Recognition	提出深度转化动作识别框架以提升视频动作识别准确性	optical flow multimodal
6	Joint attitude estimation and 3D neural reconstruction of non-cooperative space objects	利用NeRF实现非合作空间物体的姿态估计与3D重建	NeRF neural radiance field
7	IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals	提出IPFormer以解决视觉3D全景场景补全问题	scene understanding

🔬 支柱九：具身大模型 (Embodied Foundation Models) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Visual-Semantic Knowledge Conflicts in Operating Rooms: Synthetic Data Curation for Surgical Risk Perception in Multimodal Large Language Models	提出合成数据集以解决手术室中的视觉语义知识冲突问题	large language model multimodal	✅
9	How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction?	比较基础模型与骨架方法在机器人交互手势识别中的应用	foundation model multimodal
10	UniCode$^2$: Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation	提出UniCode$^2$以解决多模态理解与生成中的视觉编码问题	large language model multimodal
11	How Can Multimodal Remote Sensing Datasets Transform Classification via SpatialNet-ViT?	提出SpatialNet-ViT以解决遥感数据分类问题	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Towards Scalable and Generalizable Earth Observation Data Mining via Foundation Model Composition	通过基础模型组合实现可扩展的地球观测数据挖掘	distillation foundation model
13	Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision	提出向量对比学习以解决医学视觉中的像素级预训练问题	contrastive learning foundation model
14	FixCLR: Negative-Class Contrastive Learning for Semi-Supervised Domain Generalization	提出FixCLR以解决半监督领域泛化问题	contrastive learning
15	MMSearch-R1: Incentivizing LMMs to Search	提出MMSearch-R1以解决多模态模型搜索效率问题	reinforcement learning multimodal

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	ConViTac: Aligning Visual-Tactile Fusion with Contrastive Representations	提出ConViTac以解决视觉触觉融合特征对齐问题	manipulation representation learning contrastive learning

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration	提出WonderFree以解决3D场景探索中的视角一致性和图像质量问题	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页