cs.CV（2025-06-20）

📊 共 20 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (5 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (3 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (3 🔗1) 支柱一：机器人控制 (Robot Control) (2)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You	提出STRUCTURE以解决多模态对齐中的数据稀缺问题	foundation model multimodal
2	When Every Millisecond Counts: Real-Time Anomaly Detection via the Multimodal Asynchronous Hybrid Network	提出多模态异步混合网络以解决实时异常检测问题	multimodal
3	MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation	提出MEXA以解决多模态推理中的专家模型聚合问题	multimodal
4	Extracting Multimodal Learngene in CLIP: Unveiling the Multimodal Generalizable Knowledge	提出MM-LG以高效提取CLIP中的多模态可泛化知识	multimodal
5	LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation	提出LaVi以解决视觉语言模型效率低下问题	large language model multimodal
6	Do We Need Large VLMs for Spotting Soccer Actions?	提出基于语言模型的足球动作识别方法以替代视频处理	large language model
7	Multi-label Scene Classification for Autonomous Vehicles: Acquiring and Accumulating Knowledge from Diverse Datasets	提出KAA-CAL以解决自动驾驶场景多标签分类问题	foundation model	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Part$^{2}$GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting	提出Part²GS以解决关节物体建模问题	3D gaussian splatting gaussian splatting splatting
9	DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches	提出DepthVanish以优化立体深度估计中的对抗性补丁	depth estimation stereo depth	✅
10	RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking	提出RGBTrack以解决实时6D姿态估计与跟踪问题	6D pose estimation	✅
11	AnyTraverse: An off-road traversability framework with VLM and human operator in the loop	提出AnyTraverse框架以解决复杂环境下的越野可通行性问题	traversability
12	LunarLoc: Segment-Based Global Localization on the Moon	提出LunarLoc以解决月球表面全球定位问题	VIO	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
13	Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative Search	提出MICS以解决医疗多模态大语言模型推理能力不足的问题	curriculum learning large language model multimodal	✅
14	RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought	提出RealSR-R1以解决真实场景图像超分辨率问题	reinforcement learning large language model chain-of-thought
15	UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation	提出UniFork以解决多模态理解与生成中的任务干扰问题	representation learning multimodal

🔬 支柱六：视频提取与匹配 (Video Extraction) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
16	VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning	提出VLN-R1以解决视觉-语言导航中的路径规划问题	egocentric embodied AI VLN
17	Learning golf swing signatures from a single wrist-worn inertial sensor	提出基于单个腕部传感器的高尔夫挥杆分析框架以解决现有方法不足	human mesh recovery
18	Co-VisiON: Co-Visibility ReasONing on Sparse Image Sets of Indoor Scenes	提出Co-VisiON基准以解决稀疏图像集中的共视推理问题	feature matching	✅

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens	提出机器心理意象框架以增强多模态推理能力	manipulation reinforcement learning distillation
20	Self-supervised Feature Extraction for Enhanced Ball Detection on Soccer Robots	提出自监督特征提取方法以增强足球机器人中的球检测能力	humanoid

⬅️ 返回 cs.CV 首页 · 🏠 返回主页