cs.CV(2026-04-02)

📊 共 9 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (2 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
1 HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models 提出HieraVid以解决视频大语言模型的计算负担问题 large language model
2 UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving 提出UniDriveVLA以解决自动驾驶中的感知与推理冲突问题 vision-language-action VLA
3 Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining 提出大规模编解码Avatar(LCA),通过预训练和后训练范式提升3D头像建模的泛化性和保真度。 large language model foundation model
4 Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks 提出随机标签桥接训练,实现语言模型向视觉任务的有效迁移 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
5 F3DGS: Federated 3D Gaussian Splatting for Decentralized Multi-Agent World Modeling 提出F3DGS,用于去中心化多智能体世界建模的联邦3D高斯溅射 world model world models 3D gaussian splatting
6 UAV-Track VLA: Embodied Aerial Tracking via Vision-Language-Action Models 提出UAV-Track VLA模型,解决复杂城市场景下无人机视觉-语言-动作多模态跟踪问题。 flow matching vision-language-action VLA

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
7 DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning 提出DriveDreamer-Policy,一种几何感知的世界-动作模型,用于统一生成与规划。 motion planning world model world models

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
8 Ego-Grounding for Personalized Question-Answering in Egocentric Videos 提出MyEgo数据集,用于评估多模态大语言模型在以自我为中心的视频中进行个性化问答的能力。 egocentric large language model multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
9 CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects CompassAD提出意图驱动的3D可供性分割,解决功能竞争对象中的任务难题。 affordance

⬅️ 返回 cs.CV 首页 · 🏠 返回主页