cs.CV(2025-05-08)

📊 共 4 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (2 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)

#题目一句话要点标签🔗
1 StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant 提出StreamBridge以解决离线视频大语言模型的实时交互问题 large language model
2 TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation 提出TokLIP以解决多模态理解与生成中的高计算开销问题 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
3 Visual Affordance Prediction: Survey and Reproducibility 提出统一的视觉可供性预测框架以解决方法不一致问题 affordance

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
4 Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding 提出自适应标记语言生成以解决视觉文档理解问题 spatial relationship instruction following

⬅️ 返回 cs.CV 首页 · 🏠 返回主页