cs.CV(2025-08-15)

📊 共 5 篇论文

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (2) 支柱一:机器人控制 (Robot Control) (1) 支柱九:具身大模型 (Embodied Foundation Models) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
1 Recent Advances in Transformer and Large Language Models for UAV Applications 系统评估Transformer模型在无人机应用中的进展与挑战 reinforcement learning large language model
2 Ovis2.5 Technical Report 提出Ovis2.5以解决多模态推理与视觉感知问题 DPO multimodal chain-of-thought

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
3 TTF-VLA: Temporal Token Fusion via Pixel-Attention Integration for Vision-Language-Action Models 提出TTF以解决视觉语言动作模型中的时间信息缺失问题 manipulation vision-language-action VLA

🔬 支柱九:具身大模型 (Embodied Foundation Models) (1 篇)

#题目一句话要点标签🔗
4 Controlling Multimodal LLMs via Reward-guided Decoding 提出奖励引导解码方法以提升多模态大语言模型的可控性 large language model multimodal visual grounding

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
5 Labels or Input? Rethinking Augmentation in Multimodal Hate Detection 提出双重方法以提升多模态仇恨检测的准确性 HuMoR multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页