cs.CV(2025-09-06)

📊 共 8 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱八:物理动画 (Physics-based Animation) (1 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
1 SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning SpecPrune-VLA:通过动作感知自适应推测剪枝加速视觉-语言-动作模型 vision-language-action VLA OpenVLA
2 Human-in-the-Loop: Quantitative Evaluation of 3D Models Generation by Large Language Models 提出人机闭环框架,量化评估大语言模型生成3D模型质量,加速CAD设计。 large language model multimodal
3 PictOBI-20k: Unveiling Large Multimodal Models in Visual Decipherment for Pictographic Oracle Bone Characters 提出PictOBI-20k数据集,用于评估大型多模态模型在甲骨文象形文字视觉释读中的能力 multimodal
4 Context-Aware Multi-Turn Visual-Textual Reasoning in LVLMs via Dynamic Memory and Adaptive Visual Guidance 提出CAMVR框架以解决多轮视觉文本推理问题 large language model instruction following
5 Unleashing Hierarchical Reasoning: An LLM-Driven Framework for Training-Free Referring Video Object Segmentation 提出PARSE-VOS以解决动态视频物体分割中的语言与视觉对齐问题 large language model
6 Leveraging Vision-Language Large Models for Interpretable Video Action Recognition with Semantic Tokenization LVLM-VAR:利用视觉-语言大模型和语义标记实现可解释的视频行为识别 large language model

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
7 Language-guided Recursive Spatiotemporal Graph Modeling for Video Summarization 提出基于语言引导的递归时空图网络VideoGraph,用于视频摘要任务 spatiotemporal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
8 Stereovision Image Processing for Planetary Navigation Maps with Semi-Global Matching and Superpixel Segmentation 提出基于半全局匹配和超像素分割的立体视觉行星导航地图方法,提升火星探测精度。 scene reconstruction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页