cs.CV(2025-09-05)
📊 共 20 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (7 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (5 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (4)
支柱一:机器人控制 (Robot Control) (2)
支柱五:交互与反应 (Interaction & Reaction) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Symbolic Graphics Programming with Large Language Models | 提出基于强化学习的框架,提升大语言模型生成精确可控SVG图像的能力 | reinforcement learning large language model | ||
| 9 | PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination | PropVG:提出端到端的基于提议的视觉定位框架,提升复杂场景下的目标识别能力。 | contrastive learning visual grounding | ✅ | |
| 10 | DuoCLR: Dual-Surrogate Contrastive Learning for Skeleton-based Human Action Segmentation | DuoCLR:通过双代理对比学习增强骨骼动作分割 | representation learning contrastive learning | ||
| 11 | SL-SLR: Self-Supervised Representation Learning for Sign Language Recognition | 提出SL-SLR框架,通过自监督学习提升手语识别的表征能力 | representation learning contrastive learning | ||
| 12 | Dynamic Sensitivity Filter Pruning using Multi-Agent Reinforcement Learning For DCNN's | 提出差分敏感度融合剪枝算法,用于高效压缩深度卷积神经网络 | reinforcement learning |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | Efficient Video-to-Audio Generation via Multiple Foundation Models Mapper | 提出多基础模型映射器(MFM-Mapper),高效生成与视频内容匹配的音频。 | foundation model | ||
| 14 | UniView: Enhancing Novel View Synthesis From A Single Image By Unifying Reference Features | UniView:通过统一参考特征增强单图像的新视角合成 | large language model multimodal | ||
| 15 | MCANet: A Multi-Scale Class-Specific Attention Network for Multi-Label Post-Hurricane Damage Assessment using UAV Imagery | 提出MCANet,利用多尺度类特定注意力网络进行无人机图像的飓风灾后多标签评估。 | large language model multimodal | ||
| 16 | WatchHAR: Real-time On-device Human Activity Recognition System for Smartwatches | WatchHAR:面向智能手表的实时、端侧人体活动识别系统 | multimodal |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | OpenEgo: A Large-Scale Multimodal Egocentric Dataset for Dexterous Manipulation | OpenEgo:用于灵巧操作的大规模多模态第一人称数据集 | manipulation dexterous hand dexterous manipulation | ||
| 18 | LUIVITON: Learned Universal Interoperable VIrtual Try-ON | LUIVITON:学习的通用互操作虚拟试穿系统,适用于复杂服装和多样化人体 | humanoid SMPL foundation model |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Scale-interaction transformer: a hybrid cnn-transformer model for facial beauty prediction | 提出Scale-Interaction Transformer (SIT)模型,用于提升面部美学预测的准确性。 | interaction transformer |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | Comparative Evaluation of Traditional and Deep Learning Feature Matching Algorithms using Chandrayaan-2 Lunar Data | 利用嫦娥二号月球数据,对比传统与深度学习特征匹配算法,实现精准图像配准。 | feature matching |