cs.CV(2023-12-07)
📊 共 31 篇论文 | 🔗 7 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (11)
支柱九:具身大模型 (Embodied Foundation Models) (10 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗3)
支柱一:机器人控制 (Robot Control) (2 🔗1)
支柱四:生成式动作 (Generative Motion) (2)
支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)
支柱五:交互与反应 (Interaction & Reaction) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (11 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Open-Vocabulary Segmentation with Semantic-Assisted Calibration | 提出语义辅助校准网络SCAN,解决开放词汇分割中词汇内偏差和领域偏差问题。 | open-vocabulary open vocabulary | ||
| 2 | GSGFormer: Generative Social Graph Transformer for Multimodal Pedestrian Trajectory Prediction | GSGFormer:用于多模态行人轨迹预测的生成式社交图Transformer | semantic map multimodal | ||
| 3 | Camera Height Doesn't Change: Unsupervised Training for Metric Monocular Road-Scene Depth Estimation | 提出FUMET框架,仅用驾驶视频无监督训练单目深度网络,实现绝对尺度和度量深度估计。 | depth estimation monocular depth metric depth | ||
| 4 | EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS | EAGLES:轻量级编码加速高效3D高斯模型,显著降低内存占用。 | 3D gaussian splatting gaussian splatting splatting | ||
| 5 | Text as Image: Learning Transferable Adapter for Multi-Label Classification | 提出Text as Image方法,学习可迁移适配器用于多标签图像分类 | open-vocabulary open vocabulary large language model | ||
| 6 | Auto-Vocabulary Semantic Segmentation | 提出AutoSeg框架,实现无需预定义类别的自动词汇语义分割 | open-vocabulary open vocabulary large language model | ||
| 7 | MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar | 提出MonoGaussianAvatar,利用单目视频重建并驱动逼真头部Avatar。 | gaussian splatting splatting implicit representation | ||
| 8 | VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment | 提出VOODOO 3D,用于单样本3D头部重演的体绘制解耦框架 | neural radiance field | ||
| 9 | MuRF: Multi-Baseline Radiance Fields | MuRF:提出多基线辐射场方法,解决稀疏视角合成问题,适用于不同基线设置。 | NeRF | ||
| 10 | GenDeF: Learning Generative Deformation Field for Video Generation | GenDeF:通过学习生成形变场实现高质量视频生成 | optical flow | ||
| 11 | Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection | 提出基于物体反射的相机位姿估计方法,无需依赖背景信息。 | NeRF |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | Augmentation-Free Dense Contrastive Knowledge Distillation for Efficient Semantic Segmentation | 提出无数据增强的密集对比知识蒸馏方法,提升语义分割效率与精度。 | contrastive learning teacher-student distillation | ✅ | |
| 23 | HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image | HyperDreamer:基于单张图像生成和编辑超逼真3D内容 | dreamer | ||
| 24 | Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors | 提出BiDiff双向扩散模型,融合2D和3D先验知识,提升文本到3D生成质量。 | distillation foundation model | ✅ | |
| 25 | PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation | 提出PartDistill,通过视觉-语言模型蒸馏实现3D形状部件分割 | distillation | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction | 提出PhysHOI,通过模仿学习实现基于物理的动态人-物交互,无需任务特定奖励。 | humanoid reward design human-object interaction | ||
| 27 | Inversion-Free Image Editing with Natural Language | 提出InfEdit,实现无需反演的自然语言图像编辑,兼顾一致性与效率 | manipulation | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 28 | DiffusionPhase: Motion Diffusion in Frequency Domain | DiffusionPhase:提出一种频域运动扩散方法,用于生成高质量、多样化的人体运动序列。 | motion diffusion text-to-motion motion generation | ||
| 29 | Digital Life Project: Autonomous 3D Characters with Social Intelligence | 提出Digital Life Project,构建具备社交智能的自主3D角色 | text-driven motion motion synthesis motion generation |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 30 | LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos | LifelongMemory:利用大型语言模型进行长时程第一视角视频问答 | egocentric Ego4D large language model | ✅ |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 31 | Instance Tracking in 3D Scenes from Egocentric Videos | 提出IT3DEgo基准数据集与实例跟踪方法,解决以自我为中心的3D场景实例跟踪问题。 | human-object interaction egocentric |