cs.CV(2023-12-10)
📊 共 17 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (5 🔗3)
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
支柱一:机器人控制 (Robot Control) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | GenDepth: Generalizing Monocular Depth Estimation for Arbitrary Camera Parameters via Ground Plane Embedding | GenDepth:通过地面平面嵌入泛化单目深度估计,适应任意相机参数 | depth estimation monocular depth metric depth | ||
| 2 | OpenSD: Unified Open-Vocabulary Segmentation and Detection | OpenSD:提出统一的开放词汇分割与检测框架,提升性能并缓解任务冲突。 | open-vocabulary open vocabulary | ✅ | |
| 3 | SuperPrimitive: Scene Reconstruction at a Primitive Level | 提出SuperPrimitive场景表示,解决单目视觉三维重建中的歧义性问题 | visual odometry scene reconstruction | ||
| 4 | TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video | 提出TeTriRF,通过时序三平面辐射场实现高效自由视点视频压缩与渲染 | NeRF neural radiance field | ||
| 5 | ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering | 提出ASH,一种基于可动画高斯 Splatting 的高效逼真人像渲染方法 | gaussian splatting splatting | ||
| 6 | NeVRF: Neural Video-based Radiance Fields for Long-duration Sequences | NeVRF:提出神经视频辐射场,解决长时动态序列的自由视角渲染问题 | NeRF neural radiance field |
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One | AM-RADIO:通过多教师蒸馏融合视觉基础模型,实现性能提升与效率优化。 | distillation open-vocabulary open vocabulary | ✅ | |
| 8 | IL-NeRF: Incremental Learning for Neural Radiance Fields with Camera Pose Alignment | 提出IL-NeRF,解决相机位姿未知时NeRF的增量学习问题 | distillation NeRF neural radiance field | ||
| 9 | Disentangled Representation Learning for Controllable Person Image Generation | 提出DRL-CPG框架以实现可控的人物图像生成 | DRL representation learning curriculum learning | ||
| 10 | Spatial-wise Dynamic Distillation for MLP-like Efficient Visual Fault Detection of Freight Trains | 提出基于MLP的空间动态蒸馏框架,用于高效的货运列车视觉故障检测。 | distillation | ✅ | |
| 11 | RepViT-SAM: Towards Real-Time Segmenting Anything | 提出RepViT-SAM以解决移动设备实时分割问题 | distillation zero-shot transfer | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | Multimodality in Online Education: A Comparative Study | 提出一种基于多模态融合的在线教育学生情感识别方法 | multimodal | ||
| 13 | Open World Object Detection in the Era of Foundation Models | 提出FOMO,利用基础模型解决开放世界目标检测问题,并构建新基准。 | foundation model | ✅ | |
| 14 | Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning | 利用生成式语言模型进行弱监督句子成分分析,提升视频-语言联合学习 | large language model |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions | 提出I'm-HOI,一种基于单目RGB相机和物体IMU的3D人-物交互动作捕捉方案。 | motion diffusion model motion diffusion human-object interaction |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Wild Motion Unleashed: Markerless 3D Kinematics and Force Estimation in Cheetahs | 提出K-FTE方法,实现野生猎豹无标记3D运动学和力估计 | quadruped locomotion |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | Layered 3D Human Generation via Semantic-Aware Diffusion Model | 提出语义感知扩散模型,实现可分层编辑的高质量3D人体生成 | SMPL |