cs.CV(2023-12-10)

📊 共 17 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱一:机器人控制 (Robot Control) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
1 GenDepth: Generalizing Monocular Depth Estimation for Arbitrary Camera Parameters via Ground Plane Embedding GenDepth:通过地面平面嵌入泛化单目深度估计,适应任意相机参数 depth estimation monocular depth metric depth
2 OpenSD: Unified Open-Vocabulary Segmentation and Detection OpenSD:提出统一的开放词汇分割与检测框架,提升性能并缓解任务冲突。 open-vocabulary open vocabulary
3 SuperPrimitive: Scene Reconstruction at a Primitive Level 提出SuperPrimitive场景表示,解决单目视觉三维重建中的歧义性问题 visual odometry scene reconstruction
4 TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video 提出TeTriRF,通过时序三平面辐射场实现高效自由视点视频压缩与渲染 NeRF neural radiance field
5 ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering 提出ASH,一种基于可动画高斯 Splatting 的高效逼真人像渲染方法 gaussian splatting splatting
6 NeVRF: Neural Video-based Radiance Fields for Long-duration Sequences NeVRF:提出神经视频辐射场,解决长时动态序列的自由视角渲染问题 NeRF neural radiance field

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
7 AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One AM-RADIO:通过多教师蒸馏融合视觉基础模型,实现性能提升与效率优化。 distillation open-vocabulary open vocabulary
8 IL-NeRF: Incremental Learning for Neural Radiance Fields with Camera Pose Alignment 提出IL-NeRF,解决相机位姿未知时NeRF的增量学习问题 distillation NeRF neural radiance field
9 Disentangled Representation Learning for Controllable Person Image Generation 提出DRL-CPG框架以实现可控的人物图像生成 DRL representation learning curriculum learning
10 Spatial-wise Dynamic Distillation for MLP-like Efficient Visual Fault Detection of Freight Trains 提出基于MLP的空间动态蒸馏框架,用于高效的货运列车视觉故障检测。 distillation
11 RepViT-SAM: Towards Real-Time Segmenting Anything 提出RepViT-SAM以解决移动设备实时分割问题 distillation zero-shot transfer

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
12 Multimodality in Online Education: A Comparative Study 提出一种基于多模态融合的在线教育学生情感识别方法 multimodal
13 Open World Object Detection in the Era of Foundation Models 提出FOMO,利用基础模型解决开放世界目标检测问题,并构建新基准。 foundation model
14 Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning 利用生成式语言模型进行弱监督句子成分分析,提升视频-语言联合学习 large language model

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
15 I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions 提出I'm-HOI,一种基于单目RGB相机和物体IMU的3D人-物交互动作捕捉方案。 motion diffusion model motion diffusion human-object interaction

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
16 Wild Motion Unleashed: Markerless 3D Kinematics and Force Estimation in Cheetahs 提出K-FTE方法,实现野生猎豹无标记3D运动学和力估计 quadruped locomotion

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
17 Layered 3D Human Generation via Semantic-Aware Diffusion Model 提出语义感知扩散模型,实现可分层编辑的高质量3D人体生成 SMPL

⬅️ 返回 cs.CV 首页 · 🏠 返回主页