cs.CV（2023-12-10）

📊 共 17 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (5 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱四：生成式动作 (Generative Motion) (1) 支柱一：机器人控制 (Robot Control) (1) 支柱六：视频提取与匹配 (Video Extraction) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
1	GenDepth: Generalizing Monocular Depth Estimation for Arbitrary Camera Parameters via Ground Plane Embedding	GenDepth：通过地面平面嵌入泛化单目深度估计，适应任意相机参数	depth estimation monocular depth metric depth
2	OpenSD: Unified Open-Vocabulary Segmentation and Detection	OpenSD：提出统一的开放词汇分割与检测框架，提升性能并缓解任务冲突。	open-vocabulary open vocabulary	✅
3	SuperPrimitive: Scene Reconstruction at a Primitive Level	提出SuperPrimitive场景表示，解决单目视觉三维重建中的歧义性问题	visual odometry scene reconstruction
4	TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video	提出TeTriRF，通过时序三平面辐射场实现高效自由视点视频压缩与渲染	NeRF neural radiance field
5	ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering	提出ASH，一种基于可动画高斯 Splatting 的高效逼真人像渲染方法	gaussian splatting splatting
6	NeVRF: Neural Video-based Radiance Fields for Long-duration Sequences	NeVRF：提出神经视频辐射场，解决长时动态序列的自由视角渲染问题	NeRF neural radiance field

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
7	AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One	AM-RADIO：通过多教师蒸馏融合视觉基础模型，实现性能提升与效率优化。	distillation open-vocabulary open vocabulary	✅
8	IL-NeRF: Incremental Learning for Neural Radiance Fields with Camera Pose Alignment	提出IL-NeRF，解决相机位姿未知时NeRF的增量学习问题	distillation NeRF neural radiance field
9	Disentangled Representation Learning for Controllable Person Image Generation	提出DRL-CPG框架以实现可控的人物图像生成	DRL representation learning curriculum learning
10	Spatial-wise Dynamic Distillation for MLP-like Efficient Visual Fault Detection of Freight Trains	提出基于MLP的空间动态蒸馏框架，用于高效的货运列车视觉故障检测。	distillation	✅
11	RepViT-SAM: Towards Real-Time Segmenting Anything	提出RepViT-SAM以解决移动设备实时分割问题	distillation zero-shot transfer	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Multimodality in Online Education: A Comparative Study	提出一种基于多模态融合的在线教育学生情感识别方法	multimodal
13	Open World Object Detection in the Era of Foundation Models	提出FOMO，利用基础模型解决开放世界目标检测问题，并构建新基准。	foundation model	✅
14	Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning	利用生成式语言模型进行弱监督句子成分分析，提升视频-语言联合学习	large language model

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
15	I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions	提出I'm-HOI，一种基于单目RGB相机和物体IMU的3D人-物交互动作捕捉方案。	motion diffusion model motion diffusion human-object interaction

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Wild Motion Unleashed: Markerless 3D Kinematics and Force Estimation in Cheetahs	提出K-FTE方法，实现野生猎豹无标记3D运动学和力估计	quadruped locomotion

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	Layered 3D Human Generation via Semantic-Aware Diffusion Model	提出语义感知扩散模型，实现可分层编辑的高质量3D人体生成	SMPL

⬅️ 返回 cs.CV 首页 · 🏠 返回主页