cs.CV（2023-12-31）

📊 共 8 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (5 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (1) 支柱九：具身大模型 (Embodied Foundation Models) (1) 支柱四：生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity	SteinDreamer：通过Stein恒等式减少文本到3D score distillation的方差	dreamer distillation monocular depth
2	Analyzing Local Representations of Self-supervised Vision Transformers	分析自监督ViT的局部表征能力，揭示不同预训练方法的优劣与适用性	masked autoencoder MAE contrastive learning
3	Taming Mode Collapse in Score Distillation for Text-to-3D Generation	提出Entropic Score Distillation (ESD)以解决文本到3D生成中的Janus伪影问题	distillation classifier-free guidance
4	Masked Modeling for Self-supervised Representation Learning on Vision and Beyond	提出掩码建模方法以提升自监督表示学习能力	representation learning	✅
5	Multi-Granularity Representation Learning for Sketch-based Dynamic Face Image Retrieval	提出多粒度表示学习方法，解决草图引导的动态人脸图像检索中的早期检索难题。	representation learning	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
6	Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding	提出Video-GroundingDINO，解决开放词汇时空视频定位问题。	open-vocabulary open vocabulary

🔬 支柱九：具身大模型 (Embodied Foundation Models) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
7	Reviving the Context: Camera Trap Species Classification as Link Prediction on Multimodal Knowledge Graphs	提出基于多模态知识图谱的相机陷阱物种分类方法，提升分布外泛化能力。	multimodal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
8	EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling	EMAGE：通过具表现力的掩码音频手势建模实现统一的全身协同语音手势生成	VQ-VAE SMPL SMPL-X	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页