cs.CV(2025-05-05)

📊 共 25 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (13 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (5) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (13 篇)

#题目一句话要点标签🔗
1 Multimodal Deep Learning for Stroke Prediction and Detection using Retinal Imaging and Clinical Data 提出多模态深度学习方法以改善中风预测与检测 foundation model multimodal
2 AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation 提出解剖本体引导推理以提升胸部X光解读能力 multimodal
3 GAME: Learning Multimodal Interactions via Graph Structures for Personality Trait Estimation 提出GAME以解决短视频中个性特征估计问题 multimodal
4 DeepSparse: A Foundation Model for Sparse-View CBCT Reconstruction 提出DeepSparse以解决稀疏视图CBCT重建中的高辐射和计算挑战 foundation model
5 Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models 提出VELM以解决工业异常分类问题 large language model
6 Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities 提出统一多模态理解与生成模型以解决独立演化问题 multimodal
7 Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction 提出Ming-Lite-Uni以解决多模态交互统一架构问题 multimodal
8 Timing Is Everything: Finding the Optimal Fusion Points in Multimodal Medical Imaging 提出序列前向搜索算法以优化多模态医学影像融合时机 multimodal
9 Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection 提出图像-事件融合方法以解决视频异常检测中的时序信息不足问题 multimodal
10 Using Knowledge Graphs to harvest datasets for efficient CLIP model training 利用知识图谱高效收集数据集以训练CLIP模型 foundation model
11 RGBX-DiffusionDet: A Framework for Multi-Modal RGB-X Object Detection Using DiffusionDet 提出RGBX-DiffusionDet以解决多模态目标检测问题 multimodal
12 Recent Advances in Out-of-Distribution Detection with CLIP-Like Models: A Survey 提出基于CLIP的多模态OOD检测新框架以解决现有方法局限性 multimodal
13 TeDA: Boosting Vision-Lanuage Models for Zero-Shot 3D Object Retrieval via Testing-time Distribution Alignment 提出TeDA以解决未知类别3D物体检索问题 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
14 R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning 提出R1-Reward以解决多模态奖励建模不稳定问题 reinforcement learning reward design large language model
15 VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection 提出VAEmo以解决多模态情感识别中的数据稀缺与表达差异问题 representation learning contrastive learning large language model
16 Text to Image Generation and Editing: A Survey 全面综述文本到图像生成与编辑技术 Mamba classifier-free guidance foundation model
17 Learning 3D Persistent Embodied World Models 提出持久性具身世界模型以解决长远规划问题 policy learning world model
18 No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves 提出自我表征对齐方法以优化扩散变换器的生成质量 representation learning distillation foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
19 Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models 提出DiffuGTS以解决肿瘤分割的通用性问题 open-vocabulary open vocabulary
20 VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery 提出VGLD以解决单目深度估计中的语言歧义问题 depth estimation monocular depth metric depth
21 6D Pose Estimation on Spoons and Hands 提出基于6D姿态估计的饮食监测系统以解决饮食行为分析问题 6D pose estimation
22 DELTA: Dense Depth from Events and LiDAR using Transformer's Attention 提出DELTA以融合事件相机与LiDAR数据解决深度估计问题 depth estimation

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
23 MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans 提出MetaScenes以解决真实世界3D场景复制问题 manipulation sim-to-real embodied AI
24 Sim2Real in endoscopy segmentation with a novel structure aware image translation 提出一种新颖的图像翻译模型以解决内镜图像分割问题 sim2real

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
25 Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation 提出Scenethesis以解决3D场景生成中的空间现实性问题 physically plausible penetration embodied AI

⬅️ 返回 cs.CV 首页 · 🏠 返回主页