cs.CV(2025-06-12)

📊 共 32 篇论文 | 🔗 13 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (14 🔗5) 支柱二:RL算法与架构 (RL & Architecture) (8 🔗5) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1) 支柱四:生成式动作 (Generative Motion) (2 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱一:机器人控制 (Robot Control) (1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (14 篇)

#题目一句话要点标签🔗
1 MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models 提出MedSeg-R以解决医学图像分割中的推理问题 large language model multimodal
2 Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation 提出Pisces以解决多模态图像理解与生成的统一模型挑战 large language model foundation model multimodal
3 UrbanSense:A Framework for Quantitative Analysis of Urban Streetscapes leveraging Vision Large Language Models 提出UrbanSense框架以解决城市街景定量分析问题 large language model multimodal
4 Lifting Data-Tracing Machine Unlearning to Knowledge-Tracing for Foundation Models 提出知识追踪机器遗忘以解决基础模型的多样化需求 foundation model
5 BrainMAP: Multimodal Graph Learning For Efficient Brain Disease Localization 提出BrainMAP以解决脑部疾病定位效率低下问题 multimodal
6 MF2Summ: Multimodal Fusion for Video Summarization with Temporal Alignment 提出MF2Summ以解决视频摘要中的多模态信息融合问题 multimodal
7 GeoCAD: Local Geometry-Controllable CAD Generation with Large Language Models 提出GeoCAD以解决局部几何可控CAD生成问题 large language model
8 Towards Scalable SOAP Note Generation: A Weakly Supervised Multimodal Framework 提出弱监督多模态框架以生成SOAP笔记,解决临床文档负担问题 multimodal
9 Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs 提出CDPruner以解决多模态大语言模型中的视觉token冗余问题 large language model multimodal
10 Prompts to Summaries: Zero-Shot Language-Guided Video Summarization 提出零-shot视频摘要方法以解决用户意图表达不足问题 large language model multimodal
11 Defensive Adversarial CAPTCHA: A Semantics-Driven Framework for Natural Adversarial Example Generation 提出无源对抗CAPTCHA以解决传统CAPTCHA易受攻击问题 large language model multimodal
12 From Images to Insights: Explainable Biodiversity Monitoring with Plain Language Habitat Explanations 提出可解释的生物多样性监测框架以解决生态系统理解问题 large language model multimodal
13 CogStream: Context-guided Streaming Video Question Answering 提出CogStream以解决流媒体视频问答中的上下文依赖问题 large language model multimodal
14 CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic Design Generation 提出CreatiPoster以解决可编辑多层图形设计生成问题 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
15 Motion-R1: Enhancing Motion Generation with Decomposed Chain-of-Thought and RL Binding 提出Motion-R1以解决文本到动作生成中的复杂性问题 reinforcement learning text-to-motion motion generation
16 DART: Differentiable Dynamic Adaptive Region Tokenizer for Vision Foundation Models 提出DART以解决固定网格分块的性能瓶颈问题 Mamba spatiotemporal foundation model
17 TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy 提出STRIDE数据集与TARDIS模型以解决动态环境建模问题 world model egocentric generalist agent
18 Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts 提出CIDer框架以解决多模态情感识别中的缺失模态和分布偏移问题 distillation multimodal
19 M4V: Multi-Modal Mamba for Text-to-Video Generation 提出M4V框架以解决文本到视频生成中的计算复杂性问题 Mamba spatiotemporal
20 Human-Robot Navigation using Event-based Cameras and Reinforcement Learning 提出基于事件相机与强化学习的人机导航控制器以解决实时导航问题 reinforcement learning imitation learning
21 Occlusion-Aware 3D Hand-Object Pose Estimation with Masked AutoEncoders 提出基于掩码自编码器的手-物体姿态估计方法以解决遮挡问题 masked autoencoder
22 Rethinking Random Masking in Self-Distillation on ViT 提出改进随机掩码策略以增强ViT自蒸馏性能 distillation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
23 PointGS: Point Attention-Aware Sparse View Synthesis with Gaussian Splatting 提出PointGS以解决稀疏视图合成中的渲染质量问题 3D gaussian splatting 3DGS gaussian splatting
24 Leveraging 6DoF Pose Foundation Models For Mapping Marine Sediment Burial 提出PoseIDON以解决海底沉积物埋藏深度估计问题 depth estimation foundation model
25 LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System 提出LRSLAM以解决密集视觉SLAM中的计算和内存挑战 visual SLAM
26 SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis 提出SceneCompleter以解决3D场景补全与生成视图一致性问题 scene understanding
27 Post-Training Quantization for Video Matting 提出后训练量化框架以解决视频抠图模型的资源限制问题 optical flow

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
28 DanceChat: Large Language Model-Guided Music-to-Dance Generation 提出DanceChat以解决音乐与舞蹈生成之间的语义差距问题 motion synthesis large language model
29 ReconMOST: Multi-Layer Sea Temperature Reconstruction with Observations-Guided Diffusion 提出ReconMOST以解决海洋温度重建问题 physically plausible

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
30 HyBiomass: Global Hyperspectral Imagery Benchmark Dataset for Evaluating Geospatial Foundation Models in Forest Aboveground Biomass Estimation 提出HyBiomass数据集以解决森林生物量估计的基准评估问题 HSI foundation model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
31 Ground Reaction Force Estimation via Time-aware Knowledge Distillation 提出时间感知知识蒸馏框架以解决GRF估计问题 locomotion distillation

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
32 DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers 提出基于扩散变换器的框架以解决人机产品演示视频生成问题 spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页