cs.CV(2025-05-31)

📊 共 15 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱一:机器人控制 (Robot Control) (3 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (3 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (2) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱四:生成式动作 (Generative Motion) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
1 SatDreamer360: Multiview-Consistent Generation of Ground-Level Scenes from Satellite Imagery 提出SatDreamer360以解决卫星图像生成多视角一致地面场景问题 dreamer height map
2 SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation 提出SenseFlow以解决大规模文本到图像蒸馏的收敛问题 flow matching distillation
3 From Local Cues to Global Percepts: Emergent Gestalt Organization in Self-Supervised Vision Models 提出DiSRT以评估自监督视觉模型的整体感知能力 MAE spatial relationship
4 CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning 提出CReFT-CAD以解决CAD中正投影推理问题 reinforcement learning instruction following

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
5 Multimodal Generative AI with Autoregressive LLMs for Human Motion Understanding and Generation: A Way Forward 提出多模态生成AI与自回归LLM以提升人类动作理解与生成 humanoid text-to-motion motion synthesis
6 XYZ-IBD: A High-precision Bin-picking Dataset for Object 6D Pose Estimation Capturing Real-world Industrial Complexity 提出XYZ-IBD数据集以解决工业环境中的6D姿态估计问题 manipulation depth estimation 6D pose estimation
7 SEED: A Benchmark Dataset for Sequential Facial Attribute Editing with Diffusion Models 提出SEED数据集以解决顺序面部属性编辑的挑战 manipulation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
8 Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning 提出基于帧感知推理的视频理解方法以提升多模态LLMs性能 large language model multimodal chain-of-thought
9 HueManity: Probing Fine-Grained Visual Perception in MLLMs 提出HueManity基准以评估多模态大语言模型的视觉感知能力 large language model multimodal
10 Common Inpainted Objects In-N-Out of Context 提出COinCO数据集以解决视觉数据集中缺乏上下文示例的问题 large language model multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
11 Improving Optical Flow and Stereo Depth Estimation by Leveraging Uncertainty-Based Learning Difficulties 提出基于不确定性学习的光流与立体深度估计改进方法 depth estimation stereo depth optical flow
12 Test-time Vocabulary Adaptation for Language-driven Object Detection 提出VocAda以解决开放词汇物体检测中的词汇适应问题 open-vocabulary open vocabulary

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
13 Sequence-Based Identification of First-Person Camera Wearers in Third-Person Views 提出TF2025数据集与序列识别方法以解决多摄像头交互问题 egocentric egocentric vision Ego4D

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
14 Parallel Rescaling: Rebalancing Consistency Guidance for Personalized Diffusion Models 提出并行重标定技术以解决个性化扩散模型的生成一致性问题 classifier-free guidance

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
15 Event-based multi-view photogrammetry for high-dynamic, high-velocity target measurement 提出基于事件的多视角摄影测量方法以解决高速动态目标测量问题 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页