cs.CV(2025-06-19)

📊 共 18 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
1 TrajSceneLLM: A Multimodal Perspective on Semantic GPS Trajectory Analysis 提出TrajSceneLLM以解决GPS轨迹语义分析问题 multimodal
2 P2MFDS: A Privacy-Preserving Multimodal Fall Detection System for Elderly People in Bathroom Environments 提出隐私保护的多模态跌倒检测系统以解决老年人浴室跌倒问题 multimodal
3 Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details 提出Hunyuan3D 2.5以生成高保真3D资产 foundation model
4 Proxy-Embedding as an Adversarial Teacher: An Embedding-Guided Bidirectional Attack for Referring Expression Segmentation Models 提出PEAT以解决REF模型的对抗攻击问题 multimodal
5 Loss-Oriented Ranking for Automated Visual Prompting in LVLMs 提出AutoV以解决视觉提示选择的自动化问题 large language model
6 DIGMAPPER: A Modular System for Automated Geologic Map Digitization 提出DIGMAPPER以解决地质图自动数字化问题 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
7 GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning 提出GRPO-CARE以解决多模态推理中的一致性问题 reinforcement learning large language model multimodal
8 MoiréXNet: Adaptive Multi-Scale Demoiréing with Linear Attention Test-Time Training and Truncated Flow Matching Prior 提出MoiréXNet以解决图像视频去摩尔纹问题 flow matching linear attention
9 LBMamba: Locally Bi-directional Mamba 提出LBMamba以提升Mamba模型的计算效率与准确性 Mamba SSM state space model
10 MambaHash: Visual State Space Deep Hashing Model for Large-Scale Image Retrieval 提出MambaHash以解决大规模图像检索问题 Mamba

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
11 EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training 提出EndoMUST以解决机器人内窥镜中的单目深度估计问题 depth estimation monocular depth optical flow
12 R3eVision: A Survey on Robust Rendering, Restoration, and Enhancement for 3D Low-Level Vision 提出R3eVision以解决3D低级视觉中的鲁棒渲染与恢复问题 3D gaussian splatting 3DGS gaussian splatting
13 Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation 提出OVSNet以解决开放词汇分割性能不足问题 open-vocabulary open vocabulary
14 VideoGAN-based Trajectory Proposal for Automated Vehicles 基于VideoGAN的轨迹提议方法以解决自动驾驶车辆轨迹生成问题 occupancy grid spatial relationship multimodal

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
15 Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors 提出基于CNN的检测方法以识别面部交换视频中的视觉伪影 manipulation
16 Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks 提出统一框架以评估OCR基础视觉文档理解的鲁棒性 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
17 How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering? 提出轻量化文本记忆方法以解决在线情节记忆视频问答问题 egocentric Ego4D large language model

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
18 Dense 3D Displacement Estimation for Landslide Monitoring via Fusion of TLS Point Clouds and Embedded RGB Images 提出层次分区方法以解决滑坡监测中的稀疏位移估计问题 geometric consistency

⬅️ 返回 cs.CV 首页 · 🏠 返回主页