cs.CV(2025-06-19)
📊 共 18 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1)
支柱一:机器人控制 (Robot Control) (2)
支柱六:视频提取与匹配 (Video Extraction) (1)
支柱七:动作重定向 (Motion Retargeting) (1 🔗1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | TrajSceneLLM: A Multimodal Perspective on Semantic GPS Trajectory Analysis | 提出TrajSceneLLM以解决GPS轨迹语义分析问题 | multimodal | ✅ | |
| 2 | P2MFDS: A Privacy-Preserving Multimodal Fall Detection System for Elderly People in Bathroom Environments | 提出隐私保护的多模态跌倒检测系统以解决老年人浴室跌倒问题 | multimodal | ✅ | |
| 3 | Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details | 提出Hunyuan3D 2.5以生成高保真3D资产 | foundation model | ||
| 4 | Proxy-Embedding as an Adversarial Teacher: An Embedding-Guided Bidirectional Attack for Referring Expression Segmentation Models | 提出PEAT以解决REF模型的对抗攻击问题 | multimodal | ||
| 5 | Loss-Oriented Ranking for Automated Visual Prompting in LVLMs | 提出AutoV以解决视觉提示选择的自动化问题 | large language model | ||
| 6 | DIGMAPPER: A Modular System for Automated Geologic Map Digitization | 提出DIGMAPPER以解决地质图自动数字化问题 | large language model |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning | 提出GRPO-CARE以解决多模态推理中的一致性问题 | reinforcement learning large language model multimodal | ||
| 8 | MoiréXNet: Adaptive Multi-Scale Demoiréing with Linear Attention Test-Time Training and Truncated Flow Matching Prior | 提出MoiréXNet以解决图像视频去摩尔纹问题 | flow matching linear attention | ||
| 9 | LBMamba: Locally Bi-directional Mamba | 提出LBMamba以提升Mamba模型的计算效率与准确性 | Mamba SSM state space model | ✅ | |
| 10 | MambaHash: Visual State Space Deep Hashing Model for Large-Scale Image Retrieval | 提出MambaHash以解决大规模图像检索问题 | Mamba | ✅ |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training | 提出EndoMUST以解决机器人内窥镜中的单目深度估计问题 | depth estimation monocular depth optical flow | ✅ | |
| 12 | R3eVision: A Survey on Robust Rendering, Restoration, and Enhancement for 3D Low-Level Vision | 提出R3eVision以解决3D低级视觉中的鲁棒渲染与恢复问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 13 | Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation | 提出OVSNet以解决开放词汇分割性能不足问题 | open-vocabulary open vocabulary | ||
| 14 | VideoGAN-based Trajectory Proposal for Automated Vehicles | 基于VideoGAN的轨迹提议方法以解决自动驾驶车辆轨迹生成问题 | occupancy grid spatial relationship multimodal |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors | 提出基于CNN的检测方法以识别面部交换视频中的视觉伪影 | manipulation | ||
| 16 | Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks | 提出统一框架以评估OCR基础视觉文档理解的鲁棒性 | manipulation |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering? | 提出轻量化文本记忆方法以解决在线情节记忆视频问答问题 | egocentric Ego4D large language model |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | Dense 3D Displacement Estimation for Landslide Monitoring via Fusion of TLS Point Clouds and Embedded RGB Images | 提出层次分区方法以解决滑坡监测中的稀疏位移估计问题 | geometric consistency | ✅ |