cs.CV(2025-05-13)
📊 共 22 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (7 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (6)
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2)
支柱一:机器人控制 (Robot Control) (2 🔗1)
支柱八:物理动画 (Physics-based Animation) (1 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection | 提出轨迹感知自适应标记选择以解决视频建模中的掩蔽策略问题 | reinforcement learning PPO masked autoencoder | ||
| 9 | DFA-CON: A Contrastive Learning Approach for Detecting Copyright Infringement in DeepFake Art | 提出DFA-CON以解决深度伪造艺术作品的版权侵犯检测问题 | contrastive learning foundation model | ||
| 10 | Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning | 提出基于强化学习的动态安全策略管理框架以应对云环境安全挑战 | reinforcement learning deep reinforcement learning | ||
| 11 | OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning | 提出OpenThinkIMG以解决视觉工具增强学习的挑战 | reinforcement learning | ||
| 12 | Leveraging Multi-Modal Information to Enhance Dataset Distillation | 提出多模态数据蒸馏框架以提升数据集表现 | distillation | ||
| 13 | MoKD: Multi-Task Optimization for Knowledge Distillation | 提出MoKD以解决知识蒸馏中的梯度冲突与主导问题 | distillation |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care | 提出Meta-EyeFM以解决初级眼科诊断中的多任务整合问题 | large language model foundation model | ||
| 15 | Generative AI for Autonomous Driving: Frontiers and Opportunities | 综述生成性人工智能在自动驾驶中的应用与挑战 | embodied AI large language model multimodal | ✅ | |
| 16 | Multimodal Fusion of Glucose Monitoring and Food Imagery for Caloric Content Prediction | 提出多模态深度学习框架以提升卡路里估算精度 | multimodal | ||
| 17 | Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training | 提出PRIOR以解决视觉语言模型中的噪声问题 | large language model | ||
| 18 | Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion | 提出视觉-成分特征融合方法以提升食品营养估计 | multimodal | ✅ | |
| 19 | Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion | 提出ResULIC以解决现有图像压缩效率低的问题 | multimodal |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | TT-DF: A Large-Scale Diffusion-Based Dataset and Benchmark for Human Body Forgery Detection | 提出TT-DF数据集以解决人体伪造检测的不足问题 | manipulation optical flow spatiotemporal | ✅ | |
| 21 | Removing Watermarks with Partial Regeneration using Semantic Information | 提出SemanticRegen以解决水印防护脆弱性问题 | manipulation |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series | 提出TiMo以解决卫星图像时间序列分析中的多尺度时空关系捕捉问题 | spatiotemporal foundation model | ✅ |