cs.CV(2025-11-01)
📊 共 11 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)
支柱三:空间感知 (Perception & SLAM) (4)
支柱一:机器人控制 (Robot Control) (2 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond | 提出UniFER-7B,提升多模态大语言模型在面部表情识别中的推理和可解释性。 | reinforcement learning large language model foundation model | ||
| 2 | Towards classification-based representation learning for place recognition on LiDAR scans | 提出基于分类的LiDAR点云表征学习方法,用于解决定位识别问题 | representation learning contrastive learning | ||
| 3 | Saliency-R1: Incentivizing Unified Saliency Reasoning Capability in MLLM with Confidence-Guided Reinforcement Learning | Saliency-R1:利用置信度引导强化学习,提升MLLM的统一显著性推理能力 | reinforcement learning | ||
| 4 | VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning | VinciCoder:通过粗到细视觉强化学习统一多模态代码生成 | reinforcement learning | ✅ |
🔬 支柱三:空间感知 (Perception & SLAM) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | 4D Neural Voxel Splatting: Dynamic Scene Rendering with Voxelized Guassian Splatting | 提出4D神经体素溅射,高效动态场景渲染与新视角合成 | 3D gaussian splatting gaussian splatting novel view synthesis | ||
| 6 | Weakly Supervised Pneumonia Localization from Chest X-Rays Using Deep Neural Network and Grad-CAM Explanations | 提出基于弱监督深度学习和Grad-CAM的肺炎定位方法,提升胸部X光片诊断效率。 | localization | ||
| 7 | Benchmarking individual tree segmentation using multispectral airborne laser scanning data: the FGI-EMIT dataset | FGI-EMIT:多光谱激光雷达树木分割基准数据集与深度学习方法性能评估 | point cloud | ||
| 8 | Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models | Diff4Splat:基于动态重建模型的单图可控4D场景生成 | novel view synthesis |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback | OmniTrack++:通过学习大视场轨迹反馈实现全向多目标跟踪 | quadruped legged robot bipedal | ✅ | |
| 10 | iFlyBot-VLA Technical Report | 提出iFlyBot-VLA,一种基于双层动作表示的视觉-语言-动作大模型,提升机器人操作能力。 | manipulation cross-embodiment |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Oitijjo-3D: Generative AI Framework for Rapid 3D Heritage Reconstruction from Street View Imagery | Oitijjo-3D:利用街景图像的快速3D遗产重建生成式AI框架 | multimodal |