cs.CV(2025-08-08)
📊 共 4 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | MMFformer: Multimodal Fusion Transformer Network for Depression Detection | 提出MMFformer以解决多模态抑郁检测问题 | multimodal | ✅ | |
| 2 | $Δ$-AttnMask: Attention-Guided Masked Hidden States for Efficient Data Selection and Augmentation | 提出$Δ$-AttnMask以解决视觉指令微调中的数据选择问题 | large language model multimodal instruction following | ||
| 3 | Effective Training Data Synthesis for Improving MLLM Chart Understanding | 提出有效数据合成方法以提升多模态大语言模型的图表理解能力 | large language model multimodal | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 4 | GMF-Drive: Gated Mamba Fusion with Spatial-Aware BEV Representation for End-to-End Autonomous Driving | 提出GMF-Drive以解决现有自动驾驶模型的融合效率问题 | Mamba SSM |