cs.CV（2025-08-30）

📊 共 22 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (9 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (7 🔗3) 支柱六：视频提取与匹配 (Video Extraction) (3) 支柱三：空间感知与语义 (Perception & Semantics) (2 🔗1) 支柱一：机器人控制 (Robot Control) (1 🔗1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	A Modality-agnostic Multi-task Foundation Model for Human Brain Imaging	提出BrainFM以解决脑部成像多模态泛化问题	foundation model	✅
2	LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression	提出LightVLM以加速多模态模型推理过程	multimodal
3	Adaptive Point-Prompt Tuning: Fine-Tuning Heterogeneous Foundation Models for 3D Point Cloud Analysis	提出自适应点提示调优方法以解决3D点云分析问题	foundation model
4	A Multimodal and Multi-centric Head and Neck Cancer Dataset for Segmentation, Diagnosis and Outcome Prediction	提出多模态头颈癌数据集以促进肿瘤分割与预后预测	multimodal
5	SurgLLM: A Versatile Large Multimodal Model with Spatial Focus and Temporal Awareness for Surgical Video Understanding	提出SurgLLM以解决外科视频理解中的空间和时间感知不足问题	multimodal	✅
6	TrimTokenator: Towards Adaptive Visual Token Pruning for Large Multimodal Models	提出视觉令牌修剪策略以提升多模态模型的推理效率	multimodal
7	Two Causes, Not One: Rethinking Omission and Fabrication Hallucinations in MLLMs	提出视觉潜力场校准以解决多模态大语言模型的幻觉问题	large language model multimodal
8	A Dataset Generation Scheme Based on Video2EEG-SPGN-Diffusion for SEED-VD	提出Video2EEG-SPGN-Diffusion以生成视频刺激下的EEG数据集	multimodal
9	Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models	提出SPO-VLM以解决视觉语言模型的对抗攻击问题	visual grounding

🔬 支柱二：RL算法与架构 (RL & Architecture) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
10	VideoRewardBench: Comprehensive Evaluation of Multimodal Reward Models for Video Understanding	提出VideoRewardBench以解决视频理解中多模态奖励模型评估不足的问题	reinforcement learning multimodal
11	SemaMIL: Semantic-Aware Multiple Instance Learning with Retrieval-Guided State Space Modeling for Whole Slide Images	提出SemaMIL以解决全切片图像中的多实例学习问题	SSM state space model
12	MorphGen: Morphology-Guided Representation Learning for Robust Single-Domain Generalization in Histopathological Cancer Classification	提出MorphGen以解决组织病理学癌症分类中的领域泛化问题	representation learning contrastive learning	✅
13	Make me an Expert: Distilling from Generalist Black-Box Models into Specialized Models for Semantic Segmentation	提出黑箱蒸馏方法以解决局部模型训练问题	distillation open-vocabulary open vocabulary	✅
14	Context-Aware Knowledge Distillation with Adaptive Weighting for Image Classification	提出自适应知识蒸馏框架以优化图像分类性能	distillation
15	LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables	提出LUT-Fuse以解决实时红外与可见光图像融合问题	distillation	✅
16	Multi-Focused Video Group Activities Hashing	提出多聚焦视频组活动哈希技术以解决视频检索问题	representation learning spatiotemporal

🔬 支柱六：视频提取与匹配 (Video Extraction) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
17	AQFusionNet: Multimodal Deep Learning for Air Quality Index Prediction with Imagery and Sensor Data	提出AQFusionNet以解决资源受限地区空气质量监测问题	sparse sensors multimodal
18	HERO-VQL: Hierarchical, Egocentric and Robust Visual Query Localization	提出HERO-VQL以解决自我中心视频中的视觉查询定位问题	egocentric
19	Learning Yourself: Class-Incremental Semantic Segmentation with Language-Inspired Bootstrapped Disentanglement	提出语言启发的自我学习框架以解决增量语义分割中的灾难性语义纠缠问题	feature matching

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
20	DGL-RSIS: Decoupling Global Spatial Context and Local Class Semantics for Training-Free Remote Sensing Image Segmentation	提出DGL-RSIS以解决遥感图像分割中的训练需求问题	open-vocabulary open vocabulary multimodal
21	Encoder-Only Image Registration	提出Encoder-Only图像配准框架以解决计算复杂性与大变形问题	optical flow	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation	提出Face-MoGLE以解决可控人脸生成问题	manipulation multimodal	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页