cs.CV(2025-10-26)

📊 共 20 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (12 🔗4) 支柱三:空间感知与语义 (Perception & Semantics) (4) 支柱二:RL算法与架构 (RL & Architecture) (4)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)

#题目一句话要点标签🔗
1 Windsock is Dancing: Adaptive Multimodal Retrieval-Augmented Generation Windsock:自适应多模态检索增强生成方法,提升多模态大语言模型性能。 large language model multimodal
2 SARVLM: A Vision Language Foundation Model for Semantic Understanding and Target Recognition in SAR Imagery 提出SARVLM:面向SAR图像语义理解和目标识别的视觉语言基础模型 foundation model multimodal
3 DeepfakeBench-MM: A Comprehensive Benchmark for Multimodal Deepfake Detection 构建多模态深度伪造检测基准,应对伪造音视频内容带来的社会风险。 multimodal
4 Open Multimodal Retrieval-Augmented Factual Image Generation 提出ORIG框架,通过开放多模态检索增强,解决事实性图像生成中知识不准确问题 multimodal
5 GateFuseNet: An Adaptive 3D Multimodal Neuroimaging Fusion Network for Parkinson's Disease Diagnosis GateFuseNet:一种自适应3D多模态神经影像融合网络,用于帕金森病诊断 multimodal
6 FairJudge: MLLM Judging for Social Attributes and Prompt Image Alignment FairJudge:利用多模态LLM评估社会属性和提示图像对齐,提升公平性审计。 multimodal instruction following
7 Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs 针对AVSR中LLM的Attention Sink问题,提出解耦损失以提升识别精度 large language model multimodal
8 LLM-based Fusion of Multi-modal Features for Commercial Memorability Prediction 提出基于LLM的多模态融合方法,用于提升商业广告记忆度预测的鲁棒性和泛化性。 multimodal
9 VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree VADTree:通过分层粒度感知树实现可解释的无训练视频异常检测 large language model
10 RoboSVG: A Unified Framework for Interactive SVG Generation with Multi-modal Guidance RoboSVG:多模态引导的交互式SVG统一生成框架 multimodal
11 PSScreen V2: Partially Supervised Multiple Retinal Disease Screening PSScreen V2:一种用于多视网膜疾病筛查的半监督自训练框架 foundation model
12 STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models STATUS Bench:用于评估视觉-语言模型物体状态理解能力的严格基准 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
13 LVD-GS: Gaussian Splatting SLAM for Dynamic Scenes via Hierarchical Explicit-Implicit Representation Collaboration Rendering LVD-GS:面向动态场景,基于分层显隐式表达协同渲染的Gaussian Splatting SLAM 3D gaussian splatting gaussian splatting splatting
14 Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views 提出Look and Tell数据集,用于研究以自我为中心和以外部为中心视角下的多模态指示交流。 scene reconstruction egocentric multimodal
15 DynaPose4D: High-Quality 4D Dynamic Content Generation via Pose Alignment Loss DynaPose4D:提出基于姿态对齐损失的高质量4D动态内容生成方法 3D gaussian splatting gaussian splatting splatting
16 Seeing the Unseen: Towards Zero-Shot Inspection for Wind Turbine Blades using Knowledge-Augmented Vision Language Models 提出基于知识增强视觉语言模型的零样本风力涡轮机叶片缺陷检测方法 open-vocabulary open vocabulary multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
17 Edge Collaborative Gaussian Splatting with Integrated Rendering and Communication 提出ECO-GS,通过边缘协同高斯溅射提升低成本设备渲染质量 imitation learning gaussian splatting splatting
18 Mutual Information guided Visual Contrastive Learning 提出互信息引导的视觉对比学习,提升表征学习在开放环境下的泛化性 representation learning contrastive learning
19 Alias-Free ViT: Fractional Shift Invariance via Linear Attention 提出Alias-Free ViT,通过线性注意力实现分数平移不变性,提升ViT的鲁棒性。 linear attention
20 Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity 提出基于单教师视角增强的知识蒸馏方法,通过角度多样性提升学生模型性能。 distillation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页