cs.CV(2025-09-27)

📊 共 28 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (11 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (7 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (3) 支柱七:动作重定向 (Motion Retargeting) (3) 支柱四:生成式动作 (Generative Motion) (3) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
1 Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness 提出C$^3$B漫画跨文化基准,评估多模态大语言模型的文化感知能力 large language model multimodal
2 Planning with Unified Multimodal Models Uni-Plan:基于统一多模态模型的规划框架,提升长程决策任务性能 large language model multimodal
3 DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice DentVLM:用于全面牙科诊断和增强临床实践的多模态视觉-语言模型 multimodal
4 Decoupling Reasoning and Perception: An LLM-LMM Framework for Faithful Visual Reasoning 提出解耦推理与感知的LLM-LMM框架,提升视觉推理的可靠性 large language model multimodal chain-of-thought
5 Learning Regional Monsoon Patterns with a Multimodal Attention U-Net 提出多模态注意力U-Net,用于印度区域高分辨率季风降雨预测。 multimodal
6 TATTOO: Training-free AesTheTic-aware Outfit recOmmendation 提出TATTOO:一种无需训练的、具有美学意识的服装搭配推荐方法 large language model multimodal chain-of-thought
7 GRAPE: Let GPRO Supervise Query Rewriting by Ranking for Retrieval GRAPE:通过排序监督查询重写,提升检索效果 large language model multimodal
8 Seeing Symbols, Missing Cultures: Probing Vision-Language Models' Reasoning on Fire Imagery and Cultural Meaning 提出火主题文化图像诊断框架,揭示视觉-语言模型在文化理解上的偏差 multimodal
9 SynDoc: A Hybrid Discriminative-Generative Framework for Enhancing Synthetic Domain-Adaptive Document Key Information Extraction SynDoc:一种混合判别-生成框架,用于增强合成领域自适应文档关键信息提取 multimodal
10 Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection 提出基于自反思的自洽性方法,减少视觉-语言模型中的幻觉问题 instruction following
11 Uncovering Intrinsic Capabilities: A Paradigm for Data Curation in Vision-Language Models 提出能力归因数据精选框架CADC,提升视觉-语言模型指令调优效率。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
12 Streamline pathology foundation model by cross-magnification distillation 提出基于跨倍率蒸馏的轻量级病理学基础模型XMAG,加速临床部署。 distillation foundation model
13 Balanced Diffusion-Guided Fusion for Multimodal Remote Sensing Classification 提出平衡扩散引导融合框架,解决多模态遥感分类中的模态不平衡问题。 Mamba multimodal
14 Reinforcement Learning-Based Prompt Template Stealing for Text-to-Image Models 提出基于强化学习的RLStealer框架,用于文本到图像模型中的提示模板窃取。 reinforcement learning large language model multimodal
15 RestoRect: Degraded Image Restoration via Latent Rectified Flow & Feature Distillation RestoRect:基于潜在空间校正流与特征蒸馏的图像复原方法 distillation feature matching
16 CasPoinTr: Point Cloud Completion with Cascaded Networks and Knowledge Distillation CasPoinTr:基于级联网络和知识蒸馏的点云补全框架 distillation
17 LRPO: Enhancing Blind Face Restoration through Online Reinforcement Learning 提出LRPO框架,通过在线强化学习提升盲人脸修复效果 reinforcement learning
18 C3-OWD: A Curriculum Cross-modal Contrastive Learning Framework for Open-World Detection 提出C3-OWD框架,通过课程学习和跨模态对比学习实现开放世界目标检测的鲁棒性和泛化性 contrastive learning

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
19 OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting OracleGS:通过生成先验引导的稀疏视角高斯溅射 3D gaussian splatting gaussian splatting splatting
20 Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos 提出基于方向锚定的超高斯方法OriGS,用于从单目视频中进行高质量4D重建。 3D gaussian splatting gaussian splatting splatting
21 FM-SIREN & FM-FINER: Nyquist-Informed Frequency Multiplier for Implicit Neural Representation with Periodic Activation FM-SIREN/FINER:通过Nyquist频率乘子提升周期激活隐式神经表示性能 NeRF neural radiance field

🔬 支柱七:动作重定向 (Motion Retargeting) (3 篇)

#题目一句话要点标签🔗
22 GeLoc3r: Enhancing Relative Camera Pose Regression with Geometric Consistency Regularization GeLoc3r:通过几何一致性正则化增强相对相机位姿回归 geometric consistency
23 Sparse2Dense: A Keypoint-driven Generative Framework for Human Video Compression and Vertex Prediction Sparse2Dense:一种关键点驱动的生成框架,用于人体视频压缩和顶点预测 geometric consistency
24 CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP CoPatch:利用CLIP中未开发的 spatial knowledge 实现零样本指代图像分割 spatial relationship

🔬 支柱四:生成式动作 (Generative Motion) (3 篇)

#题目一句话要点标签🔗
25 3DPCNet: Pose Canonicalization for Robust Viewpoint-Invariant 3D Kinematic Analysis from Monocular RGB cameras 提出3DPCNet以解决单目RGB摄像头下的3D姿态标准化问题 physically plausible
26 Generative Modeling of Shape-Dependent Self-Contact Human Poses 提出基于形状条件的自接触人体姿态生成模型,提升单视角姿态估计精度。 penetration
27 Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing Vid-Freeze:通过时序冻结防御恶意图像到视频生成 motion synthesis

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
28 Evaluating point-light biological motion in multimodal large language models ActPLD基准测试揭示多模态大语言模型在理解点光生物运动方面的不足 spatiotemporal large language model multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页