cs.CV(2025-09-03)

📊 共 24 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (10 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (6) 支柱一:机器人控制 (Robot Control) (2) 支柱七:动作重定向 (Motion Retargeting) (2 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (1) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
1 Generalist versus Specialist Vision Foundation Models for Ocular Disease and Oculomics 领域专精的RETFound在眼科疾病和眼基因组学任务中优于通用视觉基础模型 MAE foundation model
2 RTGMFF: Enhanced fMRI-based Brain Disorder Diagnosis via ROI-driven Text Generation and Multimodal Feature Fusion 提出RTGMFF框架以提升fMRI脑部疾病诊断准确性 Mamba multimodal
3 Resilient Multimodal Industrial Surface Defect Detection with Uncertain Sensors Availability 提出一种鲁棒的多模态工业表面缺陷检测方法,解决传感器可用性不确定问题。 contrastive learning multimodal
4 Empowering Lightweight MLLMs with Reasoning via Long CoT SFT 长CoT SFT赋能轻量级MLLM推理能力 reinforcement learning multimodal chain-of-thought
5 AIVA: An AI-based Virtual Companion for Emotion-aware Interaction AIVA:一种基于AI的情感感知交互虚拟助手 contrastive learning large language model multimodal
6 Teacher-Student Model for Detecting and Classifying Mitosis in the MIDOG 2025 Challenge 提出基于Teacher-Student模型的有丝分裂检测与分类方法,提升领域泛化性。 representation learning teacher-student
7 PPORLD-EDNetLDCT: A Proximal Policy Optimization-Based Reinforcement Learning Framework for Adaptive Low-Dose CT Denoising 提出基于近端策略优化的强化学习框架PPORLD-EDNetLDCT,用于自适应低剂量CT降噪。 reinforcement learning PPO
8 Multi Attribute Bias Mitigation via Representation Learning 提出GMBM框架,通过表征学习缓解视觉模型中的多重属性偏差问题 representation learning
9 PointAD+: Learning Hierarchical Representations for Zero-shot 3D Anomaly Detection PointAD+:学习分层表示,实现零样本3D异常检测 representation learning spatial relationship
10 Towards Efficient General Feature Prediction in Masked Skeleton Modeling 提出通用特征预测框架,加速并提升掩码骨骼建模的动作识别性能。 masked autoencoder MAE

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
11 Mitigating Multimodal Hallucinations via Gradient-based Self-Reflection 提出基于梯度的自反思方法GACD,缓解多模态大语言模型中的幻觉问题。 large language model multimodal visual grounding
12 Decoding Visual Neural Representations by Multimodal with Dynamic Balancing 提出一种动态平衡多模态解码框架,提升脑电信号解码视觉神经表征的准确性。 multimodal
13 Scalable and Loosely-Coupled Multimodal Deep Learning for Breast Cancer Subtyping 提出一种可扩展的松耦合多模态深度学习框架,用于乳腺癌分子亚型分类。 multimodal
14 OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation OneCAT:提出纯Decoder自回归多模态统一模型,实现高效理解、生成与编辑 large language model multimodal
15 Parameter-Efficient Adaptation of mPLUG-Owl2 via Pixel-Level Visual Prompts for NR-IQA 提出基于像素级视觉提示的mPLUG-Owl2参数高效微调方法,用于无参考图像质量评估 large language model multimodal
16 InstaDA: Augmenting Instance Segmentation Data with Dual-Agent System InstaDA:利用双Agent系统增强实例分割数据,无需训练。 large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
17 High-Fidelity Digital Twins for Bridging the Sim2Real Gap in LiDAR-Based ITS Perception 提出高保真数字孪生框架,解决LiDAR感知中Sim2Real迁移问题 sim2real
18 UrbanTwin: Building High-Fidelity Digital Twins for Sim2Real LiDAR Perception and Evaluation UrbanTwin:构建高保真数字孪生,用于Sim2Real LiDAR感知与评估 sim2real

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
19 LayoutGKN: Graph Similarity Learning of Floor Plans LayoutGKN:通过图相似性学习提升楼层平面图匹配效率 spatial relationship
20 InfraDiffusion: zero-shot depth map restoration with diffusion models and prompted segmentation from sparse infrastructure point clouds InfraDiffusion:利用扩散模型和提示分割实现零样本深度图修复,用于稀疏基础设施点云 geometric consistency

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
21 Reg3D: Reconstructive Geometry Instruction Tuning for 3D Scene Understanding 提出Reg3D,通过重建几何指令微调提升3D场景理解能力 scene understanding geometric consistency multimodal

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
22 Human Preference-Aligned Concept Customization Benchmark via Decomposed Evaluation 提出D-GPTScore,通过分解评估解决概念定制评估与人类偏好不一致问题 multi-person interaction large language model multimodal

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
23 Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data Strefer:通过合成指令数据增强视频LLM的时空指代与推理能力 spatiotemporal large language model

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
24 Towards Realistic Hand-Object Interaction with Gravity-Field Based Diffusion Bridge 提出重力场驱动扩散桥以解决手-物体交互问题 physically plausible penetration

⬅️ 返回 cs.CV 首页 · 🏠 返回主页