cs.CV(2026-01-07)

📊 共 30 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (11) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (5) 支柱一:机器人控制 (Robot Control) (3 🔗2) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱四:生成式动作 (Generative Motion) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (11 篇)

#题目一句话要点标签🔗
1 Pixel-Wise Multimodal Contrastive Learning for Remote Sensing Images 提出像素级多模态对比学习PIMC,有效提升遥感图像时间序列分析性能 contrastive learning multimodal
2 Semantic Belief-State World Model for 3D Human Motion Prediction 提出语义信念状态世界模型(SBWM)用于解决3D人体运动预测中的长时漂移问题。 reinforcement learning world model latent dynamics
3 Staged Voxel-Level Deep Reinforcement Learning for 3D Medical Image Segmentation with Noisy Annotations 提出SVL-DRL框架,解决医学图像分割中带噪声标注的问题。 reinforcement learning deep reinforcement learning DRL
4 MVP: Enhancing Video Large Language Models via Self-supervised Masked Video Prediction 提出MVP:通过自监督掩码视频预测增强视频大语言模型 reinforcement learning large language model
5 From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs 提出NNGPT,利用LLM和性能反馈自动设计最优数据增强策略。 reinforcement learning large language model chain-of-thought
6 REFA: Real-time Egocentric Facial Animations for Virtual Reality 提出基于VR头显内红外相机的实时面部动画系统,无需校准。 distillation egocentric
7 Thinking with Frames: Generative Video Distortion Evaluation via Frame Reward Model 提出REACT:基于帧奖励模型的生成视频结构扭曲评估框架 reinforcement learning chain-of-thought
8 CrackSegFlow: Controllable Flow-Matching Synthesis for Generalizable Crack Segmentation with the CSF-50K Benchmark 提出CrackSegFlow,结合CSF-50K基准,提升裂缝分割的泛化性和可控性 flow matching
9 ToTMNet: FFT-Accelerated Toeplitz Temporal Mixing Network for Lightweight Remote Photoplethysmography 提出ToTMNet,利用FFT加速的Toeplitz时序混合网络实现轻量级远程光电容积脉搏波估计。 MAE PULSE
10 Diffusion-DRF: Differentiable Reward Flow for Video Diffusion Fine-Tuning 提出Diffusion-DRF以解决视频扩散模型微调中的奖励信号问题 DPO direct preference optimization
11 Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models 提出LocalDPO,通过局部细节偏好优化提升视频扩散模型生成质量 preference learning DPO

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
12 Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts 提出主动视觉上下文精炼方法,提升大模型在跨模态冲突下的推理一致性 multimodal chain-of-thought
13 Scanner-Induced Domain Shifts Undermine the Robustness of Pathology Foundation Models 揭示病理学预训练模型对扫描仪差异的脆弱性,强调校准和嵌入稳定性的重要性 foundation model
14 EvalBlocks: A Modular Pipeline for Rapidly Evaluating Foundation Models in Medical Imaging EvalBlocks:用于医学影像领域基础模型快速评估的模块化流水线 foundation model
15 MGPC: Multimodal Network for Generalizable Point Cloud Completion With Modality Dropout and Progressive Decoding MGPC:多模态网络,通过模态Dropout和渐进式解码实现通用点云补全 multimodal
16 I2E: From Image Pixels to Actionable Interactive Environments for Text-Guided Image Editing I2E:提出一种基于可交互环境的文本引导图像编辑框架,解决复杂组合编辑任务。 vision-language-action chain-of-thought
17 Klear: Unified Multi-Task Audio-Video Joint Generation Klear:统一的多任务音视频联合生成模型,解决对齐、泛化和数据稀缺问题。 instruction following
18 SDCD: Structure-Disrupted Contrastive Decoding for Mitigating Hallucinations in Large Vision-Language Models 提出SDCD:一种结构扰乱对比解码算法,用于缓解大型视觉语言模型中的对象幻觉问题 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
19 IDESplat: Iterative Depth Probability Estimation for Generalizable 3D Gaussian Splatting IDESplat:迭代深度概率估计,提升通用3D高斯溅射重建效果 depth estimation 3D gaussian splatting gaussian splatting
20 Bayesian Monocular Depth Refinement via Neural Radiance Fields 提出MDENeRF,利用神经辐射场迭代优化单目深度估计,提升几何细节。 depth estimation monocular depth NeRF
21 G2P: Gaussian-to-Point Attribute Alignment for Boundary-Aware 3D Semantic Segmentation 提出G2P,通过高斯到点的属性对齐实现边界感知的3D语义分割 3D gaussian splatting gaussian splatting splatting
22 Systematic Evaluation of Depth Backbones and Semantic Cues for Monocular Pseudo-LiDAR 3D Detection 系统评估深度骨干网络与语义线索以提升单目伪LiDAR 3D检测精度 metric depth Depth Anything
23 Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction Gen3R:融合重建先验与视频扩散模型,实现场景级3D生成 VGGT

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
24 CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval 提出CSMCIR,通过对称对齐和记忆库增强组合图像检索性能。 manipulation large language model multimodal
25 Can LLMs See Without Pixels? Benchmarking Spatial Intelligence from Textual Descriptions 提出SiT-Bench基准,评估大型语言模型在无像素输入下的空间智能 manipulation world model egocentric
26 Choreographing a World of Dynamic Objects CHORD:通过蒸馏视频信息,生成动态物体和场景的通用框架 manipulation distillation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
27 TRec: Egocentric Action Recognition using 2D Point Tracks TRec:利用2D点轨迹进行第一人称视角动作识别,提升识别精度。 egocentric

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
28 SpatiaLoc: Leveraging Multi-Level Spatial Enhanced Descriptors for Cross-Modal Localization SpatiaLoc:利用多层次空间增强描述符实现跨模态定位 spatial relationship

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
29 FUSION: Full-Body Unified Motion Prior for Body and Hands via Diffusion FUSION:提出基于扩散模型的全身统一运动先验,用于生成身体和手部动作 motion synthesis

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
30 MFC-RFNet: A Multi-scale Guided Rectified Flow Network for Radar Sequence Prediction 提出MFC-RFNet,融合多尺度特征与校正流,用于提升雷达回波序列预测精度。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页