cs.CV(2025-05-19)

📊 共 45 篇论文 | 🔗 12 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (16 🔗5) 支柱九:具身大模型 (Embodied Foundation Models) (13 🔗5) 支柱三:空间感知与语义 (Perception & Semantics) (11 🔗2) 支柱八:物理动画 (Physics-based Animation) (2) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (16 篇)

#题目一句话要点标签🔗
1 KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture 提出KinTwin以解决运动分析中的逆动力学计算问题 imitation learning markerless motion capture
2 Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning 通过困难先验建模提升多模态推理的强化学习效果 reinforcement learning multimodal
3 Mamba-Adaptor: State Space Model Adaptor for Visual Recognition 提出Mamba-Adaptor以解决视觉识别中的长程遗忘和空间建模问题 Mamba SSM state space model
4 G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning 提出VLM-Gym以解决视觉语言模型在游戏中的决策能力不足问题 reinforcement learning multimodal
5 SPKLIP: Aligning Spike Video Streams with Natural Language 提出SPKLIP以解决Spike视频与自然语言对齐问题 contrastive learning VLA multimodal
6 AutoMat: Enabling Automated Crystal Structure Reconstruction from Microscopy via Agentic Tool Use 提出AutoMat以解决显微镜图像转化为晶体结构的挑战 MAE large language model multimodal
7 BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation 提出BusterX框架以解决AI生成视频伪造检测与解释问题 reinforcement learning large language model multimodal
8 Few-Step Diffusion via Score identity Distillation 提出Score identity Distillation以解决高分辨率图像生成问题 distillation classifier-free guidance
9 Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping 提出Sat2Sound框架以解决声音景观映射问题 representation learning contrastive learning multimodal
10 Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking 提出Safe-Sora以解决AI生成视频版权保护问题 Mamba state space model spatiotemporal
11 DD-Ranking: Rethinking the Evaluation of Dataset Distillation 提出DD-Ranking以解决数据集蒸馏评估不准确的问题 distillation
12 RMMSS: Towards Advanced Robust Multi-Modal Semantic Segmentation with Hybrid Prototype Distillation and Feature Selection 提出RMMSS以解决多模态语义分割中的鲁棒性问题 distillation
13 Coarse Attribute Prediction with Task Agnostic Distillation for Real World Clothes Changing ReID 提出RLQ框架以解决低质量图像下的服装变化重识别问题 distillation
14 RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers 提出RoPECraft以解决视频运动转移问题 flow matching optical flow
15 Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction 提出Touch2Shape以解决3D形状重建中的局部细节捕捉问题 reinforcement learning reward design
16 Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach 提出Slow-Fast跟踪方法以解决低延迟视觉目标跟踪问题 representation learning distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (13 篇)

#题目一句话要点标签🔗
17 FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning 提出FEALLM以解决面部情感分析中的多模态挑战 large language model multimodal
18 Specialized Foundation Models for Intelligent Operating Rooms 提出ORQA模型以解决手术室智能化问题 foundation model multimodal
19 Semantic Change Detection of Roads and Bridges: A Fine-grained Dataset and Multimodal Frequency-driven Detector 提出多模态频率驱动检测器以解决道路与桥梁语义变化检测问题 multimodal
20 Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues? 提出Reasoning-OCR以解决复杂逻辑推理问题 multimodal
21 FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks 提出FLASH以解决多模态任务中的解码速度问题 multimodal
22 VLC Fusion: Vision-Language Conditioned Sensor Fusion for Robust Object Detection 提出VLC Fusion以解决多模态传感器融合中的环境适应性问题 language conditioned
23 Any-to-Any Learning in Computational Pathology via Triplet Multimodal Pretraining 提出ALTER框架以解决计算病理中的多模态融合问题 multimodal
24 Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering 提出时间感知激活工程以缓解视频大语言模型中的幻觉问题 large language model multimodal
25 Computer Vision Models Show Human-Like Sensitivity to Geometric and Topological Concepts 利用计算机视觉模型探讨人类对几何与拓扑概念的敏感性 multimodal
26 From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection 提出注意力引导选择方法以提升视觉语言模型性能 large language model
27 Industrial Synthetic Segment Pre-training 提出工业合成分割预训练数据集以解决图像数据不足问题 foundation model
28 Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents 提出MONDAY数据集以解决跨平台移动操作系统导航问题 large language model
29 Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding 提出时间导向配方以提升视频理解中的大规模视觉语言模型 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (11 篇)

#题目一句话要点标签🔗
30 Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation 提出混合3D-4D高斯点云以解决动态场景表示问题 gaussian splatting splatting scene reconstruction
31 3D Visual Illusion Depth Estimation 提出3D视觉幻觉深度估计框架以提升深度估计精度 depth estimation monocular depth spatial relationship
32 eStonefish-scenes: A synthetically generated dataset for underwater event-based optical flow prediction tasks 提出eStonefish-scenes以解决水下事件驱动光流预测问题 visual odometry optical flow
33 IPENS:Interactive Unsupervised Framework for Rapid Plant Phenotyping Extraction via NeRF-SAM2 Fusion 提出IPENS以解决植物表型提取中的无监督多目标分割问题 NeRF
34 TACOcc:Target-Adaptive Cross-Modal Fusion with Volume Rendering for 3D Semantic Occupancy 提出TACOcc以解决多模态3D占用预测中的融合问题 3D gaussian splatting gaussian splatting splatting
35 Predicting Reaction Time to Comprehend Scenes with Foveated Scene Understanding Maps 提出F-SUM模型以解决场景理解反应时间预测问题 scene understanding
36 Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos 提出一种新方法以解决无标定视频的视图合成问题 gaussian splatting splatting
37 Event-Driven Dynamic Scene Depth Completion 提出EventDC以解决动态场景深度补全问题 depth estimation
38 FlowCut: Unsupervised Video Instance Segmentation via Temporal Mask Matching 提出FlowCut以解决无监督视频实例分割问题 optical flow
39 Just Dance with $π$! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection 提出PI-VAD以解决弱监督视频异常检测中的模态不足问题 optical flow
40 IA-MVS: Instance-Focused Adaptive Depth Sampling for Multi-View Stereo 提出IA-MVS以解决多视角立体视觉中的深度估计精度问题 depth estimation

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
41 Joint Depth and Reflectivity Estimation using Single-Photon LiDAR 提出联合深度与反射率估计方法以解决动态场景中的重建问题 PULSE TAMP
42 Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation 提出Long-RVOS以解决长视频物体分割问题 spatiotemporal

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
43 HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos 提出HiERO以增强对自我中心视频的推理能力 egocentric egocentric vision Ego4D

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
44 GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization 提出GeoRanker以解决全球图像地理定位问题 spatial relationship multimodal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
45 FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance 提出FinePhys以解决细粒度人类动作生成中的物理一致性问题 physically plausible

⬅️ 返回 cs.CV 首页 · 🏠 返回主页