cs.CV（2025-05-19）

📊 共 45 篇论文 | 🔗 12 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (16 🔗5) 支柱九：具身大模型 (Embodied Foundation Models) (13 🔗5) 支柱三：空间感知与语义 (Perception & Semantics) (11 🔗2) 支柱八：物理动画 (Physics-based Animation) (2) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (16 篇)

#	题目	一句话要点	标签	🔗	⭐
1	KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture	提出KinTwin以解决运动分析中的逆动力学计算问题	imitation learning markerless motion capture
2	Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning	通过困难先验建模提升多模态推理的强化学习效果	reinforcement learning multimodal
3	Mamba-Adaptor: State Space Model Adaptor for Visual Recognition	提出Mamba-Adaptor以解决视觉识别中的长程遗忘和空间建模问题	Mamba SSM state space model
4	G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning	提出VLM-Gym以解决视觉语言模型在游戏中的决策能力不足问题	reinforcement learning multimodal	✅
5	SPKLIP: Aligning Spike Video Streams with Natural Language	提出SPKLIP以解决Spike视频与自然语言对齐问题	contrastive learning VLA multimodal
6	AutoMat: Enabling Automated Crystal Structure Reconstruction from Microscopy via Agentic Tool Use	提出AutoMat以解决显微镜图像转化为晶体结构的挑战	MAE large language model multimodal	✅
7	BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation	提出BusterX框架以解决AI生成视频伪造检测与解释问题	reinforcement learning large language model multimodal
8	Few-Step Diffusion via Score identity Distillation	提出Score identity Distillation以解决高分辨率图像生成问题	distillation classifier-free guidance	✅
9	Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping	提出Sat2Sound框架以解决声音景观映射问题	representation learning contrastive learning multimodal
10	Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking	提出Safe-Sora以解决AI生成视频版权保护问题	Mamba state space model spatiotemporal	✅
11	DD-Ranking: Rethinking the Evaluation of Dataset Distillation	提出DD-Ranking以解决数据集蒸馏评估不准确的问题	distillation
12	RMMSS: Towards Advanced Robust Multi-Modal Semantic Segmentation with Hybrid Prototype Distillation and Feature Selection	提出RMMSS以解决多模态语义分割中的鲁棒性问题	distillation
13	Coarse Attribute Prediction with Task Agnostic Distillation for Real World Clothes Changing ReID	提出RLQ框架以解决低质量图像下的服装变化重识别问题	distillation
14	RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers	提出RoPECraft以解决视频运动转移问题	flow matching optical flow
15	Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction	提出Touch2Shape以解决3D形状重建中的局部细节捕捉问题	reinforcement learning reward design
16	Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach	提出Slow-Fast跟踪方法以解决低延迟视觉目标跟踪问题	representation learning distillation	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (13 篇)

#	题目	一句话要点	标签	🔗	⭐
17	FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning	提出FEALLM以解决面部情感分析中的多模态挑战	large language model multimodal	✅
18	Specialized Foundation Models for Intelligent Operating Rooms	提出ORQA模型以解决手术室智能化问题	foundation model multimodal
19	Semantic Change Detection of Roads and Bridges: A Fine-grained Dataset and Multimodal Frequency-driven Detector	提出多模态频率驱动检测器以解决道路与桥梁语义变化检测问题	multimodal	✅
20	Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues?	提出Reasoning-OCR以解决复杂逻辑推理问题	multimodal	✅
21	FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks	提出FLASH以解决多模态任务中的解码速度问题	multimodal	✅
22	VLC Fusion: Vision-Language Conditioned Sensor Fusion for Robust Object Detection	提出VLC Fusion以解决多模态传感器融合中的环境适应性问题	language conditioned
23	Any-to-Any Learning in Computational Pathology via Triplet Multimodal Pretraining	提出ALTER框架以解决计算病理中的多模态融合问题	multimodal
24	Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering	提出时间感知激活工程以缓解视频大语言模型中的幻觉问题	large language model multimodal
25	Computer Vision Models Show Human-Like Sensitivity to Geometric and Topological Concepts	利用计算机视觉模型探讨人类对几何与拓扑概念的敏感性	multimodal
26	From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection	提出注意力引导选择方法以提升视觉语言模型性能	large language model	✅
27	Industrial Synthetic Segment Pre-training	提出工业合成分割预训练数据集以解决图像数据不足问题	foundation model
28	Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents	提出MONDAY数据集以解决跨平台移动操作系统导航问题	large language model
29	Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding	提出时间导向配方以提升视频理解中的大规模视觉语言模型	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
30	Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation	提出混合3D-4D高斯点云以解决动态场景表示问题	gaussian splatting splatting scene reconstruction
31	3D Visual Illusion Depth Estimation	提出3D视觉幻觉深度估计框架以提升深度估计精度	depth estimation monocular depth spatial relationship
32	eStonefish-scenes: A synthetically generated dataset for underwater event-based optical flow prediction tasks	提出eStonefish-scenes以解决水下事件驱动光流预测问题	visual odometry optical flow
33	IPENS:Interactive Unsupervised Framework for Rapid Plant Phenotyping Extraction via NeRF-SAM2 Fusion	提出IPENS以解决植物表型提取中的无监督多目标分割问题	NeRF
34	TACOcc:Target-Adaptive Cross-Modal Fusion with Volume Rendering for 3D Semantic Occupancy	提出TACOcc以解决多模态3D占用预测中的融合问题	3D gaussian splatting gaussian splatting splatting
35	Predicting Reaction Time to Comprehend Scenes with Foveated Scene Understanding Maps	提出F-SUM模型以解决场景理解反应时间预测问题	scene understanding
36	Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos	提出一种新方法以解决无标定视频的视图合成问题	gaussian splatting splatting	✅
37	Event-Driven Dynamic Scene Depth Completion	提出EventDC以解决动态场景深度补全问题	depth estimation
38	FlowCut: Unsupervised Video Instance Segmentation via Temporal Mask Matching	提出FlowCut以解决无监督视频实例分割问题	optical flow
39	Just Dance with $π$! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection	提出PI-VAD以解决弱监督视频异常检测中的模态不足问题	optical flow
40	IA-MVS: Instance-Focused Adaptive Depth Sampling for Multi-View Stereo	提出IA-MVS以解决多视角立体视觉中的深度估计精度问题	depth estimation	✅

🔬 支柱八：物理动画 (Physics-based Animation) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
41	Joint Depth and Reflectivity Estimation using Single-Photon LiDAR	提出联合深度与反射率估计方法以解决动态场景中的重建问题	PULSE TAMP
42	Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation	提出Long-RVOS以解决长视频物体分割问题	spatiotemporal

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
43	HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos	提出HiERO以增强对自我中心视频的推理能力	egocentric egocentric vision Ego4D

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
44	GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization	提出GeoRanker以解决全球图像地理定位问题	spatial relationship multimodal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
45	FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance	提出FinePhys以解决细粒度人类动作生成中的物理一致性问题	physically plausible

⬅️ 返回 cs.CV 首页 · 🏠 返回主页