cs.CV（2025-05-06）

📊 共 23 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (7 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (5) 支柱七：动作重定向 (Motion Retargeting) (2 🔗1) 支柱四：生成式动作 (Generative Motion) (1) 支柱六：视频提取与匹配 (Video Extraction) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	3D Gaussian Splatting Data Compression with Mixture of Priors	提出混合先验策略以解决3D高斯点云压缩问题	3D gaussian splatting 3DGS gaussian splatting
2	LiftFeat: 3D Geometry-Aware Local Feature Matching	提出LiftFeat以解决3D几何感知下的局部特征匹配问题	depth estimation monocular depth feature matching	✅
3	Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation	提出Show or Tell基准以评估语义分割中的视觉与文本提示	open-vocabulary open vocabulary large language model
4	OS-W2S: An Automatic Labeling Engine for Language-Guided Open-Set Aerial Object Detection	提出OS-W2S引擎以解决语言引导的开放集空中物体检测问题	open-vocabulary open vocabulary visual grounding	✅
5	TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion	提出TimeTracker以解决非线性运动下的视频帧插值问题	optical flow spatiotemporal
6	Read My Ears! Horse Ear Movement Detection for Equine Affective State Assessment	提出耳朵运动检测方法以解决马匹情感状态评估问题	optical flow	✅
7	3D Surface Reconstruction with Enhanced High-Frequency Details	提出FreNeuS以解决3D重建表面细节不足问题	implicit representation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning	提出统一多模态链式思维奖励模型以提升视觉任务的准确性	multimodal chain-of-thought
9	PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing	提出PhysLLM以解决远程生理信号测量中的噪声敏感问题	large language model
10	UPMAD-Net: A Brain Tumor Segmentation Network with Uncertainty Guidance and Adaptive Multimodal Feature Fusion	提出UPMAD-Net以解决脑肿瘤分割中的不确定性与多模态特征融合问题	multimodal	✅
11	Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach	提出能力编码方法以高效评估遥感基础模型性能	foundation model	✅
12	Deep Learning for Sports Video Event Detection: Tasks, Datasets, Methods, and Challenges	提出深度学习框架以解决体育视频事件检测的挑战	multimodal
13	Multi-Agent System for Comprehensive Soccer Understanding	提出综合框架以解决足球理解的局限性问题	multimodal
14	SD-VSum: A Method and Dataset for Script-Driven Video Summarization	提出SD-VSum以解决脚本驱动的视频摘要问题	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
15	STG: Spatiotemporal Graph Neural Network with Fusion and Spatiotemporal Decoupling Learning for Prognostic Prediction of Colorectal Cancer Liver Metastasis	提出STG框架以解决结直肠癌肝转移预测问题	contrastive learning spatiotemporal multimodal
16	Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning	提出新方法量化与减少图像-文本表示学习中的模态差距	representation learning multimodal
17	MambaStyle: Efficient StyleGAN Inversion for Real Image Editing with State-Space Models	提出MambaStyle以解决GAN反演与编辑效率问题	Mamba
18	Real-Time Person Image Synthesis Using a Flow Matching Model	提出基于流匹配模型的实时人物图像合成方法以解决生成速度问题	flow matching
19	seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models	提出seq-JEPA以解决自监督学习中的表示灵活性问题	world model

🔬 支柱七：动作重定向 (Motion Retargeting) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
20	Fixed-Length Dense Fingerprint Representation	提出固定长度密集指纹表示以解决指纹匹配挑战	spatial relationship	✅
21	Blending 3D Geometry and Machine Learning for Multi-View Stereopsis	提出GC MVSNet++以解决多视图立体视觉中的几何一致性问题	geometric consistency

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data	提出StableMotion以解决运动捕捉数据清理问题	motion generation

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	GUAVA: Generalizable Upper Body 3D Gaussian Avatar	提出GUAVA框架以解决单图像重建3D人类头像问题	SMPL-X

⬅️ 返回 cs.CV 首页 · 🏠 返回主页