cs.CV(2025-05-06)

📊 共 23 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (7 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (5) 支柱七:动作重定向 (Motion Retargeting) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
1 3D Gaussian Splatting Data Compression with Mixture of Priors 提出混合先验策略以解决3D高斯点云压缩问题 3D gaussian splatting 3DGS gaussian splatting
2 LiftFeat: 3D Geometry-Aware Local Feature Matching 提出LiftFeat以解决3D几何感知下的局部特征匹配问题 depth estimation monocular depth feature matching
3 Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation 提出Show or Tell基准以评估语义分割中的视觉与文本提示 open-vocabulary open vocabulary large language model
4 OS-W2S: An Automatic Labeling Engine for Language-Guided Open-Set Aerial Object Detection 提出OS-W2S引擎以解决语言引导的开放集空中物体检测问题 open-vocabulary open vocabulary visual grounding
5 TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion 提出TimeTracker以解决非线性运动下的视频帧插值问题 optical flow spatiotemporal
6 Read My Ears! Horse Ear Movement Detection for Equine Affective State Assessment 提出耳朵运动检测方法以解决马匹情感状态评估问题 optical flow
7 3D Surface Reconstruction with Enhanced High-Frequency Details 提出FreNeuS以解决3D重建表面细节不足问题 implicit representation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
8 Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning 提出统一多模态链式思维奖励模型以提升视觉任务的准确性 multimodal chain-of-thought
9 PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing 提出PhysLLM以解决远程生理信号测量中的噪声敏感问题 large language model
10 UPMAD-Net: A Brain Tumor Segmentation Network with Uncertainty Guidance and Adaptive Multimodal Feature Fusion 提出UPMAD-Net以解决脑肿瘤分割中的不确定性与多模态特征融合问题 multimodal
11 Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach 提出能力编码方法以高效评估遥感基础模型性能 foundation model
12 Deep Learning for Sports Video Event Detection: Tasks, Datasets, Methods, and Challenges 提出深度学习框架以解决体育视频事件检测的挑战 multimodal
13 Multi-Agent System for Comprehensive Soccer Understanding 提出综合框架以解决足球理解的局限性问题 multimodal
14 SD-VSum: A Method and Dataset for Script-Driven Video Summarization 提出SD-VSum以解决脚本驱动的视频摘要问题 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
15 STG: Spatiotemporal Graph Neural Network with Fusion and Spatiotemporal Decoupling Learning for Prognostic Prediction of Colorectal Cancer Liver Metastasis 提出STG框架以解决结直肠癌肝转移预测问题 contrastive learning spatiotemporal multimodal
16 Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning 提出新方法量化与减少图像-文本表示学习中的模态差距 representation learning multimodal
17 MambaStyle: Efficient StyleGAN Inversion for Real Image Editing with State-Space Models 提出MambaStyle以解决GAN反演与编辑效率问题 Mamba
18 Real-Time Person Image Synthesis Using a Flow Matching Model 提出基于流匹配模型的实时人物图像合成方法以解决生成速度问题 flow matching
19 seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models 提出seq-JEPA以解决自监督学习中的表示灵活性问题 world model

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
20 Fixed-Length Dense Fingerprint Representation 提出固定长度密集指纹表示以解决指纹匹配挑战 spatial relationship
21 Blending 3D Geometry and Machine Learning for Multi-View Stereopsis 提出GC MVSNet++以解决多视图立体视觉中的几何一致性问题 geometric consistency

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
22 StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data 提出StableMotion以解决运动捕捉数据清理问题 motion generation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
23 GUAVA: Generalizable Upper Body 3D Gaussian Avatar 提出GUAVA框架以解决单图像重建3D人类头像问题 SMPL-X

⬅️ 返回 cs.CV 首页 · 🏠 返回主页