cs.CV(2025-05-12)

📊 共 22 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 Vision Foundation Model Embedding-Based Semantic Anomaly Detection 提出基于视觉基础模型嵌入的语义异常检测方法 foundation model
2 Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning 提出Skywork-VL Reward以提升多模态理解与推理能力 multimodal
3 Visually Interpretable Subtask Reasoning for Visual Question Answering 提出VISTAR以解决视觉问答中的多步骤推理问题 large language model multimodal
4 Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning 提出Re-Critic框架以解决视觉语言模型的幻觉问题 multimodal chain-of-thought
5 Gameplay Highlights Generation 提出自动生成游戏精彩片段以提升玩家分享体验 multimodal
6 Self-Supervised Event Representations: Towards Accurate, Real-Time Perception on SoC FPGAs 提出自监督事件表示方法以解决事件数据处理挑战 TAMP
7 Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs 基于图神经网络的文档布局分析方法提升公共事务文档处理能力 multimodal
8 L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers 提出L-SWAG以解决零成本神经架构搜索在视觉变换器中的应用问题 large language model
9 Synthetic Similarity Search in Automotive Production 提出基于合成数据的相似性搜索以优化汽车生产质量检测 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
10 TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset 提出TUM2TWIN以解决城市数字双胞胎数据集不足问题 gaussian splatting splatting NeRF
11 SLAG: Scalable Language-Augmented Gaussian Splatting 提出SLAG以解决大规模场景编码效率问题 gaussian splatting splatting
12 TUGS: Physics-based Compact Representation of Underwater Scenes by Tensorized Gaussian 提出TUGS以解决复杂水下场景重建问题 gaussian splatting splatting NeRF
13 GIFStream: 4D Gaussian-based Immersive Video with Feature Stream 提出GIFStream以解决沉浸视频存储与质量平衡问题 gaussian splatting splatting
14 Geometric Prior-Guided Neural Implicit Surface Reconstruction in the Wild 提出几何先验引导的神经隐式表面重建方法以解决复杂场景问题 NeRF neural radiance field
15 Asynchronous Multi-Object Tracking with an Event Camera 提出异步事件多目标跟踪算法以解决动态环境下的目标检测问题 optical flow
16 Deep Learning Advances in Vision-Based Traffic Accident Anticipation: A Comprehensive Review of Methods, Datasets, and Future Directions 综述深度学习在基于视觉的交通事故预测中的应用与挑战 scene understanding

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
17 SAMChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Small Scale Remote Sensing 提出SAMChat以解决小规模遥感图像分析问题 reinforcement learning large language model multimodal
18 Learning to Reason and Navigate: Parameter Efficient Action Planning with Large Language Models 提出PEAP-LLM以解决复杂室内导航问题 DPO direct preference optimization large language model
19 DanceGRPO: Unleashing GRPO on Visual Generation 提出DanceGRPO以解决视觉生成中的优化稳定性问题 reinforcement learning RLHF foundation model
20 RealRep: Generalized SDR-to-HDR Conversion via Attribute-Disentangled Representation Learning 提出RealRep以解决SDR到HDR转换中的表现多样性问题 representation learning

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
21 Boosting Global-Local Feature Matching via Anomaly Synthesis for Multi-Class Point Cloud Anomaly Detection 提出GLFM方法以解决多类点云异常检测中的特征混淆问题 feature matching

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
22 Hybrid Spiking Vision Transformer for Object Detection with Event Cameras 提出混合脉冲视觉变换器以解决事件摄像头物体检测问题 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页