cs.CV（2025-05-12）

📊 共 22 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (9 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (4) 支柱六：视频提取与匹配 (Video Extraction) (1 🔗1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Vision Foundation Model Embedding-Based Semantic Anomaly Detection	提出基于视觉基础模型嵌入的语义异常检测方法	foundation model
2	Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning	提出Skywork-VL Reward以提升多模态理解与推理能力	multimodal
3	Visually Interpretable Subtask Reasoning for Visual Question Answering	提出VISTAR以解决视觉问答中的多步骤推理问题	large language model multimodal	✅
4	Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning	提出Re-Critic框架以解决视觉语言模型的幻觉问题	multimodal chain-of-thought
5	Gameplay Highlights Generation	提出自动生成游戏精彩片段以提升玩家分享体验	multimodal
6	Self-Supervised Event Representations: Towards Accurate, Real-Time Perception on SoC FPGAs	提出自监督事件表示方法以解决事件数据处理挑战	TAMP	✅
7	Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs	基于图神经网络的文档布局分析方法提升公共事务文档处理能力	multimodal
8	L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers	提出L-SWAG以解决零成本神经架构搜索在视觉变换器中的应用问题	large language model
9	Synthetic Similarity Search in Automotive Production	提出基于合成数据的相似性搜索以优化汽车生产质量检测	foundation model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
10	TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset	提出TUM2TWIN以解决城市数字双胞胎数据集不足问题	gaussian splatting splatting NeRF
11	SLAG: Scalable Language-Augmented Gaussian Splatting	提出SLAG以解决大规模场景编码效率问题	gaussian splatting splatting	✅
12	TUGS: Physics-based Compact Representation of Underwater Scenes by Tensorized Gaussian	提出TUGS以解决复杂水下场景重建问题	gaussian splatting splatting NeRF
13	GIFStream: 4D Gaussian-based Immersive Video with Feature Stream	提出GIFStream以解决沉浸视频存储与质量平衡问题	gaussian splatting splatting	✅
14	Geometric Prior-Guided Neural Implicit Surface Reconstruction in the Wild	提出几何先验引导的神经隐式表面重建方法以解决复杂场景问题	NeRF neural radiance field
15	Asynchronous Multi-Object Tracking with an Event Camera	提出异步事件多目标跟踪算法以解决动态环境下的目标检测问题	optical flow
16	Deep Learning Advances in Vision-Based Traffic Accident Anticipation: A Comprehensive Review of Methods, Datasets, and Future Directions	综述深度学习在基于视觉的交通事故预测中的应用与挑战	scene understanding

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
17	SAMChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Small Scale Remote Sensing	提出SAMChat以解决小规模遥感图像分析问题	reinforcement learning large language model multimodal
18	Learning to Reason and Navigate: Parameter Efficient Action Planning with Large Language Models	提出PEAP-LLM以解决复杂室内导航问题	DPO direct preference optimization large language model
19	DanceGRPO: Unleashing GRPO on Visual Generation	提出DanceGRPO以解决视觉生成中的优化稳定性问题	reinforcement learning RLHF foundation model
20	RealRep: Generalized SDR-to-HDR Conversion via Attribute-Disentangled Representation Learning	提出RealRep以解决SDR到HDR转换中的表现多样性问题	representation learning

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
21	Boosting Global-Local Feature Matching via Anomaly Synthesis for Multi-Class Point Cloud Anomaly Detection	提出GLFM方法以解决多类点云异常检测中的特征混淆问题	feature matching	✅

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Hybrid Spiking Vision Transformer for Object Detection with Event Cameras	提出混合脉冲视觉变换器以解决事件摄像头物体检测问题	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页