cs.CV(2025-06-17)

📊 共 25 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (10 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
1 Fine-Scale Soil Mapping in Alaska with Multimodal Machine Learning 提出MISO模型以解决阿拉斯加细尺度土壤制图问题 contrastive learning foundation model multimodal
2 I Speak and You Find: Robust 3D Visual Grounding with Noisy and Ambiguous Speech Inputs 提出SpeechRefer以解决噪声和模糊语音输入下的3D视觉定位问题 contrastive learning multimodal visual grounding
3 Earth Observation Foundation Model PhilEO: Pretraining on the MajorTOM and FastTOM Datasets 提出PhilEO以提升地球观测模型的预训练效率 Mamba SSM foundation model
4 DETONATE: A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization 提出DETONATE基准以优化文本到图像模型的对齐问题 DPO direct preference optimization large language model
5 PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning 提出PeRL以解决多图像推理中的空间关系理解问题 reinforcement learning multimodal
6 KDMOS:Knowledge Distillation for Motion Segmentation 提出KDMOS以解决运动物体分割中的实时性与准确性问题 distillation scene flow
7 Align Your Flow: Scaling Continuous-Time Flow Map Distillation 提出连续时间流图蒸馏方法以提升生成模型效率 flow matching distillation
8 SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks 提出SIRI-Bench以评估视觉语言模型的空间智能 reinforcement learning large language model
9 Model compression using knowledge distillation with integrated gradients 提出基于集成梯度的知识蒸馏方法以实现模型压缩 distillation
10 HydroChronos: Forecasting Decades of Surface Water Change 提出HydroChronos以解决水体动态预测数据不足问题 MAE spatiotemporal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
11 3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-Splatting 提出3DGS-IEval-15K以解决3D高斯点云压缩评估问题 3D gaussian splatting 3DGS gaussian splatting
12 SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting 提出SyncTalk++以解决高保真同步人头合成问题 gaussian splatting splatting
13 Peering into the Unknown: Active View Selection with Neural Uncertainty Maps for 3D Reconstruction 提出基于神经不确定性图的主动视角选择以优化3D重建 3D gaussian splatting gaussian splatting splatting
14 Leader360V: The Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment 提出Leader360V以解决360视频数据集缺乏的问题 scene understanding large language model foundation model
15 DiFuse-Net: RGB and Dual-Pixel Depth Estimation using Window Bi-directional Parallax Attention and Cross-modal Transfer Learning 提出DiFuse-Net以解决RGB和双像素深度估计问题 depth estimation
16 MOL: Joint Estimation of Micro-Expression, Optical Flow, and Landmark via Transformer-Graph-Style Convolution 提出MOL框架以解决微表情识别中的数据不足问题 optical flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
17 VisText-Mosquito: A Unified Multimodal Benchmark Dataset for Visual Detection, Segmentation, and Textual Reasoning on Mosquito Breeding Sites 提出VisText-Mosquito以解决蚊虫滋生地检测与分析问题 multimodal
18 Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection 提出RAM-APL以解决细粒度一-shot子集选择问题 foundation model
19 MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models 提出MoTE以解决大规模多模态模型的内存效率问题 multimodal
20 ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM 提出ASCD以减少多模态大语言模型中的幻觉问题 large language model multimodal
21 Dense360: Dense Understanding from Omnidirectional Panoramas 提出Dense360以解决全景图像理解的挑战 large language model multimodal
22 YOLOv11-RGBT: Towards a Comprehensive Single-Stage Multispectral Object Detection Framework 提出YOLOv11-RGBT以解决多光谱目标检测框架不足问题 multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
23 EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization 提出EVA02-AT以解决自我中心视频语言理解中的多重挑战 egocentric Ego4D foundation model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
24 CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion 提出Causal Diffusion Policy以解决机器人控制中的数据质量和实时性问题 manipulation policy learning diffusion policy

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
25 Decoupled Classifier-Free Guidance for Counterfactual Diffusion Models 提出解耦分类器无关引导以解决反事实生成问题 classifier-free guidance

⬅️ 返回 cs.CV 首页 · 🏠 返回主页