cs.CV（2025-06-17）

📊 共 25 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (10 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (6 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (6 🔗2) 支柱六：视频提取与匹配 (Video Extraction) (1 🔗1) 支柱一：机器人控制 (Robot Control) (1) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Fine-Scale Soil Mapping in Alaska with Multimodal Machine Learning	提出MISO模型以解决阿拉斯加细尺度土壤制图问题	contrastive learning foundation model multimodal	✅
2	I Speak and You Find: Robust 3D Visual Grounding with Noisy and Ambiguous Speech Inputs	提出SpeechRefer以解决噪声和模糊语音输入下的3D视觉定位问题	contrastive learning multimodal visual grounding
3	Earth Observation Foundation Model PhilEO: Pretraining on the MajorTOM and FastTOM Datasets	提出PhilEO以提升地球观测模型的预训练效率	Mamba SSM foundation model
4	DETONATE: A Benchmark for Text-to-Image Alignment and Kernelized Direct Preference Optimization	提出DETONATE基准以优化文本到图像模型的对齐问题	DPO direct preference optimization large language model
5	PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning	提出PeRL以解决多图像推理中的空间关系理解问题	reinforcement learning multimodal
6	KDMOS:Knowledge Distillation for Motion Segmentation	提出KDMOS以解决运动物体分割中的实时性与准确性问题	distillation scene flow	✅
7	Align Your Flow: Scaling Continuous-Time Flow Map Distillation	提出连续时间流图蒸馏方法以提升生成模型效率	flow matching distillation
8	SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks	提出SIRI-Bench以评估视觉语言模型的空间智能	reinforcement learning large language model
9	Model compression using knowledge distillation with integrated gradients	提出基于集成梯度的知识蒸馏方法以实现模型压缩	distillation
10	HydroChronos: Forecasting Decades of Surface Water Change	提出HydroChronos以解决水体动态预测数据不足问题	MAE spatiotemporal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
11	3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-Splatting	提出3DGS-IEval-15K以解决3D高斯点云压缩评估问题	3D gaussian splatting 3DGS gaussian splatting	✅
12	SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting	提出SyncTalk++以解决高保真同步人头合成问题	gaussian splatting splatting	✅
13	Peering into the Unknown: Active View Selection with Neural Uncertainty Maps for 3D Reconstruction	提出基于神经不确定性图的主动视角选择以优化3D重建	3D gaussian splatting gaussian splatting splatting
14	Leader360V: The Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment	提出Leader360V以解决360视频数据集缺乏的问题	scene understanding large language model foundation model
15	DiFuse-Net: RGB and Dual-Pixel Depth Estimation using Window Bi-directional Parallax Attention and Cross-modal Transfer Learning	提出DiFuse-Net以解决RGB和双像素深度估计问题	depth estimation
16	MOL: Joint Estimation of Micro-Expression, Optical Flow, and Landmark via Transformer-Graph-Style Convolution	提出MOL框架以解决微表情识别中的数据不足问题	optical flow	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
17	VisText-Mosquito: A Unified Multimodal Benchmark Dataset for Visual Detection, Segmentation, and Textual Reasoning on Mosquito Breeding Sites	提出VisText-Mosquito以解决蚊虫滋生地检测与分析问题	multimodal	✅
18	Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection	提出RAM-APL以解决细粒度一-shot子集选择问题	foundation model
19	MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models	提出MoTE以解决大规模多模态模型的内存效率问题	multimodal
20	ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM	提出ASCD以减少多模态大语言模型中的幻觉问题	large language model multimodal
21	Dense360: Dense Understanding from Omnidirectional Panoramas	提出Dense360以解决全景图像理解的挑战	large language model multimodal
22	YOLOv11-RGBT: Towards a Comprehensive Single-Stage Multispectral Object Detection Framework	提出YOLOv11-RGBT以解决多光谱目标检测框架不足问题	multimodal	✅

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization	提出EVA02-AT以解决自我中心视频语言理解中的多重挑战	egocentric Ego4D foundation model	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
24	CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion	提出Causal Diffusion Policy以解决机器人控制中的数据质量和实时性问题	manipulation policy learning diffusion policy

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
25	Decoupled Classifier-Free Guidance for Counterfactual Diffusion Models	提出解耦分类器无关引导以解决反事实生成问题	classifier-free guidance

⬅️ 返回 cs.CV 首页 · 🏠 返回主页