cs.CV（2025-08-13）

📊 共 44 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (17 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (12 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (6 🔗2) 支柱一：机器人控制 (Robot Control) (3 🔗2) 支柱八：物理动画 (Physics-based Animation) (3) 支柱六：视频提取与匹配 (Video Extraction) (2) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (17 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations	提出多模态大语言模型以增强视频推荐系统的语义理解	large language model multimodal
2	Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model	提出多模态学习方法以增强人脸变形攻击检测	foundation model multimodal
3	ViMoNet: A Multimodal Vision-Language Framework for Human Behavior Understanding from Motion and Video	提出ViMoNet以解决人类行为理解中的多模态数据融合问题	large language model multimodal
4	IAG: Input-aware Backdoor Attack on VLM-based Visual Grounding	提出IAG以解决VLM基础视觉定位系统的后门攻击问题	multimodal visual grounding
5	MANGO: Multimodal Attention-based Normalizing Flow Approach to Fusion Learning	提出MANGO方法以解决多模态融合学习的特征捕捉问题	multimodal
6	January Food Benchmark (JFB): A Public Benchmark Dataset and Evaluation Suite for Multimodal Food Analysis	提出January Food Benchmark以解决营养分析标准化问题	multimodal
7	Multimodal Sheaf-based Network for Glioblastoma Molecular Subtype Prediction	提出基于sheaf的多模态网络以解决胶质母细胞瘤分子亚型预测问题	multimodal	✅
8	NEURAL: Attention-Guided Pruning for Unified Multimodal Resource-Constrained Clinical Evaluation	提出NEURAL以解决资源受限临床环境中的多模态医学影像数据压缩问题	multimodal	✅
9	The Brain Resection Multimodal Image Registration (ReMIND2Reg) 2025 Challenge	提出ReMIND2Reg挑战以解决脑肿瘤手术中的图像配准问题	multimodal
10	CellSymphony: Deciphering the molecular and phenotypic orchestration of cells with single-cell pathomics	提出CellSymphony以解决细胞特征提取与空间转录组数据整合问题	foundation model multimodal
11	Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation	提出Echo-4o以解决图像生成中的数据稀缺问题	foundation model multimodal
12	DINOv3	提出DINOv3以解决自监督学习中的特征图退化问题	foundation model
13	iWatchRoad: Scalable Detection and Geospatial Visualization of Potholes for Smart Cities	提出iWatchRoad以解决印度道路坑洼检测问题	TAMP
14	On the dynamic evolution of CLIP texture-shape bias and its relationship to human alignment and model robustness	分析CLIP模型训练过程中的纹理-形状偏差及其与人类感知的关系	multimodal
15	Preacher: Paper-to-Video Agentic System	提出Preacher以解决论文转视频生成的多重限制问题	chain-of-thought	✅
16	Learning Spatial Decay for Vision Transformers	提出空间衰减变换器以提升视觉变换器的空间注意力	large language model
17	Gen-AFFECT: Generation of Avatar Fine-grained Facial Expressions with Consistent identiTy	提出GEN-AFFECT以解决个性化头像生成中的表情一致性问题	multimodal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (12 篇)

#	题目	一句话要点	标签	🔗	⭐
18	A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation	综述3D高斯点云技术在分割、编辑与生成中的应用	3D gaussian splatting 3DGS gaussian splatting	✅
19	GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors	提出GSFixer以解决3D高斯点云重建中的伪影问题	3D gaussian splatting 3DGS gaussian splatting	✅
20	CitySeg: A 3D Open Vocabulary Semantic Segmentation Foundation Model in City-scale Scenarios	提出CitySeg以解决城市规模点云语义分割问题	open-vocabulary open vocabulary foundation model
21	EntropyGS: An Efficient Entropy Coding on 3D Gaussian Splatting	提出EntropyGS以高效编码3D Gaussian Splatting数据	3D gaussian splatting 3DGS gaussian splatting
22	Surg-InvNeRF: Invertible NeRF for 3D tracking and reconstruction in surgical vision	提出Invertible NeRF以解决外科视觉中的3D跟踪与重建问题	NeRF neural radiance field
23	HumanGenesis: Agent-Based Geometric and Generative Modeling for Synthetic Human Dynamics	提出HumanGenesis以解决合成人体动态中的几何不一致性和运动泛化问题	3D gaussian splatting gaussian splatting splatting
24	Distilling LLM Prior to Flow Model for Generalizable Agent's Imagination in Object Goal Navigation	提出GOAL框架以解决室内目标导航中的不确定性问题	semantic map large language model	✅
25	Semantic-aware DropSplat: Adaptive Pruning of Redundant Gaussians for 3D Aerial-View Segmentation	提出SAD-Splat以解决3D航空图像语义分割中的模糊性问题	scene understanding foundation model
26	PERSONA: Personalized Whole-Body 3D Avatar with Pose-Driven Deformations from a Single Image	提出PERSONA框架以从单张图像生成个性化3D人类头像	3DGS NeRF
27	RayletDF: Raylet Distance Fields for Generalizable 3D Surface Reconstruction from Point Clouds or Gaussians	提出RayletDF以解决3D表面重建问题	3DGS
28	E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras	提出E-4DGS以解决动态场景重建中的光照与模糊问题	scene reconstruction
29	SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing	提出SVG-Head以解决高保真头部重建与实时编辑问题	implicit representation

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
30	SkySplat: Generalizable 3D Gaussian Splatting from Multi-Temporal Sparse Satellite Images	提出SkySplat以解决多时相稀疏卫星图像的3D重建问题	MAE 3D gaussian splatting 3DGS
31	HyperKD: Distilling Cross-Spectral Knowledge in Masked Autoencoders via Inverse Domain Shift with Spatial-Aware Masking and Specialized Loss	提出HyperKD以解决高光谱遥感中的知识蒸馏问题	representation learning masked autoencoder MAE
32	Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory	提出M3-Agent以解决多模态智能体的长期记忆问题	reinforcement learning multimodal	✅
33	WeatherPrompt: Multi-modality Representation Learning for All-Weather Drone Visual Geo-Localization	提出WeatherPrompt以解决无人机视觉地理定位中的天气干扰问题	representation learning contrastive learning
34	BridgeTA: Bridging the Representation Gap in Knowledge Distillation via Teacher Assistant for Bird's Eye View Map Segmentation	提出BridgeTA以解决知识蒸馏中的表示差距问题	teacher-student distillation
35	SpeechForensics: Audio-Visual Speech Representation Learning for Face Forgery Detection	提出音视频联合学习方法以解决人脸伪造检测问题	representation learning	✅

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
36	Physical Autoregressive Model for Robotic Manipulation without Action Pretraining	提出物理自回归模型以解决机器人操作数据稀缺问题	manipulation	✅
37	RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization	提出RelayFormer以解决图像和视频篡改区域定位问题	manipulation	✅
38	LIA-X: Interpretable Latent Portrait Animator	提出LIA-X以解决可解释性和控制性不足的问题	manipulation

🔬 支柱八：物理动画 (Physics-based Animation) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
39	OneVAE: Joint Discrete and Continuous Optimization Helps Discrete Video VAE Train Better	提出OneVAE以解决离散视频VAE训练不稳定问题	spatiotemporal
40	Noise-adapted Neural Operator for Robust Non-Line-of-Sight Imaging	提出噪声适应神经算子以解决非视线成像问题	spatiotemporal
41	Animate-X++: Universal Character Image Animation with Dynamic Backgrounds	提出Animate-X++以解决角色动画与动态背景问题	character animation

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
42	GoViG: Goal-Conditioned Visual Navigation Instruction Generation	提出GoViG以解决基于视觉的导航指令生成问题	egocentric large language model multimodal
43	Enhancing Monocular 3D Hand Reconstruction with Learned Texture Priors	提出轻量级纹理模块以提升单目3D手重建精度	hand reconstruction

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
44	Episodic Memory Representation for Long-form Video Understanding	提出Video-EM以解决长视频理解中的上下文限制问题	spatial relationship large language model chain-of-thought

⬅️ 返回 cs.CV 首页 · 🏠 返回主页