cs.CV（2025-08-26）

📊 共 20 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (5 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗1) 支柱五：交互与反应 (Interaction & Reaction) (2 🔗1) 支柱四：生成式动作 (Generative Motion) (1 🔗1) 支柱一：机器人控制 (Robot Control) (1 🔗1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Dual Enhancement on 3D Vision-Language Perception for Monocular 3D Visual Grounding	提出双重增强方法以解决单目3D视觉定位问题	visual grounding
2	Decouple, Reorganize, and Fuse: A Multimodal Framework for Cancer Survival Prediction	提出DeReF框架以解决癌症生存预测中的信息融合问题	multimodal
3	Beyond the Textual: Generating Coherent Visual Options for MCQs	提出跨模态选项合成框架以生成视觉选项的多项选择题	multimodal chain-of-thought
4	Autoregressive Universal Video Segmentation Model	提出自回归通用视频分割模型以解决无提示分割问题	foundation model
5	Event-Enriched Image Analysis Grand Challenge at ACM Multimedia 2025	提出EVENTA挑战以解决事件级多模态理解问题	multimodal	✅
6	OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward	提出OwlCap以解决视频字幕生成中的运动细节不平衡问题	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
7	ColorGS: High-fidelity Surgical Scene Reconstruction with Colored Gaussian Splatting	提出ColorGS以解决内窥镜视频中组织重建的色彩与变形建模问题	3D gaussian splatting 3DGS gaussian splatting
8	Can we make NeRF-based visual localization privacy-preserving?	提出ppNeSF以解决NeRF视觉定位中的隐私问题	NeRF
9	PseudoMapTrainer: Learning Online Mapping without HD Maps	提出PseudoMapTrainer以解决在线地图训练依赖高清地图的问题	gaussian splatting splatting
10	SoccerNet 2025 Challenges Results	SoccerNet 2025挑战推动足球视频理解研究进展	depth estimation monocular depth
11	Robust and Label-Efficient Deep Waste Detection	提出基于集成的半监督学习框架以提升废物检测效率	open-vocabulary open vocabulary	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
12	MIDAS: Multimodal Interactive Digital-humAn Synthesis via Real-time Autoregressive Video Generation	提出MIDAS框架以解决实时多模态交互数字人合成问题	world model large language model multimodal
13	Geo2Vec: Shape- and Distance-Aware Neural Representation of Geospatial Entities	提出Geo2Vec以解决地理实体表示学习中的高计算成本问题	representation learning spatial relationship	✅
14	Flatness-aware Curriculum Learning via Adversarial Difficulty	提出对抗性难度度量以解决课程学习与平坦最小值结合问题	curriculum learning
15	Clustering-based Feature Representation Learning for Oracle Bone Inscriptions Detection	提出基于聚类的特征表示学习方法以解决甲骨文检测问题	representation learning

🔬 支柱五：交互与反应 (Interaction & Reaction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Rethinking Human-Object Interaction Evaluation for both Vision-Language Models and HOI-Specific Methods	提出新基准数据集以评估人机交互检测方法的有效性	human-object interaction HOI
17	DQEN: Dual Query Enhancement Network for DETR-based HOI Detection	提出双查询增强网络以解决DETR基础的HOI检测问题	human-object interaction HOI	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation	提出OmniHuman-1.5以解决视频化身动画的情感表达问题	physically plausible character animation large language model	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
19	All-in-One Slider for Attribute Manipulation in Diffusion Models	提出全能滑块以解决生成图像属性操控难题	manipulation	✅

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
20	Wan-S2V: Audio-Driven Cinematic Video Generation	提出Wan-S2V以解决复杂影视动画生成问题	character animation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页