cs.CV(2025-10-08)

📊 共 23 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (7) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (6) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
1 No MoCap Needed: Post-Training Motion Diffusion Models with Reinforcement Learning using Only Textual Prompts 提出基于强化学习的后训练运动扩散模型,仅用文本提示即可实现动作迁移。 reinforcement learning diffusion policy motion diffusion model
2 Implicit-Knowledge Visual Question Answering with Structured Reasoning Traces 提出MODELNAME框架,通过结构化推理轨迹提升隐式知识视觉问答性能。 distillation large language model multimodal
3 DeRainMamba: A Frequency-Aware State Space Model with Detail Enhancement for Image Deraining 提出DeRainMamba,结合频域感知和细节增强的图像去雨方法 Mamba state space model
4 Look before Transcription: End-to-End SlideASR with Visually-Anchored Policy Optimization 提出VAPO,通过视觉锚定的策略优化,提升SlideASR中领域术语的识别精度。 reinforcement learning large language model chain-of-thought
5 Knowledge-Aware Mamba for Joint Change Detection and Classification from MODIS Times Series 提出知识驱动的Mamba以解决MODIS时间序列变化检测问题 Mamba
6 Temporal Prompting Matters: Rethinking Referring Video Object Segmentation 提出Tenet框架,利用时序Prompt高效解决Referring Video Object Segmentation问题 preference learning foundation model
7 TTRV: Test-Time Reinforcement Learning for Vision Language Models 提出TTRV:一种用于视觉语言模型的测试时强化学习方法,无需标注数据。 reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
8 Evaluating Fundus-Specific Foundation Models for Diabetic Macular Edema Detection 评估眼底特有的基础模型在糖尿病黄斑水肿检测中的性能 foundation model
9 VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance 提出VA-Adapter,将超声基础模型应用于超声心动图探头引导,提升图像质量。 foundation model
10 DreamOmni2: Multimodal Instruction-based Editing and Generation DreamOmni2:提出多模态指令驱动的图像编辑与生成框架,扩展应用场景。 multimodal
11 Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods 提出VTC-Bench,用于更准确评估多模态大模型中视觉Token压缩方法的性能。 large language model multimodal
12 Evaluating LLMs for Historical Document OCR: A Methodological Framework for Digital Humanities 提出历史文档OCR的LLM评估框架,解决时序偏差和特定时期错误问题 large language model multimodal
13 TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility TRAVL:提升视频-语言模型对物理合理性判断能力的方案 multimodal
14 Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking 提出EDJE:一种高效判别式联合编码器,用于大规模视觉-语言重排序。 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
15 Generating Surface for Text-to-3D using 2D Gaussian Splatting 提出DirectGaussian以解决3D内容生成中的几何一致性问题 gaussian splatting splatting geometric consistency
16 Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers 提出基于语义提示扩散Transformer的像素级单目深度估计模型,生成高质量点云。 depth estimation monocular depth foundation model
17 SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel View Synthesis SCas4D:结构化级联优化加速持久动态场景的4D新视角合成 3D gaussian splatting gaussian splatting splatting
18 Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry 通过SAE分析DINOv2,揭示其表征的功能专业化和Minkowski几何特性。 depth estimation monocular depth
19 MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis MV-Performer:提出一种用于生成逼真同步多视角表演者视频的扩散模型 depth estimation monocular depth
20 Out-of-Distribution Detection in LiDAR Semantic Segmentation Using Epistemic Uncertainty from Hierarchical GMMs 提出基于分层GMM不确定性的LiDAR语义分割OOD检测方法 scene understanding

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
21 TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation 提出TalkCuts大规模数据集,用于多镜头人声视频生成研究 SMPL SMPL-X multimodal
22 MoRe: Monocular Geometry Refinement via Graph Optimization for Cross-View Consistency 提出MoRe,通过图优化单目几何体,提升跨视角一致性和尺度对齐。 feature matching foundation model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
23 WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation 提出WristWorld,利用4D世界模型从Anchor视角生成腕部视角视频,提升机器人操作性能。 manipulation world model VGGT

⬅️ 返回 cs.CV 首页 · 🏠 返回主页