| 10 |
DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis |
DEMO:解耦运动潜在流匹配,实现细粒度可控的说话人像合成 |
flow matching motion latent |
|
|
| 11 |
EGD-YOLO: A Lightweight Multimodal Framework for Robust Drone-Bird Discrimination via Ghost-Enhanced YOLOv8n and EMA Attention under Adverse Condition |
EGD-YOLO:轻量级多模态框架,通过Ghost增强YOLOv8n和EMA注意力实现恶劣条件下无人机-鸟类稳健区分 |
VIP multimodal |
|
|
| 12 |
Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection |
提出FS-VFM,通过自监督学习提升人脸安全任务的泛化能力 |
distillation foundation model |
✅ |
|
| 13 |
MSF-Mamba: Motion-aware State Fusion Mamba for Efficient Micro-Gesture Recognition |
提出MSF-Mamba,通过运动感知状态融合提升Mamba在微手势识别中的效率与精度。 |
Mamba SSM state space model |
|
|
| 14 |
Unified Open-World Segmentation with Multi-Modal Prompts |
COSINE:多模态提示下的统一开放世界分割模型 |
representation learning open-vocabulary open vocabulary |
|
|
| 15 |
Structured Spectral Graph Representation Learning for Multi-label Abnormality Analysis from 3D CT Scans |
提出基于结构化谱图表示学习的3D CT多标签异常分析方法 |
representation learning spatial relationship |
|
|
| 16 |
OmniQuality-R: Advancing Reward Models Through All-Encompassing Quality Assessment |
OmniQuality-R:通过全方位质量评估提升奖励模型性能 |
reinforcement learning chain-of-thought |
|
|
| 17 |
Mesh-Gait: A Unified Framework for Gait Recognition Through Multi-Modal Representation Learning from 2D Silhouettes |
Mesh-Gait:提出一种基于2D轮廓多模态表征学习的统一步态识别框架 |
representation learning |
|
|