| 1 |
M-PACE: Mother Child Framework for Multimodal Compliance |
M-PACE:用于多模态合规性的母子框架,显著降低推理成本。 |
large language model multimodal |
|
|
| 2 |
Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR |
Baseer:面向阿拉伯语文档OCR的视觉-语言模型,刷新SOTA |
large language model multimodal |
|
|
| 3 |
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding |
ViSpec:利用视觉感知推测解码加速视觉-语言模型推理 |
large language model multimodal |
✅ |
|
| 4 |
Dense Video Understanding with Gated Residual Tokenization |
提出门控残差Token化(GRT)框架,实现高效高帧率视频理解。 |
large language model |
|
|