| 1 |
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations |
提出多模态大语言模型以增强视频推荐系统的语义理解 |
large language model multimodal |
|
|
| 2 |
Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model |
提出多模态学习方法以增强人脸变形攻击检测 |
foundation model multimodal |
|
|
| 3 |
ViMoNet: A Multimodal Vision-Language Framework for Human Behavior Understanding from Motion and Video |
提出ViMoNet以解决人类行为理解中的多模态数据融合问题 |
large language model multimodal |
|
|
| 4 |
IAG: Input-aware Backdoor Attack on VLM-based Visual Grounding |
提出IAG以解决VLM基础视觉定位系统的后门攻击问题 |
multimodal visual grounding |
|
|
| 5 |
MANGO: Multimodal Attention-based Normalizing Flow Approach to Fusion Learning |
提出MANGO方法以解决多模态融合学习的特征捕捉问题 |
multimodal |
|
|
| 6 |
January Food Benchmark (JFB): A Public Benchmark Dataset and Evaluation Suite for Multimodal Food Analysis |
提出January Food Benchmark以解决营养分析标准化问题 |
multimodal |
|
|
| 7 |
Multimodal Sheaf-based Network for Glioblastoma Molecular Subtype Prediction |
提出基于sheaf的多模态网络以解决胶质母细胞瘤分子亚型预测问题 |
multimodal |
✅ |
|
| 8 |
NEURAL: Attention-Guided Pruning for Unified Multimodal Resource-Constrained Clinical Evaluation |
提出NEURAL以解决资源受限临床环境中的多模态医学影像数据压缩问题 |
multimodal |
✅ |
|
| 9 |
The Brain Resection Multimodal Image Registration (ReMIND2Reg) 2025 Challenge |
提出ReMIND2Reg挑战以解决脑肿瘤手术中的图像配准问题 |
multimodal |
|
|
| 10 |
CellSymphony: Deciphering the molecular and phenotypic orchestration of cells with single-cell pathomics |
提出CellSymphony以解决细胞特征提取与空间转录组数据整合问题 |
foundation model multimodal |
|
|
| 11 |
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation |
提出Echo-4o以解决图像生成中的数据稀缺问题 |
foundation model multimodal |
|
|
| 12 |
DINOv3 |
提出DINOv3以解决自监督学习中的特征图退化问题 |
foundation model |
|
|
| 13 |
iWatchRoad: Scalable Detection and Geospatial Visualization of Potholes for Smart Cities |
提出iWatchRoad以解决印度道路坑洼检测问题 |
TAMP |
|
|
| 14 |
On the dynamic evolution of CLIP texture-shape bias and its relationship to human alignment and model robustness |
分析CLIP模型训练过程中的纹理-形状偏差及其与人类感知的关系 |
multimodal |
|
|
| 15 |
Preacher: Paper-to-Video Agentic System |
提出Preacher以解决论文转视频生成的多重限制问题 |
chain-of-thought |
✅ |
|
| 16 |
Learning Spatial Decay for Vision Transformers |
提出空间衰减变换器以提升视觉变换器的空间注意力 |
large language model |
|
|
| 17 |
Gen-AFFECT: Generation of Avatar Fine-grained Facial Expressions with Consistent identiTy |
提出GEN-AFFECT以解决个性化头像生成中的表情一致性问题 |
multimodal |
|
|