| 1 |
STROKEVISION-BENCH: A Multimodal Video And 2D Pose Benchmark For Tracking Stroke Recovery |
StrokeVision-Bench:用于跟踪中风恢复的多模态视频和2D姿态基准数据集 |
multimodal |
|
|
| 2 |
Toward a robust lesion detection model in breast DCE-MRI: adapting foundation models to high-risk women |
针对高危女性,提出基于医学切片Transformer和KAN的乳腺DCE-MRI病灶检测模型。 |
foundation model |
|
|
| 3 |
MedDINOv3: How to adapt vision foundation models for medical image segmentation? |
MedDINOv3:一种用于医学图像分割的视觉基础模型自适应方法 |
foundation model |
✅ |
|
| 4 |
OmniActor: A Generalist GUI and Embodied Agent for 2D&3D Worlds |
OmniActor:一种用于2D和3D世界的通用GUI和具身智能体 |
generalist agent large language model multimodal |
|
|
| 5 |
A Multimodal Cross-View Model for Predicting Postoperative Neck Pain in Cervical Spondylosis Patients |
提出ABPDC和FPRAN模型,预测颈椎病患者术后颈部疼痛恢复情况 |
multimodal |
|
|
| 6 |
FusWay: Multimodal hybrid fusion approach. Application to Railway Defect Detection |
提出FusWay多模态融合方法,用于提升铁路缺陷检测精度。 |
multimodal |
|
|
| 7 |
Why Do MLLMs Struggle with Spatial Understanding? A Systematic Analysis from Data to Architecture |
系统分析MLLM空间理解能力瓶颈,提出MulSeT基准并探究数据与架构的影响 |
large language model multimodal |
|
|
| 8 |
DIET-CP: Lightweight and Data Efficient Self Supervised Continued Pretraining |
DIET-CP:轻量级且数据高效的自监督持续预训练方法 |
foundation model |
|
|
| 9 |
Understanding Space Is Rocket Science -- Only Top Reasoning Models Can Solve Spatial Understanding Tasks |
提出RocketScience基准,揭示现有VLM在空间关系理解上的不足 |
chain-of-thought |
✅ |
|