Diagnosing Shoulder Disorders Using Multimodal Large Language Models and Consumer-Grade Cameras

作者: Jindong Hong, Wencheng Zhang, Shiqin Qiao, Jianhai Chen, Jianing Qiu, Chuanyang Zheng, Qian Xu, Yun Ji, Qianyue Wen, Weiwei Sun, Hao Li, Huizhen Li, Huichao Wang, Kai Wu, Meng Li, Yijun He, Lingjie Luo, Jiankai Sun

分类: cs.CV, cs.AI, cs.CL, cs.LG

发布日期: 2025-10-10

💡 一句话要点

提出多模态大语言模型以解决肩部疾病诊断问题

🎯 匹配领域: 支柱九：具身大模型 (Embodied Foundation Models)

关键词: 肩部疾病 多模态大语言模型 视频诊断 混合运动视频诊断 医疗决策 可用性指数 动作理解 疾病诊断

📋 核心要点

现有方法在肩部疾病的早期诊断中面临高成本和资源不足的挑战，尤其是在医疗资源匮乏的地区。
论文提出了一种混合运动视频诊断框架（HMVDx），利用多模态大语言模型分别进行动作理解和疾病诊断。
实验结果显示，HMVDx在肩关节损伤诊断中的准确率比传统方法提高了79.6%，具有显著的技术贡献。

📝 摘要（中文）

肩部疾病，如冻结肩，是全球普遍存在的健康问题，尤其在老年人和从事重复性肩部工作的工人中发病率较高。在医疗资源匮乏的地区，早期和准确的诊断面临重大挑战，急需低成本、易于扩展的辅助诊断解决方案。本研究利用消费级设备捕获的视频作为诊断基础，提出了一种混合运动视频诊断框架（HMVDx），通过两个多模态大语言模型分别完成动作理解和疾病诊断。此外，提出了一种新颖的可用性指数，以评估多模态大语言模型在整个医疗诊断路径中的有效性。实验结果表明，HMVDx在肩关节损伤诊断中的准确率比直接视频诊断提高了79.6%。

🔬 方法详解

问题定义：本研究旨在解决肩部疾病的早期诊断问题，现有方法在资源匮乏地区的应用受到限制，导致诊断准确性不足。

核心思路：通过引入混合运动视频诊断框架（HMVDx），将动作理解和疾病诊断任务分开处理，利用多模态大语言模型提升诊断的准确性和效率。

技术框架：HMVDx框架包括两个主要模块：动作理解模块和疾病诊断模块，分别由不同的多模态大语言模型完成，形成一个完整的诊断流程。

关键创新：提出的可用性指数是本研究的重要创新点，它从医疗决策的逻辑过程出发，评估多模态大语言模型在医疗应用中的有效性，与传统方法相比具有更全面的评估视角。

关键设计：在模型设计中，采用了特定的损失函数和参数设置，以优化动作识别和疾病诊断的性能，确保模型在实际应用中的可行性和准确性。

📊 实验亮点

实验结果表明，HMVDx在肩关节损伤的诊断准确率上比直接视频诊断提高了79.6%，显示出多模态大语言模型在医疗视频理解中的巨大潜力，具有显著的技术进步。

🎯 应用场景

该研究的潜在应用领域包括医疗诊断、远程医疗和健康监测等，尤其适用于医疗资源匮乏的地区。通过低成本的消费级设备，能够实现更广泛的肩部疾病筛查和早期诊断，提升整体医疗服务的可及性和效率。

📄 摘要（原文）

Shoulder disorders, such as frozen shoulder (a.k.a., adhesive capsulitis), are common conditions affecting the health of people worldwide, and have a high incidence rate among the elderly and workers engaged in repetitive shoulder tasks. In regions with scarce medical resources, achieving early and accurate diagnosis poses significant challenges, and there is an urgent need for low-cost and easily scalable auxiliary diagnostic solutions. This research introduces videos captured by consumer-grade devices as the basis for diagnosis, reducing the cost for users. We focus on the innovative application of Multimodal Large Language Models (MLLMs) in the preliminary diagnosis of shoulder disorders and propose a Hybrid Motion Video Diagnosis framework (HMVDx). This framework divides the two tasks of action understanding and disease diagnosis, which are respectively completed by two MLLMs. In addition to traditional evaluation indicators, this work proposes a novel metric called Usability Index by the logical process of medical decision-making (action recognition, movement diagnosis, and final diagnosis). This index evaluates the effectiveness of MLLMs in the medical field from the perspective of the entire medical diagnostic pathway, revealing the potential value of low-cost MLLMs in medical applications for medical practitioners. In experimental comparisons, the accuracy of HMVDx in diagnosing shoulder joint injuries has increased by 79.6\% compared with direct video diagnosis, a significant technical contribution to future research on the application of MLLMs for video understanding in the medical field.

Diagnosing Shoulder Disorders Using Multimodal Large Language Models and Consumer-Grade Cameras

💡 一句话要点

📋 核心要点

📝 摘要（中文）

🔬 方法详解

📊 实验亮点

🎯 应用场景

📄 摘要（原文）

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册