当前位置: 首页 > 详情页

Large Language Models' Responses to Spinal Cord Injury: A Comparative Study of Performance

文献详情

资源类型:
WOS体系:
Pubmed体系:

收录情况: ◇ SCIE

机构: [1]Capital Med Univ, Xuanwu Hosp, Dept Neurosurg, 45 Changchun St, Beijing 10053, Peoples R China [2]CHINA INI, Spine Ctr, Beijing, Peoples R China [3]Shandong Univ, Qilu Hosp, Dept Orthopaed, 107 Wenhua West Rd, Jinan 250012, Peoples R China [4]Baylor Coll Med, Houston, TX USA [5]Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China [6]Capital Med Univ, Beijing Ditan Hosp, Ctr Integrat Med, Beijing, Peoples R China [7]Capital Med Univ, Beijing, Peoples R China
出处:
ISSN:

关键词: Large language model Spinal cord injury Quality assessment Readability assessment Accuracy assessment

摘要:
With the increasing application of large language models (LLMs) in the medical field, their potential in patient education and clinical decision support is becoming increasingly prominent. Given the complex pathogenesis, diverse treatment options, and lengthy rehabilitation periods of spinal cord injury (SCI), patients are increasingly turning to advanced online resources to obtain relevant medical information. This study analyzed responses from four LLMs-ChatGPT-4o, Claude-3.5 sonnet, Gemini-1.5 Pro, and Llama-3.1-to 37 SCI-related questions spanning pathogenesis, risk factors, clinical features, diagnostics, treatments, and prognosis. Quality and readability were assessed using the Ensuring Quality Information for Patients (EQIP) tool and Flesch-Kincaid metrics, respectively. Accuracy was independently scored by three senior spine surgeons using consensus scoring. Performance varied among the models. Gemini ranked highest in EQIP scores, suggesting superior information quality. Although the readability of all four LLMs was generally low, requiring a college-level reading comprehension ability, they were all able to effectively simplify complex content. Notably, ChatGPT led in accuracy, achieving significantly higher "Good" ratings (83.8%) compared to Claude (78.4%), Gemini (54.1%), and Llama (62.2%). Comprehensiveness scores were high across all models. Furthermore, the LLMs exhibited strong self-correction abilities. After being prompted for revision, the accuracy of ChatGPT and Claude's responses improved by 100% and 50%, respectively; both Gemini and Llama improved by 67%. This study represents the first systematic comparison of leading LLMs in the context of SCI. While Gemini excelled in response quality, ChatGPT provided the most accurate and comprehensive responses.

基金:
语种:
WOS:
PubmedID:
中科院(CAS)分区:
出版当年[2025]版:
大类 | 3 区 医学
小类 | 3 区 卫生保健与服务 4 区 医学:信息
最新[2025]版:
大类 | 3 区 医学
小类 | 3 区 卫生保健与服务 4 区 医学:信息
JCR分区:
出版当年[2023]版:
Q1 HEALTH CARE SCIENCES & SERVICES Q2 MEDICAL INFORMATICS
最新[2023]版:
Q1 HEALTH CARE SCIENCES & SERVICES Q2 MEDICAL INFORMATICS

影响因子: 最新[2023版] 最新五年平均 出版当年[2023版] 出版当年五年平均 出版前一年[2022版]

第一作者:
第一作者机构: [1]Capital Med Univ, Xuanwu Hosp, Dept Neurosurg, 45 Changchun St, Beijing 10053, Peoples R China [2]CHINA INI, Spine Ctr, Beijing, Peoples R China
共同第一作者:
通讯作者:
通讯机构: [1]Capital Med Univ, Xuanwu Hosp, Dept Neurosurg, 45 Changchun St, Beijing 10053, Peoples R China [2]CHINA INI, Spine Ctr, Beijing, Peoples R China
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:17292 今日访问量:0 总访问量:929 更新日期:2025-06-01 建议使用谷歌、火狐浏览器 常见问题

版权所有©2020 首都医科大学宣武医院 技术支持:重庆聚合科技有限公司 地址:北京市西城区长椿街45号宣武医院