当前位置: 首页 > 详情页

The actual performance of large language models in providing liver cirrhosis-related information: A comparative study

文献详情

资源类型:
WOS体系:
Pubmed体系:

收录情况: ◇ SCIE

机构: [1]Capital Med Univ, Beijing Ditan Hosp, Ctr Integrat Med, Beijing, Peoples R China [2]Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China [3]Capital Med Univ, Xuanwu Hosp, Dept Neurosurg, Beijing, Peoples R China
出处:
ISSN:

关键词: Large Language Model Liver Cirrhosis Quality Assessment Readability Assessment Accuracy Assessment

摘要:
Objective: With the increasing prevalence of large language models (LLMs) in the medical field, patients are increasingly turning to advanced online resources for information related to liver cirrhosis due to its long-term management processes. Therefore, a comprehensive evaluation of real-world performance of LLMs in these specialized medical areas is necessary. Methods: This study evaluates the performance of four mainstream LLMs (ChatGPT-4o, Claude-3.5 Sonnet, Gemini-1.5 Pro, and Llama-3.1) in answering 39 questions related to liver cirrhosis. The information quality, readability and accuracy were assessed using Ensuring Quality Information for Patients tool, Flesch-Kincaid metrics and consensus scoring. The simplification and their self-correction ability of LLMs were also assessed. Results: Significant performance differences were observed among the models. Gemini scored highest in providing high-quality information. While the readability of all four LLMs was generally low, requiring a college-level reading comprehension ability, they exhibited strong capabilities in simplifying complex information. ChatGPT performed best in terms of accuracy, with a "Good" rating of 80%, higher than Claude (72%), Gemini (49%), and Llama (64%). All models received high scores for comprehensiveness. Each of the four LLMs demonstrated some degree of self-correction ability, improving the accuracy of initial answers with simple prompts. ChatGPT's and Llama's accuracy improved by 100%, Claude's by 50% and Gemini's by 67%. Conclusion: LLMs demonstrate excellent performance in generating health information related to liver cirrhosis, yet they exhibit differences in answer quality, readability and accuracy. Future research should enhance their value in healthcare, ultimately achieving reliable, accessible and patient-centered medical information dissemination.

基金:
语种:
WOS:
PubmedID:
中科院(CAS)分区:
出版当年[2025]版:
大类 | 2 区 医学
小类 | 2 区 卫生保健与服务 3 区 计算机:信息系统 3 区 医学:信息
最新[2025]版:
大类 | 2 区 医学
小类 | 2 区 卫生保健与服务 3 区 计算机:信息系统 3 区 医学:信息
JCR分区:
出版当年[2023]版:
Q1 HEALTH CARE SCIENCES & SERVICES Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Q2 MEDICAL INFORMATICS
最新[2023]版:
Q1 HEALTH CARE SCIENCES & SERVICES Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Q2 MEDICAL INFORMATICS

影响因子: 最新[2023版] 最新五年平均 出版当年[2023版] 出版当年五年平均 出版前一年[2022版]

第一作者:
第一作者机构: [1]Capital Med Univ, Beijing Ditan Hosp, Ctr Integrat Med, Beijing, Peoples R China
共同第一作者:
通讯作者:
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:17292 今日访问量:0 总访问量:929 更新日期:2025-06-01 建议使用谷歌、火狐浏览器 常见问题

版权所有©2020 首都医科大学宣武医院 技术支持:重庆聚合科技有限公司 地址:北京市西城区长椿街45号宣武医院