当前位置: 首页 > 详情页

Intelligent Head and Neck CTA Report Quality Detection with Large Language Models

文献详情

资源类型:
WOS体系:

收录情况: ◇ SCIE

机构: [1]Capital Med Univ, Xuanwu Hosp, Informat Ctr, Beijing 100053, Peoples R China [2]Capital Med Univ, Xuanwu Hosp, Dept Radiol & Nucl Med, Beijing 100053, Peoples R China [3]Beijing Key Lab Magnet Resonance Imaging & Brain I, Beijing 100053, Peoples R China
出处:
ISSN:

关键词: Artificial intelligence Large language models Radiological report Reports quality

摘要:
This study aims to identify common errors in head and neck CTA reports using GPT-4, ERNIE Bot, and SparkDesk, evaluating their potential for supporting quality control in Chinese radiological reports. This study collected 10,000 head and neck CTA imaging reports from Xuanwu Hospital (Dataset 1) and 5000 multi-center reports (Dataset 2). We identified six common types of errors and detected them using three large language models: GPT-4, ERNIE Bot, and SparkDesk. The overall quality of the reports was assessed using a 5-point Likert scale. We conducted a Wilcoxon rank-sum test and Friedman test to compare error detection rates and evaluate the models' performance on different error types and overall scores. For Dataset 2, after manual review, we annotated the six error types and provided overall scoring, while also recording the time taken for manual scoring and model detection. Model performance was evaluated using accuracy, precision, recall, and F1 score. The intraclass correlation coefficient measured consistency between manual and model scores, and ANOVA compared evaluation times. In Dataset 1, the error detection rates for final reports were significantly lower than those for preliminary reports across all three model types. Friedman's test indicated significant differences in error rates among the three models. In Dataset 2, the detection accuracy of the three LLMs for the six error types was above 95%. GPT-4 had a moderate consistency with manual scores (ICC = 0.517), while ERNIE Bot and SparkDesk showed slightly lower consistency (ICC = 0.431 and 0.456, respectively; P < 0.001). The models evaluated one hundred radiology reports significantly faster than human reviewers. LLMs can differentiate the quality of radiology reports and identify error types, significantly enhancing the efficiency of quality control reviews and providing substantial research and practical value in this field.

基金:
语种:
WOS:
第一作者:
第一作者机构: [2]Capital Med Univ, Xuanwu Hosp, Dept Radiol & Nucl Med, Beijing 100053, Peoples R China [3]Beijing Key Lab Magnet Resonance Imaging & Brain I, Beijing 100053, Peoples R China
通讯作者:
通讯机构: [2]Capital Med Univ, Xuanwu Hosp, Dept Radiol & Nucl Med, Beijing 100053, Peoples R China [3]Beijing Key Lab Magnet Resonance Imaging & Brain I, Beijing 100053, Peoples R China
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:18209 今日访问量:0 总访问量:997 更新日期:2025-10-01 建议使用谷歌、火狐浏览器 常见问题

版权所有©2020 首都医科大学宣武医院 技术支持:重庆聚合科技有限公司 地址:北京市西城区长椿街45号宣武医院