当前位置: 首页 > 详情页

MLeVLM: Improve Multi-level Progressive Capabilities based on Multimodal Large Language Model for Medical Visual Question Answering

| 认领 | 导出 |

文献详情

资源类型:
WOS体系:

收录情况: ◇ CPCI(ISTP)

机构: [1]Peking Univ, Sch Comp Sci, Beijing, Peoples R China [2]Peking Univ, Sch Software Microelectron, Beijing, Peoples R China [3]Peking Univ, Natl Engn Res Ctr Software Engn, Beijing, Peoples R China [4]Med Univ, Xuanwu Hosp Capital, Beijing, Peoples R China [5]Peking Univ, Sixth Hosp, Beijing, Peoples R China [6]Peking Univ, Peoples Hosp, Beijing, Peoples R China [7]Peking Univ, First Hosp, Beijing, Peoples R China
出处:

摘要:
Medical visual question answering (MVQA) requires in-depth understanding of medical images and questions to provide reliable answers. We summarize multi-level progressive capabilities that models need to focus on in MVQA: recognition, details, diagnosis, knowledge, and reasoning. Existing MVQA models tend to ignore the above capabilities due to unspecific data and plain architecture. To address these issues, this paper proposes Multi-level Visual Language Model (MLeVLM(1)) for MVQA. On the data side, we construct a high-quality multi-level instruction dataset MLe-VQA via GPT-4, which covers multi-level questions and answers as well as reasoning processes from visual clues to semantic cognition. On the architecture side, we propose a multi-level feature alignment module, including attention-based token selector and context merger, which can efficiently align features at different levels from visual to semantic. To better evaluate the model's capabilities, we manually construct a multi-level MVQA evaluation benchmark named MLe-Bench. Extensive experiments demonstrate the effectiveness of our constructed multi-level instruction dataset and the multi-level feature alignment module. It also proves that MLeVLM outperforms existing medical multimodal large language models.

基金:
语种:
WOS:
第一作者:
第一作者机构: [1]Peking Univ, Sch Comp Sci, Beijing, Peoples R China
共同第一作者:
通讯作者:
推荐引用方式(GB/T 7714):
APA:
MLA:

资源点击量:16936 今日访问量:1 总访问量:903 更新日期:2025-03-01 建议使用谷歌、火狐浏览器 常见问题

版权所有©2020 首都医科大学宣武医院 技术支持:重庆聚合科技有限公司 地址:北京市西城区长椿街45号宣武医院