详情页 - 首都医科大学宣武医院知识库

当前位置：首页 > 详情页

MLeVLM: Improve Multi-level Progressive Capabilities based on Multimodal Large Language Model for Medical Visual Question Answering

文献详情

资源类型：

WOS体系：

收录情况： ◇ CPCI(ISTP)

作者：

机构： [1]Peking Univ, Sch Comp Sci, Beijing, Peoples R China [2]Peking Univ, Sch Software Microelectron, Beijing, Peoples R China [3]Peking Univ, Natl Engn Res Ctr Software Engn, Beijing, Peoples R China [4]Med Univ, Xuanwu Hosp Capital, Beijing, Peoples R China [5]Peking Univ, Sixth Hosp, Beijing, Peoples R China [6]Peking Univ, Peoples Hosp, Beijing, Peoples R China [7]Peking Univ, First Hosp, Beijing, Peoples R China

出处：

摘要：

Medical visual question answering (MVQA) requires in-depth understanding of medical images and questions to provide reliable answers. We summarize multi-level progressive capabilities that models need to focus on in MVQA: recognition, details, diagnosis, knowledge, and reasoning. Existing MVQA models tend to ignore the above capabilities due to unspecific data and plain architecture. To address these issues, this paper proposes Multi-level Visual Language Model (MLeVLM(1)) for MVQA. On the data side, we construct a high-quality multi-level instruction dataset MLe-VQA via GPT-4, which covers multi-level questions and answers as well as reasoning processes from visual clues to semantic cognition. On the architecture side, we propose a multi-level feature alignment module, including attention-based token selector and context merger, which can efficiently align features at different levels from visual to semantic. To better evaluate the model's capabilities, we manually construct a multi-level MVQA evaluation benchmark named MLe-Bench. Extensive experiments demonstrate the effectiveness of our constructed multi-level instruction dataset and the multi-level feature alignment module. It also proves that MLeVLM outperforms existing medical multimodal large language models.

基金：

语种：

被引次数：

WOS：

第一作者：

第一作者机构： [1]Peking Univ, Sch Comp Sci, Beijing, Peoples R China

共同第一作者：

通讯作者：

推荐引用方式(GB/T 7714)：

APA：

MLA：