机构:[1]City Univ Hong Kong, Dept Comp Sci, Kowloon Tong, Hong Kong, Peoples R China[2]Univ Oxford, Sir William Dunn Sch Pathol, Oxford, England[3]Harvard Med Sch, Massachusetts Gen Hosp, Cutaneous Biol Res Ctr, Boston, MA USA[4]MIT, Sci & Artificial Intelligence Lab, Cambridge, MA USA[5]Capital Med Univ, Xuanwu Hosp, Dept Neurosurg, Beijing, Peoples R China首都医科大学宣武医院[6]Ningbo Univ, Coll Informat Sci & Engn, Ningbo, Peoples R China[7]City Univ Hong Kong, Shenzhen Res Inst, Shenzhen, Peoples R China[8]City Univ Hong Kong, Hong Kong Inst Data Sci, Kowloon, Hong Kong, Peoples R China
Imbalanced datasets have been a persistent challenge in bioinformatics, particularly in the context of drug-drug interaction (DDI) risk level datasets. Such imbalance can lead to biased models that perform poorly on underrepresented classes. To address this issue, one strategy is to construct a balanced dataset, while another involves employing more advanced features and models. In this study, we introduce a novel approach called DDintensity, which leverages pre-trained deep learning models as embedding generators combined with LSTM-attention models to address the imbalance in DDI risk level datasets. We tested embeddings from various domains, including images, graphs, and textual corpus. Among these, embeddings generated by BioGPT achieved the highest performance, with an Area Under the Curve (AUC) of 0.97 and an Area Under the Precision-Recall curve (AUPR) of 0.92. Our model was trained on the DDinter and further validated using the MecDDI dataset. Additionally, case studies on chemotherapeutic drugs, DB00398 (Sorafenib) and DB01204 (Mitoxantrone) used in oncology, were conducted to demonstrate the specificity and effectiveness of the this methods. Our approach demonstrates high scalability across DDI modalities, as well as the discovery of novel interactions. In summary, we introduce DDIntensity as a solution for imbalanced datasets in bioinformatics with pre-trained deep-learning embeddings.
基金:
National Natural Science Foundation of China [32170654]; Shenzhen Research Institute, City University of Hong Kong; Research Grants Council of the Hong Kong Special Administrative Region [CityU 11203723]; City University of Hong Kong [2021SIRG036, CityU 9667265, CityU 11203221]; Innovation and Technology Commission [ITB/FBL/9037/22/S]; Cancer Research UK [BVR01170]
第一作者机构:[1]City Univ Hong Kong, Dept Comp Sci, Kowloon Tong, Hong Kong, Peoples R China[2]Univ Oxford, Sir William Dunn Sch Pathol, Oxford, England
通讯作者:
通讯机构:[1]City Univ Hong Kong, Dept Comp Sci, Kowloon Tong, Hong Kong, Peoples R China[7]City Univ Hong Kong, Shenzhen Res Inst, Shenzhen, Peoples R China[8]City Univ Hong Kong, Hong Kong Inst Data Sci, Kowloon, Hong Kong, Peoples R China
推荐引用方式(GB/T 7714):
Xie Weidun,Chen Xingjian,Huang Lei,et al.DDintensity: Addressing imbalanced drug-drug interaction risk levels using pre-trained deep learning model embeddings[J].ARTIFICIAL INTELLIGENCE IN MEDICINE.2025,168:doi:10.1016/j.artmed.2025.103202.
APA:
Xie, Weidun,Chen, Xingjian,Huang, Lei,Zheng, Zetian,Wang, Yuchen...&Wong, Ka-chun.(2025).DDintensity: Addressing imbalanced drug-drug interaction risk levels using pre-trained deep learning model embeddings.ARTIFICIAL INTELLIGENCE IN MEDICINE,168,
MLA:
Xie, Weidun,et al."DDintensity: Addressing imbalanced drug-drug interaction risk levels using pre-trained deep learning model embeddings".ARTIFICIAL INTELLIGENCE IN MEDICINE 168.(2025)