首页 | 本学科首页   官方微博 | 高级检索  
     

逐层Transformer在类别不均衡数据的应用
引用本文:杨晶东,李熠伟,江彪,姜泉,韩曼,宋梦歌. 逐层Transformer在类别不均衡数据的应用[J]. 计算机应用研究, 2023, 40(10): 3047-3052
作者姓名:杨晶东  李熠伟  江彪  姜泉  韩曼  宋梦歌
作者单位:1. 上海理工大学光电信息与计算机工程学院;2. 中国中医科学院广安门医院
基金项目:国家自然科学基金资助项目(81973749);
摘    要:为解决临床医学量表数据类别不均衡容易对模型产生影响,以及在处理量表数据任务时深度学习框架性能难以媲美传统机器学习方法问题,提出了一种基于级联欠采样的Transformer网络模型(layer by layer Transformer, LLT)。LLT通过级联欠采样方法对多数类数据逐层删减,实现数据类别平衡,降低数据类别不均衡对分类器的影响,并利用注意力机制对输入数据的特征进行相关性评估实现特征选择,细化特征提取能力,改善模型性能。采用类风湿关节炎(RA)数据作为测试样本,实验证明,在不改变样本分布的情况下,提出的级联欠采样方法对少数类别的识别率增加了6.1%,与常用的NEARMISS和ADASYN相比,分别高出1.4%和10.4%;LLT在RA量表数据的准确率和F1-score指标上达到了72.6%和71.5%,AUC值为0.89,mAP值为0.79,性能超过目前RF、XGBoost和GBDT等主流量表数据分类模型。最后对模型过程进行可视化,分析了影响RA的特征,对RA临床诊断具有较好的指导意义。

关 键 词:量表数据分类  类别不均衡  级联欠采样  Transformer
收稿时间:2023-01-31
修稿时间:2023-04-24

Application of layer by layer Transformer in class-imbalanced data
Yang Jingdong,Li Yiwei,Jiang Biao,Jiang Quan,Han Man and Song Mengge. Application of layer by layer Transformer in class-imbalanced data[J]. Application Research of Computers, 2023, 40(10): 3047-3052
Authors:Yang Jingdong  Li Yiwei  Jiang Biao  Jiang Quan  Han Man  Song Mengge
Affiliation:School of Optical-Electrical DdDdComputer Engineering, Shanghai University of Science DdDd Technology,,,,,
Abstract:In order to solve the problem that class-imbalance data of clinical medical tables tend to have an impact on the model and that the performance of deep learning framework is difficult to match that of traditional machine learning methods when processing scale data tasks, this paper proposed a layer by layer Transformer(LLT) network model based on cascaded under-sampling. LLT deleted the most types of data layer by layer by cascade under-sampling method to achieve the balance of data categories and reduced the impact of class-imbalance data on the classifier. Moreover, LLT used attention mechanism to carry out correlation evaluation on the features of the input data to achieve feature selection, refined the feature extraction ability and improved the model performance. This paper used RA(rheumatoid arthritis) data as test samples. Experimental results show that, on the premise of not changing the sample distribution, the recognition rate of a few categories is increased by 6.1% by the proposed cascade under-sampling method, which is 1.4% and 10.4% higher than that of the commonly used NEARMISS and ADASYN respectively. The accuracy of the RA tabular data and the F1-score index of LLT reach 72.6% and 71.5%, the AUC value is 0.89, the mAP value is 0.79, and the performance exceeds the current mainstream tabular data classification models such as RF, XGBoost and GBDT. This paper also visualized the model process and analyzed the characteristics affecting RA. It has a good guiding significance for the clinical diagnosis of RA.
Keywords:tabular data classification   class-imbalance   cascaded under-sampling   Transformer
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号