首页 | 本学科首页   官方微博 | 高级检索  
     

基于特征交互与权重集成的癌症分类方法
引用本文:陈昊楠,金敏. 基于特征交互与权重集成的癌症分类方法[J]. 计算机应用研究, 2021, 38(4): 1051-1057. DOI: 10.19734/j.issn.1001-3695.2020.04.0106
作者姓名:陈昊楠  金敏
作者单位:湖南大学 信息科学与工程学院,长沙410082
基金项目:国家自然科学基金资助项目
摘    要:在癌症分类研究领域,高维、高冗余、类分布不平衡的基因表达数据如何进行特征选择与分类模型构建一直是影响分类准确率的难点。为了提高癌症分类的准确率,提出了基于特征交互与权重集成的癌症分类方法。在特征选择层面,利用多特征对分类信息的增益性交互作用来选出对于标签联合互信息大于单独互信息之和的特征组合,并利用条件互信息选择低冗余的特征,解决基因表达数据的高维、高冗余问题。在分类模型层面,提出结合权重集成反馈机制的二次学习集成模型,综合不同模型对不同类别样本的差异拟合能力,构造不依赖于样本数量的类权重,解决数据类分布不平衡的问题。应用该方法对六种癌症数据进行分类测试,accuracy、sensitivity、precision和F-measure四项指标均稳定在99.39%以上、specificity在94.74%以上,表明该方法能有效提高癌症分类的准确率和稳定性,同时具有对于不同癌症分类的通用性。

关 键 词:癌症分类  数据科学  特征交互  多元异构模型  权重集成反馈  二次学习集成模型
收稿时间:2020-04-23
修稿时间:2021-03-11

Cancer classification method based on feature interaction and weight integration
Chen Hao Nan and Jin Min. Cancer classification method based on feature interaction and weight integration[J]. Application Research of Computers, 2021, 38(4): 1051-1057. DOI: 10.19734/j.issn.1001-3695.2020.04.0106
Authors:Chen Hao Nan and Jin Min
Affiliation:(College of Computer Science&Electronic Engineering,Hunan University,Changsha 410082,China)
Abstract:In the field of cancer classification,gene expression profile data has the characteristics of high dimensions,high redundancy,and unbalanced class distribution,which are the factors that affect the accuracy of classification.In order to improve the accuracy of cancer classification,this paper proposed a cancer classification method based on feature interaction and weight integration.At the feature selection level,this method used the gaining interaction of multiple features to select the features with the joint mutual information that was greater than the sum of the individual mutual information,and further used conditional mutual information to select low-redundancy features.At the classification model level,the re-learning ensemble model combined with weight integration feedback mechanism could comprehensively consider the different fitting ability of multiple models for different types of samples.This model constructed class weight that did not depend on the number of samples,and solved the problem of unbalanced class distribution.Comparative experiments of six kinds of cancer data show that the four indicators of accuracy,sensitivity,precision and F-measure are all stable above 99.39%,and the specificity is above 94.74%,which indicates that the method can further improve the accuracy and stability of cancer classification and the versatility of different cancers.
Keywords:cancer classification  data science  feature interaction  multivariate heterogeneous model  weighted integration feedback  re-learning ensemble model
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号