首页 | 本学科首页   官方微博 | 高级检索  
     


Analysis of complexity indices for classification problems: Cancer gene expression data
Authors:Ana C. LorenaAuthor Vitae,Ivan G. CostaAuthor Vitae,Newton Spolaô  rAuthor Vitae
Affiliation:a Centro de Matemática, Computação e Cognição, Universidade Federal do ABC, Brazil
b Centro de Informática, Universidade Federal de Pernambuco, Brazil
Abstract:Currently, cancer diagnosis at a molecular level has been made possible through the analysis of gene expression data. More specifically, one usually uses machine learning (ML) techniques to build, from cancer gene expression data, automatic diagnosis models (classifiers). Cancer gene expression data often present some characteristics that can have a negative impact in the generalization ability of the classifiers generated. Some of these properties are data sparsity and an unbalanced class distribution. We investigate the results of a set of indices able to extract the intrinsic complexity information from the data. Such measures can be used to analyze, among other things, which particular characteristics of cancer gene expression data mostly impact the prediction ability of support vector machine classifiers. In this context, we also show that, by applying a proper feature selection procedure to the data, one can reduce the influence of those characteristics in the error rates of the classifiers induced.
Keywords:Classification   Gene expression data   Complexity indices   Linear separability
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号