首页 | 本学科首页   官方微博 | 高级检索  
     

基于过采样技术和随机森林的不平衡微阵列数据分类方法研究
引用本文:于化龙,高尚,赵靖,秦斌. 基于过采样技术和随机森林的不平衡微阵列数据分类方法研究[J]. 计算机科学, 2012, 39(5): 190-194
作者姓名:于化龙  高尚  赵靖  秦斌
作者单位:1. 江苏科技大学计算机科学与工程学院 镇江212003
2. 哈尔滨工程大学计算机科学与技术学院 哈尔滨150001
基金项目:国家自然科学基金,国家教育部博士点基金新教师项目,江苏科技大学引进人才科研启动项目
摘    要:近年来,应用DNA微阵列技术对疾病,尤其是癌症进行诊断,已逐渐成为生物信息学领域的研究热点之一。对比其它的数据载体,微阵列数据通常具有一些独有的特点。针对微阵列数据样本分布不平衡这一特点,提出了一种基于概率分布的过采样技术,通过该技术可以为少数类建立一些合理的伪样本,从而使各类的样本数达到均衡,然后使用随机森林分类器对其进行分类。该方法的有效性和可行性已经在两个标准的微阵列数据集上得到了验证。实验结果显示,与传统的方法相比,该方法可以获得更好的分类性能。

关 键 词:微阵列数据  样本分布不平衡  过采样技术  概率分布  随机森林

Classification for Imbalanced Microarray Data Based on Oversampling Technology and Random Forest
YU Hua-long , GAO Shang , ZHAO Jing , QIN Bin. Classification for Imbalanced Microarray Data Based on Oversampling Technology and Random Forest[J]. Computer Science, 2012, 39(5): 190-194
Authors:YU Hua-long    GAO Shang    ZHAO Jing    QIN Bin
Affiliation:1(School of Computer Science and Engineering,Jiangsu University of Science and Technology,Zhenjiang 212003,China)1(College of Computer Science and Technology,Harbin Engineering University,Harbin 150001,China)2
Abstract:In recent years, applying DNA microarray technology to diagnose for disease, especially for cancer, has been becoming one of hot topics in bioinformatics. In contrast with many other data carriers,microarray data generally holds some unique characteristics. A novel oversampling technology based on probability distribution was proposed to solve the problem brought by the characteristic of sample distribution imbalance of microarray data. 13y this technology, some reasonable pseudo samples would be created for the minority class to guarantee the balance between two classes. Then we used random forest to classify the samples belonging to different classes. Its effectiveness and feasibility were verified on two benchmark microarray datasets. Experimental results show that the proposed method can obtain better classification performance, compared with some traditional approaches.
Keywords:Microarray data   Sample distribution imbalance   Oversampling technology   Probability distribution   Random forest
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号