首页 | 本学科首页   官方微博 | 高级检索  
     

cDNA芯片缺失值处理对基于基因表达谱的疾病分类的影响
引用本文:王栋,郭政,李霞,吕莹丽,朱晶,王晨光. cDNA芯片缺失值处理对基于基因表达谱的疾病分类的影响[J]. 高技术通讯, 2006, 16(5): 501-505
作者姓名:王栋  郭政  李霞  吕莹丽  朱晶  王晨光
作者单位:1. 哈尔滨医科大学生物信息学系,哈尔滨,150086
2. 哈尔滨医科大学生物信息学系,哈尔滨,150086;同济大学生命科学与技术学院,上海,200092;哈尔滨基太生物芯片开发有限责任公司,哈尔滨,150090
摘    要:选取了4套cDNA芯片数据,分别运用补零和K近邻的方法,对有检测缺失的基因进行了补缺失值处理,分析了不同处理对支持向量机、K近邻分类器、决策树三种分类器分类效能的影响.结果显示: 在cDNA基因表达谱数据中,对检测缺失率不高于5%的基因补缺失值是一种较好的策略,这样可以保留较多的基因供后续的功能分析,同时仍然能够保持很高的疾病分类效能.

关 键 词:基因表达谱  缺失值  分类
收稿时间:2005-06-06
修稿时间:2005-06-06

Influence of missing values replacement on disease classification analysis based on gene expression profiles
Wang Dong,Guo Zheng,Li Xia,Lv Yingli,Zhu Jing,Wang Chenguang. Influence of missing values replacement on disease classification analysis based on gene expression profiles[J]. High Technology Letters, 2006, 16(5): 501-505
Authors:Wang Dong  Guo Zheng  Li Xia  Lv Yingli  Zhu Jing  Wang Chenguang
Affiliation:Department of Bioinformatics, Harbin Medical University, Harbin 150086; School of Biological Science and Technology, Tongji University, Shanghai 200092; Harbin Gene-Tech Biochip Development Inc. ,LTD, Harbin 150090
Abstract:In this article, two different missing value treatments (replacing with zeros, KNN estimations) combined with three kinds of classifiers,support vector machine(SVM), K-nearest neighbor(KNN) and decision tree(DT), were used to evaluate the effect on four data sets.The results showed that when the missing value rate was less than 5%, enough genes for classification will remain and quite high classification accuracy can be still got.
Keywords:gene expression profile   missing value   classification
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号