首页 | 本学科首页   官方微博 | 高级检索  
     

基于主成份分析的肿瘤分类检测算法研究
引用本文:王树林,王戟,陈火旺,张波云.基于主成份分析的肿瘤分类检测算法研究[J].计算机工程与科学,2007,29(9):84-90.
作者姓名:王树林  王戟  陈火旺  张波云
作者单位:1. 国防科技大学计算机学院,湖南,长沙,410073;湖南大学计算机与通信学院,湖南,长沙,410082
2. 国防科技大学计算机学院,湖南,长沙,410073
摘    要:基于基因表达谱的肿瘤诊断方法有望成为临床医学上一种快速而有效的诊断方法,但由于基因表达数据存在维数过高、样本量很小以及噪音大等特点,使得提取与肿瘤有关的信息基因成为一件有挑战性的工作。因此,在分析了目前肿瘤分类检测所采用方法的基础上,本文提出了一种结合基因特征记分和主成份分析的混合特征抽取方法。实验表明明,这种方法能够有效地提取分类特征信息,并在保持较高的肿瘤识别准确率的前提下大幅度地降低基因表达数据的维数,使得分类器性能得到很大提高。实验采用了两种与肿瘤有关的基因表达数据集来验证这种混合特征抽取方法的有效性,采用支持向量机的分类实验结果表明,所提出的混合方法不仅交叉验证识别准确率高而且分类结果能够可
可视化。对于结肠癌组织样本集,其交叉验证识别准确率高这95.16%;而对于急性白血病组织样本集,其交叉验证识别准确率高这100%。

关 键 词:支持向量机  基因表达谱  肿瘤分类  主成份分析
文章编号:1007-130X(2007)09-0084-07
修稿时间:2006-04-18

Research of a Tumor Diagnosis Algorithm Based on Principal Component Analysis
WANG Shu-lin,WANG Ji,CHEN Huo-wang,ZHANG Bo-yun.Research of a Tumor Diagnosis Algorithm Based on Principal Component Analysis[J].Computer Engineering & Science,2007,29(9):84-90.
Authors:WANG Shu-lin  WANG Ji  CHEN Huo-wang  ZHANG Bo-yun
Affiliation:1. School of Computer Science, National University of Defense Technology, Changsha 410073; 2. School of Computer and Communications, Hunan University, Changsha 410082, China
Abstract:The tumor diagnosis method based on gene expression profiles will be developed into a fast and effective method in clinical domain in the near future. Although DNA microarray experiments provide us with a huge amount of gene expression data, in fact, only a few genes relate to tumor. Moreover, it is difficult to extract tumor-related genes from gene expression profiles because of its characteristics such as the high dimensionality, the small sample set, many noises and redundancies in gene expression profiles. In this paper we propose a novel feature extraction approach which projects high dimensional data onto a lower dimensional feature space,which improves the SVM-based classification performance of gene expression data. We have examined two sets of gene expression data (colon dataset and leukemia dataset) by means of SVM classifiers with different parameters to validate the proposed approach. Experimental results show that SVM has a superior performance in the classification of gene expression data using the principal components extracted from the top-ranked genes based on the gene ranking method.The cross-validation accuracy of 95.16% has been achieved for colon dataset using SVM classifiers and 100% for leukemia dataset also. Another advantage of the proposed method is that the results of the sample classification can be visualized in the form of 2D or 3D scatter plot.
Keywords:SVM  gene expression profile  tumor classification  principal component analysis
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号