首页 | 本学科首页   官方微博 | 高级检索  
     

肿瘤信息基因启发式宽度优先搜索算法研究
引用本文:王树林,王戟,陈火旺,李树涛,张波云.肿瘤信息基因启发式宽度优先搜索算法研究[J].计算机学报,2008,31(4):636-649.
作者姓名:王树林  王戟  陈火旺  李树涛  张波云
作者单位:1. 国防科技大学计算机学院,长沙,410073;湖南大学计算机与通信学院,长沙,410082
2. 国防科技大学计算机学院,长沙,410073
3. 湖南大学电气与工程学院,长沙,410082
基金项目:湖南省自然科学杰出青年基金(06JJ1010)资助~~
摘    要:基于基因表达谱的肿瘤检测方法有望成为临床医学上一种快速而有效的肿瘤分子诊断方法,但由于基因表达谱数据存在维数过高、样本量很小以及噪音很大等特点,使得肿瘤信息基因选择成为一件有挑战性的工作.根据肿瘤基因表达谱样本集的特点,提出了一种以支持向量机分类性能为评估准则的寻找信息基因的启发式宽度优先搜索算法,其优点是能够同时搜索到基因数量尽可能少而分类能力尽可能强的多个信息基因子集.实验采用了3种肿瘤样本集以验证新算法的可行性和有效性,对于急性白血病、难以分类的结肠癌和多肿瘤亚型的小圆蓝细胞瘤样本集,分别只需2,4和4个信息基因就能获得100%的4-折交叉验证识别准确率.与其它优秀的肿瘤分类方法相比,实验结果在信息基因数量及其分类性能方面具有明显的优越性.为避免样本集的不同划分对分类性能的影响,提出了一种能够更加客观地反映信息基因子集分类性能的全折交叉验证评估方法.

关 键 词:基因表达谱  肿瘤分类  信息基因选择  支持向量机  全折交叉验证方法
修稿时间:2006年12月16

Heuristic Breadth-First Search Algorithm for Informative Gene Selection Based on Gene Expression Profiles
WANG Shu-Lin,WANG Ji,CHEN Huo-Wang,LI Shu-Tao,ZHANG Bo-Yun.Heuristic Breadth-First Search Algorithm for Informative Gene Selection Based on Gene Expression Profiles[J].Chinese Journal of Computers,2008,31(4):636-649.
Authors:WANG Shu-Lin  WANG Ji  CHEN Huo-Wang  LI Shu-Tao  ZHANG Bo-Yun
Abstract:The tumor diagnosis method based on gene expression profiles will be developed into the fast and effective method in clinical domain in the near future. Although DNA microarray experiments provide us with huge amount of gene expression data, only a few of genes are related to tumor in gene expression profiles. Moreover, it is difficult to select informative genes related to tumor from gene expression profiles because of its characteristics such as high dimensionality, small sample set and many noises in gene expression profiles. According to its characteristic, a novel heuristic breadth-first search algorithm based on support vector machines is proposed, which can simultaneously find as many informative gene subsets as possible in which the number of informative genes is almost least but its classification performance is almost highest in spite of its time-consuming characteristic. Three tumor sample sets are examined by the novel approach and experiments show that the novel approach is feasible and effective in tumor classification. Experiment results show that 100% of 4-fold cross-validation accuracy has been achieved by only two, four and four genes for leukemia, colon tumor and SRBCT (Small Round Blue Cells Tumor) datasets, respectively, which is superior to the results of other tumor classification methods. To avoid the affect of different partition of sample set, the full-fold cross-validated method that can more objectively evaluate the classification performance of informative gene subset is proposed.
Keywords:gene expression profiles  tumor classification  informative gene selection  support vector machines  full-fold cross-validated method
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号