首页 | 本学科首页   官方微博 | 高级检索  
     

K-split Lasso:有效的肿瘤特征基因选择方法
引用本文:张靖,胡学钢,张玉红,施万锋.K-split Lasso:有效的肿瘤特征基因选择方法[J].计算机科学与探索,2012(12):1136-1143.
作者姓名:张靖  胡学钢  张玉红  施万锋
作者单位:合肥工业大学 计算机与信息学院,合肥 230009
基金项目:国家自然科学基金 No. 60975034;安徽省自然科学基金 No. 1208085QF122;中央高校基本科研业务费专项资金 Nos. 2011HGBZ1329,2011HGQC1013~~
摘    要:随着DNA微阵列技术的出现,大量关于不同肿瘤的基因表达谱数据集被发布到网络上,从而使得对肿瘤特征基因选择和亚型分类的研究成为生物信息学领域的热点。基于Lasso(least absolute shrinkage and selection operator)方法提出了K-split Lasso特征选择方法,其基本思想是将数据集平均划分为K份,分别使用Lasso方法对每份进行特征选择,而后将选择出来的每份特征子集合并,重新进行特征选择,得到最终的特征基因。实验采用支持向量机作为分类器,结果表明K-split Lasso方法减少了冗余特征,提高了分类精度,具有良好的稳定性。由于每次计算的维数降低,K-split Lasso方法解决了计算开销过大的问题,并在一定程度上解决了"过拟合"问题。因此K-split Lasso方法是一种有效的肿瘤特征基因选择方法。

关 键 词:肿瘤基因表达谱  Lasso  特征选择  支持向量机

K-split Lasso: An Effective Feature Selection Method for Tumor Gene Expression Data
ZHANG Jing , HU Xuegang, ZHANG Yuhong, SHI Wanfeng.K-split Lasso: An Effective Feature Selection Method for Tumor Gene Expression Data[J].Journal of Frontier of Computer Science and Technology,2012(12):1136-1143.
Authors:ZHANG Jing  HU Xuegang  ZHANG Yuhong  SHI Wanfeng
Affiliation:School of Computer and Information, Hefei University of Technology, Hefei 230009, China
Abstract:With the advent of DNA microarray technology, a large number of open-access tumor gene expression datasets are searchable online and can be downloaded. Informative gene selection and tumor subtype classification have been becoming one of primary research fields in Bioinformatics. This paper proposes K-split Lasso (least absolute shrinkage and selection operator) method for gene selection, whose main idea is to divide the feature sets into K parts, and then select the genes from each feature subset using Lasso, finally merge the selected genes into one feature subset to get the informative genes. Using the support vector machine as classification tool, the experimental results indicate that K-split Lasso reduces data redundancy, improves sample classification accuracy, and has good stability. In addition, K-split Lasso overcomes the large computation and overfitting problems due to the decrease of dimension. K-split Lasso is an effective method for gene selection of tumor.
Keywords:tumor gene expression profiles  Lasso  feature selection  support vector machine
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号