首页 | 本学科首页   官方微博 | 高级检索  
     

K-means聚类算法在肿瘤基因变异识别中的应用
引用本文:叶骁.K-means聚类算法在肿瘤基因变异识别中的应用[J].计算机应用与软件,2019,36(3):287-290,333.
作者姓名:叶骁
作者单位:复旦大学计算机科学技术学院智能信息处理重点实验室 上海200433
摘    要:二代测序NGS(Next-generation sequencing)数据的迅速发展加快人们对于基因的探索,同时也给测序数据分析任务带来更大的挑战。癌细胞特异变异的识别是测序数据分析的一项重要基础性工作。当前的变异识别工具大多采用贝叶斯模型方法,特异度、灵敏度和速度都远远满足不了需求。K-means是一种简洁高效的无监督聚类算法,基于此将位点信息映射成多维的特征,再进行类别个数为2的聚类过程。该算法明显提高了准确度和召回率,实验结果验证了算法的有效性。

关 键 词:K-MEANS  变异识别  二代测序

USING K-MEANS CLUSTERING ALGORITHM FOR CANCER GENE VARIANT DETECTING
Ye Xiao.USING K-MEANS CLUSTERING ALGORITHM FOR CANCER GENE VARIANT DETECTING[J].Computer Applications and Software,2019,36(3):287-290,333.
Authors:Ye Xiao
Affiliation:(Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai 200433, China)
Abstract:The rapid development of next-generation sequencing data has accelerated the exploration of genes, and has also brought greater challenges to sequencing data analysis tasks. The identification of cancer-specific mutations is an important basic task in sequencing data analysis. Most of the current mutation identification tools use Bayesian model methods, but the specificity, sensitivity, and speed are far from meeting the needs. K-means is a concise and efficient unsupervised clustering algorithm. The algorithm mapped the site information into multidimensional features, and then carried out the clustering process with two classes. The algorithm improved the accuracy and recall rate obviously. Experimental results verify the effectiveness of the algorithm.
Keywords:K-means  Variant calling  Next-generation sequencing
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号