首页 | 本学科首页   官方微博 | 高级检索  
     

一种新的CpG岛的位置识别算法
引用本文:刘维,陈崚.一种新的CpG岛的位置识别算法[J].小型微型计算机系统,2012,33(7):1557-1563.
作者姓名:刘维  陈崚
作者单位:1. 扬州大学信息工程学院计算机系,扬州,225127
2. 扬州大学信息工程学院计算机系,扬州225127;南京大学计算机软件新技术国家重点实验室,南京210093
摘    要:随着多数生物基因组测序工作的完成,基因识别就显得尤为重要.CpG岛在基因组中有着重要的生物学意义,因此识别CpG岛将有助于基因的识别.目前已经构建的一些识别CpG岛的位置的模型大都存在标注偏差、需要独立假设等缺点,为此提出一种基于条件随机场(CRFs)模型的CpG岛的位置识别的新方法.该方法将识别CpG岛的位置的问题转化为序列标记问题,并根据CpG岛的位置的性质设计了相应的模型构建、训练以及解码的算法.利用本文算法可以对输入序列确定最有可能的标注序列,从而识别CpG岛的位置.通过对标准数据库的数据进行测试,其实验结果表明本文算法是可行的、高效的,比HMM方法有更高的准确率.

关 键 词:条件随机场模型  CpG岛  序列标记

Novel Method for CpG Islands Location Identification
LIU Wei , CHEN Ling.Novel Method for CpG Islands Location Identification[J].Mini-micro Systems,2012,33(7):1557-1563.
Authors:LIU Wei  CHEN Ling
Affiliation:Ling1,2 1(Department of Computer Science,Yangzhou University,Yangzhou 225127,China) 2(National Key Lab of Novel Software Technology,Nanjing University,Nanjing 210093,China)
Abstract:While the genomes of the organisms have been sequenced,gene prediction becomes one of the most important projects.CpG islands are of important biological significance in the genomes.CpG islands location identification is helpful for gene prediction.In order to overcome the shortcomings of existing models such as the strong independence assumptions which generative model must have,the label-bias problem exhibited by maximum entropy markov model and other non-generative models,we present a novel method for CpG islands location identification based on conditional random fields model.The method transforms the problem of CpG islands location identification into sequential data labeling.Based on the properties of CpG islands location,we design the corresponding methods of model constructing、 training and decoding.In this paper,we also design the corresponding feature functions and obtain the weights from the joint distribution over the label sequence given observation through a learning procedure on training data.Then according to the distribution model obtained,we can determine the labeled sequence with maximum probability and thereby identify the location of CpG islands.We test our algorithm by the use of the data sets from the standard database.The experimental results show that compared with other traditional algorithms,our algorithm is more practicable and efficient than the method of HMM.
Keywords:conditional random fields model  CpG islands  sequential data labeling
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号