首页 | 本学科首页   官方微博 | 高级检索  
     


Protein-coding region discovery in organisms underrepresented in databases.
Authors:Y Quentin  C Voiblet  F Martin  G Fichant
Affiliation:LCB-IBSM CNRS, Marseille, France.
Abstract:The prediction of coding sequences has received a lot of attention during the last decade. We can distinguish two kinds of methods, those that rely on training with sets of example and counter-example sequences, and those that exploit the intrinsic properties of the DNA sequences to be analyzed. The former are generally more powerful but their domains of application are limited by the availability of a training set. The latter avoid this drawback but can only be applied to sequences that are long enough to allow computation of the statistics. Here, we present a method that fills the gap between the two approaches. A learning step is applied using a set of sequences that are assumed to contain coding and non-coding regions, but with the boundaries of these regions unknown. A test step then uses the discriminant function obtained during the learning to predict coding regions in sequences from the same organism. The learning relies upon a correspondence analysis and prediction is presented on a graphical display. The method has been evaluated on a sample of yeast sequences, and the analysis of a set of expressed sequence tags from the Eucalyptus globulus-Pisolithus tinctorius ectomycorrhiza illustrates the relevance of the approach in its biological context.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号