首页 | 本学科首页   官方微博 | 高级检索  
     


A statistical approach to high-throughput screening of predicted orthologs
Authors:Jeong Eun Min  Matthew D. WhitesideFiona S.L. Brinkman  Brad McNeneyJinko Graham
Affiliation:
  • a Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
  • b Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
  • Abstract:
    Orthologs are genes in different species that have diverged from a common ancestral gene after speciation. In contrast, paralogs are genes that have diverged after a gene duplication event. For many comparative analyses, it is of interest to identify orthologs with similar functions. Such orthologs tend to support species divergence (ssd-orthologs) in the sense that they have diverged only due to speciation, to the same relative degree as their species. However, due to incomplete sequencing or gene loss in a species, predicted orthologs can sometimes be paralogs or other non-ssd-orthologs. To increase the specificity of ssd-ortholog prediction, Fulton et al. [Fulton, D., Li, Y., Laird, M., Horsman, B., Roche, F., Brinkman, F., 2006. Improving the specificity of high-throughput ortholog prediction. BMC Bioinformatics 7 (1), 270] developed Ortholuge, a bioinformatics tool that identifies predicted orthologs with atypical genetic divergence. However, when the initial list of putative orthologs contains a non-negligible number of non-ssd-orthologs, the cut-off values that Ortholuge generates for orthology classification are difficult to interpret and can be too high, leading to decreased specificity of ssd-ortholog prediction. Therefore, we propose a complementary statistical approach to determining cut-off values. A benefit of the proposed approach is that it gives the user an estimated conditional probability that a predicted ortholog pair is unusually diverged. This enables the interpretation and selection of cut-off values based on a direct measure of the relative composition of ssd-orthologs versus non-ssd-orthologs. In a simulation comparison of the two approaches, we find that the statistical approach provides more stable cut-off values and improves the specificity of ssd-ortholog prediction for low-quality data sets of predicted orthologs.
    Keywords:Orthologs   Comparative genomics   Local-fdr   Mixture distribution
    本文献已被 ScienceDirect 等数据库收录!
    设为首页 | 免责声明 | 关于勤云 | 加入收藏

    Copyright©北京勤云科技发展有限公司  京ICP备09084417号