A statistical approach to high-throughput screening of predicted orthologs |
| |
Authors: | Jeong Eun Min Matthew D. WhitesideFiona S.L. Brinkman Brad McNeneyJinko Graham |
| |
Affiliation: | a Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canadab Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC V5A 1S6, Canada |
| |
Abstract: | Orthologs are genes in different species that have diverged from a common ancestral gene after speciation. In contrast, paralogs are genes that have diverged after a gene duplication event. For many comparative analyses, it is of interest to identify orthologs with similar functions. Such orthologs tend to support species divergence (ssd-orthologs) in the sense that they have diverged only due to speciation, to the same relative degree as their species. However, due to incomplete sequencing or gene loss in a species, predicted orthologs can sometimes be paralogs or other non-ssd-orthologs. To increase the specificity of ssd-ortholog prediction, Fulton et al. [Fulton, D., Li, Y., Laird, M., Horsman, B., Roche, F., Brinkman, F., 2006. Improving the specificity of high-throughput ortholog prediction. BMC Bioinformatics 7 (1), 270] developed Ortholuge, a bioinformatics tool that identifies predicted orthologs with atypical genetic divergence. However, when the initial list of putative orthologs contains a non-negligible number of non-ssd-orthologs, the cut-off values that Ortholuge generates for orthology classification are difficult to interpret and can be too high, leading to decreased specificity of ssd-ortholog prediction. Therefore, we propose a complementary statistical approach to determining cut-off values. A benefit of the proposed approach is that it gives the user an estimated conditional probability that a predicted ortholog pair is unusually diverged. This enables the interpretation and selection of cut-off values based on a direct measure of the relative composition of ssd-orthologs versus non-ssd-orthologs. In a simulation comparison of the two approaches, we find that the statistical approach provides more stable cut-off values and improves the specificity of ssd-ortholog prediction for low-quality data sets of predicted orthologs. |
| |
Keywords: | Orthologs Comparative genomics Local-fdr Mixture distribution |
本文献已被 ScienceDirect 等数据库收录! |
|