Twilight zone of protein sequence alignments |
| |
Authors: | Rost Burkhard |
| |
Affiliation: | 1 EMBL, 69 012 Heidelberg, 2 LION Bioscience AG, Im Neuenheimer Feld 517, 69 120 Heidelberg, Germany and 3 Columbia University, Department of Biochemistry and Molecular Biophysics, 650 West 168 Street, New York, NY 10032, USA |
| |
Abstract: | Sequence alignments unambiguously distinguish between proteinpairs of similar and non-similar structure when the pairwisesequence identity is high (>40% for long alignments). Thesignal gets blurred in the twilight zone of 2035% sequenceidentity. Here, more than a million sequence alignments wereanalysed between protein pairs of known structures to re-definea line distinguishing between true and false positives for lowlevels of similarity. Four results stood out. (i) The transitionfrom the safe zone of sequence alignment into the twilight zoneis described by an explosion of false negatives. More than 95%of all pairs detected in the twilight zone had different structures.More precisely, above a cut-off roughly corresponding to 30%sequence identity, 90% of the pairs were homologous; below 25%less than 10% were. (ii) Whether or not sequence homology impliedstructural identity depended crucially on the alignment length.For example, if 10 residues were similar in an alignment oflength 16 (>60%), structural similarity could not be inferred.(iii) The `more similar than identical' rule (discarding allpairs for which percentage similarity was lower than percentageidentity) reduced false positives significantly. (iv) Usingintermediate sequences for finding links between more distantfamilies was almost as successful: pairs were predicted to behomologous when the respective sequence families had proteinsin common. All findings are applicable to automatic databasesearches. |
| |
Keywords: | alignment quality analysis/ evolutionary conservation/ genome analysis/ protein sequence alignment/ sequence space hopping |
本文献已被 Oxford 等数据库收录! |
|