Sequence clustering strategies improve remote homology recognitions while reducing search times |
| |
Authors: | Li, Weizhong Jaroszewski, Lukasz Godzik, Adam |
| |
Affiliation: | The Burnham Institute, La Jolla, CA 92037, USA |
| |
Abstract: | Sequence databases are rapidly growing, thereby increasing thecoverage of protein sequence space, but this coverage is unevenbecause most sequencing efforts have concentrated on a smallnumber of organisms. The resulting granularity of sequence spacecreates many problems for profile-based sequence comparisonprograms. In this paper, we suggest several strategies thataddress these problems, and at the same time speed up the searchesfor homologous proteins and improve the ability of profile methodsto recognize distant homologies. One of our strategies combinesdatabase clustering, which removes highly redundant sequence,and a two-step PSI-BLAST (PDB-BLAST), which separates sequencespaces of profile composition and space of homology searching.The combination of these strategies improves distant homologyrecognitions by more than 100%, while using only 10% of theCPU time of the standard PSI-BLAST search. Another method, intermediateprofile searches, allows for the exploration of additional searchdirections that are normally dominated by large protein sub-familieswithin very diverse families. All methods are evaluated witha large fold-recognition benchmark. |
| |
Keywords: | |
本文献已被 Oxford 等数据库收录! |
|