A test for the statistical significance of DNA sequence similarities for application in databank searches |
| |
Authors: | R F Mott T B Kirkwood R N Curnow |
| |
Affiliation: | Laboratory of Mathematical Biology, National Institute for Medical Research, London, UK. |
| |
Abstract: | A method is developed, based on word-searching, which provides a rapid test for the statistical significance of DNA sequence similarities for use in databank searching. The method makes allowance for the lengths and dinucleotide compositions of the sequences being compared. A way is also described to calculate the power of the test, i.e. the probability of detecting a given similarity as being statistically significant. The effects on the power of the test of the scoring method, word length, sequence length, and sequence composition are examined. A novel scoring method is shown to be superior to the method currently used in most word-searching algorithms. |
| |
Keywords: | |
|
|