Abstract: | This paper studies the use of text signatures in string searching. Text signatures are a coded representation of a unit of text formed by hashing substrings into bit positions which are, in turn, set to one. Then instead of searching an entire line of text exhaustively, the text signature may be examined first to determine if complete processing is warranted. A hashing function which minimizes the number of collisions in a signature is described. Experimental results for two signature lengths with both a text file and a program file are given. Analyses of the results and the utility and application of the method conclude the discussion. |