Automatic motif discovery in an enzyme database using a genetic algorithm-based approach |
| |
Authors: | D F Tsunoda H S Lopes |
| |
Affiliation: | (1) Laboratório de Bioinformática / CPGEI, CEFET-PR, Av. 7 de setembro, 3165 80230-901 Curitiba (PR), Brazil |
| |
Abstract: | Proteins can be grouped into families according to some features such as hydrophobicity, composition or structure, aiming
to establish the common biological functions. This paper presents a system that was conceived to discover features (particular
sequences of amino acids, or motifs) that occur very often in proteins of a given family but rarely occur in proteins of other
families. These features can be used for the classification of unknown proteins, that is, to predict their function by analyzing
the primary structure. Runnings were done with the enzymes subset extracted from the Protein Data Bank. The heuristic method
used was based on a genetic algorithm using specially tailored operators for the problem. Motifs found were used to build
a decision tree using the C4.5 algorithm. The results were compared with motifs found by MEME, a freely available web tool.
Another comparison was made with classification results of other two systems: a neural network-based tool and a hidden Markov
model-based tool. The final performance was measured using sensitivity (Se) and specificity (Sp): similar results were obtained
for the proposed tool (78.79 and 95.82) and the neural network-based tool (74.65 and 94.80, respectively), while MEME and
HMMER resulted in an inferior performance. The proposed system has the advantage of giving comprehensible rules when compared
with the other approaches. These results obtained for the enzyme dataset suggest that the evolutionary computation method
proposed is very efficient to find patterns for protein classification. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|