首页 | 本学科首页   官方微博 | 高级检索  
     


Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition
Authors:Wu  Cathy  Berry  Michael  Shivakumar  Sailaja  McLarty  Jerry
Affiliation:(1) Department of Epidemiology/Biomathematics, The University of Texas Health Center at Tyler, 75710 Tyler, Texas;(2) Department of Computer Science, University of Tennessee, 37996-1301 Knoxville, Tennessee
Abstract:A neural network classification method has been developed as an alternative approach to the search/organization problem of protein sequence databases. The neural networks used are three-layered, feed-forward, back-propagation networks. The protein sequences are encoded into neural input vectors by a hashing method that counts occurrences ofn-gram words. A new SVD (singular value decomposition) method, which compresses the long and sparsen-gram input vectors and captures semantics ofn-gram words, has improved the generalization capability of the network. A full-scale protein classification system has been implemented on a Cray supercomputer to classify unknown sequences into 3311 PIR (Protein Identification Resource) superfamilies/families at a speed of less than 0.05 CPU second per sequence. The sensitivity is close to 90% overall, and approaches 100% for large superfamilies. The system could be used to reduce the database search time and is being used to help organize the PIR protein sequence database.
Keywords:neural networks  database search  protein classification  sequence analysis  superfamily  singular value decomposition (SVD)
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号