首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Wu  Cathy  Berry  Michael  Shivakumar  Sailaja  McLarty  Jerry 《Machine Learning》1995,21(1-2):177-193
A neural network classification method has been developed as an alternative approach to the search/organization problem of protein sequence databases. The neural networks used are three-layered, feed-forward, back-propagation networks. The protein sequences are encoded into neural input vectors by a hashing method that counts occurrences ofn-gram words. A new SVD (singular value decomposition) method, which compresses the long and sparsen-gram input vectors and captures semantics ofn-gram words, has improved the generalization capability of the network. A full-scale protein classification system has been implemented on a Cray supercomputer to classify unknown sequences into 3311 PIR (Protein Identification Resource) superfamilies/families at a speed of less than 0.05 CPU second per sequence. The sensitivity is close to 90% overall, and approaches 100% for large superfamilies. The system could be used to reduce the database search time and is being used to help organize the PIR protein sequence database.  相似文献   

2.
A modified counter-propagation (CP) algorithm with supervised learning vector quantizer (LVQ) and dynamic node allocation has been developed for rapid classification of molecular sequences. The molecular sequences were encoded into neural input vectors using an n–gram hashing method for word extraction and a singular value decomposition (SVD) method for vector compression. The neural networks used were three-layered, forward-only CP networks that performed nearest neighbor classification. Several factors affecting the CP performance were evaluated, including weight initialization, Kohonen layer dimensioning, winner selection and weight update mechanisms. The performance of the modified CP network was compared with the back-propagation (BP) neural network and the k–nearest neighbor method. The major advantages of the CP network are its training and classification speed and its capability to extract statistical properties of the input data. The combined BP and CP networks can classify nucleic acid or protein sequences with a close to 100% accuracy at a rate of about one order of magnitude faster than other currently available methods.  相似文献   

3.
4.
In order to study the dynamics of protein and nucleic acid conformations, a molecular folding-unfolding system (FUS written in Lisp) has been developed. Secondary structure features of protein and nucleic acids are graphically represented by cubes in a modified 'Blocks World' paradigm. Modeling of protein and nucleic acid unfolding (denaturation) and folding of their three-dimensional structure is possible by the use of high level 'block' operators which allow displacement of these structural features in space. Due to the flexible nature of this program, FUS is a useful tool for the rapid evaluation of user-defined rules governing conformational changes. The use of FUS to unfold three common proteins (prealbumin, flavodoxin and triose phosphate isomerase) and a tRNA is presented.  相似文献   

5.
A program has been developed that provides molecular biologists with multiple tools for searching databases, yet uses a very simple interface. PATMAT can use protein or (translated) DNA sequences, patterns or blocks of aligned proteins as queries of databases consisting of amino acid or nucleotide sequences, patterns or blocks. The ability to search databases of blocks by 'on-the-fly' conversion to scoring matrices provides a new tool for detection and evaluation of distant relationships. PATMAT uses a pull-down, menu-driven interface to carry out its multiple searching, extraction and viewing functions. Each query or database type is recognized, reported, and the appropriate search carried out, with matches and alignments reported in windows as they occur. Any of the high scoring matches can be exported to a file, viewed and recalled as a query using only a few keystrokes or mouse selections. Searches of multiple database files are carried out by user selection within a window. PATMAT runs under DOS; the searching engine also runs under UNIX.  相似文献   

6.
In this study, a mining system is proposed for finding protein–protein interaction literatures from the databases on the Internet. In this system, we find out discriminating words for protein–protein interaction by way of statistics and the results from literatures. A threshold is also evaluated to check if a given literature is related to protein–protein interactions. In addition, a keypage-based search mechanism is used to find related papers for protein–protein interactions from a given document. To expand the search space and ensure better performance of the system, mechanisms for protein name identification and databases for protein names are also developed.The system is designed with a web-based user interface and a job-dispatching kernel. Experiments are conducted and the results have been checked by a biomedical expert. The experimental results indicate that by using the proposed mining system, it is helpful for researchers to find out protein–protein literatures from the overwhelming piece of information available on the biomedical databases on the Internet.  相似文献   

7.
Biomolecular array technology is an invaluable tool for rapid screening of nucleic acid mixtures. This approach has been tremendously successful both in its breadth of application and its commercial value. Entire genomes, including the human genome, have been screened by molecular array techniques. Arrays are a rapid and now routine method for analysis of expression patterns and their association with physiological states. Such a rapid, high throughput analysis of cellular expression is key to the expansion of our basic knowledge of the relationship between gene expression and organismal function, as well as to the understanding of the genetic component of disease states and the predisposition to disease.Despite the success of array technology for nucleic acid applications, a similar trend for proteins has not occurred. Due, in part, to the difficulties involved in production and labeling of proteins for solid state analysis, solid state arrays of proteins are not widely utilized. Protein function and interaction have been traditionally addressed by the combination of 2D gel electrophoretic separation and mass spectrometry to examine individual protein spots, a slow, tedious and expensive process. Another approach uses in vivo methods for examining protein-protein interactions by the two-hybrid system in yeast and mammalian cells [1]. Although the two-hybrid system has shown some success in finding new interaction between proteins in important cellular pathways, it is far more difficult, costly and time consuming than the solid state methods used for nucleic acids.BioForce Laboratory, Inc., has developed a solid state method for examining the interaction between a wide range of molecules in an array format. This technology involves several key technological innovations.  相似文献   

8.
9.
《Computers & chemistry》1997,21(4):237-256
Artificial neural networks provide a unique computing architecture whose potential has attracted interest from researchers across different disciplines. As a technique for computational analysis, neural network technology is very well suited for the analysis of molecular sequence data. It has been applied successfully to a variety of problems, ranging from gene identification, to protein structure prediction and sequence classification. This article provides an overview of major neural network paradigms, discusses design issues, and reviews current applications in DNA/RNA and protein sequence analysis.  相似文献   

10.
Sovereign rating has had an increasing importance since the beginning of the financial crisis. However, credit rating agencies opacity has been criticised by several authors highlighting the suitability of designing more objective alternative methods. This paper tackles the sovereign credit rating classification problem within an ordinal classification perspective by employing a pairwise class distances projection to build a classification model based on standard regression techniques. In this work the ϵ-SVR is selected as the regressor tool. The quality of the projection is validated through the classification results obtained for four performance metrics when applied to Standard & Poors, Moody's and Fitch sovereign rating data of U27 countries during the period 2007–2010. This validated projection is later used for ranking visualization which might be suitable to build a decision support system.  相似文献   

11.
Towards Deeper Understanding of the Search Interfaces of the Deep Web   总被引:2,自引:0,他引:2  
Many databases have become Web-accessible through form-based search interfaces (i.e., HTML forms) that allow users to specify complex and precise queries to access the underlying databases. In general, such a Web search interface can be considered as containing an interface schema with multiple attributes and rich semantic/meta-information; however, the schema is not formally defined in HTML. Many Web applications, such as Web database integration and deep Web crawling, require the construction of the schemas. In this paper, we first propose a schema model for representing complex search interfaces, and then present a layout-expression based approach to automatically extract the logical attributes from search interfaces. We also rephrase the identification of different types of semantic information as a classification problem, and design several Bayesian classifiers to help derive semantic information from extracted attributes. A system, WISE-iExtractor, has been implemented to automatically construct the schema from any Web search interfaces. Our experimental results on real search interfaces indicate that this system is highly effective.  相似文献   

12.
The capture, analysis and classification of sedimentary organic matter in palynological preparations have been semi-automated. First, the morphological and textural discriminatory features used in classification schemes are measured using a computer-controlled stage and a digital camera mounted on a microscope in combination with Halcon image analysis algorithms. Second, the Exhaustive CHi-square Automatic Interaction Detector classification tree algorithm is applied to all feature measurements to establish their saliency as classification discriminators. Thirdly, the results of the classification tree algorithm are used to determine the inputs used by the actual classifier, which consists of a series of artificial neural networks (ANNs). The Gamma test (GT) is introduced as a tool to help facilitate the best use of limited data and to ensure that the ANNs are not over trained.The results show that the system developed is able to achieve an average correct classification rate of 87%. This is encouraging enough to prompt further research that could result in a commercially viable system. In the future, work will concentrate on refining the image capture component of the system and increasing the size of those databases that have been shown both empirically and by the GT to be too small to facilitate the construction of accurate classifiers.  相似文献   

13.
In this paper, a hybrid intelligent system that consists of the Fuzzy Min–Max neural network, the Classification and Regression Tree, and the Random Forest model is proposed, and its efficacy as a decision support tool for medical data classification is examined. The hybrid intelligent system aims to exploit the advantages of the constituent models and, at the same time, alleviate their limitations. It is able to learn incrementally from data samples (owing to Fuzzy Min–Max neural network), explain its predicted outputs (owing to the Classification and Regression Tree), and achieve high classification performances (owing to Random Forest). To evaluate the effectiveness of the hybrid intelligent system, three benchmark medical data sets, viz., Breast Cancer Wisconsin, Pima Indians Diabetes, and Liver Disorders from the UCI Repository of Machine Learning, are used for evaluation. A number of useful performance metrics in medical applications which include accuracy, sensitivity, specificity, as well as the area under the Receiver Operating Characteristic curve are computed. The results are analyzed and compared with those from other methods published in the literature. The experimental outcomes positively demonstrate that the hybrid intelligent system is effective in undertaking medical data classification tasks. More importantly, the hybrid intelligent system not only is able to produce good results but also to elucidate its knowledge base with a decision tree. As a result, domain users (i.e., medical practitioners) are able to comprehend the prediction given by the hybrid intelligent system; hence accepting its role as a useful medical decision support tool.  相似文献   

14.
Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents a robust and effective approach for speech/music discrimination, which relies on a two-stage cascaded classification scheme. The cascaded classification scheme is composed of a statistical pattern recognition classifier followed by a genetic fuzzy system. For the first stage of the classification scheme, other widely used classifiers, such as neural networks and support vector machines, have also been considered in order to assess the robustness of the proposed classification scheme. Comparison with well-proven signal features is also performed. In this work, the most commonly used genetic learning algorithms (Michigan and Pittsburgh) have been evaluated in the proposed two-stage classification scheme. The genetic fuzzy system gives rise to an improvement of about 4% in the classification accuracy rate. Experimental results show the good performance of the proposed approach with a classification accuracy rate of about 97% for the best trial.  相似文献   

15.
This article proposes a tabu search approach to solve a mathematical programming formulation of the linear classification problem, which consists of determining an hyperplane that separates two groups of points as well as possible in ?m. The tabu search approach proposed is based on a non-standard formulation using linear system infeasibility. The search space is the set of bases defined on the matrix that describes the linear system. The moves are performed by pivoting on a specified row and column. On real machine learning databases, our approach compares favorably with implementations based on parametric programming and irreducible infeasible constraint sets. Additional computational results for randomly generated instances confirm that our method provides a suitable alternative to the mixed integer programming formulation that is solved by a commercial code when the number of attributes m increases.  相似文献   

16.
Pattern recognition has a long history within electrical engineering but has recently become much more widespread as the automated capture of signal and images has been cheaper. Very many of the application of neural networks are to classification, and so are within the field of pattern recognition and classification. In this paper, we explore how probabilistic neural networks fit into the earlier framework of pattern recognition of partial discharge patterns since the PD patterns are an important tool for diagnosis of HV insulation systems. Skilled humans can identify the possible insulation defects in various representations of partial discharge (PD) data. One of the most widely used representation is phase resolved PD (PRPD) patterns. Also this paper describes a method for the automated recognition of PRPD patterns using a novel complex probabilistic neural network system for the actual classification task. The efficacy of composite neural network developed using probabilistic neural network is examined.  相似文献   

17.
In this paper, we present CoLD (colorectal lesions detector) an innovative detection system to support colorectal cancer diagnosis and detection of pre-cancerous polyps, by processing endoscopy images or video frame sequences acquired during colonoscopy. It utilizes second-order statistical features that are calculated on the wavelet transformation of each image to discriminate amongst regions of normal or abnormal tissue. An artificial neural network performs the classification of the features. CoLD integrates the feature extraction and classification algorithms under a graphical user interface, which allows both novice and expert users to utilize effectively all system's functions. It has been developed in close cooperation with gastroenterology specialists and has been tested on various colonoscopy videos. The detection accuracy of the proposed system has been estimated to be more than 95%. As it has been resulted, it can be used as a supplementary diagnostic tool for colorectal lesions.  相似文献   

18.
PRONUC is a menu-driven software package from which a molecular biologist may gain access to a variety of tools for the analysis of protein and nucleic acid sequences. Features include various algorithms for sequence comparisons, secondary structure prediction, sequence manipulation (translation complementation etc.) and finding restriction enzyme cut-sites. The sequences under study can be retrieved from several databases of published sequences or a users sequence(s) can be entered by means of a sequence editor or retrieved from a database constructed by the user. PRONUC comes with a comprehensive manual and on-line help which reflects several years of user feedback and is available for Digital VAX computer systems running the VMS or micro-VMS operating system.  相似文献   

19.
The main objective of this work is to automatically design neural network models with sigmoid basis units for binary classification tasks. The classifiers that are obtained achieve a double objective: a high classification level in the dataset and a high classification level for each class. We present MPENSGA2, a Memetic Pareto Evolutionary approach based on the NSGA2 multiobjective evolutionary algorithm which has been adapted to design Artificial Neural Network models, where the NSGA2 algorithm is augmented with a local search that uses the improved Resilient Backpropagation with backtracking—IRprop+ algorithm. To analyze the robustness of this methodology, it was applied to four complex classification problems in predictive microbiology to describe the growth/no-growth interface of food-borne microorganisms such as Listeria monocytogenes, Escherichia coli R31, Staphylococcus aureus and Shigella flexneri. The results obtained in Correct Classification Rate (CCR), Sensitivity (S) as the minimum of sensitivities for each class, Area Under the receiver operating characteristic Curve (AUC), and Root Mean Squared Error (RMSE), show that the generalization ability and the classification rate in each class can be more efficiently improved within a multiobjective framework than within a single-objective framework.  相似文献   

20.
A principal task in dissecting the genetics of complex traits is to identify causal genes for disease phenotypes. Millions of genes have been sequenced in data-driven genomics era, but their causal relationships with disease phenotypes remain limited, due to the difficulty of elucidating underlying causal genes by laboratory-based strategies. Here, we proposed an innovative deep learning computational modeling alternative (DPPCG framework) for identifying causal (coding) genes for a specific disease phenotype. In terms of male infertility, we introduced proteins as intermediate cell variables, leveraging integrated deep knowledge representations (Word2vec, ProtVec, Node2vec, and Space2vec) quantitatively represented as ‘protein deep profiles’. We adopted deep convolutional neural network (CNN) classifier to model protein deep profiles relationships with male infertility, creatively training deep CNN models of single-label binary classification and multi-label eight classification. We demonstrate the capabilities of DPPCG framework by integrating and fully harnessing the utility of heterogeneous biomedical big data, including literature, protein sequences, protein–protein interactions, gene expressions, and gene–phenotype relationships, and effective indirect prediction of 794 causal genes of male infertility and associated pathological processes. We present this research in an interactive ‘Smart Protein’ intelligent (demo) system (http://www.smartprotein.cloud/public/home). Researchers can benefit from our intelligent system by (i) accessing a shallow gene/protein-radar service involving research status and a knowledge graph-based vertical search; (ii) querying and downloading protein deep profile matrices; (iii) accessing intelligent recommendations for causal genes of male infertility and associated pathological processes, and references for model architectures, parameter settings, and training outputs; and (iv) carrying out personalized analysis such as online K-Means clustering.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号