首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A prediction scheme has been developed for the IBM PC and compatibles containing computer programs which make use of the protein secondary structure prediction algorithms of Nagano (1977a,b), Garnier et al. (1978), Burgess et al. (1974), Chou and Fasman (1974a,b), Lim (1974) and Dufton and Hider (1977). The results of the individual prediction methods are combined as described by Hamodrakas et al. (1982) by the program PLOTPROG to produce joint prediction histograms for a protein, for three types of secondary structure: alpha-helix, beta-sheet and beta-turns. The scheme requires uniform input for the prediction programs, produced by any word processor, spreadsheet, editor or database program and produces uniform output on a printer, a graphics screen or a file. The scheme is independent of any additional software and runs under DOS 2.0 or later releases.  相似文献   

2.
This article briefly describes our program Jamsek written in FORTRAN for an ICL 2950/10 computer. Jamsek combines statistical and stereochemical rules most frequently encountered in literature to predict protein secondary structure from its sequence, into a single algorithm. The composite algorithm does not work better than the best existing single algorithms of Garnier et al. (J. Mol. Biol., 120, 97-120, 1978) or Lim (J. Mol. Biol., 88, 873-894, 1974) if percentage of residues with a correctly predicted secondary structure is taken as a criterion. However, it is fairly reliable in predicting the total amount of alpha-helices and beta-sheets in proteins, the secondary structure of highly ordered proteins or their parts and identification of long alpha-helices. It surpasses the previous algorithms by providing a possibility to make a notion about confidence of the prediction of the particular secondary structure elements thanks to the simultaneous availability of four independent predictions of the secondary structure and other relevant data (hydrophobic profile and helical wheel representation). The main body of this article is devoted to a demonstration that output data of Jamsek can simply be used for the prediction of protein topological class, identification of globular proteins containing hydrophobic alpha-helices and, as an auxiliary means, to distinguish between protein coding and non-coding nucleotide sequences.  相似文献   

3.
A modified Chou and Fasman protein structure algorithm   总被引:6,自引:0,他引:6  
A FORTRAN program PRSTRC has been developed for protein secondary structure prediction, which is a modified Chou and Fasman (1978) analysis. This implementation carries out a running average of amino acid structure occurrence frequencies, utilizes a simple set of nucleation conditions, and allows user control over nucleation threshold and cutoff parameters. The algorithm includes prediction of the newly defined secondary structure elements: omega loops (1986). It also generates a charge distribution and hydropathy profile. Output includes a simple graphic display for a printer, or a CRT using color addition. Correct structures are predicted for T. dyscritum hemerythrin and the variable domain of mouse immunoglobin k-chain.  相似文献   

4.
Amino acid propensity score is one of the earliest successful methods used in protein secondary structure prediction. However, the score performs poorly on small-sized datasets and low-identity protein sequences. Based on current in silico method, secondary structure can be predicted from local folds or local protein structure. In biology, the evolution of secondary structure produces local protein structure with different lengths. To precisely predict secondary structures, we propose a derivative feature vector, DPS that utilizes the optimal length of the local protein structure. DPS is the unification of amino acid propensity score and dihedral angle score. This new feature vector is further normalized to level the edges. Prediction is performed by support vector machines (SVM) over the DPS feature vectors with class labels generated by secondary structure assignment method (SSAM) and secondary structure prediction method (SSPM). All experiments are carried out on RS126 sequences. The results from this proposed method also highlight the overall accuracy of our method compared to other state-of-the-art methods. The performance of our method was acceptable specifically in dealing with low number and low identity sequences.  相似文献   

5.
The rapid generation of mutation data matrices from protein sequences.   总被引:35,自引:0,他引:35  
An efficient means for generating mutation data matrices from large numbers of protein sequences is presented here. By means of an approximate peptide-based sequence comparison algorithm, the set sequences are clustered at the 85% identity level. The closest relating pairs of sequences are aligned, and observed amino acid exchanges tallied in a matrix. The raw mutation frequency matrix is processed in a similar way to that described by Dayhoff et al. (1978), and so the resulting matrices may be easily used in current sequence analysis applications, in place of the standard mutation data matrices, which have not been updated for 13 years. The method is fast enough to process the entire SWISS-PROT databank in 20 h on a Sun SPARCstation 1, and is fast enough to generate a matrix from a specific family or class of proteins in minutes. Differences observed between our 250 PAM mutation data matrix and the matrix calculated by Dayhoff et al. are briefly discussed.  相似文献   

6.
编码方式是影响蛋白质二级结构预测准确率的重要因素之一。针对单序列蛋白质二级结构预测问题,提出了一种新的综合编码方法。该编码是根据氨基酸出现在每种二级结构中的倾向因子以及氨基酸的疏水性值进行分类,并以二进制形式来表示每类氨基酸的编码方法。在相同的实验条件下,首先用不同的编码方式对数据集CB513进行编码,然后采用支持向量机的方法进行训练建模预测。实验结果显示提出编码的预测准确率比20位正交编码和5位编码分别高出1.48%和10.68%。可见,该编码比较适合非同源或低同源蛋白质结构预测。  相似文献   

7.
罗林波  陈绮 《微机发展》2010,(2):206-208,212
组成蛋白质的基本单位是氨基酸,对于蛋白质分类预测问题,氨基酸序列特征提取方法是一个非常重要的因素。对基于氨基酸组成、位置的特征提取算法如熵密度、n阶耦联组成和基于氨基酸性质的特征提取方法如自相关函数、伪氨基酸组成等方法进行了阐述,并进行了简单评价。基于氨基酸组成的方法实现简单、计算量小,且对所有的氨基酸序列都适用,但丢失了氨基酸的顺序信息以及其问的相互作用,基于氨基酸位置信息或理化特性等方法计算量非常大,科研工作者可以根据对蛋白质的不同要求选择相应的特征提取方法。  相似文献   

8.
The problem of protein secondary structure prediction is one of the most important problems in Bioinformatics. After the study of this problem for 30 years and more, there have been some breakthroughs. Especially, the introduction of ensemble prediction model and hybrid prediction model makes the accuracy of prediction better, but there is a long distance to induce the tertiary structures from the secondary ones. As one of the extension researches of KDTICM [Bingru, Yang (2004). Knowledge discovery based on theory of inner cognition mechanism and application. Beijing: Electronic Industry Press] theory, this paper proposed a method KAAPRO, which is based on Maradbcm algorithm which is induced by KDD1 model and combined with CBA, for protein secondary structure prediction. And a gradually enhanced, multi-layer systematic prediction model, compound pyramid model, is proposed. The kernel of this model is KAAPRO. Domain knowledge is used through the whole model, and the physical–chemical attributes are chosen by causal cellular automata. In the experiment, the test proteins used in reference Muggleton et al. (Muggleton, S. H., King, R., Sternberg, M. (1992). Protein secondary structure prediction using logic-based machine learning. Protein Engineering, 5(7), 647–657) are predicted. The structures of amino acids, whose structural traits are obscure, are predicted well by KAAPRO. Hence, the result of this model is satisfying too.  相似文献   

9.
Protein Structure from Contact Maps: A Case-Based Reasoning Approach   总被引:1,自引:0,他引:1  
Determining the three-dimensional structure of a protein is an important step in understanding biological function. Despite advances in experimental methods (crystallography and NMR) and protein structure prediction techniques, the gap between the number of known protein sequences and determined structures continues to grow. Approaches to protein structure prediction vary from those that apply physical principles to those that consider known amino acid sequences and previously determined protein structures. In this paper we consider a two-step approach to structure prediction: (1) predict contacts between amino acids using sequence data; (2) predict protein structure using the predicted contact maps. Our focus is on the second step of this approach. In particular, we apply a case-based reasoning framework to determine the alignment of secondary structures based on previous experiences stored in a case base, along with detailed knowledge of the chemical and physical properties of proteins. Case-based reasoning is founded on the premise that similar problems have similar solutions. Our hypothesis is that we can use previously determined structures and their contact maps to predict the structure for novel proteins from their contact maps. The paper presents an overview of contact maps along with the general principles behind our methodology of case-based reasoning. We discuss details of the implementation of our system and present empirical results using contact maps retrieved from the Protein Data Bank. Funding provided by: The Natural Science and Engineering Research Council (Ottawa); Institute for Robotics and Intelligent Systems (Ottawa); Protein Engineering Network Center of Excellence (Edmonton)  相似文献   

10.
One of the main research problems in structural bioinformatics is the prediction of three-dimensional structures (3-D) of polypeptides or proteins. The current rate at which amino acid sequences are identified increases much faster than the 3-D protein structure determination by experimental methods, such as X-ray diffraction and NMR techniques. The determination of protein structures is both experimentally expensive and time consuming. Predicting the correct 3-D structure of a protein molecule is an intricate and arduous task. The protein structure prediction (PSP) problem is, in computational complexity theory, an NP-complete problem. In order to reduce computing time, current efforts have targeted hybridizations between ab initio and knowledge-based methods aiming at efficient prediction of the correct structure of polypeptides. In this article we present a hybrid method for the 3-D protein structure prediction problem. An artificial neural network knowledge-based method that predicts approximated 3-D protein structures is combined with an ab initio strategy. Molecular dynamics (MD) simulation is used to the refinement of the approximated 3-D protein structures. In the refinement step, global interactions between each pair of atoms in the molecule (including non-bond interactions) are evaluated. The developed MD protocol enables us to correct polypeptide torsion angles deviation from the predicted structures and improve their stereo-chemical quality. The obtained results shows that the time to predict native-like 3-D structures is considerably reduced. We test our computational strategy with four mini proteins whose sizes vary from 19 to 34 amino acid residues. The structures obtained at the end of 32.0 nanoseconds (ns) of MD simulation were comparable topologically to their correspondent experimental structures.  相似文献   

11.
Protein structure prediction is currently one of the main open challenges in Bioinformatics. The protein contact map is an useful, and commonly used, representation for protein 3D structure and represents binary proximities (contact or non-contact) between each pair of amino acids of a protein. In this work, we propose a multi-objective evolutionary approach for contact map prediction based on physico-chemical properties of amino acids. The evolutionary algorithm produces a set of decision rules that identifies contacts between amino acids. The rules obtained by the algorithm impose a set of conditions based on amino acid properties to predict contacts. We present results obtained by our approach on four different protein data sets. A statistical study was also performed to extract valid conclusions from the set of prediction rules generated by our algorithm. Results obtained confirm the validity of our proposal.  相似文献   

12.
The study of spatial folding of peptides is a very difficult task needing time-consuming elaborations. The complexity of the problem demands tools that predict in a simple manner basic properties such as the secondary structure starting from the amino acid sequence, which contains all the information necessary for the determination fo the folding of a protein. The study of secondary structure is of considerable interest, in particular the prediction of regular structures, because these regions, like alpha-helices and beta-sheets, may form nucleation sites (M.J.E. Sternberg and J.M. Thornton, Nature 2H (1978) 15-20; B. Robson and R.H. Pain, Biochem. J. 155 (1976) 331-344). The aim of this paper is to propose a procedure for the secondary structure prediction, based on statistics (B. Robson and J. Garnier, Introduction to Proteins and Protein Engineering (Elsevier, Amsterdam, 1986); J. Garnier, D.J. Osguthorpe and B. Robson, J. Mol. Biol. 120 (1977) 97-120) and heuristic rules, also taking into account experimental data.  相似文献   

13.
We describe a simple approach for finding identical amino acid clusters on the outer surface of α -helical coiled-coil proteins by examining the sequence of amino acids that compose the protein. Finding such similarities is an important immunological problem, since these may correspond to cross-reactive epitopes, i.e., sites at which antibodies produced against one protein also bind to another conformationally similar protein. Because of the regularities inherent in a coiled-coil structure the position of each amino acid on the structure is predicted. Based on this prediction, our algorithm finds similarities on the outer surface of the proteins. The matches found by our algorithm serve as an important screening process, intended to indicate which experiments to conduct to determine sites that correspond to cross-reactive epitopes. The location of several cross-reactive epitopes between M proteins and myosins had been verified experimentally. Although our approach makes many simplifying assumptions, these epitopes always correspond to clusters of identical amino acids, which our algorithm predicted to be contiguous on the outer surface. Our algorithm runs in O(n+m+r) time and O(n+m) space, where n and m are the lengths of the protein sequences, and r is the number of matching amino acids that appear in the same structural position of the α -helix in both sequences. Received June 7, 1997; revised March 23, 1998.  相似文献   

14.
Accurate protein secondary structure prediction plays an important role in direct tertiary structure modeling, and can also significantly improve sequence analysis and sequence-structure threading for structure and function determination. Hence improving the accuracy of secondary structure prediction is essential for future developments throughout the field of protein research.In this article, we propose a mixed-modal support vector machine (SVM) method for predicting protein secondary structure. Using the evolutionary information contained in the physicochemical properties of each amino acid and a position-specific scoring matrix generated by a PSI-BLAST multiple sequence alignment as input for a mixed-modal SVM, secondary structure can be predicted at significantly increased accuracy. Using a Knowledge Discovery Theory based on the Inner Cognitive Mechanism (KDTICM) method, we have proposed a compound pyramid model, which is composed of three layers of intelligent interface that integrate a mixed-modal SVM (MMS) module, a modified Knowledge Discovery in Databases (KDD1) process, a mixed-modal back propagation neural network (MMBP) module and so on.Testing against data sets of non-redundant protein sequences returned values for the Q3 accuracy measure that ranged from 84.0% to 85.6%,while values for the SOV99 segment overlap measure ranged from 79.8% to 80.6%. When compared using a blind test dataset from the CASP8 meeting against currently available secondary structure prediction methods, our new approach shows superior accuracy.Availability: http://www.kdd.ustb.edu.cn/protein_Web/.  相似文献   

15.
张蕾  李征  郑逢斌  杨伟 《计算机应用》2017,37(5):1512-1515
蛋白质二级结构预测是结构生物学中的一个重要问题。针对八类蛋白质二级结构预测,提出了一种基于递归神经网络和前馈神经网络的深度学习预测算法。该算法通过双向递归神经网络建模氨基酸间的局部和长程相互作用,递归神经网络的隐层输出进一步送入到三层的前馈神经网络以便进行八类蛋白质二级结构预测。实验结果表明,提出的算法在CB513数据集上达到了67.9%的Q8预测精度,显著地优于SSpro8和SC-GSN。  相似文献   

16.
《Computers & chemistry》1998,21(4):279-294
The preference functions method is described for prediction of membrane-buried helices in membrane proteins. Preference for the α-helix conformation of amino acid residue in a sequence is a non-linear function of average hydrophobicity of its sequence neighbors. Kyte–Doolittle hydropathy values are used to extract preference functions from a training data set of integral membrane proteins of partially known secondary structure. Preference functions for β-sheet, turn and undefined conformation are also extracted by including β-class soluble proteins of known structure in the training data set. Conformational preferences are compared in tested sequence for each residue and predicted secondary structure is associated with the highest preference. This procedure is incorporated in an algorithm that performs accurate prediction of transmembrane helical segments. Correct sequence location and secondary structure of transmembrane segments is predicted for 20 of 21 reference membrane polypeptides with known crystal structure that were not included in the training data set. Comparison with hydrophobicity plots revealed that our preference profiles are more accurate and exhibit higher resolution and less noise. Shorter unstable or movable membrane-buried α-helices are also predicted to exist in different membrane proteins with transport function. For instance, in the sequence of voltage-gated ion channels and glutamate receptors, N-terminal parts of known P-segments can be located as characteristic α-helix preference peaks. Our e-mail server: predict@drava.etfos.hr, returns a preference profile and secondary structure prediction for a suspected or known membrane protein when its sequence is submitted.  相似文献   

17.
鉴于不同类型氨基酸的相互作用对蛋白质结构预测的影响不同,文中融合卷积神经网络和长短时记忆神经网络模型,提出卷积长短时记忆神经网络,并应用到蛋白质8类二级结构的预测中.首先基于氨基酸序列的类别信息和氨基酸结构的进化信息表示蛋白质序列,并采用卷积提取氨基酸残基之间的局部相关特征,然后利用双向长短时记忆神经网络提取蛋白质序列内部残基之间的远程相互作用,最后将提取的蛋白质的局部相关特征和远程相互作用用于蛋白质8类二级结构的预测.实验表明,相比基准方法,文中模型提高8类二级结构预测的精度,并具有良好的可扩展性.  相似文献   

18.
《Information Fusion》2009,10(3):217-232
Protein secondary structure prediction is still a challenging problem at today. Even if a number of prediction methods have been presented in the literature, the various prediction tools that are available on-line produce results whose quality is not always fully satisfactory. Therefore, a user has to know which predictor to use for a given protein to be analyzed. In this paper, we propose a server implementing a method to improve the accuracy in protein secondary structure prediction. The method is based on integrating the prediction results computed by some available on-line prediction tools to obtain a combined prediction of higher quality. Given an input protein p whose secondary structure has to be predicted, and a group of proteins F, whose secondary structures are known, the server currently works according to a two phase approach: (i) it selects a set of predictors good at predicting the secondary structure of proteins in F (and, therefore, supposedly, that of p as well), and (ii) it integrates the prediction results delivered for p by the selected team of prediction tools. Therefore, by exploiting our system, the user is relieved of the burden of selecting the most appropriate predictor for the given input protein being, at the same time, assumed that a prediction result at least as good as the best available one will be delivered. The correctness of the resulting prediction is measured referring to EVA accuracy parameters used in several editions of CASP.  相似文献   

19.
蛋白质二级结构的协同训练预测方法*   总被引:1,自引:1,他引:0  
针对蛋白质二级结构机器学习预测方法,忽略氨基酸疏水性特征以及氨基酸之间的长程作用和准确率不高的现状,进行了比较实验分析。采用氨基酸对应的疏水能值替换蛋白质中相应的氨基酸,得到疏水能值的序列实验结果表明,用长的疏水能值序列,训练BP网络,对长程作用起主导的E结构的预测效果好。由于Profile编码特征和疏水能值特征是独立的冗余视图,基于协同训练思想,提出Cotraining算法。该算法的主要步骤是在Profile特征空间训练SVM分类器,在疏水性特征空间训练BP神经网络分类器,协同对氨基酸二级结构进行预测  相似文献   

20.
特征向量的构造是蛋白质二级结构预测的一个关键问题. 现有的研究方法,通常只使用BLOSUM62进化矩阵生成PSSM矩阵,对蛋白质进化过程中存在的氨基酸残基突变现象缺乏考虑. 本文提出利用多重进化矩阵构造蛋白质特征向量,其融合了不同进化时间的PSSM矩阵,不仅能够很好地反映序列中氨基酸的位置信息,而且能够反映序列进化过程中氨基酸位点发生突变产生的影响. 本文通过组合不同进化程度的矩阵来构造特征向量,选用逻辑回归、随机森林和多分类支持向量机三种分类算法作为预测工具,利用网格搜索法和交叉实验法优化参数,在RS126、CB513和25PDB公用数据集上进行了若干组实验. 对比实验结果表明,本文所提出基于多重进化矩阵的蛋白质特征向量构造方法能够有效提高蛋白质二级结构的预测精度.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号