首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 254 毫秒
1.
氨基酸序列编码问题一直是在蛋白质结构预测中导致算法输入空间较大的主要原因。只有对氨基酸序列进行更好的编码,才能为后续进行计算机分析打下基础。提出并实现了综合考虑了氨基酸序列的划分和长程作用效应,利用氨基酸正交编码区分每个氨基酸个体,利用基本正交矩阵获得氨基酸在物理、化学、生物上的相似性,利用分属概率来获得当前蛋白质序列中氨基酸构成不同二级结构的趋势的新的混合编码方法,从而改进了氨基酸残基序列编码,并利用现有算法比较了不同编码方式对蛋白质二级结构预测的影响,结果证实该编码方式能够提高蛋白质二级结构预测的准确性。  相似文献   

2.
氨基酸序列编码问题一直是在蛋白质结构预测中导致算法输入空间较大的主要原因。只有对氨基酸序列进行更好的编码.才能为后续进行计算机分析打下基础。提出并实现了综合考虑了氨基酸序列的划分和长程作用效应,利用氨基酸正交编码区分每个氨基酸个体,利用基本正交矩阵获得氨基酸在物理、化学、生物上的相似性,利用分属概率来获得当前蛋白质序列中氨基酸构成不同二级结构的趋势的新的混合编码方法,从而改进了氨基酸残基序列编码,并利用现有算法比较了不同编码方式对蛋白质二级结构预测的影响,结果证实该编码方式能够提高蛋白质二级结构预测的准确性。  相似文献   

3.
编码方式是影响蛋白质二级结构预测准确率的重要因素之一。针对单序列蛋白质二级结构预测问题,提出了一种新的综合编码方法。该编码是根据氨基酸出现在每种二级结构中的倾向因子以及氨基酸的疏水性值进行分类,并以二进制形式来表示每类氨基酸的编码方法。在相同的实验条件下,首先用不同的编码方式对数据集CB513进行编码,然后采用支持向量机的方法进行训练建模预测。实验结果显示提出编码的预测准确率比20位正交编码和5位编码分别高出1.48%和10.68%。可见,该编码比较适合非同源或低同源蛋白质结构预测。  相似文献   

4.
蛋白质结构与功能一直是生命科学的研究重点.尽管蛋白质二级结构的预测已得到广泛的应用,但其预测的精度一直受到算法的制约.在本文中,采用复合编码代替传统的氨基酸编码方式,结合氨基酸疏水性对蛋白质结构的影响,提出一种新的支持向量机算法.使用7倍交叉验证表明,本算法提高了二级蛋白质结构预测的准确性,并节约了计算资源.  相似文献   

5.
蛋白质二级结构类型预测是当今生物信息学研究的热点之一。利用氨基酸数字编码模型将氨基酸序列转换成数字信号,根据LZ复杂度的算法计算了氨基酸的伪氨基酸成分,再对伪氨基酸成分用OET-KNN算法进行分类预测。Jackknife测试结果表明该算法能使得预测成功率有较大的提高。  相似文献   

6.
肖绚  肖纯材  王普 《计算机应用研究》2010,27(10):3698-3700
蛋白质二级结构预测在蛋白质结构预测中具有很重要的作用。基于伪氨基酸成分表示蛋白质的方法,能提高蛋白质结构和功能预测的成功率,利用蛋白质距离矩阵灰度图,基于几何矩提出了一种伪氨基酸构造方法,结合氨基酸的成分对蛋白质二级结构类型进行预测,通过国际公认的Jackknife检验方法显示预测成功率达到95.10%,比其他方法高出许多,说明此方法具有有效的分类效果。  相似文献   

7.
多聚脯氨酸二型螺旋是一种特殊且稀少的蛋白质二级结构。为了节省实验方法测定该结构的时间和成本,本文设计一种基于卷积神经网络的深度学习算法用于预测多聚脯氨酸二型螺旋。首先,对蛋白质序列信息进行特征编码生成特征矩阵,特征编码方式包括氨基酸正交码、氨基酸物理化学性质和位置特异性打分矩阵。其次,将归一化处理后的特征矩阵输入到卷积神经网络中,自动提取蛋白质序列的局部深层特征并输出多聚脯氨酸二型螺旋的预测结果。实验结果表明,该算法的性能相较于支持向量机之类的6种传统机器学习算法有明显的提升。  相似文献   

8.
提出了一种基于结构特征的蛋白质二级结构预测方法。先对氨基酸的理化特性进行主成分分析,提取出主要影响因素,并融合成3位编码。接着,在原有3位编码基础上加入3位氨基酸在特定二级结构中的倾向因子。编码完成后,使用支持向量机方法进行预测。实验结果表明,改进后的编码方式优于单纯做主成分分析得到的3位编码和5位编码方式,可以有效地用于蛋白质二级结构预测。  相似文献   

9.
蛋白质二级结构的协同训练预测方法*   总被引:1,自引:1,他引:0  
针对蛋白质二级结构机器学习预测方法,忽略氨基酸疏水性特征以及氨基酸之间的长程作用和准确率不高的现状,进行了比较实验分析。采用氨基酸对应的疏水能值替换蛋白质中相应的氨基酸,得到疏水能值的序列实验结果表明,用长的疏水能值序列,训练BP网络,对长程作用起主导的E结构的预测效果好。由于Profile编码特征和疏水能值特征是独立的冗余视图,基于协同训练思想,提出Cotraining算法。该算法的主要步骤是在Profile特征空间训练SVM分类器,在疏水性特征空间训练BP神经网络分类器,协同对氨基酸二级结构进行预测  相似文献   

10.
针对多序列蛋白质二级结构预测问题,提出了一种基于训练集自动构造原型并自适应进行距离度量学习的大间隔多度量学习模型。该方法首先采用欧氏距离的K-means聚类算法为每类样本构造原型,然后基于快速的子梯度下降算法最小化目标损失函数,以便学习输入空间中的多个局部线性变换。特别地,度量学习模型可形式化为凸半定规划问题,因此参数求解不存在局部极小值问题。CB513数据集上的实验结果表明,提出的方法不仅可以获得较好的预测精度,而且能够快速地进行蛋白质二级结构预测。  相似文献   

11.
Precise prediction of protein secondary structures from the associated amino acids sequence is of great importance in bioinformatics and yet a challenging task for machine learning algorithms. As a major step toward predicting the ultimate three dimensional structures, the secondary structure assignment specifies the protein function. Considering a multilayer perceptron neural network, pruned for optimum size of hidden layers, as the reference network, advanced kinds of recurrent neural network (RNN) are devised in this article to enhance the secondary structure prediction. To better model the strong correlations between secondary structure elements, types of modular reciprocal recurrent neural networks (MRR-NN) are examined. Additionally, to take into account the long-range interactions between amino acids in formation of the secondary structure, bidirectional RNN are investigated. A multilayer bidirectional recurrent neural network (MBR-NN) is finally applied to capture the predominant long-term dependencies. Eventually, a modular prediction system based on the interactive combination of the MRR-NN and MBR-NN boosts the percentage accuracy (Q3) up to 76.91% and augments the segment overlap (SOV) up to 68.13% when tested on the PSIPRED dataset. The coupling effects of the secondary structure types as well as the sequential information of amino acids along the protein chain can be well cast by the integration of the MRR-NN and the MBR-NN.  相似文献   

12.
A method is presented for predicting the secondary structure of globular proteins from their amino acid sequence. It is based on a rigorous statistical exploitation of the well-known biological fact that the amino acid compositions of each secondary structure are different. We also propose an evaluation process that allows us to estimate the capacity of a method to predict the secondary structure of a new protein which does not have any homologous proteins whose structure is already known. This evaluation process shows that our method has a prediction accuracy of 58.7% over three states for the 62 proteins of the Kabsch and Sander (1983a) data bank. This result is better than that obtained by the most widely used methods--Lim (1974), Chou and Fasman (1978) and Garnier et al. (1978)--and also than that obtained by a recent method based on local homologies (Levin et al., 1986). Our prediction method is very simple and may be implemented on any microcomputer and even on programmable pocket calculators. A simple Pascal implementation of the method prediction algorithm is given. The interpretation of our results in terms of protein folding and directions for further work are discussed.  相似文献   

13.
The paper discusses numerical results of predicting protein secondary structure using Bayesian classification procedures based on nonstationary Markovian chains. A new approach is used, based on the classification of pairs of states for pairs of neighboring amino acids. It improves the prediction accuracy as compared with that of the classification of the state of one amino acid. __________ Translated from Kibernetika i Sistemnyi Analiz, No. 2, pp. 59–64, March–April 2007.  相似文献   

14.
Amino acid propensity score is one of the earliest successful methods used in protein secondary structure prediction. However, the score performs poorly on small-sized datasets and low-identity protein sequences. Based on current in silico method, secondary structure can be predicted from local folds or local protein structure. In biology, the evolution of secondary structure produces local protein structure with different lengths. To precisely predict secondary structures, we propose a derivative feature vector, DPS that utilizes the optimal length of the local protein structure. DPS is the unification of amino acid propensity score and dihedral angle score. This new feature vector is further normalized to level the edges. Prediction is performed by support vector machines (SVM) over the DPS feature vectors with class labels generated by secondary structure assignment method (SSAM) and secondary structure prediction method (SSPM). All experiments are carried out on RS126 sequences. The results from this proposed method also highlight the overall accuracy of our method compared to other state-of-the-art methods. The performance of our method was acceptable specifically in dealing with low number and low identity sequences.  相似文献   

15.
One of the main research problems in structural bioinformatics is the prediction of three-dimensional structures (3-D) of polypeptides or proteins. The current rate at which amino acid sequences are identified increases much faster than the 3-D protein structure determination by experimental methods, such as X-ray diffraction and NMR techniques. The determination of protein structures is both experimentally expensive and time consuming. Predicting the correct 3-D structure of a protein molecule is an intricate and arduous task. The protein structure prediction (PSP) problem is, in computational complexity theory, an NP-complete problem. In order to reduce computing time, current efforts have targeted hybridizations between ab initio and knowledge-based methods aiming at efficient prediction of the correct structure of polypeptides. In this article we present a hybrid method for the 3-D protein structure prediction problem. An artificial neural network knowledge-based method that predicts approximated 3-D protein structures is combined with an ab initio strategy. Molecular dynamics (MD) simulation is used to the refinement of the approximated 3-D protein structures. In the refinement step, global interactions between each pair of atoms in the molecule (including non-bond interactions) are evaluated. The developed MD protocol enables us to correct polypeptide torsion angles deviation from the predicted structures and improve their stereo-chemical quality. The obtained results shows that the time to predict native-like 3-D structures is considerably reduced. We test our computational strategy with four mini proteins whose sizes vary from 19 to 34 amino acid residues. The structures obtained at the end of 32.0 nanoseconds (ns) of MD simulation were comparable topologically to their correspondent experimental structures.  相似文献   

16.
杨炳儒  周谆  侯伟 《计算机应用研究》2009,26(12):4617-4620
蛋白质二级结构预测问题,是生物信息学领域中最为重要的任务之一,历经三十多年的研究,已取得了一些进展,尤其是近来集成预测模型与混合预测模型的引入,为预测精度带来了一定程度的提高,然而其离从二级结构推导三级结构的目标,仍然存在很大差距。为了有效提高蛋白质二级结构预测精度,以KDTICM理论的扩展性研究与KDD*模型为基础, 使用基于KDD*模型的关联分析蛋白质二级结构预测方法KAAPRO,提出一种基于支持度与可信度的复杂距离度量的CBA(classification based on association)  相似文献   

17.
鉴于不同类型氨基酸的相互作用对蛋白质结构预测的影响不同,文中融合卷积神经网络和长短时记忆神经网络模型,提出卷积长短时记忆神经网络,并应用到蛋白质8类二级结构的预测中.首先基于氨基酸序列的类别信息和氨基酸结构的进化信息表示蛋白质序列,并采用卷积提取氨基酸残基之间的局部相关特征,然后利用双向长短时记忆神经网络提取蛋白质序列内部残基之间的远程相互作用,最后将提取的蛋白质的局部相关特征和远程相互作用用于蛋白质8类二级结构的预测.实验表明,相比基准方法,文中模型提高8类二级结构预测的精度,并具有良好的可扩展性.  相似文献   

18.
南雨宏  陈绮 《微机发展》2011,(10):168-170,175
提出一种易于修改的蛋白质二级结构预测算法。以蛋白质数据银行中PDB文本数据作为数据源,提取所有蛋白质氨基酸序列并以此建立样本数据库,然后针对α-螺旋、β-折叠分别利用基于散列辞典的不同改进方法编程实现蛋白质二级结构序列片段预测,在预测过程中,随机抽取68421个蛋白质中部分样本作为测试集,对未知序列根据建立的散列辞典中的片段使用正向最大匹配分词法进行切分对比。从实验结果来看,对未知序列片段预测的准确度达到了83.9%,而且能够较好地体现片段之间的连接顺序。  相似文献   

19.
《Information Fusion》2009,10(3):217-232
Protein secondary structure prediction is still a challenging problem at today. Even if a number of prediction methods have been presented in the literature, the various prediction tools that are available on-line produce results whose quality is not always fully satisfactory. Therefore, a user has to know which predictor to use for a given protein to be analyzed. In this paper, we propose a server implementing a method to improve the accuracy in protein secondary structure prediction. The method is based on integrating the prediction results computed by some available on-line prediction tools to obtain a combined prediction of higher quality. Given an input protein p whose secondary structure has to be predicted, and a group of proteins F, whose secondary structures are known, the server currently works according to a two phase approach: (i) it selects a set of predictors good at predicting the secondary structure of proteins in F (and, therefore, supposedly, that of p as well), and (ii) it integrates the prediction results delivered for p by the selected team of prediction tools. Therefore, by exploiting our system, the user is relieved of the burden of selecting the most appropriate predictor for the given input protein being, at the same time, assumed that a prediction result at least as good as the best available one will be delivered. The correctness of the resulting prediction is measured referring to EVA accuracy parameters used in several editions of CASP.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号