首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
预测蛋白质二级结构,是当今生物信息学中一个难以解决的问题。由于预测蛋白质二级结构的精度在蛋白 质结构研究中起到非常重要的作用,因此在基于KDTICM理论基础上,提出一种基于混合SVM方法的蛋白质二级 结构预测算法。该算法有效地利用蛋白质的物化属性和PSI-SEARCH生成的位置特异性打分矩阵作为双层SVM的 输入,从而大大地提高了蛋白质二级结构预测的精度。实验比较分析表明,新算法的预测精度和普适性明显优于目前 其他典型的预测方法。  相似文献   

2.
The prediction of secondary structure is an important topic in the field of bioinformatics, even if the methods have matured, and development of the algorithms is a far less active area than a decade ago. Accurate prediction is very useful to biologists in its own right, but it is worth pointing out that it is also an essential component of tertiary structure prediction, which in contrast is far from solved and continues to be a highly active area of research. In addition, sequence comparison methods have more recently incorporated local structure tracks. The extra information utilized by the new methods has led to considerable improvements in fold recognition and alignment accuracy. In this paper, a novel method for protein secondary structure prediction is presented. Using evolutionary information contained in amino acid’s physicochemical properties, position-specific scoring matrix generated by PSI-BLAST and HMMER3 profiles as input to hybrid back propagation system, secondary structure can be predicted at significantly increased accuracy. Based on knowledge discovery theory based on inner cognitive mechanism (KDTICM) theory, we have constructed a compound pyramid model approach, which is composed of four layers of the intelligent interface and integrated in several ways, such as hybrid back propagation method (HBP), modified knowledge discovery in databases (KDD*), hybrid SVM method (HSVM) and so on. Experiments on three standard datasets (RS126, CB513 and CASP8) show that CPM is capable of producing the higher Q 3 and SOV scores than that achieved by existing widely used schemes such as PSIPRED, PHD, Predator, as well as previously developed prediction methods. On the RS126 and CB513 datasets, it achieves a Q 3 and SOV99 score are considerably higher than the best reported scores, respectively. It is also tested on target proteins of critical assessment of protein structure prediction experiment (CASP8) and achieves better results than the traditional methods, including the popular PSIPRED method over overall prediction accuracy. Available: .  相似文献   

3.
为了提高蛋白质氧链糖基化位点的预测准确率,提出了把独立成分分析和支持向量机相结合的方法。实验样本(蛋白质序列)用稀疏编码方式编码,窗口长度为w=21,对于训练样本和待测样本,首先用独立成分分析法(ICA)提取了120个独立成分(特征),把这些独立成分作为支持向量机的输入,在特征空间用支持向量机(SVM)进行预测(分类)。实验结果表明,ICA+SVM的方法比PCA+SVM和SVM的好。预测准确率为88%。更进一步,用同一个蛋白质序列在不同窗口长度下的样本做实验,结果表明,窗口长度越长,预测准确率越高。  相似文献   

4.
南雨宏  陈绮 《微机发展》2011,(10):168-170,175
提出一种易于修改的蛋白质二级结构预测算法。以蛋白质数据银行中PDB文本数据作为数据源,提取所有蛋白质氨基酸序列并以此建立样本数据库,然后针对α-螺旋、β-折叠分别利用基于散列辞典的不同改进方法编程实现蛋白质二级结构序列片段预测,在预测过程中,随机抽取68421个蛋白质中部分样本作为测试集,对未知序列根据建立的散列辞典中的片段使用正向最大匹配分词法进行切分对比。从实验结果来看,对未知序列片段预测的准确度达到了83.9%,而且能够较好地体现片段之间的连接顺序。  相似文献   

5.
编码方式是影响蛋白质二级结构预测准确率的重要因素之一。针对单序列蛋白质二级结构预测问题,提出了一种新的综合编码方法。该编码是根据氨基酸出现在每种二级结构中的倾向因子以及氨基酸的疏水性值进行分类,并以二进制形式来表示每类氨基酸的编码方法。在相同的实验条件下,首先用不同的编码方式对数据集CB513进行编码,然后采用支持向量机的方法进行训练建模预测。实验结果显示提出编码的预测准确率比20位正交编码和5位编码分别高出1.48%和10.68%。可见,该编码比较适合非同源或低同源蛋白质结构预测。  相似文献   

6.
Amino acid propensity score is one of the earliest successful methods used in protein secondary structure prediction. However, the score performs poorly on small-sized datasets and low-identity protein sequences. Based on current in silico method, secondary structure can be predicted from local folds or local protein structure. In biology, the evolution of secondary structure produces local protein structure with different lengths. To precisely predict secondary structures, we propose a derivative feature vector, DPS that utilizes the optimal length of the local protein structure. DPS is the unification of amino acid propensity score and dihedral angle score. This new feature vector is further normalized to level the edges. Prediction is performed by support vector machines (SVM) over the DPS feature vectors with class labels generated by secondary structure assignment method (SSAM) and secondary structure prediction method (SSPM). All experiments are carried out on RS126 sequences. The results from this proposed method also highlight the overall accuracy of our method compared to other state-of-the-art methods. The performance of our method was acceptable specifically in dealing with low number and low identity sequences.  相似文献   

7.
蛋白质二级结构的协同训练预测方法*   总被引:1,自引:1,他引:0  
针对蛋白质二级结构机器学习预测方法,忽略氨基酸疏水性特征以及氨基酸之间的长程作用和准确率不高的现状,进行了比较实验分析。采用氨基酸对应的疏水能值替换蛋白质中相应的氨基酸,得到疏水能值的序列实验结果表明,用长的疏水能值序列,训练BP网络,对长程作用起主导的E结构的预测效果好。由于Profile编码特征和疏水能值特征是独立的冗余视图,基于协同训练思想,提出Cotraining算法。该算法的主要步骤是在Profile特征空间训练SVM分类器,在疏水性特征空间训练BP神经网络分类器,协同对氨基酸二级结构进行预测  相似文献   

8.
As many structures of protein–DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein–DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone.  相似文献   

9.
膜蛋白是一种具有重要生物功能的蛋白质,根据蛋白质的序列信息预测其是否属于β桶状跨膜蛋白是结构预测与功能分析的重要先导步骤,也是蛋白质预测领域中的一个挑战性问题。针对这两类问题,提取了208条β桶状跨膜蛋白序列的氨基酸位置与理化特征。利用支持向量机(SVM)进行了预测,结果表明二分类精度与相关系数分别达到了88.36%与0.7723。  相似文献   

10.
Precise prediction of protein secondary structures from the associated amino acids sequence is of great importance in bioinformatics and yet a challenging task for machine learning algorithms. As a major step toward predicting the ultimate three dimensional structures, the secondary structure assignment specifies the protein function. Considering a multilayer perceptron neural network, pruned for optimum size of hidden layers, as the reference network, advanced kinds of recurrent neural network (RNN) are devised in this article to enhance the secondary structure prediction. To better model the strong correlations between secondary structure elements, types of modular reciprocal recurrent neural networks (MRR-NN) are examined. Additionally, to take into account the long-range interactions between amino acids in formation of the secondary structure, bidirectional RNN are investigated. A multilayer bidirectional recurrent neural network (MBR-NN) is finally applied to capture the predominant long-term dependencies. Eventually, a modular prediction system based on the interactive combination of the MRR-NN and MBR-NN boosts the percentage accuracy (Q3) up to 76.91% and augments the segment overlap (SOV) up to 68.13% when tested on the PSIPRED dataset. The coupling effects of the secondary structure types as well as the sequential information of amino acids along the protein chain can be well cast by the integration of the MRR-NN and the MBR-NN.  相似文献   

11.
The problem of protein secondary structure prediction is one of the most important problems in Bioinformatics. After the study of this problem for 30 years and more, there have been some breakthroughs. Especially, the introduction of ensemble prediction model and hybrid prediction model makes the accuracy of prediction better, but there is a long distance to induce the tertiary structures from the secondary ones. As one of the extension researches of KDTICM [Bingru, Yang (2004). Knowledge discovery based on theory of inner cognition mechanism and application. Beijing: Electronic Industry Press] theory, this paper proposed a method KAAPRO, which is based on Maradbcm algorithm which is induced by KDD1 model and combined with CBA, for protein secondary structure prediction. And a gradually enhanced, multi-layer systematic prediction model, compound pyramid model, is proposed. The kernel of this model is KAAPRO. Domain knowledge is used through the whole model, and the physical–chemical attributes are chosen by causal cellular automata. In the experiment, the test proteins used in reference Muggleton et al. (Muggleton, S. H., King, R., Sternberg, M. (1992). Protein secondary structure prediction using logic-based machine learning. Protein Engineering, 5(7), 647–657) are predicted. The structures of amino acids, whose structural traits are obscure, are predicted well by KAAPRO. Hence, the result of this model is satisfying too.  相似文献   

12.
集成灰色支持向量机预测模型研究与应用   总被引:1,自引:1,他引:1  
林耀进  周忠眉  吴顺祥 《计算机应用》2009,29(12):3287-3289
对灰色预测GM(1,1)模型进行了分析,提出了集成灰色支持向量机的预测模型。分别对影响灰色预测GM(1,1)模型精度的背景值的计算、初值的选取以及数据序列的光滑度进行改进,提出了背景GM模型、初值GM模型、光滑度GM模型,并结合支持向量机的特点,将一维原始数据序列通过三个灰色模型得到的三组值作为支持向量机的输入,原始序列作为支持向量机的输出,训练得到最佳支持向量回归机模型。仿真结果表明了该模型的有效性。  相似文献   

13.
应用ANN/HMM混合模型预测蛋白质二级结构   总被引:1,自引:1,他引:0  
针对3状态隐马尔可夫模型(hidden Markov model,HMM)预测蛋白质二级结构准确率不高的问题,提出15状态HMM,通过改进的算法与BP神经网络相结合进行二级结构预测。研究对象为CB513数据集中筛选出的492条蛋白质序列,将其随机均分7组。应用混合模型进行预测,对准确率进行7交叉验证,Q3准确率达7721%,SOV值为7252%。结果表明,混合模型既能充分考虑相邻氨基酸残基间的相互影响,也能在一定程度上照顾二级结构的远程相关性,因此带来了较好的预测准确率。  相似文献   

14.
介绍了构造性机器学习方法——覆盖算法在蛋白质二级结构预测中的应用。相比普通的神经网络,这种方法直观且运算简单,对训练样本可100%识别。同时,考虑到同源家族的结构应该比单条序列结构预测更准确,采用了基于概率的Profile编码方式,相比以往的预测方法,具有更好的稳定性和精确性。  相似文献   

15.

We employ the support vector machine (SVM) classifier, over different types of kernels, to investigate whether observable variables of individuals and their household information are able to describe their consumption decision of film at theaters in Brazil. Using a very big dataset of 340,000 individuals living in metropolitan areas of a whole large developing economy, we performed a Knowledge Discovery in Databases to classify the film consumers, which results in 80% instances correctly classified. To reduce the degrees of freedom for SVM and to learn the more important determinants of film consumption, we apply the Linear Discriminant Analysis that allows us to identify the key determinants of this consumption. The main individual characteristics are age, education (that merges to be a student), income, and preferences for cultural goods. Regarding the main geographic characteristics, these are the timing of sample, population concentration, and supply of movie theaters. The results point to an ineffective policy for the sector at the time investigated.

  相似文献   

16.
Lapedes  Alan S.  Steeg  Evan W.  Farber  Robert M. 《Machine Learning》1995,21(1-2):103-124
We present an adaptive, neural network method that determinesnew classes of protein secondary structure that are significantly more predictable from local amino-acid sequence than conventional classifications. Accurate prediction of the conventional secondary-structure classes, alpha-helix, beta-strand, and coil, from primary sequence has long been an important problem in computational molecular biology, with many ramifications, including multiple-sequence alignment, prediction of functionally important regions of proteins, and prediction of tertiary structure from primary sequence. The algorithm presented here uses adaptive networks to simultaneously examine both sequence and structure data, as available from, for example, the Brookhaven Protein Database, and to determine new secondary-structure classes that can be predicted from sequence with high accuracy. These new classes have both similarities to, and differences from, conventional secondary-structure classes. They represent a new, nontrivial classification of protein secondary structure that is predictable from primary sequence.  相似文献   

17.
在远同源检测的蛋白质结构预测方法中,基于支持向量机的方法取得了优于其他方法的高准确性,但这类方法只能完成对目标蛋白质作出是否属于特定蛋白质结构的判别,而实际应用中常需要直接给出具体的结构预测结果.提出一种基于多类支持向量机的蛋白质结构预测方法,通过采用加权一对多的多类分类方法对标准支持向量机输出结果进行综合评价,获得唯...  相似文献   

18.
A method is presented for predicting the secondary structure of globular proteins from their amino acid sequence. It is based on a rigorous statistical exploitation of the well-known biological fact that the amino acid compositions of each secondary structure are different. We also propose an evaluation process that allows us to estimate the capacity of a method to predict the secondary structure of a new protein which does not have any homologous proteins whose structure is already known. This evaluation process shows that our method has a prediction accuracy of 58.7% over three states for the 62 proteins of the Kabsch and Sander (1983a) data bank. This result is better than that obtained by the most widely used methods--Lim (1974), Chou and Fasman (1978) and Garnier et al. (1978)--and also than that obtained by a recent method based on local homologies (Levin et al., 1986). Our prediction method is very simple and may be implemented on any microcomputer and even on programmable pocket calculators. A simple Pascal implementation of the method prediction algorithm is given. The interpretation of our results in terms of protein folding and directions for further work are discussed.  相似文献   

19.
应用于垃圾邮件过滤的词序列核   总被引:1,自引:0,他引:1  
针对支持向量机(SVM)中常用核函数由于忽略文本结构而导致大量语义信息丢失的现象,提出一种类别相关度量的词序列核(WSK),并将其应用于垃圾邮件过滤。首先提取邮件文本特征并计算特征的类别相关度量,然后利用词序列核作为核函数训练支持向量机,训练过程中利用类别相关度量计算词的衰减系数,最后对邮件进行分类。实验结果表明,与常用核函数和字符串核相比,改进的词序列核分类准确率更高,提高了垃圾邮件过滤的准确率。  相似文献   

20.
面向服务的知识发现体系结构研究与实现   总被引:11,自引:0,他引:11  
杨立  左春  王裕国 《计算机学报》2005,28(4):445-457
知识发现服务(Knowledge Discovery Service,KDS)作为一种数据、计算、语义密集型的高层服务应用。用户通常需要具备非常全面的知识才能正确使用.如何实现一个面向最终用户的、智能的、有质量保证的KDS架构面临很多困难.现有的研究提出了利用数据挖掘本体和预测执行时间的方法来帮助用户选择正确并且高质量的KDS.但是数据挖掘本体只是对数据挖掘的方法进行枚举,无法保证服务的质量,而预测执行时间的方法不能体现KDS本身的特点,因而难以获得满意的服务效果.为了更有效地辅助最终用户在面向服务的体系结构(Service Oriented Architecture,SOA)上自助地实现知识发现应用,该文提出了一种新的面向服务的知识发现体系结构——SOA4KD,将用户的知识发现需求分为内容需求和质量需求,并提出了扩展的知识发现任务本体EKDTO。以自然语言的方式进行用户意图获取;在考虑到KDS的服务特性的前提下,充分分析了KDS自身的特点,提出了KDS质量本体KDSQO,采用元学习来进行选择最适合的KDS.相对于目前的体系结构,提出了为最终用户提供高质量知识发现服务的一些新方法和技术,为面向服务的知识发现系统设计与实现提供了一个新的参考模型.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号