首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
为了从蛋白质结构数据库中提取经验知识,进行蛋白质作用位点预测,提出了以蛋白质序列谱作为特征向量,采用支持向量机算法进行训练和预测蛋白质相互作用位点的方法。从蛋白质一级序列出发,以序列上邻近残基的序列谱为输入特征向量,采用支持向量机方法构建预测器,来预测蛋白质相互作用位点,预测精度达到70.47%,相关系数CC=0.1919。实验结果表明,利用蛋白质序列谱,结合支持向量机算法进行蛋白质相互作用位点预测的方法是有效的。  相似文献   

2.
王菲露  宋杨 《计算机仿真》2012,29(2):184-187
在生化实验中,关于优化蛋白质预测问题,由于采集的信息、参数、选取和设置等优化处理存在随机性,限制了蛋白质二级结构预测精确度。为解决上述问题,针对广义回归神经网络学习速率快、网络稳健的特点,提出基于广义回归神经网络预测蛋白质二级结构的方法。鉴于编码方式对预测精度有重要影响,首先基于5位编码和不同的滑动窗口构建多个广义回归神经网络预测器对蛋白质二级结构进行预测,取得了较好的结果。并采用富含生物进化信息的序列谱(Profile)编码构建输入向量、并针对不同大小的滑动窗口设置多个spread值重新创建广义回归神经网络预测器,大大提高了预测精确度,仿真结果证明了预测模型的有效性和可行性,为预测提供了有效方法。  相似文献   

3.
SIMCA法用于从非同源蛋白一级序列预测其结构类   总被引:1,自引:1,他引:1  
蛋白质结构类的正确识别对于其三级结构预测具有十分重要的意义,有必要引入先进的算法提高预测精度。使用SIMCA法处理氨基酸组成、自相关系数提取的特征参数以及氨基酸对含量,进行了蛋白质结构类的预测。采用Miyazawa和Jerni—gan的疏水值时,All-α、All-β、αβ类的白检验的精度为89%、91%、89%,它检验的精度分别为74%、87%、91%;引入氨基酸对含量后,All-α、All-β、αβ类白检验精度为86%、89%、90%,它检验的精度为77%、88%、93%。SIMCA的预测结果好于Bayesian识别函数法,氨基酸对的引入可以提高预测精度。  相似文献   

4.
结合中心氨基酸组成成分预测固有不规则蛋白质   总被引:1,自引:0,他引:1  
在固有不规则蛋白质结构预测过程中,针对短的不规则结构区域特征提取困难,提出一种结合中心氨基酸组成成分进行预测的方法。利用滑窗技术,计算20种氨基酸在窗口内出现的频率,构建一个子预测器;计算窗口中心氨基酸形成不规则结构的统计概率,以此作为新的特征参数;对子预测器的结果与新的特征参数分别赋予一个系数,进行加权组合,建立基于组合模型的固有不规则蛋白质结构预测器。实验结果表明,该预测器在保持对长的不规则结构区域预测精度较高的前提下,能够显著提高短的不规则结构区域的预测精度。  相似文献   

5.
对于模式识别系统而言,不同的训练样本在建立分类模型时所起的作用不同,以往的蛋白质关联结构预测方法都是从样本集中随机选取一部分样本作为分类器的训练样本,这将降低蛋白质关联结构分类器的预测精度,为改善训练样本对预测精度的影响,本文提出一种基于样本选择及BP神经网络的蛋白质关联结构预测方法.该方法选取与蛋白质关联结构相关的属性进行编码,并采用样本选择技术从编码后的样本集中选取一定的高质量样本构建预测模型,从而有效地对蛋白质关联结构进行预测.本文根据提出的编码方式对从蛋白质数据库PDB中获取的200个蛋白质进行编码,然后用最近邻算法选择训练样本,并使用BP神经网络建立相应的预测模型.实验结果表明,进行训练样本选择能够有效提高蛋白质关联结构的预测精度.  相似文献   

6.
基于一级结构信息预测蛋白质热稳定性,对于利用计算机筛选热稳定性蛋白具有重要意义。本文采用k-近邻算法从序列出发预测蛋白质的热稳定性,用自一致性检验、交叉验证和独立样本测试等三种方法评估。仅用20种氨基酸组成作为特征变量时,识别的正确率分别可达100%,87.7%和89.6%;而引入8个新变量后,其精度分别为100%,89.6%和90.2%,对小蛋白质分子识别的精度提高了2.4%。同时探讨了蛋白质分子大小对识别效果的影响。  相似文献   

7.
杨炳儒  周谆  侯伟 《计算机应用研究》2009,26(12):4617-4620
蛋白质二级结构预测问题,是生物信息学领域中最为重要的任务之一,历经三十多年的研究,已取得了一些进展,尤其是近来集成预测模型与混合预测模型的引入,为预测精度带来了一定程度的提高,然而其离从二级结构推导三级结构的目标,仍然存在很大差距。为了有效提高蛋白质二级结构预测精度,以KDTICM理论的扩展性研究与KDD*模型为基础, 使用基于KDD*模型的关联分析蛋白质二级结构预测方法KAAPRO,提出一种基于支持度与可信度的复杂距离度量的CBA(classification based on association)  相似文献   

8.
由 DeepMind 开发的 AlphaFold 在蛋白质结构预测领域取得了前所未有的巨大突破,对生命科学的研究产生了革命性的影响。基于大规模的结构预测,AlphaFold 结构预测数据库得以建立,它包含 2 亿多种蛋白,并覆盖了数十种物种的完整蛋白质组。该综述介绍了在“后 AlphaFold 时代”利用统计物理方法研究蛋白质进化问题的一些最新进展。传统的蛋白质进化研究往往关注同一个家族的蛋白质序列或者结构(微观视角),而随着 AlphaFold 预测的海量蛋白质结构的出现,研究者可以把视角扩展到大量蛋白质的集合,甚至是直接对比不同物种体内的全部蛋白质,从中挖掘统计趋势(宏观视角)。基于 AlphaFold 数据库,通过对比 40 多种模式生物体内相似链长的蛋白质,研究者发现了蛋白质分子进化中的统计规律。随着物种复杂性的提高,蛋白质结构将趋向于更高的柔性和模块化程度,蛋白质序列将趋向于出现更显著的亲疏水片段分隔,蛋白质的功能专一性也不断提高。这些基于AlphaFold 的统计研究在分子进化和物种进化之间建立了联系,有助于理解生物复杂性的演化。  相似文献   

9.
蛋白质通过结合位点与其他分子产生相互作用, 所以对蛋白结合位点的预测具有重要的意义. 现有许多不同的预测方法, 但是这些方法存在命中率低或计算量大的问题, 本文引入了一种基于结构比对的蛋白质位点预测方法, 同时在结构比对过程中引入同源索引, 找出相应的同源模版, 并与之进行结构比对, 然后将结构相似的模版中的配体映射到目标蛋白质中, 采用聚类方法对位点进行分析. 结果表明, 与其他预测方法相比, 本文的方法降低了计算量, 并提高了预测精度.  相似文献   

10.
在生物信息学领域,人工智能方法在预测药物分子的物理化学性质和生物活性中获得了重大成功,特别是神经网络已被广泛应用到药物研发中.但是浅层神经网络的预测精度低,深度神经网络又容易出现过拟合的问题,而模型融合策略有望提升机器学习中弱学习器的预测能力.据此,文中将模型融合方法首次应用到药物分子性质的预测中,通过对药物分子的化学结构进行信息化编码,采用平均法、堆叠法融合浅层神经网络,提高对药物分子pKa预测的能力.与深度学习方法相比,堆叠法(Stacking)融合的模型具有更高的预测准确性,其预测结果的相关系数达到0.86.通过将多个弱学习器的神经网络有机组合可使其达到深度神经网络的预测精度,同时保留更好的模型泛化能力.研究结果表明,模型融合方法可提高神经网络对药物分子pKa预测结果的准确性和可靠性.  相似文献   

11.
After the atomic coordinates themselves, the most important data in a homology model are the spatial reliability estimates associated with each of the atoms (atom annotation). Recent blind homology modeling predictions have demonstrated that principally correct sequence-structure alignments are achievable to sequence identities as low as 25% [Martin, A.C., MacArthur, M.W., Thornton, J.M., 1997. Assessment of comparative modeling in CASP2. Proteins Suppl(1), 14-28]. The locations and extent of spatial deviations in the backbone between correctly aligned homologous protein structures remained very poorly estimated however, and these errors were the cause of errant loop predictions [Abagyan, R., Batalov, S., Cardozo, T., Totrov, M., Webber, J., Zhou, Y., 1997. Homology modeling with internal coordinate mechanics: deformation zone mapping and improvements of models via conformational search. Proteins Suppl(1), 29-37]. In order to derive accurate measures for local backbone deviations, we made a systematic study of static local backbone deviations between homologous pairs of protein structures. We found that 'through space' proximity to gaps and chain termini, local three-dimensional 'density', three-dimensional environment conservation, and B-factor of the template contribute to local deviations in the backbone in addition to local sequence identity. Based on these finding, we have identified the meaningful ranges of values within which each of these parameters correlates with static local backbone deviation and produced a combined scoring function to greatly improve the estimation of local backbone deviations. The optimized function has more than twice the accuracy of local sequence identity or B-factor alone and was validated in a recent blind structure prediction experiment. This method may be used to evaluate the utility of a preliminary homology model for a particular biological investigation (e.g. drug design) or to provide an improved starting point for molecular mechanics loop prediction methods.  相似文献   

12.
The antigenic index: a novel algorithm for predicting antigenic determinants   总被引:23,自引:0,他引:23  
In this paper, we introduce a computer algorithm which can be used to predict the topological features of a protein directly from its primary amino acid sequence. The computer program generates values for surface accessibility parameters and combines these values with those obtained for regional backbone flexibility and predicted secondary structure. The output of this algorithm, the antigenic index, is used to create a linear surface contour profile of the protein. Because most, if not all, antigenic sites are located within surface exposed regions of a protein, the program offers a reliable means of predicting potential antigenic determinants. We have tested the ability of this program to generate accurate surface contour profiles and predict antigenic sites from the linear amino acid sequences of well-characterized proteins and found a strong correlation between the predictions of the antigenic index and known structural and biological data.  相似文献   

13.
随着生物信息学的发展,模体识别已经成为一种能够从生物序列中提取有用生物信息的方法。文中介绍了有关模体的一些概念,讨论了模体识别算法(MEME)的基础,即EM(expectation maximization)算法,由于MEME算法是建立在EM算法的基础上的,所以又由此引出了MEME算法,并对MEME算法的一些基本问题比如时间复杂度、算法性能等进行了详细讨论,对算法的局限性和有待改进的地方作了说明。实践证明,MEME是一个较好的模体识别算法,它能够识别出蛋白质或者DNA序列中单个或多个模体,具有很大的灵活性。  相似文献   

14.
在蛋白质序列的比对研究中,拥有相似模式的蛋白质常常具有相似的功能.通过已知的蛋白质序列模式可以很方便地对新蛋白质序列的功能结构进行研究和确认.蛋白质序列的发现已成为一个很有意义的题目.对基于模式驱动Pratt算法进行改进以提高其效率,在原来基础上引入模糊查询方法,能够更为快捷地从互不相关的蛋白质序列集合中找出最具代表性的蛋白质模式.  相似文献   

15.
SOMAP: a novel interactive approach to multiple protein sequences alignment   总被引:3,自引:0,他引:3  
A novel interactive method for generating multiple protein sequence alignments is described. The program has no internal limit to the number or length of sequences it can handle and is designed for use with DEC VAX processors running the VMS operating system. The approach used is essentially one of manual sequence manipulation, aided by built-in symbolic displays of identities and similarities, and strict and 'fuzzy' (ambiguous) pattern-matching facilities. Additional flexibility is provided by means of an interface to a publicly available automatic alignment system and to a comprehensive sequence analysis package.  相似文献   

16.
The protein structure code: what is its present status?   总被引:3,自引:0,他引:3  
Current methods of prediction of protein conformation are reviewed and the algorithms on which they rely are presented. For non-homologous proteins and after cross-validation the reported methods exhibit a probability index, i.e. the per cent of correctly predicted residues per predicted residues, of 63-65% with a standard deviation of the order of 7% for three conformational states--helix, beta-strand and coil. This present limitation in the accuracy of predictions that use only the information of the local sequence can be related essentially to the effect of long-range interactions specific for each protein family. The methods based on sequence similarity can improve the accuracy of prediction by expressing explicitly the homology of the protein to be predicted with proteins in the database. In these circumstances the probability index can reach 87% with a standard deviation of 6.6%. This property can be used for modeling homologous proteins by aiding in amino acid sequence alignments. The prediction of the tertiary structure of a protein is still limited to the case of modeling a structure based on the known three-dimensional structure of a homologous protein.  相似文献   

17.
The objective of this study is to develop a new online model for wheel wear that takes into account the track flexibility. The proposed model consists of two parts that interact with each other, namely, (a) a locomotive/track coupled dynamics model considering the track flexibility, which is validated by field measurement results, and (b) a model for the wear estimation. The wheel wear prediction model can be employed in online solutions rather than in post-processing. The effect of including the track flexibility on the wear estimation is investigated by comparing the results with those obtained for a rigid track. Moreover, the effect of the wheel profile updating strategy on the wheel wear is also examined. The simulation results indicate that the track flexibility cannot be neglected for the wheel wear prediction. The wear predicted with the rigid track model is generally larger than that predicted with the flexible track model. The strategy of maintaining unchanged wheel profiles during the dynamic simulation coincides with the online updating strategy in terms of the predicted wear.  相似文献   

18.
Lapedes  Alan S.  Steeg  Evan W.  Farber  Robert M. 《Machine Learning》1995,21(1-2):103-124
We present an adaptive, neural network method that determinesnew classes of protein secondary structure that are significantly more predictable from local amino-acid sequence than conventional classifications. Accurate prediction of the conventional secondary-structure classes, alpha-helix, beta-strand, and coil, from primary sequence has long been an important problem in computational molecular biology, with many ramifications, including multiple-sequence alignment, prediction of functionally important regions of proteins, and prediction of tertiary structure from primary sequence. The algorithm presented here uses adaptive networks to simultaneously examine both sequence and structure data, as available from, for example, the Brookhaven Protein Database, and to determine new secondary-structure classes that can be predicted from sequence with high accuracy. These new classes have both similarities to, and differences from, conventional secondary-structure classes. They represent a new, nontrivial classification of protein secondary structure that is predictable from primary sequence.  相似文献   

19.
The unfolding of a protein can be described as a transition from a predominantly rigid, folded structure to an ensemble of denatured states. During unfolding, the hydrogen bonds and salt bridges break, destabilizing the secondary and tertiary structure. Our previous work shows that the network of covalent bonds, salt bridges, hydrogen bonds, and hydrophobic interactions forms constraints that define which regions of the native protein are flexible or rigid (structurally stable). Here, we test the hypothesis that information about the folding pathway is encoded in the energetic hierarchy of non-covalent interactions in the native-state structure. The incremental thermal denaturation of protein structures is simulated by diluting the network of salt bridges and hydrogen bonds, breaking them one by one, from weakest to strongest. The structurally stable and flexible regions are identified at each step, providing information about the evolution of flexible regions during denaturation. The folding core, or center of structure formation during folding, is predicted as the region formed by two or more secondary structures having the greatest stability against denaturation. For 10 proteins with different architectures, we show that the predicted folding cores from this flexibility/stability analysis are in good agreement with those identified by native-state hydrogen-deuterium exchange experiments.  相似文献   

20.
Protein xylosyltransferases are the group of enzymes which are involved in transferring xylose from UDP-d-xylose to serine residue in a protein. These enzymes are commonly found in multicellular organisms and in some unicellular organisms. Previously we had identified the xylosyltransferase (XT) genes in EST sequence of a unicellular organism Trichomonas vaginalis through in silico approach based on the sequence homology. To corroborate if these genes are putative XT genes, we designed a workflow based on the sequence characteristics of xylosyltransferase, to verify if any of the putative XT gene sequences have sequence motifs. The XT genes in T. vaginalis predicted by Hidden Markov Model (HMM) were further analyzed with PfamHMM to identify if each putative sequence belongs to a known protein family, with TMHMM to examine whether the predicted XTs are Golgi xylosyltransferases and with MEME to find out the conserved motifs. The results confirmed our earlier study that these XTs are related to N-linked XTs in plants. To confirm the in silico results further, we analyzed the N-linked glycans of T. vaginalis and the empirical data also confirmed the computational analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号