首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 328 毫秒
1.
随着基因工程产生大量新序列,导致蛋白质序列数据库的迅速增长,巨量蛋白质数据的功能组和族谱分析使蛋白质序列聚类分析成为结构和功能基因组学重要的研究目标,应用数据挖掘技术对生物数据进行聚类分析成为生物信息学研究的热点。聚类分析算法中的CLARA划分算法已广泛应用于其它领域,但在大数据量蛋白质序列聚类分析中应用很少,文章应用CLARA算法对在基准数据库中选取的蛋白质序列进行聚类分析,并将结果与其它几种蛋白质聚类算法进行了比较。  相似文献   

2.
外膜蛋白由于其位于细菌的表面,从而对于抗生素和疫苗开发具有重要的研究价值.如何准确地将外膜蛋白从球蛋白和内膜蛋白等中识别出来对于从基因组序列中确认外膜蛋白以及预测其二级、三级结构都是一项重要的研究任务.近年来人们已经提出了若干从蛋白质序列出发预测外膜蛋白的方法.本文利用1种新的核方法,即核最近邻算法,结合蛋白质序列的子序列分布预测外膜蛋白,并和支持向量机方法、传统的最近邻算法进行了比较.结果表明本文算法不亚于已有的预测方法,而且新算法更为简洁、容易实现.同时我们发现残基顺序在外膜蛋白预测中具有重要作用.  相似文献   

3.
应用化学修饰的实验方法,结合蛋白质结构信息的计算来研究酶蛋白中氨基酸残基化学修饰与结构信息之间的关系。以Thermotoga maritima嗜热木聚糖酶为对象,采用PDB数据库中的1VBR为模板计算其序列中色氨酸、谷氨酸、天冬氨酸的溶剂可及性、氢键、盐桥数等结构特性,并与该酶化学修饰的实验结果相对比。结果表明酶活性中心3个色氨酸中,可及性大的Trp802与Trp602两个残基对酶的活性影响较大;序列中谷氨酸与天冬氨酸的氢键、盐桥数较多,修饰其对酶的热稳定性有很大影响。此结果有助于深入了解蛋白质中与化学修饰有关的结构特性,并为基于蛋白质结构的酶蛋白改性奠定了基础。  相似文献   

4.
基于神经网络的基因分类器   总被引:8,自引:4,他引:4  
随着人类基因组计划的成果和生物信息学的发展,DNA、RNA和蛋白质的数据量空前增长。从这些生物数据中挖掘出有用的知识对于基因组处理尤为重要。其中,通过对DNA序列的分类来预测蛋白质的功能是分子生物研究的一个核心目标。介绍并实现了一种特征提取的方法,并将其应用于神经网络构造分类器,用来对未知类别的DNA序列分类,将分类结果与当前应用广泛的BLAST算法的结果进行比较和分析。  相似文献   

5.
为了促进开发大肠杆菌快速检测适体生物传感器,通过对已知RNA-蛋白质相互作用原理和复合物结构的分析,在对相关文献资料和基于分子模拟技术的网络资源充分了解的基础上,模拟预测研究了随机RNA序列与肠致病性大肠杆菌紧密黏附素蛋白的相互作用。结果表明,RNA高级结构主要依赖于其一级结构的序列信息。NPDock模拟不同随机RNA序列与紧密黏附素相互作用时,不同长度RNA序列均可与紧密黏附素发生相互作用,但作用位点和相互位置有一定差异;对于相同长度不同排布的RNA序列,相互作用的差异性主要与序列排布信息有关。对于分子模拟研究RNA-紧密黏附素相互作用方法的可行性,通过RNA-蛋白质相互作用位点在线预测方法(PRIdictor)进行验证,结果表明,预测出的蛋白质、RNA相互作用位点均位于相互作用预测结构的接触面上,说明对于RNA-蛋白质相互作用的模拟预测研究方法具有一定的可行性,将有助于通过设计合成RNA改进适体筛选、研发的相关生物技术推广,以及应用创新。  相似文献   

6.
构建生物的细胞色素c的进化树对蛋白质一级结构的种属差异的研究十分重要.本文通过一维映射,将蛋白质一级序列转化为时间序列,采用DTW算法来计算2个时间序列之间的DTW距离,用以量度序列之间的相似度,给出比较蛋白质序列相似性的度最新算法,用以分析不同物种的细胞色素c蛋白一级序列的相似性,构建序列进化树,得到较好的结果.本方法较其它方法简单快速,为研究生物序列进化关系提供新的手段.  相似文献   

7.
原子力显微镜(atomic force microscope,AFM)在研究DNA与蛋白质的相互作用方面具有重要的应用。本文对DNA结合蛋白的AFM像的分析方法进行了研究,通过分析DNA及其与蛋白质结合的AFM图像,我们可以计算出已知序列DNA的长度及其未知DNA结合蛋白质在DNA链上的位置信息,从而可以估计该蛋白质的DNA结合位点序列。结果证明对AFM图像的分析可以得到DNA结合蛋白的位置信息。该研究对于寻找新的DNA结合蛋白,研究生物体中遗传信息的复制、转录、修复和重组的分子生物学机制具有重要意义。  相似文献   

8.
研究嗜冷嗜盐菌H.lacusprofundi中蛋白质在不同结构区域的特点,对了解其稳定的结构基础及设计新型嗜冷嗜盐蛋白具有重要意义。通过对991对同源嗜冷嗜盐菌和嗜温嗜盐菌的序列及高级结构信息进行统计学对比分析,结果表明:嗜冷嗜盐菌和嗜温嗜盐菌氨基酸组成中亮氨酸、天冬酰胺、含硫氨基酸、带电氨基酸差异显著;苏氨酸差异很显著;丙氨酸、甘氨酸、谷氨酰胺、小分子氨基酸差异极显著。其中苯丙氨酸、甘氨酸、天冬酰胺、中性氨基酸在嗜冷嗜盐菌蛋白的含量较高,亮氨酸、苏氨酸、谷氨酰胺、含硫氨基酸、小分子氨基酸在嗜温嗜盐菌中的含量较高。嗜冷嗜盐蛋白含有更多的中性氨基酸、小分子氨基酸和非极性氨基酸,从而降低蛋白质中氢键等非共价键的相互作用,提高蛋白质结构的柔韧性和可变性。  相似文献   

9.
基于蛋白质二维HP非格模型和改进的模拟退火算法研究了长短程作用在蛋白质折叠过程中的作用。通过试验得出1ECD、2RNS、1PHT、1WBC等序列的折叠构型,并根据PDB中所提供的上述序列的结构信息,具体讨论了长程作用对蛋白质构型的影响,说明了:长程作用在三级结构的形成和稳定中,位于诸多影响因素的首位。  相似文献   

10.
在蛋白质序列的比对研究中,拥有相似模式的蛋白质常常具有相似的功能.通过已知的蛋白质序列模式可以很方便地对新蛋白质序列的功能结构进行研究和确认.蛋白质序列的发现已成为一个很有意义的题目.对基于模式驱动Pratt算法进行改进以提高其效率,在原来基础上引入模糊查询方法,能够更为快捷地从互不相关的蛋白质序列集合中找出最具代表性的蛋白质模式.  相似文献   

11.
It is still not very clear to what extent and how does the amino acid sequences of proteins determine their tertiary structures. In this paper, we report our investigations of the sequence-structure relations of the proteins in the beta-propeller fold family, which adopt highly symmetrical tertiary structures while their sequences appear "random". We analyzed the amino acid sequences by using a similarity matrix plus Pearson correlation method and found that the sequences can show the same symmetries as their tertiary structures only if we deduce the conditions of sequence similarity. This suggests that some key residues may play an important role in the formation of the tertiary structures of these proteins.  相似文献   

12.
A lot of evidence suggests that many proteins with the symmetric structures have evolved by internal duplication and fusion. Meanwhile many internal sequence repeats correspond to functional and structural units. These proteins, which have internal structural symmetry, this means that their sequences should be made up of identical repeats. However, many of these repeat signals can only be seen at the structural level yet. We have developed a de novo algorithm, modified recurrence correlation analysis, to detect the symmetries in the primary sequences of immunoglobulin folds (Ig folds), which adopt highly symmetrical tertiary structures while their sequences appear nearly random. Using this method, we show that the internal repetitions of the immunoglobulin folds could be identified directly at the sequence level. These results may give us some help to study the hypotheses about the origin of Ig folds by duplication of simpler fragments and it may also give us some helps to understand the relationship between the sequences and their tertiary structures.  相似文献   

13.
《Computers & chemistry》1994,18(3):233-243
Many proteins sequences contain motifs which display similarity. The similarities between the repeats are a result of gene duplication and/or gene fusion. The evolutionary role of repeats within protein sequences is considered and some repeat examples are given ranging from tandem repeats to multiple types of repeats which are sequentially interspersed. Existing computer methods to delineate repeats in individual protein sequences are discussed and a novel sensitive repeat recognition method is introduced.  相似文献   

14.
It has been noted that natural proteins adapt only a limited number of folds. Several researchers have investigated why and how nature has selected this small number of folds. Using simple models of protein folding, we demonstrate systematically that there is a "designability principle" behind nature's selection of protein folds. The designability of a structure (fold) is measured by the number of sequences that can design the structure--that is, sequences that possess the structure as their unique ground state. Structures differ drastically in terms of their designability. A small number of highly designable structures emerge with a number of associated sequences much larger than the average. These highly designable structures possess proteinlike secondary structures, motifs, and even tertiary symmetries. In addition, they are thermodynamically more stable and fold faster than other structures. These results suggest that protein structures are selected in nature because they are readily designed and stable against mutations, and that such a selection simultaneously leads to thermodynamic stability.  相似文献   

15.
A new approach to search for common patterns in many sequences is presented. The idea is that one sequence from the set of sequences to be compared is considered as a 'basic' one and all its similarities with other sequences are found. Multiple similarities are then reconstructed using these data. This approach allows one to search for similar segments which can differ in both substitutions and deletions/insertions. These segments can be situated at different positions in various sequences. No regions of complete or strong similarity within the segments are required. The other parts of the sequences can have no similarity at all. The only requirement is that the similar segments can be found in all the sequences (or in the majority of them, given the common segments are present in the basic sequence). Working time of an algorithm presented is proportional to n.L2 when n sequences of length L are analyzed. The algorithm proposed is implemented as programs for the IBM-PC and IBM/370. Its applications to the analysis of biopolymer primary structures as well as the dependence of the results on the choice of basic sequence are discussed.  相似文献   

16.
The number of proteins that fold into a certain structure differs drastically. The designability of a protein structure, which is defined as the number of sequences that have that structure as their unique lowest energy state, is studied in this paper using a simplified lattice model. The two-letter (HP) code and the pair-contact energy model are employed in the formulation of the relationship between the protein sequences and the compact structures. Due to the correlations between different dimensions, principal component analysis (PCA) is carried out to remove these correlations and develop reliable approximations of probability density functions of the protein sequences and the compact structures. An estimation of designability is derived using these probability density functions. Good correlation between estimated designabilities and those obtained through enumerative calculations is successfully achieved.  相似文献   

17.
由DeepMind开发的AlphaFold在蛋白质结构预测领域取得了前所未有的巨大突破,对生命科学的研究产生了革命性的影响。基于大规模的结构预测,AlphaFold结构预测数据库得以建立,它包含2亿多种蛋白,并覆盖了数十种物种的完整蛋白质组。该综述介绍了在“后AlphaFold时代”利用统计物理方法研究蛋白质进化问题的一些最新进展。传统的蛋白质进化研究往往关注同一个家族的蛋白质序列或者结构(微观视角),而随着AlphaFold预测的海量蛋白质结构的出现,研究者可以把视角扩展到大量蛋白质的集合,甚至是直接对比不同物种体内的全部蛋白质,从中挖掘统计趋势(宏观视角)。基于AlphaFold数据库,通过对比40多种模式生物体内相似链长的蛋白质,研究者发现了蛋白质分子进化中的统计规律。随着物种复杂性的提高,蛋白质结构将趋向于更高的柔性和模块化程度,蛋白质序列将趋向于出现更显著的亲疏水片段分隔,蛋白质的功能专一性也不断提高。这些基于AlphaFold的统计研究在分子进化和物种进化之间建立了联系,有助于理解生物复杂性的演化。  相似文献   

18.
Grammatical inference in bioinformatics   总被引:1,自引:0,他引:1  
Bioinformatics is an active research area aimed at developing intelligent systems for analyses of molecular biology. Many methods based on formal language theory, statistical theory, and learning theory have been developed for modeling and analyzing biological sequences such as DNA, RNA, and proteins. Especially, grammatical inference methods are expected to find some grammatical structures hidden in biological sequences. In this article, we give an overview of a series of our grammatical approaches to biological sequence analyses and related researches and focus on learning stochastic grammars from biological sequences and predicting their functions based on learned stochastic grammars.  相似文献   

19.
氨基酸序列的序列比对是研究蛋白的结构和功能关系的有力工具。本文利用Visual Basic for Applications的宏功能,采用Needleman-Wunsch算法在Excel中建立模型,对两条人类肿瘤抗体的氨基酸序列进行序列配联,找出它们的同源性。通过和一些常用序列分析软件的比较,这种方法简单、直观和易于接受。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号