首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
徐占  董洪伟 《计算机工程》2010,36(14):233-234
从蛋白质结构特性出发,利用结构字母表和CGR游走技术将蛋白质三维结构信息转换到二维坐标空间中。通过分析所得图像找出蛋白质分子的主体结构,获得各结构点在CGR图中的坐标,利用Hausdorff距离判定要比较的蛋白质对象相似性。该方法实现了蛋白质相似性比较的结构-序列模式转变,利用Hausdorff距离比较两点集间相似性的优势,为蛋白质相似性比较提供了一种简便有效的方法。  相似文献   

2.
基于蛋白质CGR的线粒体蛋白质序列比对   总被引:1,自引:0,他引:1       下载免费PDF全文
利用蛋白质混沌游走表示法(PCGR)提出一种新的蛋白质序列比对方法。通过计算两序列之间的PCGR点距离,就可以找到所有的局部相似片断。根据氨基酸的化学物理性质把氨基酸分成4和7类,针对分类与无分类的各种情况进行蛋白质序列比对。为了更直观地描述比对结果,采用点阵图来表示比对数据,不仅能显示两序列间所有相同片断,还可以体现出序列的相似性。  相似文献   

3.
Programs of the Sun-4 workstation permit the combined display of a table of aligned amino acid sequences of a family of proteins, and a corresponding three-dimensional fold. Interactive facilities include the ability, to scroll through the sequences, to rotate the structure and to connect the examination of the sequences and the structure by selecting a portion of the sequences and automatically highlighting the corresponding region in the structure and vice versa. These programs are well suited to support applications such as the investigation of the structural or functional significance of conserved patterns of amino acids in the sequences of a family of proteins.  相似文献   

4.
基于矩阵图谱表达法的蛋白质序列的相似性分析   总被引:1,自引:0,他引:1       下载免费PDF全文
在DNA序列的混沌游走方法(CGR)及DNA序列的4线图谱表达方法(4-LGR)的基础上,提出了一种新型DNA序列的表达方法—矩阵图谱表达法(MGR),并进一步,在DNA序列的上述三种表达式基础上,分别推广建立了基于经典HP模型的蛋白质序列的图谱表达法,对蛋白质序列的相似性进行了比较验证。研究表明:矩阵图谱表达方法不仅能够说明蛋白质序列间的相似性,而且与传统的方法相比,该方法更具有灵活性和变通性。  相似文献   

5.
An algorithmic method for drawing residue-based schematic diagrams of proteins on a 2D page is presented and illustrated. The method allows the creation of rendering engines dedicated to a given family of sequences, or fold. The initial implementation provides an engine that can produce a 2D diagram representing secondary structure for any transmembrane protein sequence. We present the details of the strategy for automating the drawing of these diagrams. The most important part of this strategy is the development of an algorithm for laying out residues of a loop that connects to arbitrary points of a 2D plane. As implemented, this algorithm is suitable for real-time modification of the loop layout. This work is of interest for the representation and analysis of data from (1) protein databases, (2) mutagenesis results, or (3) various kinds of protein context-dependent annotations or data.  相似文献   

6.
It is still not very clear to what extent and how does the amino acid sequences of proteins determine their tertiary structures. In this paper, we report our investigations of the sequence-structure relations of the proteins in the beta-propeller fold family, which adopt highly symmetrical tertiary structures while their sequences appear "random". We analyzed the amino acid sequences by using a similarity matrix plus Pearson correlation method and found that the sequences can show the same symmetries as their tertiary structures only if we deduce the conditions of sequence similarity. This suggests that some key residues may play an important role in the formation of the tertiary structures of these proteins.  相似文献   

7.
The Medical and Pharmaceutical industries have shown high interest in the precise engineering of protein hormones and enzymes that perform existing functions under a wide range of conditions. Proteins are responsible for the execution of different functions in the cell: catalysis in chemical reactions, transport and storage, regulation and recognition control. Computational Protein Design (CPD) investigates the relationship between 3-D structures of proteins and amino acid sequences and looks for all sequences that will fold into such 3-D structure. Many computational methods and algorithms have been proposed over the last years, but the problem still remains a challenge for Mathematicians, Computer Scientists, Bioinformaticians and Structural Biologists. In this article we present a new method for the protein design problem. Clustering techniques and a Dead-End-Elimination algorithm are combined with a SAT problem representation of the CPD problem in order to design the amino acid sequences. The obtained results illustrate the accuracy of the proposed method, suggesting that integrated Artificial Intelligence techniques are useful tools to solve such an intricate problem.  相似文献   

8.
Proteins can be grouped into families according to some features such as hydrophobicity, composition or structure, aiming to establish common biological functions. This paper presents MAHATMA—memetic algorithm-based highly adapted tool for motif ascertainment—a system that was conceived to discover features (particular sequences of amino acids, or motifs) that occur very often in proteins of a given family but rarely occur in proteins of other families. These features can be used for the classification of unknown proteins, that is, to predict their function by analyzing their primary structure. Experiments were done with a set of enzymes extracted from the Protein Data Bank. The heuristic method used was based on genetic programming using operators specially tailored for the target problem. The final performance was measured using sensitivity, specificity and hit rate. The best results obtained for the enzyme dataset suggest that the proposed evolutionary computation method is effective in finding predictive features (motifs) for protein classification.  相似文献   

9.
Proteins can be grouped into families according to some features such as hydrophobicity, composition or structure, aiming to establish the common biological functions. This paper presents a system that was conceived to discover features (particular sequences of amino acids, or motifs) that occur very often in proteins of a given family but rarely occur in proteins of other families. These features can be used for the classification of unknown proteins, that is, to predict their function by analyzing the primary structure. Runnings were done with the enzymes subset extracted from the Protein Data Bank. The heuristic method used was based on a genetic algorithm using specially tailored operators for the problem. Motifs found were used to build a decision tree using the C4.5 algorithm. The results were compared with motifs found by MEME, a freely available web tool. Another comparison was made with classification results of other two systems: a neural network-based tool and a hidden Markov model-based tool. The final performance was measured using sensitivity (Se) and specificity (Sp): similar results were obtained for the proposed tool (78.79 and 95.82) and the neural network-based tool (74.65 and 94.80, respectively), while MEME and HMMER resulted in an inferior performance. The proposed system has the advantage of giving comprehensible rules when compared with the other approaches. These results obtained for the enzyme dataset suggest that the evolutionary computation method proposed is very efficient to find patterns for protein classification.  相似文献   

10.
陈俣 《数据采集与处理》2019,34(6):1118-1124
基于稀疏表示的阵列测向技术中的一系列高精度鲁棒性方法都基于网格假设,即假设入射信号来向无误差地落在网格上,这一假设与现实中信号来向落在连续角度域内相违背,所造成的网格偏差效应会带来模型失配,从而导致估计性能的恶化。针对这一问题,本文提出了一种基于泰勒展开的离格类信号模型,该模型允许信号来向偏离网格,从而消除了网格误差效应,减小了估计误差。同时采用一种交替迭代优化的方法对模型进行求解,并利用奇异值分解等方法降低计算量。该方法能够有效减小网格误差,提高估计精度。仿真结果验证了所提方法的有效性。  相似文献   

11.
This paper is in the area of membrane proteins. Membrane proteins make up about 75% of possible targets for novel drugs discovery. However, membrane proteins are one of the most understudied groups of proteins in biochemical research because of technical difficulties of attaining structural information about transmembrane regions or domains. Structural determination of TM regions is an important priority in pharmaceutical industry, as it paves the way for structure based drug design.This research presents a novel evolutionary support vector machine (SVM) based alpha-helix transmembrane region prediction algorithm to solve the membrane helices in amino acid sequences. The SVM-genetic algorithm (GA) methodology is based on the optimisation of sliding window size, evolutionary encoding selection and SVM parameter optimisation. In this research average hydrophobicity and propensity based on skew statistics are used to encode the one letter representation of amino acid sequences datasets.The computer simulation results demonstrate that the proposed SVM-GA methodology performs better than most conventional techniques producing an accuracy of 86.71% for cross-validation and 86.43% for jack-knife for randomly selected proteins containing single and multiple transmembrane regions. Furthermore, for the amino acid sequence 3LVG, the proposed SVM-GA produces better alpha-helix region identification than PRED-TMR2, MEMSATSVM/MEMSAT3 and PSIPRED V3.0.  相似文献   

12.
The annotation of proteins can be achieved by classifying the protein of interest into a certain known protein family to induce its functional and structural features. This paper presents a new method for classifying protein sequences based upon the hydropathy blocks occurring in protein sequences. First, a fixed-dimensional feature vector is generated for each protein sequence using the frequency of the hydropathy blocks occurring in the sequence. Then, the support vector machine (SVM) classifier is utilized to classify the protein sequences into the known protein families. The experimental results have shown that the proteins belonging to the same family or subfamily can be identified using features generated from the hydropathy blocks.  相似文献   

13.
Three families of proteinase inhibitors and the trypsin family were the subjects of the analysis of amino acid replacements at aligned positions. This approach concerned some specific types of replacement and the mechanisms that can be involved in their control. The usefulness of the Markovian model for interpretation of mutational replacement within homologous proteins was examined. The same sequences were also analyzed with the use of the non-Markovian algorithm of genetic semihomology. This study leads to the conclusion that the Markovian model is not suitable for the interpretation of protein mutational variability since: (1) The information about the history of a variable unit is included in its genetic code. (2) This information plays an important role in the probability of further possible changes of the unit.  相似文献   

14.
Predicting the fold, or approximate 3D structure, of a protein from its amino acid sequence is an important problem in biology. The homology modeling approach uses a protein database to identify fold-class relationships by sequence similarity. The main limitation of this method is that some proteins with similar structures appear to have very different sequences, which we call the hidden-homology problem. As in other real-world domains for machine learning, this difficulty may be caused by a low-level representation. Learning in such domains can be improved by using domain knowledge to search for representations that better match the inductive bias of a preferred algorithm. In this domain, knowledge of amino acid properties can be used to construct higher-level representations of protein sequences. In one experiment using a 179-protein data set, the accuracy of fold-class prediction was increased from 77.7% to 81.0%. The search results are analyzed to refine the grouping of small residues suggested by Dayhoff. Finally, an extension to the representation incorporates sequential context directly into the representation, which can express finer relationships among the amino acids. The methods developed in this domain are generalized into a framework that suggests several systematic roles for domain knowledge in machine learning. Knowledge may define both a space of alternative representations, as well as a strategy for searching this space. The search results may be summarized to extract feedback for revising the domain knowledge.  相似文献   

15.
A new representation for parametric curves and surfaces is introduced here. It is in rational form and uses rational Gaussian bases. This representation allows design of 2-D and 3-D shapes, and makes recovery of shapes from noisy image data possible. The standard deviations of Gaussians in a curve or surface control the smoothness of a recovered shape. The control points of a surface in this representation are not required to form a regular grid and a scattered set of control points is sufficient to reconstruct a surface. Examples of shape design, shape recovery, and image segmentation using the proposed representation are given.  相似文献   

16.
Currently, Profile Hidden Markov Models (Profile HMMs) are the methodology of choice for probabilistic protein family modeling. Unfortunately, despite substantial progress the general problem of remote homology analysis is still far from being solved. In this article we propose new approaches for robust protein family modeling by consequently exploiting general pattern recognition techniques. A new feature based representation of amino acid sequences serves as the basis for semi-continuous protein family HMMs. Due to this paradigm shift in processing biological sequences the complexity of family models can be reduced substantially resulting in less parameters which need to be trained. This is especially favorable when only little training data is available as in most current tasks of molecular biology research. In various experiments we prove the superior performance of advanced stochastic protein family modeling for remote homology analysis which is especially relevant for e.g. drug discovery applications.  相似文献   

17.
在提出的符号序列的高维数字表达以及高维傅里叶变换概念的基础上,提出了蛋白质比较的新方法——高维共鸣识别。将两种蛋白质对应的氨基酸序列转化为向量序列,分别计算它们对应的向量序列的离散傅里叶变换。据此,定义两个蛋白质序列所对应的交叉谱函数,考查交叉谱函数的信噪比,判断两种蛋白质序列的相似性或差异性。计算结果显示它是蛋白质比对的又一个有效方法,是Cosic一维共鸣识别的拓展。  相似文献   

18.
A method is presented for predicting the secondary structure of globular proteins from their amino acid sequence. It is based on a rigorous statistical exploitation of the well-known biological fact that the amino acid compositions of each secondary structure are different. We also propose an evaluation process that allows us to estimate the capacity of a method to predict the secondary structure of a new protein which does not have any homologous proteins whose structure is already known. This evaluation process shows that our method has a prediction accuracy of 58.7% over three states for the 62 proteins of the Kabsch and Sander (1983a) data bank. This result is better than that obtained by the most widely used methods--Lim (1974), Chou and Fasman (1978) and Garnier et al. (1978)--and also than that obtained by a recent method based on local homologies (Levin et al., 1986). Our prediction method is very simple and may be implemented on any microcomputer and even on programmable pocket calculators. A simple Pascal implementation of the method prediction algorithm is given. The interpretation of our results in terms of protein folding and directions for further work are discussed.  相似文献   

19.
A partial semi-coarsening multigrid method based on the high-order compact (HOC) difference scheme on nonuniform grids is developed to solve the 2D convection–diffusion problems with boundary or internal layers. The significance of this study is that the multigrid method allows different number of grid points along different coordinate directions on nonuniform grids. Numerical experiments on some convection–diffusion problems with boundary or internal layers are conducted. They demonstrate that the partial semi-coarsening multigrid method combined with the HOC scheme on nonuniform grids, without losing the high-order accuracy, is very efficient and effective to decrease the computational cost by reducing the number of grid points along the direction which does not contain boundary or internal layers.  相似文献   

20.
In previous work, we have shown that a set of characteristics, defined as (code frequency) pairs, can be derived from a protein family by the use of a signal-processing method. This method enables the location and extraction of sequence patterns by taking into account each (code frequency) pair individually. In the present paper, we propose to extend this method in order to detect and visualize patterns by taking into account several pairs simultaneously. Two 'multifrequency' methods are described. The first one is based on a rewriting of the sequences with new symbols which summarize the frequency information. The second method is based on a clustering of the patterns associated with each pair. Both methods lead to the definition of significant consensus sequences. Some results obtained with calcium-binding proteins and serine proteases are also discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号