首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the TNC family of Ca-binding proteins (calmodulin, parvalbumin,intestinal calcium binding protein and troponin C) {small tilde}70 well-conserved amino acid sequences and six crystal structuresare known. We find a clear correlation between residue contactsin the structures and residue conservation in the sequences:residues with strong sidechain–sidechain contacts in thethree-dimensional structure tend to be the more conserved inthe sequence. This is one way to quantify the intuitive notionof the importance of sidechain interactions for maintainingprotein three-dimensional structure in evolution and may usefullybe taken into account in planning point mutations in proteinengineering.  相似文献   

2.
Hydrophobic cluster analysis (HCA) is a protein sequence comparisonmethod based on -helical representations of the sequences wherethe size, shape and orientation of the clusters of hydrophobicresidues are primarily compared. The effectiveness of HCA hasbeen suggested to originate from its potential ability to focuson the residues forming the hydrophobic core of globular proteins.We have addressed the robustness of the bidimensional representationused for HCA in its ability to detect the regular secondarystructure elements of proteins. Various parameters have beenstudied such as those governing cluster size and limits, thehydrophobic residues constituting the clusters as well as thepotential shift of the cluster positions with respect to theposition of the regular secondary structure elements. The followingresults have been found to support the -helical bidimensionalrepresentation used in HCA: (i) there is a positive correlation(clearly above background noise) between the hydrophobic clustersand the regular secondary structure elements in proteins; (ii)the hydrophobic clusters are centred on the regular secondarystructure elements; (iii) the pitch of the helical representationwhich gives the best correspondence is that of an -helix. Thecorrespondence between hydrophobic clusters and regular secondarystructure elements suggests a way to implement variable gappenalties during the automatic alignment of protein sequences.  相似文献   

3.
An amino acid index is a set of 20 numerical values representingany of the different physicochemical and biochemical propertiesof amino adds. As a follow-up to the previous study, we haveincreased the size of the database, which currently contains402 published indices, and re-performed the single-linkage clusteranalysis. The results basically confirmed the previous findings.Another important feature of amino acids that can be representednumerically is the similarity between them. Thus, a similaritymatrix, also called a mutation matrix, is a set of 20x20 numericalvalues used for protein sequence alignments and similarity searches.We have collected 42 published matrices, performed hierarchicalcluster analyses and identified several clusters correspondingto the nature of the data set and the method used for constructingthe mutation matrix. Further, we have tried to reproduce eachmutation matrix by the combination of amino acid indices inorder to understand which properties of amino acids are reflectedmost. There was a relationship between the PAM units of Dayhoff'smutation matrix and the volume and hydrophobicity of amino adds.The database of 402 amino acid indices and 42 amino acid mutationmatrices is made publicly available on the Internet.  相似文献   

4.
5.
An automatic procedure is proposed to identify, from the proteinsequence database, conserved amino acid patterns (or sequencemotifs) that are exclusive to a group of functionally relatedproteins. This procedure is applied to the PIR database anda dictionary of sequence motifs that relate to specific superfamiliesconstructed. The motifs have a practical relevance in identifyingthe membership of specific superfamilies without the need toperform sequence database searches in 20% of newly determinedsequences. The sequence motifs identified represent functionallyimportant sites on protein molecules. When multiple blocks existin a single motif they are often close together in the 3-D structure.Furthermore, occasionally these motif blocks were found to besplit by introns when the correlation with exon structures wasexamined.  相似文献   

6.
We describe a method based on neural networks for predictingcontact maps of proteins using as input chemico-physical andevolutionary information. Neural networks are trained on a dataset comprising the contact maps of 200 non-homologous proteinsof well resolved three-dimensional structures. The systems learnthe association rules between the covalent structure of eachprotein and its correspondent contact map by means of a standardback propagation algorithm. Validation of the predictor on thetraining set and on 408 proteins of known structure which arenot homologous to those contained in the training set indicatethat this method scores higher than statistical approaches previouslydescribed and based on correlated mutations and sequence information.  相似文献   

7.
We present a novel method that predicts transmembrane domainsin proteins using solely information contained in the sequenceitself. The PRED-TMR algorithm described, refines a standardhydrophobicity analysis with a detection of potential termini(`edges', starts and ends) of transmembrane regions. This allowsone both to discard highly hydrophobic regions not delimitedby clear start and end configurations and to confirm putativetransmembrane segments not distinguishable by their hydrophobiccomposition. The accuracy obtained on a test set of 101 non-homologoustransmembrane proteins with reliable topologies compares wellwith that of other popular existing methods. Only a slight decreasein prediction accuracy was observed when the algorithm was appliedto all transmembrane proteins of the SwissProt database (release35). A WWW server running the PRED-TMR algorithm is availableat http://o2.db.uoa.gr/PRED-TMR/  相似文献   

8.
Before the structure of cAMP-dependent protein kinase had beensolved, sequence alignments had already suggested that severalhighly conserved peptide motifs described as kinase subdomainsI through XI might play some functional role in catalysis. Crystalstructures of several members of the protein kinase superfamilyhave suggested that the nearly invariant aspartate residue withinsubdomain IX contributes to the conformational stability ofthe catalytic loop by forming hydrogen bonds with backbone amideswithin subdomain VI. However, substitution of this aspartatewith alanine or threonine in some protein kinases have indicatedthat these interactions are not essential for activity. In contrast,we show here that conversion of this aspartate to arginine abolishedthe catalytic activity of the Fer protein-tyrosine kinase whenexpressed either in mammalian cells or in bacteria. Structuralmodeling predicted that the catalytic loop of the FerD743R mutantwas disrupted by van der Waal's repulsion between the side chainsof the substituted arginine residue in subdomain IX and histidine-683in subdomain VI. The FerD743R mutant model predicted a shiftin the peptide backbone of the catalytic loop, and an outwardrotation of histidine-683 and arginine-684 side chains. However,the position and orientation of the presumptive catalytic base,aspartate-685, was not substantially changed. The proposed modelexplains how substitutions of some, but not all residues couldbe tolerated at this nearly invariant aspartate in kinase subdomainIX.  相似文献   

9.
Twilight zone of protein sequence alignments   总被引:8,自引:0,他引:8  
Sequence alignments unambiguously distinguish between proteinpairs of similar and non-similar structure when the pairwisesequence identity is high (>40% for long alignments). Thesignal gets blurred in the twilight zone of 20–35% sequenceidentity. Here, more than a million sequence alignments wereanalysed between protein pairs of known structures to re-definea line distinguishing between true and false positives for lowlevels of similarity. Four results stood out. (i) The transitionfrom the safe zone of sequence alignment into the twilight zoneis described by an explosion of false negatives. More than 95%of all pairs detected in the twilight zone had different structures.More precisely, above a cut-off roughly corresponding to 30%sequence identity, 90% of the pairs were homologous; below 25%less than 10% were. (ii) Whether or not sequence homology impliedstructural identity depended crucially on the alignment length.For example, if 10 residues were similar in an alignment oflength 16 (>60%), structural similarity could not be inferred.(iii) The `more similar than identical' rule (discarding allpairs for which percentage similarity was lower than percentageidentity) reduced false positives significantly. (iv) Usingintermediate sequences for finding links between more distantfamilies was almost as successful: pairs were predicted to behomologous when the respective sequence families had proteinsin common. All findings are applicable to automatic databasesearches.  相似文献   

10.
We have developed a new method of protein structure comparisonbased on spatial arrangements of secondary structural elements(SSEs). Each SSE is represented by a single vector, and commonspatial arrangements of vectors in a pair of proteins are detected.The method allows not only insertions and deletions of SSEs,but also topological permutations. It has a flexible targetfunction that can be adjusted depending upon particular levelsor definitions of structural similarity, and it is fast enoughto allow structural comparisons for many pairs of proteins.The parameters for the target function are determined basedon distributions of the geometrical variables for the spatialarrangements of the equivalent SSEs in well-known structuralmotifs. The obtained parameter set is tuned for detecting relativelystrong structural similarity. We report several tests on examplesincluding comparisons of known structural similarity and databasesearches for a target structure, and examine the results whenthis parameter set is used for the comparison of distantly relatedstructures.  相似文献   

11.
Numerous mammalian proteins are constructed from a limited repertoire of module-types. Proteins belonging to the regulators of complement activation family--crucial for ensuring a complement-mediated immune response is targeted against infectious agents--are composed solely of complement control protein (CCP) modules. In the current study, CCP module sequences were grouped to allow selection of the most appropriate experimentally determined structures to serve as templates in an automated large-scale structure modelling procedure. The resulting 135 individual CCP module models, valuable in their own right, are available at the online database http://www.bru.ed.ac.uk/~dinesh/ccp-db.html. Comparisons of surface properties within a particular family of modules should be more informative than sequence alignments alone. A comparison of surface electrostatic features was undertaken for the first 28 CCP modules of complement receptor type 1 (CR1). Assignments to clusters based on surface properties differ from assignments to clusters based on sequences. This observation might reflect adaptive evolution of surface-exposed residues involved in protein-protein interactions. This illustrative example of a multiple surface-comparison was indeed able to pinpoint functional sites in CR1.  相似文献   

12.
Local protein sequence similarity does not imply a structural relationship   总被引:1,自引:0,他引:1  
A database search often will find a seemingly strong sequencesimilarity between two fragments of proteins that are not expectedto have an evolutionary or functional relationship. It is temptingto suggest that the two fragments will adopt a similar conformationdue to a common pattern of residues that dictate a particularsubstructure. To investigate the likelihood of such a structuralsimilarity, local sequence similarities between proteins ofknown conformation were identified by a standard database searchalgorithm. Significant sequence similarity was identified aswhen the chance probability of obtaining the relatedness scorefrom a scan of the entire database was <1%. In this regionboth true homologies and false homologies are detected. A totalof 69 false homologies was located of length between 20 and262 aligned positions. Many of these alignments had 25% sequenceidentity and a further 25% of conservative changes. However,the results show in general these aligned fragments did nothave a significant similarity in secondary or tertiary structure.Thus local sequence does not indicate a structural similaritywhen there is neither an evolutionary nor functional explanationto support this. Accordingly structure predictions based onfinding a local sequence similarity with an evolutionary unrelatedprotein of known conformation are unlikely to be valid.  相似文献   

13.
A new algorithm, called convex constraint analysis, has beendeveloped to deduce the chiral contribution of the common secondarystructures directly from experimental CD curves of a large numberof proteins. The analysis is based on CD data reported by YangJ.T., Wu,C.-S.C. and Martinez,H.M. [Methods Enzymol., 130, 208–269(1986)]. Application of the decomposition algorithm for simulatedprotein data sets resulted in component spectra [B(, i)] identicalto the originals and weights [C(i, k)] with excellent Pearsoncorrelation coefficients (R) [Chang,C.T., Wu,C.-S.C. and Yang,J.T.(1978) Anal. Biochem., 91,12–31]. Test runs were performedon sets of simulated protein spectra created by the Monte Carlotechnique using poly-L-lysine-based pure component spectra.The significant correlational coefficients (R >0.9) demonstratedthe high power of the algorithm. The algorithm, applied to globularprotein data, independent of X-ray data, revealed that the CDspectrum of a given protein is composed of at least four independentsources of chirality. Three of the computed component curvesshow remarkable resemblance to the CD spectra of known proteinsecondary structures. This approach yields a significant improvementin secondary structural evaluations when compared with previousmethods, as compared with X-ray data, and yields a realisticset of pure component spectra. The new method is a useful toolnot only in analyzing CD spectra of globular proteins but alsohas the potential for the analysis of integral membrane proteins.  相似文献   

14.
Huntington's disease is one of nine known neurodegenerative diseases in which a disease-specific protein contains an unusually long polyglutamine (polyQ) stretch. The proteins associated with each disease are unrelated in sequence, size, structure, function or location of the mutation. In all cases, there is an apparent critical number of glutamines below which individuals do not develop disease. Expansion of the polyQ domain is closely associated with misfolding and aggregation of the protein. It is not yet well understood how the length of the polyQ tract, and its location within a given protein, is related to misfolding and to disease. In this work we developed a strategy for generating length libraries of polyQ-containing proteins, with the polyQ inserted at an arbitrary location. This strategy facilitates systematic, detailed study of the relationship among polyQ length, context and misfolding.  相似文献   

15.
The role of intermediates in the folding reaction of single-domainproteins is a controversial issue. It was previously shown bydifferent methods that an on-pathway intermediate is populatedin the presence of sodium sulphate during the folding of theFF domain from HYPA/FBP11. Here we demonstrate using analysisof the amplitudes of kinetic traces that this burst-phase foldingintermediate is present at different salt concentration andat various pH, and is also found in roughly 30 site-directedmutants. The intermediate appears robust to changing conditionsand thus fulfils an important criterion for a productive molecularspecies on the folding reaction pathway.  相似文献   

16.
Increasing the potency of a cytotoxin with an arginine graft   总被引:1,自引:0,他引:1  
Variants and homologs of bovine pancreatic ribonuclease (RNase A) can exhibit cytotoxic activity. This toxicity relies on cellular internalization of the enzyme. Residues Glu49 and Asp53 form an anionic patch on the surface of RNase A. We find that replacing these two residues with arginine does not affect catalytic activity or affinity for the cytosolic ribonuclease inhibitor (RI) protein. This 'arginine graft' does, however, increase toxicity towards human cancer cells. Appending a nonaarginine domain to this cationic variant results in an additional increase in cytotoxicity, providing one of the most cytotoxic known variants of RNase A. These findings correlate the potency of a ribonuclease with its deliverance of ribonucleolytic activity to the cytosol, and indicate a rational means to enhance the efficacy of ribonucleases and other cytotoxic proteins.  相似文献   

17.
We have compared a novel sequence–structure matching technique,FORESST, for detecting remote homologs to three existing sequencebased methods, including local amino acid sequence similarityby BLASTP, hidden Markov models (HMMs) of sequences of proteinfamilies using SAM, HMMs based on sequence motifs identifiedusing meta-MEME. FORESST compares predicted secondary structuresto a library of structural families of proteins, using HMMs.Altogether 45 proteins from nine structural families in thedatabase CATH were used in a cross-validated test of the foldassignment accuracy of each method. Local sequence similarityof a query sequence to a protein family is measured by the highestsegment pair (HSP) score. Each of the HMM-based approaches (FORESST,MEME, amino acid sequence-based HMM) yielded log-odds scorefor the query sequence. In order to make a fair comparison amongthese methods, the scores for each method were converted toZ-scores in a uniform way by comparing the raw scores of a queryprotein with the corresponding scores for a set of unrelatedproteins. Z-Scores were analyzed as a function of the maximumpairwise sequence identity (MPSID) of the query sequence tosequences used in training the model. For MPSID above 20%, theZ-scores increase linearly with MPSID for the sequence-basedmethods but remain roughly constant for FORESST. Below 15%,average Z-scores are close to zero for the sequence-based methods,whereas the FORESST method yielded average Z-scores of 1.8 and1.1, using observed and predicted secondary structures, respectively.This demonstrates the advantage of the sequence–structuremethod for detecting remote homologs.  相似文献   

18.
The relationship between the effective dielectric constant thatmodels the electrostatic effect from a charged side chain ina protein was evaluated both experimentally and theoretically.Experimental values were obtained from the shifts in pKa thatresulted from point mutations of side chains in subtilisin.Theoretical values were obtained from an iterative solutionto Poisson's equation that considers the dielectric responseof the protein and the solvent together with charge positions.There is no simple relationship between the effective dielectricconstant and the distance from the charge responsible for theinteractions. For some charge positions a linear but not a directproportional relationship of the effective dielectric with distanceof separation was observed. Thus, simple models such as a lineardistance-dependence for the dielectric response are not suitableto evaluate electrostatic effects in proteins.  相似文献   

19.
De novo protein structure prediction plays an important role in studies of helical membrane proteins as well as structure-based drug design efforts. Developing an accurate scoring function for protein structure discrimination and validation remains a current challenge. Network approaches based on overall network patterns of residue packing have proven useful in soluble protein structure discrimination. It is thus of interest to apply similar approaches to the studies of residue packing in membrane proteins. In this work, we first carried out such analysis on a set of diverse, non-redundant and high-resolution membrane protein structures. Next, we applied the same approach to three test sets. The first set includes nine structures of membrane proteins with the resolution worse than 2.5 A; the other two sets include a total of 101 G-protein coupled receptor models, constructed using either de novo or homology modeling techniques. Results of analyses indicate the two criteria derived from studying high-resolution membrane protein structures are good indicators of a high-quality native fold and the approach is very effective for discriminating native membrane protein folds from less-native ones. These findings should be of help for the investigation of the fundamental problem of membrane protein structure prediction.  相似文献   

20.
The Fab region of an IgG2b antibody (AM7B2.1) reactive to theherbicide atrazine was cloned into a plasmid vector using thepolymerase chain reaction and two sets of degenerate oligonucleotideprimers designed to mimic the amino acid variation at the N-terminiof L-chains and TH-chains. These primers also provide a secretionsignal fused precisely to the antibody gene sequence for secretionof the mature antibody. A further set of universal oligonucleotideprimers was developed for the direct sequencing of the VH andCm regions of B-chains and the VL and CL regions of L-chainswithout subcloning and were used to determine the sequence ofthis antibody. The L-chain was found to not possess a conservedCys residue at position 23 and the implications of this observationare discussed. The cloned genes were expressed in Escherichiacoli using a commercially available T7 RNA polymerase-basedplasmid. The clones were also expressed in a 17 RNA polymerasebasedsystem containing an attenuated version of the T7 RNA polymerasepromoter, plus a lac promoter placed in an antisense orientation,to enhance plasmid stability. The expressed products were confirmedas atrazine reactive by binding to an atrazine derivative conjugatedwith alkaline phosphatase.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号