首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Relatively little has been known about the structure of alpha-helical membrane proteins, since until recently few structures had been crystallized. These limited data have restricted structural analyses to the prediction of secondary structure, rather than tertiary folds. In order to address this, this paper describes an analysis of the 23 available membrane protein structures. A number of findings are made that are of particular relevance to transmembrane helix packing: (1) on average lipid-tail-accessible transmembrane residues are significantly more hydrophobic, less conserved and contain different residue types to buried residues; (2) charged residues are not always buried and, when accessible to membrane lipid tails, few are paired with another charge and instead they often interact with phospholipid head-groups or with other residue types; (3) a significant proportion of lipid-tail-accessible charged and polar residues form hydrogen bonds only with residues one turn away in the same helix (intra-helix); (4) pore-lining residues are usually hydrophobic and it is difficult to distinguish them from buried residues in terms of either residue type or conservation; and (5) information was gained about the proportion of helices that tend to contribute to lining a pore and the resulting pore diameter. These findings are discussed with relevance to the prediction of membrane protein 3D structure.  相似文献   

3.
Protein solubility is an important thermodynamic parameter that is critical for the characterization of a protein’s function, and a key determinant for the production yield of a protein in both the research setting and within industrial (e.g., pharmaceutical) applications. Experimental approaches to predict protein solubility are costly, time-consuming, and frequently offer only low success rates. To reduce cost and expedite the development of therapeutic and industrially relevant proteins, a highly accurate computational tool for predicting protein solubility from protein sequence is sought. While a number of in silico prediction tools exist, they suffer from relatively low prediction accuracy, bias toward the soluble proteins, and limited applicability for various classes of proteins. In this study, we developed a novel deep learning sequence-based solubility predictor, DSResSol, that takes advantage of the integration of squeeze excitation residual networks with dilated convolutional neural networks and outperforms all existing protein solubility prediction models. This model captures the frequently occurring amino acid k-mers and their local and global interactions and highlights the importance of identifying long-range interaction information between amino acid k-mers to achieve improved accuracy, using only protein sequence as input. DSResSol outperforms all available sequence-based solubility predictors by at least 5% in terms of accuracy when evaluated by two different independent test sets. Compared to existing predictors, DSResSol not only reduces prediction bias for insoluble proteins but also predicts soluble proteins within the test sets with an accuracy that is at least 13% higher than existing models. We derive the key amino acids, dipeptides, and tripeptides contributing to protein solubility, identifying glutamic acid and serine as critical amino acids for protein solubility prediction. Overall, DSResSol can be used for the fast, reliable, and inexpensive prediction of a protein’s solubility to guide experimental design.  相似文献   

4.
5.
We have compared the accuracy of the individual protein secondarystructure prediction methods: PHD, DSC, NNSSP and Predator againstthe accuracy obtained by combing the predictions of the methods.A range of ways of combing predictions were tested: voting,biased voting, linear discrimination, neural networks and decisiontrees. The combined methods that involve `learning' (the non-votingmethods) were trained using a set of 496 non-homologous domains;this dataset was biased as some of the secondary structure predictionmethods had used them for training. We used two independenttest sets to compare predictions: the first consisted of 17non-homologous domains from CASP3 (Third Community Wide Experimenton the Critical Assessment of Techniques for Protein StructurePrediction); the second set consisted of 405 domains that wereselected in the same way as the training set, and were non-homologousto each other and the training set. On both test datasets themost accurate individual method was NNSSP, then PHD, DSC andthe least accurate was Predator; however, it was not possibleto conclusively show a significant difference between the individualmethods. Comparing the accuracy of the single methods with thatobtained by combing predictions it was found that it was betterto use a combination of predictions. On both test datasets itwas possible to obtain a ~3% improvement in accuracy by combingpredictions. In most cases the combined methods were statisticallysignificantly better (at P = 0.05 on the CASP3 test set, andP = 0.01 on the EBI test set). On the CASP3 test dataset therewas no significant difference in accuracy between any of thecombined method of prediction: on the EBI test dataset, lineardiscrimination and neural networks significantly outperformedvoting techniques. We conclude that it is better to combinepredictions.  相似文献   

6.
The three‐dimensional (3‐D) structure prediction of proteins, given their amino acid sequence, is addressed using the first principles–based approach ASTRO‐FOLD 2.0. The key features presented are: (1) Secondary structure prediction using a novel optimization‐based consensus approach, (2) β‐sheet topology prediction using mixed‐integer linear optimization (MILP), (3) Residue‐to‐residue contact prediction using a high‐resolution distance‐dependent force field and MILP formulation, (4) Tight dihedral angle and distance bound generation for loop residues using dihedral angle clustering and non‐linear optimization (NLP), (5) 3‐D structure prediction using deterministic global optimization, stochastic conformational space annealing, and the full‐atomistic ECEPP/3 potential, (6) Near‐native structure selection using a traveling salesman problem‐based clustering approach, ICON, and (7) Improved bound generation using chemical shifts of subsets of heavy atoms, generated by SPARTA and CS23D. Computational results of ASTRO‐FOLD 2.0 on 47 blind targets of the recently concluded CASP9 experiment are presented. © 2011 American Institute of Chemical Engineers AIChE J, 2012  相似文献   

7.
An improved prediction of catalytic residues in enzyme structures   总被引:1,自引:0,他引:1  
The protein databases contain a huge number of function unknown proteins, including many proteins with newly determined 3D structures resulted from the Structural Genomics Projects. To accelerate experiment-based assignment of function, de novo prediction of protein functional sites, like active sites in enzymes, becomes increasingly important. Here, we attempted to improve the prediction of catalytic residues in enzyme structures by seeking and refining different encodings (i.e. residue properties) as well as employing new machine learning algorithms. In particular, considering that catalytic residues can often reveal specific network centrality when representing enzyme structure as a residue contact network, the corresponding measurement (i.e. closeness centrality) was used as one of the most important encodings in our new predictor. Meanwhile, a genetic algorithm integrated neural network (GANN) was also employed. Thanks to the above strategies, our GANN predictor demonstrated a high accuracy of 91.2% in the prediction of catalytic residues based on balanced datasets (i.e. the 1:1 ratio of catalytic to non-catalytic residues). When the GANN method was optimally applied to real enzyme structures, 73.9% of the tested structures had the active site correctly located. Compared with two existing methods, the proposed GANN method also demonstrated a better performance.  相似文献   

8.
《Ceramics International》2022,48(1):665-673
Wettability has a major effect on the performance of the corrosion of ceramic refractory under normal operating conditions. Contact angle measurement is available to characterize the wettability of liquid metals and oxide ceramics. Therefore, it is necessary to develop a contact angle prediction model with generalizability. This work emphasizes on developing a model for predicting the contact angle of a liquid metal with a solid oxide and analyzes the influence of factors affecting the contact angle when contact angle is predicted. In this paper, six contact angle prediction models are developed based on machine learning methods and contact angle data from the previous literature. The comparison between six contact angle prediction models evidences that the gaussian process regression (GPR) model has the best prediction accuracy and reaching 96%. Furthermore, the comparative results indicate that when surface energy of metal, surface energy of oxide, formation free energy of oxide, and bandgap energy of oxide are ignored respectively, the prediction accuracy of the model decreases by 4%, 3%, 1% and 1% respectively.  相似文献   

9.
De novo protein structure prediction plays an important role in studies of helical membrane proteins as well as structure-based drug design efforts. Developing an accurate scoring function for protein structure discrimination and validation remains a current challenge. Network approaches based on overall network patterns of residue packing have proven useful in soluble protein structure discrimination. It is thus of interest to apply similar approaches to the studies of residue packing in membrane proteins. In this work, we first carried out such analysis on a set of diverse, non-redundant and high-resolution membrane protein structures. Next, we applied the same approach to three test sets. The first set includes nine structures of membrane proteins with the resolution worse than 2.5 A; the other two sets include a total of 101 G-protein coupled receptor models, constructed using either de novo or homology modeling techniques. Results of analyses indicate the two criteria derived from studying high-resolution membrane protein structures are good indicators of a high-quality native fold and the approach is very effective for discriminating native membrane protein folds from less-native ones. These findings should be of help for the investigation of the fundamental problem of membrane protein structure prediction.  相似文献   

10.
The prediction of a protein's structure from its amino acidsequence has been a long-standing goal of molecular biology.In this work, a new set of conformational parameters for membranespanning helices was developed using the information from thetopology of 70 membrane proteins. Based on these conformationalparameters, a simple algorithm has been formulated to predictthe transmembrane helices in membrane proteins. A FORTRAN programhas been developed which takes the amino acid sequence as inputand gives the predicted transmembrane -helices as output. Thepresent method correctly identifies 295 transmembrane helicalsegments in 70 membrane proteins with only two overpredictions.Furthermore, this method predicts all 45 transmembrane helicesin the photosynthetic reaction center, bacteriorhodopsin andcytochrome c oxidase to an 86% level of accuracy and so is betterthan all other methods published to date.  相似文献   

11.
The ‘H5’ segment located between the putative fifthand sixth transmembrane helices is the most highly conservedregion in voltage-gated potassium channels and it is believedto constitute a major part of the ion conduction path (pore).Here we present a two-step procedure, comprising secondary structureprediction and hydrophobic moment profiling, to predict thestructure of this important region. Combined results from theapplication of the procedure to the H5 region of four voltage-gatedand five other K+ channel sequences lead to the prediction ofa ß-strand-turn-(3-strand structure for H5. The reasonsfor the application of these soluble protein methods to partsof membrane proteins are: (i) that pore-lining residues areaccessible to water and (ii) that a large enough database ofhighresolution membrane protein structures does not yet existThe results are compared with experimental results, in particularspectroscopic studies of two peptides based on the H5 sequenceof SHAKER potassium channel. The procedure developed here maybe applicable to wateraccessible regions of other membrane proteins.  相似文献   

12.
We present a novel method that predicts transmembrane domainsin proteins using solely information contained in the sequenceitself. The PRED-TMR algorithm described, refines a standardhydrophobicity analysis with a detection of potential termini(`edges', starts and ends) of transmembrane regions. This allowsone both to discard highly hydrophobic regions not delimitedby clear start and end configurations and to confirm putativetransmembrane segments not distinguishable by their hydrophobiccomposition. The accuracy obtained on a test set of 101 non-homologoustransmembrane proteins with reliable topologies compares wellwith that of other popular existing methods. Only a slight decreasein prediction accuracy was observed when the algorithm was appliedto all transmembrane proteins of the SwissProt database (release35). A WWW server running the PRED-TMR algorithm is availableat http://o2.db.uoa.gr/PRED-TMR/  相似文献   

13.
A hybrid system (hidden neural network) based on a hidden Markovmodel (HMM) and neural networks (NN) was trained to predictthe bonding states of cysteines in proteins starting from theresidue chains. Training was performed using 4136 cysteine-containingsegments extracted from 969 non-homologous proteins of well-resolved3D structure and without chain-breaks. After a 20-fold cross-validationprocedure, the efficiency of the prediction scores as high as80% using neural networks based on evolutionary information.When the whole protein is taken into account by means of anHMM, a hybrid system is generated, whose emission probabilitiesare computed using the NN output (hidden neural networks). Inthis case, the predictor accuracy increases up to 88%. Further,when tested on a protein basis, the hybrid system can correctlypredict 84% of the chains in the data set, with a gain of atleast 27% over the NN predictor.  相似文献   

14.
The stability of proteins is an essential property that has several biological implications. Knowledge about protein stability is important in many ways, ranging from protein purification and structure determination to stability in cells and biotechnological applications. Experimental determination of thermal stabilities has been tedious and available data have been limited. The introduction of limited proteolysis and mass spectrometry approaches has facilitated more extensive cellular protein stability data production. We collected melting temperature information for 34,913 proteins and developed a machine learning predictor, ProTstab2, by utilizing a gradient boosting algorithm after testing seven algorithms. The method performance was assessed on a blind test data set and showed a Pearson correlation coefficient of 0.753 and root mean square error of 7.005. Comparison to previous methods indicated that ProTstab2 had superior performance. The method is fast, so it was applied to predict and compare the stabilities of all proteins in human, mouse, and zebrafish proteomes for which experimental data were not determined. The tool is freely available.  相似文献   

15.
Flow characteristics of bidisperse mixtures of particles fluidized by a gas predicted by the mixture based kinetic theory of [Garzó et al., 2007a] and [Garzó et al., 2007b] and the species based kinetic theory model of Iddir and Arastoopour (2005) are compared. Simulations were carried out in two- and three-dimensional periodic domains. Direct comparison of the meso-scale gas-particle flow structures, and the domain-averaged slip velocities and meso-scale stresses reveals that both mixture and species based kinetic theory models manifest similar predictions for all the size ratios examined in this study. A detailed analysis is presented in which we demonstrate when the species based theory of Iddir and Arastoopour (2005) will reduce to a mathematical form similar to the mixture framework of [Garzó et al., 2007a] and [Garzó et al., 2007b] . We also find that the flow characteristics obtained for bidisperse mixtures are very similar to those obtained for monodisperse systems having the same Sauter mean diameter for the cases examined; however, the domain-averaged properties of monodisperse and bidisperse gas-particle flows do demonstrate quantitative differences. The use of filtered two-fluid models that average over meso-scale flow structures has already been described in the literature; it is clear from the present study that such filtered models are needed for coarse-grid simulations of polydisperse systems as well.  相似文献   

16.
Identifying secretory proteins from blood, saliva or other body fluids has become an effective method of diagnosing diseases. Existing secretory protein prediction methods are mainly based on conventional machine learning algorithms and are highly dependent on the feature set from the protein. In this article, we propose a deep learning model based on the capsule network and transformer architecture, SecProCT, to predict secretory proteins using only amino acid sequences. The proposed model was validated using cross-validation and achieved 0.921 and 0.892 accuracy for predicting blood-secretory proteins and saliva-secretory proteins, respectively. Meanwhile, the proposed model was validated on an independent test set and achieved 0.917 and 0.905 accuracy for predicting blood-secretory proteins and saliva-secretory proteins, respectively, which are better than conventional machine learning methods and other deep learning methods for biological sequence analysis. The main contributions of this article are as follows: (1) a deep learning model based on a capsule network and transformer architecture is proposed for predicting secretory proteins. The results of this model are better than the those of existing conventional machine learning methods and deep learning methods for biological sequence analysis; (2) only amino acid sequences are used in the proposed model, which overcomes the high dependence of existing methods on the annotated protein features; (3) the proposed model can accurately predict most experimentally verified secretory proteins and cancer protein biomarkers in blood and saliva.  相似文献   

17.
Disulfide bonds stabilize protein structures and play an important role in protein folding. Predicting disulfide connectivity precisely is an important task for determining the structural/functional relationships of proteins. The accuracy obtained by conventional disulfide connectivity predictions using sequence information only is limited. In this study, we aimed to develop a new method to improve the prediction accuracy of disulfide connectivity using support vector machine (SVM) with prior knowledge of disulfide bonding states and evolutionary information. The separations among the oxidized cysteine residues on a protein sequence have been encoded into vectors named cysteine separation profiles (CSPs). Our previous prediction of disulfide connectivity for non-redundant proteins in SwissProt release no. 39 (SP39) sharing less than 30% sequence identity has yielded the accuracy of 49% using CSP method alone. In this study, for proteins from the same dataset, an even better fourfold cross-validation accuracy of 62% was achieved using SVM with CSP as a feature.  相似文献   

18.
Identification of membrane spanning beta strands in bacterial porins   总被引:3,自引:0,他引:3  
The membrane assembly of outer membrane proteins is more complex than that of transmembrane helical proteins owing to the intervention of many charged and polar residues in the membrane. Accordingly, the predictive accuracy of transmembrane beta strands is considerably lower than that of transmembrane alpha helices. In this paper we develop a set of conformational parameters for membrane spanning beta strands. We formulate an algorithm to predict the transmembrane beta strands in the family of bacterial porins based on the conformational parameters and surrounding hydrophobicities of amino acid residues. A Fortran program has been developed which takes the amino acid sequence as the input file and gives the predicted transmembrane beta strand as output. The present method predicts at an accuracy level of 82% for all the bacterial porins considered.   相似文献   

19.
20.
Genetic variations have a multitude of effects on proteins. A substantial number of variations affect protein–solvent interactions, either aggregation or solubility. Aggregation is often related to structural alterations, whereas solubilizable proteins in the solid phase can be made again soluble by dilution. Solubility is a central protein property and when reduced can lead to diseases. We developed a prediction method, PON-Sol2, to identify amino acid substitutions that increase, decrease, or have no effect on the protein solubility. The method is a machine learning tool utilizing gradient boosting algorithm and was trained on a large dataset of variants with different outcomes after the selection of features among a large number of tested properties. The method is fast and has high performance. The normalized correct prediction rate for three states is 0.656, and the normalized GC2 score is 0.312 in 10-fold cross-validation. The corresponding numbers in the blind test were 0.545 and 0.157. The performance was superior in comparison to previous methods. The PON-Sol2 predictor is freely available. It can be used to predict the solubility effects of variants for any organism, even in large-scale projects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号