首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
Peptide vaccination for cancer immunotherapy requires identification of peptide epitopes derived from antigenic proteins associated with the tumor. Such peptides can bind to MHC proteins (MHC molecules) on the tumor-cell surface, with the potential to initiate a host immune response against the tumor. Computer prediction of peptide epitopes can be based on known motifs for peptide sequences that bind to a certain MHC molecule, on algorithms using experimental data as a training set, or on structure-based approaches. We have developed an algorithm, which we refer to as PePSSI, for flexible structural prediction of peptide binding to MHC molecules. Here, we have applied this algorithm to identify peptide epitopes (of nine amino acids, the common length) from the sequence of the cancer-testis antigen KU-CT-1, based on the potential of these peptides to bind to the human MHC molecule HLA-A2. We compared the PePSSI predictions with those of other algorithms and found that several peptides predicted to be strong HLA-A2 binders by PePSSI were similarly predicted by another structure-based algorithm, PREDEP. The results show how structure-based prediction can identify potential peptide epitopes without known binding motifs and suggest that side chain orientation in binding peptides may be obtained using PePSSI.  相似文献   

3.
Peptides that induce and recall T-cell responses are called T-cell epitopes. T-cell epitopes may be useful in a subunit vaccine against malaria. Computer models that simulate peptide binding to MHC are useful for selecting candidate T-cell epitopes since they minimize the number of experiments required for their identification. We applied a combination of computational and immunological strategies to select candidate T-cell epitopes. A total of 86 experimental binding assays were performed in three rounds of identification of HLA-A11 binding peptides from the six preerythrocytic malaria antigens. Thirty-six peptides were experimentally confirmed as binders. We show that the cyclical refinement of the ANN models results in a significant improvement of the efficiency of identifying potential T-cell epitopes.  相似文献   

4.
用于T细胞表位预测的分类器集成方法*   总被引:1,自引:1,他引:0  
T细胞表位预测技术对于减少实验合成重叠肽,理解T细胞介导的免疫特异性和研制亚单位多肽及基因疫苗均有重要意义.为弥补已有基于机器学习方法的T细胞表位预测模型的可理解性的不足并进一步提高模型的预测精度,首先通过肽的预处理构建出了存储等长肽段的决策表,而后提出了基于粗糙集的分类器集成算法.该算法不但综合利用了基于信息熵的属性约简完备算法和其他属性约简算法的优势,而且将T细胞表位预测领域中的锚点知识融入到了属性值约简过程中.最后利用该算法来预测MHC Ⅱ类分子HLA-DR4(B1·0401)的结合肽,首次提取出了预测精度高且能帮助专家理解MHC分子与抗原肽的结合机理的产生式规则,为下一步的分子建模工作奠定了基础.  相似文献   

5.
The non-covalent interaction between single-walled carbon nanotube and surfactant peptides makes them soluble in biological media to be used in nano-medicine, drug delivery and gene therapy. Pervious study has shown that two important parameters in binding peptides into nanotubes are hydrophobic effect and the number of aromatic amino acids. Ten surfactant peptides with the length of eight residue, including Lys, Trp, Tyr, Phe and Val, were designed to investigate the important parameters in binding peptides to a (6, 6) carbon nanotube. 500 ns MD simulation was performed for free surfactant peptides in water or near to a nanotube. Our results have indicated that the binding affinity of peptides to nanotube increases with the increase of aromatic residue content. Also, among aromatic residues, the peptides containing Trp residues have higher binding affinity to nanotube compared to the peptides with Phe or Tyr residue. Steric hindrance between bulky aromatic residues in peptide sequence has negative influence in binding peptide to nanotube, and in designing a surfactant peptide, the number and distance of aromatic residue and polarity of them should be taken into account. Our results also show that in docking peptides to nanotube, full-flexible docking leads to incorrect results.  相似文献   

6.
In this paper, we present a variational Bayes (VB) framework for learning continuous hidden Markov models (CHMMs), and we examine the VB framework within active learning. Unlike a maximum likelihood or maximum a posteriori training procedure, which yield a point estimate of the CHMM parameters, VB-based training yields an estimate of the full posterior of the model parameters. This is particularly important for small training sets since it gives a measure of confidence in the accuracy of the learned model. This is utilized within the context of active learning, for which we acquire labels for those feature vectors for which knowledge of the associated label would be most informative for reducing model-parameter uncertainty. Three active learning algorithms are considered in this paper: 1) query by committee (QBC), with the goal of selecting data for labeling that minimize the classification variance, 2) a maximum expected information gain method that seeks to label data with the goal of reducing the entropy of the model parameters, and 3) an error-reduction-based procedure that attempts to minimize classification error over the test data. The experimental results are presented for synthetic and measured data. We demonstrate that all of these active learning methods can significantly reduce the amount of required labeling, compared to random selection of samples for labeling.  相似文献   

7.
In classification tasks, active learning is often used to select out a set of informative examples from a big unlabeled dataset. The objective is to learn a classification pattern that can accurately predict labels of new examples by using the selection result which is expected to contain as few examples as possible. The selection of informative examples also reduces the manual effort for labeling, data complexity, and data redundancy, thus improves learning efficiency. In this paper, a new active learning strategy with pool-based settings, called inconsistency-based active learning, is proposed. This strategy is built up under the guidance of two classical works: (1) the learning philosophy of query-by-committee (QBC) algorithm; and (2) the structure of the traditional concept learning model: from-general-to-specific (GS) ordering. By constructing two extreme hypotheses of the current version space, the strategy evaluates unlabeled examples by a new sample selection criterion as inconsistency value, and the whole learning process could be implemented without any additional knowledge. Besides, since active learning is favorably applied to support vector machine (SVM) and its related applications, the strategy is further restricted to a specific algorithm called inconsistency-based active learning for SVM (I-ALSVM). By building up a GS structure, the sample selection process in our strategy is formed by searching through the initial version space. We compare the proposed I-ALSVM with several other pool-based methods for SVM on selected datasets. The experimental result shows that, in terms of generalization capability, our model exhibits good feasibility and competitiveness.  相似文献   

8.
An extension of cellular genetic programming for data classification (CGPC) to induce an ensemble of predictors is presented. Two algorithms implementing the bagging and boosting techniques are described and compared with CGPC. The approach is able to deal with large data sets that do not fit in main memory since each classifier is trained on a subset of the overall training data. The predictors are then combined to classify new tuples. Experiments on several data sets show that, by using a training set of reduced size, better classification accuracy can be obtained, but at a much lower computational cost.  相似文献   

9.
Peptide-major histocompatibility complex (MHC) binding is an important prerequisite event and has immediate consequences to immune response. Those peptides binding to MHC molecules can activate the T-cell immunity, and they are useful for understanding the immune mechanism and developing vaccines for diseases. Recently, researchers are interested in making prediction about binding affinity instead of differentiating the peptides as binder or non-binder. In this paper, we use sparse Bayesian regression algorithm proposed by Tipping [M.E. Tipping, Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. (2001)] to derive position-specific scoring matrices from allele-related peptides, and develop the models allowing for the prediction of MHC-II binding affinity. We explore the peptide length and peptide flanking residue length's impact on binding affinity, and incorporate these factors into our models to enhance prediction performance. When applied to the datasets from AntiJen database and IEDB database, our method produces better performances than several popular quantitative methods.  相似文献   

10.
Active Sampling for Class Probability Estimation and Ranking   总被引:1,自引:0,他引:1  
In many cost-sensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active learning acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features of an active learner and present a sampling-based active learning method for estimating class probabilities and class-based rankings. BOOTSTRAP-LV identifies particularly informative new data for learning based on the variance in probability estimates, and uses weighted sampling to account for a potential example's informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAP-LV with UNCERTAINTY SAMPLING, an existing active learning method designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the behavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both UNCERTAINTY SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive with BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling algorithms for classification.  相似文献   

11.
Non-covalent functionalized single-walled carbon nanotubes (SWCNTs) with improved solubility and biocompatibility can successfully transfer drugs, DNA, RNA, and proteins into the target cells. Theoretical studies such as molecular docking and molecular dynamics simulations in fully atomistic scale were used to investigate the hydrophobic and aromatic π–π-stacking interaction of designing four novel surfactant peptides for non-covalent functionalization of SWCNTs. The results indicated that the designed peptides have binding affinity towards SWCNT with constant interactions during MD simulation times, and it can even be improved by increasing the number of tryptophan residues. The aromatic content of the peptides plays a significant role in their adsorption in SWCNT wall. The data suggest that π–π stacking interaction between the aromatic rings of tryptophan and π electrons of SWCNTs is more important than hydrophobic effects for dispersing carbon nanotubes; nevertheless SWCNTs are strongly hydrophobic in front of smooth surfaces. The usage of aromatic content of peptides for forming SWCNT/peptide complex was proved successfully, providing new insight into peptide design strategies for future nano-biomedical applications.  相似文献   

12.
Peptide-MHC binding is an important prerequisite event and has immediate consequences to immune response. Those peptides binding to MHC molecules can activate the T-cell immunity, and they are useful for understanding the immune mechanism and developing vaccines for diseases. Accurate prediction of the binding between peptides and MHC-II molecules has long been a challenge in bioinformatics. Recently, instead of differentiating peptides as binder or non-binder, researchers are more interested in making predictions directly on peptide binding affinities. In this paper, we investigate the use of relevance vector machine to quantitatively predict the binding affinities between MHC-II molecules and peptides. In our scheme, a new encoding scheme is used to generate the input vectors, and then by using relevance vector machine we develop the prediction models on the basis of binding cores, which are recognized in an iterative self-consistent way. When applied to three MHC-II molecules DRB1*0101, DRB1*0401 and DRB1*1501, our method produces consistently better performance than several popular quantitative methods, in terms of cross-validated squared error, cross-validated correlation coefficient, and area under ROC curve. All evidences indicate that our method is an effective tool for MHC-II binding affinity prediction.  相似文献   

13.
14.
Research on peptide classification problems has focused mainly on the study of different encodings and the application of several classification algorithms to achieve improved prediction accuracies. The main drawback of the literature is the lack of an extensive comparison among the available encoding methods on a wide range of classification problems. This paper addresses the fundamental issue of which peptide encoding promises the best results for machine learning classifiers. Two novel encoding methods based on physicochemical properties of the amino acids are proposed and an extensive comparison with several standard encoding methods is performed on three different classification problems (HIV-protease, recognition of T-cell epitopes and prediction of peptides that bind human leukocyte antigens). The experimental results demonstrate the effectiveness of the new encodings and show that the frequently used orthonormal encoding is inferior compared to other methods.  相似文献   

15.
Recent approaches for classifying data streams are mostly based on supervised learning algorithms, which can only be trained with labeled data. Manual labeling of data is both costly and time consuming. Therefore, in a real streaming environment where large volumes of data appear at a high speed, only a small fraction of the data can be labeled. Thus, only a limited number of instances will be available for training and updating the classification models, leading to poorly trained classifiers. We apply a novel technique to overcome this problem by utilizing both unlabeled and labeled instances to train and update the classification model. Each classification model is built as a collection of micro-clusters using semi-supervised clustering, and an ensemble of these models is used to classify unlabeled data. Empirical evaluation of both synthetic and real data reveals that our approach outperforms state-of-the-art stream classification algorithms that use ten times more labeled data than our approach.  相似文献   

16.
The human TGF-β/SMAD7 signaling has been recognized as an attractive target of heterotopic ossification (HO). Here, we report a successful rational design of cyclic peptides to disrupt the signaling pathway by targeting TGF-β–receptor complex. The intermolecular interaction between TGF-β and its cognate receptor is characterized in detail using molecular dynamics simulation, binding energetic analysis, and alanine scanning. With the computational analysis a binding loop of receptor protein is identified that plays an essential role in the peptide-mediated TGF-β–receptor interaction. Subsequently, the loop is stripped from the protein context to generate a linear peptide segment, which possesses considerable flexibility and intrinsic disorder, and thus would incur a large entropy penalty upon binding to TGF-β. In order to minimize the unfavorable entropic effect, the linear peptide is cyclized by adding a disulfide bond between the N- and C-terminal cysteine residues of the peptide, resulting in a cyclic peptide. In vitro fluorescence anisotropy assays substantiate that the cyclic peptide can bind tightly to TGF-β with determined Kd value of 54 μM. We also demonstrated that structural optimization can further improve the peptide affinity by site-directed mutagenesis of selected residues based on the computationally modeled complex structure of TGF-β with the cyclic peptide.  相似文献   

17.
陈晓琪  谢振平  刘渊  詹千熠 《软件学报》2021,32(12):3884-3900
数据采样是快速提取大规模数据集中有用信息的重要手段,为更好地应对越来越大规模的数据高效处理要求,借助近邻传播算法的优异性能,通过引入分层增量处理和样本点动态赋权策略,实现了一种能够非常有效地平衡处理效率和采样质量的新方法.其中的分层增量处理策略考虑将原始的大规模数据集进行分批处理后再综合;而样本点动态赋权则考虑在近邻传播过程中对样本点进行合理的动态赋权,以获得采样的数据空间上更好的全局一致性.实验中,分别使用人工数据集、UCI标准数据集和图像数据集进行性能分析,结果表明:新方法与现有相关方法在采样划分质量上可达到同等水平,而计算效率则可实现大幅提升.进一步将新方法应用于深度学习的数据增强任务中,相应的实验结果表明:在原始数据增强方法上结合进高效增量采样处理后,在保持总训练数据集规模的情况下,所获得的模型性能可实现显著的提升.  相似文献   

18.
Consideration of binding competitiveness of a drug candidate against natural ligands and other drugs that bind to the same receptor site may facilitate the rational development of a candidate into a potent drug. A strategy that can be applied to computer-aided drug design is to evaluate ligand-receptor interaction energy or other scoring functions of a designed drug with that of the relevant ligands known to bind to the same binding site. As a tool to facilitate such a strategy, a database of ligand-receptor interaction energy is developed from known ligand-receptor 3D structural entries in the Protein Databank (PDB). The Energy is computed based on a molecular mechanics force field that has been used in the prediction of therapeutic and toxicity targets of drugs. This database also contains information about ligand function and other properties and it can be accessed at http://xin.cz3.nus.edu.sg/group/CLiBE.asp. The computed energy components may facilitate the probing of the mode of action and other profiles of binding. A number of computed energies of some PDB ligand-receptor complexes in this database are studied and compared to experimental binding affinity. A certain degree of correlation between the computed energy and experimental binding affinity is found, which suggests that the computed energy may be useful in facilitating a qualitative analysis of drug binding competitiveness.  相似文献   

19.
The storage and labeling of industrial data incur significant costs during the development of defect detection algorithms. Active learning solves these problems by selecting the most informative data among the given unlabeled data. The existing active learning methods for image segmentation focus on studying natural images and medical images, with less attention given to industrial images, and little research has been performed on imbalanced data. To solve these problems, we propose an active learning framework to selecting informative data for defect segmentation under imbalanced data. In the initialization stage, the framework uses self-supervised learning to initialize the data so that the initialization data contain more defect data, thereby solving the cold-start problem. During the iterative stage, we design the main body of the active learning framework, which is composed of a segmentation learner and a reconstruction learner. These learners use supervised learning to further improve the framework’s ability to select informative data. The experimental results obtained on public and self-owned datasets show that the framework can save 70% of the required storage space and greatly reduce the cost of labeling. The intersection over union value proves that the designed framework can achieve the equivalent effect of labeling the whole dataset by labeling partial data.  相似文献   

20.
现有的等距映射算法对邻域参数的选择较为敏感,而且对噪声干扰缺乏足够的鲁棒性。基于平均最短路径与邻域参数的变化关系与平均最短路径梯度提出了一种构建最优邻域图的方法,基于该方法构建的邻域图几乎没有短路边;可以根据每个数据点的不同特性采用可变的邻域参数;对数据点间的测地距有更好的逼近。实验表明:算法不仅对均匀采样、无噪声干扰的数据集有更好的降维性能,而且对噪声干扰的数据集有较强的鲁棒性与拓扑稳定性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号