首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Correspondence discriminant analysis (CDA) is a multivariate statistical method derived from discriminant analysis which can be used on contingency tables. We have used CDA to separate Gram negative bacteria proteins according to their subcellular location. The high resolution of the discrimination obtained makes this method a good tool to predict subcellular location when this information is not known. The main advantage of this technique is its simplicity. Indeed, by computing two linear formulae on amino acid composition, it is possible to classify a protein into one of the three classes of subcellular location we have defined. The CDA itself can be computed with the ADE-4 software package that can be downloaded, as well as the data set used in this study, from the P?le Bio-Informatique Lyonnais (PBIL) server at http://pbil.univ-lyon1.fr.  相似文献   

2.
蛋白质亚细胞的定位预测不仅是研究蛋白质结构和功能的重要基础,还对了解某些疾病的发病机理、药物设计与发现具有重要意义.然而,如何利用机器学习精准预测蛋白质亚细胞的位置一直是一项具有挑战性的科学难题.针对这一问题,提出了一种基于聚类与特征融合的蛋白质亚细胞定位方法.首先将自相关系数法和熵密度法引入蛋白质特征表达模型的构建,...  相似文献   

3.
As an important attribute of proteins, protein subcellular location(s) can provide valuable information about their functions. Determining protein subcellular locations using experimental methods are usually expensive and time-consuming. Over the years, a variety of computational approaches have been developed to predict protein subcellular locations based on knowledge of known protein locations. However, the problem is inherently hard, especially for proteins that can exist at multiple subcellular locations. Further studies are still in great need in this area. In this paper, we propose an ensemble learning framework that utilizes a modified Weighted K-Nearest Neighbors (WKNN) as the basic learning algorithm. Two different types of features are considered and extracted from training data, which are based on protein amino acid compositions (Amphiphilic Pseudo Amino Acid Composition, or AmPseAAC) and protein sequence similarities (Protein Similarity Measure, or PSM), respectively. Two individual classifiers are trained separately based on these two types of features and each assigns a probability distribution over different locations to a query protein. Based on the outputs of the two base classifiers, a novel ensemble strategy named Maximized Probability on Label (MPoL) is proposed. The strategy produces a final set of protein locations for each protein by integrating prediction results of the base classifiers through an optimization procedure. To measure the prediction quality of the proposed approach, two different types of evaluation metrics, example-based metrics and label-based metrics, are used. To evaluate the performance of our approach objectively, we compare its results with those predicted by another popular method named iLoc-Animal on a benchmark dataset through cross-validation. Results show that in terms of absolute true success rate on multi-location prediction, MPoL has achieved much better results than iLoc-Animal. It implies that the proposed method has some potential to solve a diverse set of multi-label learning problems.  相似文献   

4.
提出了一个基于符号序列LZ复杂性相似度和K近邻规则的蛋白质亚细胞位点类型预测的方法。相比许多其他特征参数,蛋白质序列的LZ复杂性相似度计算无需深入的生物学领域知识和除序列数据以外的其他辅助数据。同时,K近邻规则的延迟学习特性适合于亚细胞位点类型已知的蛋白质数据的动态增加。在标准的RH数据集上对该预测方法进行10重交叉验证,其总体的预测准确率优于4种对照预测方法。  相似文献   

5.
获取凋亡蛋白亚细胞定位的信息对揭示细胞程序性死亡的机制和注解蛋白质功能都具有非常重要的意义。鉴于实验方法确定亚细胞定位不仅费时费力而且代价过高,开发快速有效的计算方法预测亚细胞定位已成为生物信息学领域的重要研究内容之一。首先基于位置特异性得分矩阵提取氨基酸组分、二肽组分和自协方差变量等特征构建蛋白质序列的特征表示模型,然后采用递归特征消除法进行特征选择,最后选用支持向量机分类器在两个常用数据集上进行夹克刀检验。实验结果表明,该方法优于大多数已报道的预测方法,从而证明了其有效性。  相似文献   

6.
Chloroplast is a type of subcellular organelle in green plants and algae. It is the main subcellular organelle for conducting photosynthetic process. The proteins, which localize within the chloroplast, are responsible for the photosynthetic process at molecular level. The chloroplast can be further divided into several compartments. Proteins in different compartments are related to different steps in the photosynthetic process. Since the molecular function of a protein is highly correlated to the exact cellular localization, pinpointing the subchloroplast location of a chloroplast protein is an important step towards the understanding of its role in the photosynthetic process. Experimental process for determining protein subchloroplast location is always costly and time consuming. Therefore, computational approaches were developed to predict the protein subchloroplast locations from the primary sequences. Over the last decades, more than a dozen studies have tried to predict protein subchloroplast locations with machine learning methods. Various sequence features and various machine learning algorithms have been introduced in this research topic. In this review, we collected the comprehensive information of all existing studies regarding the prediction of protein subchloroplast locations. We compare these studies in the aspects of benchmarking datasets, sequence features, machine learning algorithms, predictive performances, and the implementation availability. We summarized the progress and current status in this special research topic. We also try to figure out the most possible future works in predicting protein subchloroplast locations. We hope this review not only list all existing works, but also serve the readers as a useful resource for quickly grasping the big picture of this research topic.We also hope this review work can be a starting point of future methodology studies regarding the prediction of protein subchloroplast locations.  相似文献   

7.
乔善平  闫宝强 《计算机应用》2016,36(8):2150-2156
针对多标记学习和集成学习在解决蛋白质多亚细胞定位预测问题上应用还不成熟的状况,研究基于集成多标记学习的蛋白质多亚细胞定位预测方法。首先,从多标记学习和集成学习相结合的角度提出了一种三层的集成多标记学习系统框架结构,该框架将学习算法和分类器进行了层次性分类,并把二分类学习、多分类学习、多标记学习和集成学习进行有效整合,形成一个通用型的三层集成多标记学习模型;其次,基于面向对象技术和统一建模语言(UML)对系统模型进行了设计,使系统具备良好的可扩展性,通过扩展手段增强系统的功能和提高系统的性能;最后,使用Java编程技术对模型进行扩展,实现了一个学习系统软件,并成功应用于蛋白质多亚细胞定位预测问题上。通过在革兰氏阳性细菌数据集上进行测试,验证了系统功能的可操作性和较好的预测性能,该系统可以作为解决蛋白质多亚细胞定位预测问题的一个有效工具。  相似文献   

8.
蛋白质亚细胞定位是蛋白质组学基本问题之一。某些类型蛋白质可能存在于两个或两个以上的亚细胞位置,这类蛋白质的亚细胞定位问题更为复杂。分别利用Gene Ontology和伪氨基酸成分法,将一条蛋白质表示为一实值向量;采纳多标记学习中的Ranking思想,计算出一得分向量V,该向量的每一分量的值表示被预测蛋白质属于某个亚细胞位置的概率;利用最近邻算法预测蛋白质所属亚细胞位置的个数n,得分向量V中得分最高的n个分量对应的亚细胞位置即为预测的位置。  相似文献   

9.
10.
现有关键蛋白质识别算法对生物信息考虑不全面、识别准确率亦有待提高,针对此问题,提出一种高效关键蛋白质识别算法PDWS。首先,结合由亚细胞定位信息获取到的蛋白质位置和蛋白质相互作用网络边聚类系数构建加权网络;其次,依据蛋白质所处亚细胞位置,提出亚细胞定位区室子网参与度指标;最后,融合亚细胞定位区室子网参与度和蛋白质复合物子网参与度指标,多维度度量蛋白质关键性。在DIP和Krogan两个标准数据集上的实验结果表明,PDWS算法性能优于PeC、PCSD等已有算法,可识别出更多特定结构的关键蛋白质,且识别精度分别达到0.76与0.73。  相似文献   

11.
蛋白质亚细胞定位预测对于确定蛋白质功能、揭示分子交互机理、理解复杂生理过程和设计药物靶标等方面都有很大的促进作用。随着后基因组时代中蛋白质序列数据的指数增长, 研究基于机器学习的计算性蛋白质亚细胞定位预测方法变得越来越重要。为了能够把握该问题的研究状况, 从数据集构建、蛋白质特征提取与表示、预测算法设计、算法测试和Web服务的建立等五个方面对蛋白质亚细胞定位预测的研究进行了综述。指出了目前该研究领域需要解决的核心问题及难点问题, 分析了当前研究中出现的一些新情况, 并对将来的研究方向和研究重点进行了展望。  相似文献   

12.
文章基于离散小波变换提出了一种用于预测膜蛋白跨膜区数目和位置的新方法。以代码为1Q16的膜蛋白为例,通过选择合适的小波函数和分解级数,对1Q16对应的疏水值信号进行消噪处理,能够准确预测出跨膜区的数目。根据文章提出的膜蛋白家族阀值表,为膜蛋白数据选择合适的阀值,可以准确地预测出1Q16包含的跨膜区的具体位置。我们从膜蛋白数据库MPtopo中随机抽取90条膜蛋白数据作为测试集(含跨膜区335个),跨膜区预测准确率、膜蛋白序列预测准确率分别达到94.1%、88.9%。  相似文献   

13.
摘要:在蛋白质结构预测的研究中,一个重要的问题就是正确预测二硫键的连接,二硫键的准确预测可以减少蛋白质构像的搜索空间,有利于蛋白质的3D结构的预测。本文将一个蛋白质结构中二硫键的预测问题,等价为一个寻找图的最大权的匹配问题。图的顶点表示序列中的半胱氨酸残基,边连接每一顶点,表示一种可能的连接方式,边的权根据一个权值函数赋值,用EJ算法寻找具有最大权的匹配,则这个匹配对应二硫键的正确连接。应用这个方法对蛋白质结构的二硫键进行了预测取得了良好的结果。  相似文献   

14.
利用相似规则、互补规则和分子识别理论建立一种氨基酸数字编码模型用于研究序列特征、功能预测。给出一种新的基于元胞自动机的蛋白质序列图像生成方法,其优点是考虑了氨基酸前后的相互作用,生成的图像与基因序列一一对应,许多隐藏在蛋白质序列中的重要特性通过元胞自动机图可以表现出来。基于蛋白质元胞自动机图所得到的蛋白质伪氨基酸成分,蛋白质亚细胞定位预测成功率可以达到86.4%。  相似文献   

15.
夏英  毛鸿睿  张旭  裴海英 《计算机科学》2017,44(12):38-41, 57
位置推荐服务能使用户更容易地获得周边的兴趣点信息,但也会带来用户位置隐私泄露的风险。为了避免位置隐私泄露带来的不利影响,提出一种面向位置推荐服务的差分隐私保护方法。在保持用户位置轨迹与签到频率特征的前提下,基于路径前缀树及其平衡程度采用均匀分配和几何分配两种方式进行隐私预算分配,然后根据隐私预算分配结果添加满足差分隐私的Laplace噪音。实验结果表明该方法能有效保护用户位置隐私,同时通过合理的隐私预算分配能减少差分隐私噪音对推荐质量的影响。  相似文献   

16.
提出一种基于位置图的可逆水印算法,选取载体图像的低位平面值作为水印嵌入位置,将载体图像分块,根据分块中低位平面像素值的大小关系将其分成三种不同类型的组,并用变长编码方式对组别进行算术编码,经逐组顺次判断便形成位置图编码,根据位置图信息,将位置图编码、水印信息、位置图所占用像素的LSB全部嵌入载体图像的低位,形成加密图像;提取时,先获取位置图,然后根据位置图无损提取水印,无损还原载体图像。算法具有可逆性和可多重嵌入性,安全性高、隐蔽性好、适用性强。  相似文献   

17.
针对移动通信过程中通信态势无法被预知导致的服务效率较低问题,给出一种基于隐马尔可夫模型的区域通信态势估计方法。根据不同时间点的通信行为特征具有差异性的特点,对通信行为按不同的时间段进行划分,并自适应地给出具体的划分算法,即遗传法或遍历法。挖掘终端行为发生时间、地点以及通信行为之间的内在联系,构建隐马尔可夫模型,利用维特比译码算法对区域内终端位置及通信行为进行估计。仿真结果表明,当模式特征值取0.8时,该方法的终端位置预测成功率在73%左右,通信行为预测成功率在75%左右。  相似文献   

18.
Carbohydrate binding sites are considered important for cellular recognition and adhesion and are important targets for drug design. In this paper we present a new method called InCa-SiteFinder for predicting non-covalent inositol and carbohydrate binding sites on the surface of protein structures. It uses the van der Waals energy of a protein–probe interaction and amino acid propensities to locate and predict carbohydrate binding sites. The protein surface is searched for continuous volume envelopes that correspond to a favorable protein–probe interaction. These volumes are subsequently analyzed to demarcate regions of high cumulative propensity for binding a carbohydrate moiety based on calculated amino acid propensity scores.InCa-SiteFinder1 was tested on an independent test set of 80 protein–ligand complexes. It efficiently identifies carbohydrate binding sites with high specificity and sensitivity. It was also tested on a second test set of 80 protein–ligand complexes containing 40 known carbohydrate binders (having 40 carbohydrate binding sites) and 40 known drug-like compound binders (having 58 known drug-like compound binding sites) for the prediction of the location of the carbohydrate binding sites and to distinguish these from the drug-like compound binding sites. At 73% sensitivity the method showed 98% specificity. Almost all of the carbohydrate and drug-like compound binding sites were correctly identified with an overall error rate of 12%.  相似文献   

19.
为了准确的预测采空区煤矿煤岩破裂与失稳前岩石所释放出来的声发射信息的位置,并且根据山西焦煤的官地矿16403工作面获得的声发射事件的数据,因为该数据是一个非线性、高维的问题,提出了用PSO和SVM算法相结合的方法在煤矿煤岩声发射定位中的应用进行了研究。以往的方法只是单纯的收集煤岩或岩石声发射信息,以至于定位会出现失准、精度低和误差大的缺点。文章提出了“1+1=1”的定位方法,既收集同一位置的岩石和煤岩体的声发射信号,分析处理后,得到其位置。在煤岩失稳前两者都会发出强烈的信号。仿真结果表明:应用PSO和SVM理论结合的方法进行煤矿煤岩声发射定位的预测,在提高准确性和精确度的同时,也大大的提高了泛化的能力,该方法也大大减小定位失准的误差。  相似文献   

20.
车牌定位及倾斜校正方法   总被引:2,自引:0,他引:2  
提出了一种基于车牌字符信息的车牌定位及其校正方法。首先使用灰度形态学的顶帽操作(top-hat)增强车牌区域,使其能在二值图中突出显示,然后根据二值图中车牌字符的连通元个数和排列位置来确定车牌的具体位置,最后对已定位的车牌进行方向校正,包括水平和垂直方向校正。水平校正是根据这些字符连通元的中心确定车牌的水平倾斜角度,使用旋转几何变换使其水平方向得到校正,垂直校正则使用投影分析的方法求出水平校正后垂直方向的倾斜角度,再进行图像的像素平移。实验结果表明,该方法能够在复杂背景下快速、准确定位到车牌,并且倾斜校正效果很好。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号