首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 46 毫秒
1.
蛋白质二级结构的条件隐Markov性及其预测问题   总被引:5,自引:0,他引:5  
蛋白质二级结构预测问题自1957年首次被提出迄今已有40多年了,从知道的文献中可以得出如下信息:在统计意义之下,蛋白质序列中氨基酸之间的相互作用较弱,所以,统计方法中所依赖的独立性假设虽然不是从物理背景中得来的,但的确有其合理性和方便之处;交互信息准则优于均方误差准则;信息和统计的思想和方法在预测二级结构中不可低估;加入蛋白质的一级结构之外的信息可帮助提高二级结构预测的精度;而直接从一级结构出发无附加信息的情况下预测二级结构,现存在的预测方法的预测精度仍然无较大突破;预测精度和所使用的蛋白质样本序列在总体样本中的覆盖率,是评估各种预测方法的有效性的两个重要指标。本文作者建立了一个集蛋白质一、二级结构为一体联合结构模型,并将上述信息囊括在其中。由该模型首先得到蛋白质一、二级结构的信息与统计特性,然后利用这些特性分别对蛋白质一、二级结构中各种变量的信息传递关系及隐Markov性进行定量分析和确切地统计描述。最后给出直接从一级结构出发预测二级结构的几个原则。  相似文献   

2.
通过电能质量监测系统(power quality monitoring system, PQMS)中蕴含的电网历史故障变化、趋势等重要信息,对未来电压暂降进行预测,可为用户和电网公司合理规划生产,避免经济损失提供有力帮助。该文提出一种基于隐马尔可夫模型的电压暂降发生时间(occurrence time of voltage sag, OTVS)预测方法。首先对电压暂降发生时间的变量可预测性、数据冗余性、事件混沌性进行分析,揭示电压暂降监测数据特性;然后针对这三种特性,提出基于模糊C-均值聚类算法(fuzzy C-means algorithm, FCMA)和赤池信息准则(Akaike information criterion, AIC)的电压暂降历史状态识别与划分方法,以区间型变量刻画监测数据中的历史变化信息;建立考虑暂降历史变化信息和电网扰动变化信息的隐马尔可夫模型,实现对未来电压暂降的预测。最后,利用中部某省10个监测点的历史数据进行验证,所提方法的预测准确率最高可达92.85%,所提方法的预测性能较其他典型预测方法约高5%~30%。  相似文献   

3.
蛋白质结构预测的优化模型与方法   总被引:4,自引:0,他引:4  
从头预测方法是一种主要的蛋白质空间结构预测方法,其核心内容是恰当地建立并求解一个复杂的全局优化问题,40年来,虽然也取得了许多研究成果,但该问题的研究始终没有克服两个方面的困难,即如何找到一个表征蛋白质结构与能量关系的势能函数和一种有效的全局优化方法,主要介绍了蛋白质结构预测的优化模型和方法。  相似文献   

4.
提出了一种新的基于贝叶斯神经网络(BNN)的蛋白质二级结构预测方法。计算结果表明,BNN的性能优于反向神经网络(BPNN),平均Q3精度在四组交叉证实数据集与测试数据集下分别提高了3.65%和4.01%;还提出了一种有效缩短马尔可夫链蒙特卡罗(MCMC)模拟过程中“burn in”阶段的交叉证实初值选取方法。  相似文献   

5.
相比较于在完整数据下设备性能退化预测,缺失数据下的预测是更加困难的,也是更有意义的。然而,现有的轴承性能退化预测方法都未考虑缺失数据下的预测,基于此,提出了一种基于无限隐马尔可夫模型的缺失数据下轴承退化预测方法。在提出的方法中,通过建立无限隐马尔可夫预测模型,预测了滚动轴承样本数据在振荡阶段所缺失的数据点,形成新的完整数据。同时,再使用建立的预测模型对新的完整数据进行单步预测。实验结果表明,与真实值对比,得到的预测数据具有较小的平均误差值;对比真实值、完整数据下的预测值和新的完整数据下的预测值,验证了提出方法的有效性,能够反映滚动轴承退化的变化趋势。提出的方法可为数据缺失下滚动轴承的退化趋势预测提供一种思路,具有重要的理论价值和工程应用价值。  相似文献   

6.
基于SQL Server的蛋白质二级结构预测样本集数据库的构建   总被引:1,自引:1,他引:0  
张宁  吴捷  宋卓  张涛 《高技术通讯》2006,16(6):619-623
基于SQL Server数据库管理系统,将蛋白质二级结构预测的样本集CB513、CB396和RS126组织起来,建立了数据库DataSet,并配置了一个IIS服务器以方便网络查询.该数据库将蛋白质二级结构预测样本集有效地组织起来,实现了规范化、结构化统一管理,便于存储、检索和分析数据,减少错误的发生.通过该数据库可以提取供蛋白质二级结构预测研究的样本、序列转换、变换编码以及分析评价预测结果等,取代许多传统编程处理文本文件的繁琐工作,大大提高效率,促进工作的开展.  相似文献   

7.
提出了一种基于Fuzzy和隐马尔可夫模型(HMM)的新型主动轮廓线模型,该模型将HMM理论应用于主动轮廓线模型中,利用HMM提供多种测量信息的能力,优化了模型参数,克服了传统主动轮廓线模型的缺点.同时利用模糊集理论建立了HMM中的状态与观测向量之间的模糊隶属关系,从而使蛇点准确收敛到图像边缘.对动态图像的仿真实验表明,改进后的主动轮廓线模型能很好地收敛到物体的凹陷边界,且具有良好的抗噪性.  相似文献   

8.
对轴承振动信号进行时频分析获得全特征集;运用距离补偿法提取轴承故障敏感特征获得敏感特征集。两种特征集在用于训练、测试轴承状态时不仅诊断率不同,且误判样本亦不同。基于此,提出基于集成隐马尔可夫模型的轴承故障诊断方法。采用两种特征集分别建立两独立隐马尔可夫模型;运用平均法则、最大似然概率法集成隐马尔可夫模型分类效果;对轴承信号进行故障诊断。实验结果表明,与基于敏感特征集、全特征集的分类器相比,该模型分类器在轴承故障诊断中识别精度更高。  相似文献   

9.
由于多通道数据包含了丰富的信息,有效融合多通道数据可以得到更加准确可靠的诊断结果。鉴于此,提出一种基于耦合隐马尔可夫模型的滚动轴承多通道融合故障诊断方法。该方法利用含两条链的耦合隐马尔可夫模型融合轴承水平方向和垂直方向的振动信号来进行故障诊断。通过对滚动轴承常见故障的诊断分析表明,与常用的基于隐马尔可夫模型的故障诊断方法相比,该方法可以更加准确地诊断轴承的故障。  相似文献   

10.
隐马尔科夫模型(Hidden Markov Model)在诸多领域都有广泛应用.本文从不同角度对现有的HMM进行改进并应用于金融预测.首先,我们采取固定K-means方法的初始点,使得K-means的聚类结果更加稳定,由此为Baum-Welch算法确定更好的初始迭代值.其次,为更进一步提升预测效果,与已有方法不同,我们将由BaumWelch算法所得到的模型参数值作为Vertibi算法的输入来确定隐状态的最优取值序列,由此重新划分观测向量,进而得到各个隐状态对应的观测向量的集合;基于Vertibi算法的输出结果,我们重新计算不同类观测向量的均值与方差,将新的均值向量和协方差矩阵作为Baum-Welch算法初始迭代值,最终确定HMM最优的模型参数.最后,代替现有方法仅在历史区间中简单寻求相似走势的做法,我们不仅导出了预测值发生的多步条件概率的精细表达式,而且通过极大化该条件概率的值来得到更佳的预测值.基于中国证券市场具体数据的实证结果表明了本文所提出改进HMM的优越性.  相似文献   

11.
We propose two finite difference two-timescale Simultaneous Perturbation Stochastic Approximation (SPSA) algorithms for simulation optimization of hidden Markov models. Stability and convergence of both the algorithms is proved. Numerical experiments on a queueing model with high-dimensional parameter vectors demonstrate orders of magnitude faster convergence using these algorithms over related (N = l)-Simulation finite difference analogues and another Two-Simulation finite difference algorithm that updates in cycles.  相似文献   

12.
BACKGROUND: Antibiotic-resistant nosocomial pathogens can arise in epidemic clusters or sporadically. Genotyping is commonly used to distinguish epidemic from sporadic vancomycin-resistant enterococci (VRE). We compare this to a statistical method to determine the transmission characteristics of VRE. METHODS AND FINDINGS: A structured continuous-time hidden Markov model (HMM) was developed. The hidden states were the number of VRE-colonized patients (both detected and undetected). The input for this study was weekly point-prevalence data; 157 weeks of VRE prevalence. We estimated two parameters: one to quantify the cross-transmission of VRE and the other to quantify the level of VRE colonization from sporadic sources. We compared the results to those obtained by concomitant genotyping and phenotyping. We estimated that 89% of transmissions were due to ward cross-transmission while 11% were sporadic. Genotyping found that 90% had identical glycopeptide resistance genes and 84% were identical or nearly identical on pulsed-field gel electrophoresis (PFGE).There was some evidence, based on model selection criteria, that the cross-transmission parameter changed throughout the study period. The model that allowed for a change in transmission just prior to the outbreak and again at the peak of the outbreak was superior to other models. This model estimated that cross-transmission increased at week 120 and declined after week 135, coinciding with environmental decontamination. SIGNIFICANCE: We found that HMMs can be applied to serial prevalence data to estimate the characteristics of acquisition of nosocomial pathogens and distinguish between epidemic and sporadic acquisition. This model was able to estimate transmission parameters despite imperfect detection of the organism. The results of this model were validated against PFGE and glycopeptide resistance genotype data and produced very similar results. Additionally, HMMs can provide information about unobserved events such as undetected colonization.  相似文献   

13.
Support vector machines (SVMs) have shown strong generalization ability in a number of application areas, including protein structure prediction. However, the poor comprehensibility hinders the success of the SVM for protein structure prediction. The explanation of how a decision made is important for accepting the machine learning technology, especially for applications such as bioinformatics. The reasonable interpretation is not only useful to guide the "wet experiments," but also the extracted rules are helpful to integrate computational intelligence with symbolic AI systems for advanced deduction. On the other hand, a decision tree has good comprehensibility. In this paper, a novel approach to rule generation for protein secondary structure prediction by integrating merits of both the SVM and decision tree is presented. This approach combines the SVM with decision tree into a new algorithm called SVM/spl I.bar/DT, which proceeds in three steps. This algorithm first trains an SVM. Then, a new training set is generated through careful selection from the output of the SVM. Finally, the obtained training set is used to train a decision tree learning system and to extract the corresponding rule sets. The results of the experiments of protein secondary structure prediction on RS126 data set show that the comprehensibility of SVM/spl I.bar/DT is much better than that of the SVM. Moreover, the generalization ability of SVM/spl I.bar/DT is better than that of C4.5 decision trees and is similar to that of the SVM. Hence, SVM/spl I.bar/DT can be used not only for prediction, but also for guiding biological experiments.  相似文献   

14.
Principal component regression (PCR) was applied to a spectral library of proteins in H2O solution acquired by single-pass attenuated total reflectance (ATR) Fourier transform infrared (FT-IR) spectroscopy. PCR was used to predict the secondary structure content, principally alpha-helical and the beta-sheet content, of proteins within a spectral library. Quantitation of protein secondary structure content was performed as a proof of principle that use of single-pass ATR-FT-IR is an appropriate method for protein secondary structure analysis. The ATR-FT-IR method permits acquisition of the entire spectral range from 700 to 3900 cm(-1) without significant interference from water bands. An "inside model space" bootstrap and a genetic algorithm (GA) were used to improve prediction results. Specifically, the bootstrap was utilized to increase the number of replicates for adequate training and validation of the PCR model. The GA was used to optimize PCR parameters, particularly wavenumber selection. The use of the bootstrap allowed for adequate representation of variability in the amide A, amide B, and C-H stretching regions due to differing levels of sample hydration. Implementation of the bootstrap improved the robustness of the PCR models significantly; however, the use of a GA only slightly improved prediction results. Two spectral libraries are presented where one was better suited for beta-sheet content prediction and the other for alpha-helix content prediction. The GA-optimized PCR method for alpha-helix content prediction utilized 120 wavenumbers within the amide I, II, A, B, and IV and the C-H stretching regions and 18 factors. For beta-sheet content predictions, 580 wavenumbers within the amide I, II, A, and B and the C-H stretching regions and 18 factors were used. The validation results using these two methods yielded an average absolute error of 1.7% for alpha-helix content prediction and an average absolute error of 2.3% for beta-sheet content prediction. After the PCR models were developed and validated, they were used to predict the alpha-helix and beta-sheet content of two unknowns, casein and immunoglobulin G.  相似文献   

15.
With the market demands, the classification for highly reliable products becomes more and more significant. The degradation data can provide information about the degradation states and can be used to classify products to various classes according to the reliability attribute. In this paper, a temporal probabilistic approach, named segmental continuous hidden Markov model (SCHMM), is proposed to tackle the problem of degradation modeling and classification for mixed populations. Separate SCHMMs are built for each class of the mixed populations. The SCHMMs can directly depict the correspondence between actual degradation and the hidden states. A novel method called self‐training algorithm for the preprocessing of the original data from the mixed populations is proposed. Furthermore, the unknown parameters of the SCHMMs are estimated by the maximum likelihood method with the complete degradation data. The root mean square error of the estimated degradation value compared with the actual physical degradation value, as well as Akaike information criterion and Bayesian information criterion, is used for the evolution of the fitting accuracy and the selection of model topologies and discretization methods. Then the maximum posterior probability‐based classification criteria are developed. Degradation tests are designed for the data collection. To obtain the optimal classification policies, a cost function that consists of the degradation test cost and misclassification cost is constructed. A numerical example is used to illustrate the proposed method and demonstrate its advantages by comparing with other classification methods.  相似文献   

16.
马尔科夫预测模型具有"无后效性",即预测未来的销售情况只与当前的销售数据有关,而与过去的销售数据无关.事实上,过去不同的时间点对当前的销售结果会有不同程度的影响.而指数平滑法恰好弥补了马尔科夫预测模型的缺点,它认为最近的过去销售数据,在某种程度上会持续到未来.因此本文利用二次指数平滑系数法优化马尔科夫预测模型,并以某品牌电动车的销售情况为例进行验证,发现优化后预测模型的绝对误差均小于马尔科夫模型的预测结果.由此得出结论,基于二次指数平滑法优化的马尔科夫预测模型具有可行性.  相似文献   

17.
Monitoring stochastic processes with control charts is the main field of application in statistical process control. For a Poisson hidden Markov model (HMM) as the underlying process, we investigate a Shewhart individuals chart, an ordinary Cumulative Sum (CUSUM) chart, and two different types of log-likelihood ratio (log-LR) CUSUM charts. We evaluate and compare the charts' performance by their average run length, computed either by utilizing the Markov chain approach or by simulations. Our performance evaluation includes various out-of-control scenarios as well as different levels of dependence within the HMM. It turns out that the ordinary CUSUM chart shows the best overall performance, whereas the other charts' performance strongly depend on the particular out-of-control scenario and autocorrelation level, respectively. For illustration, we apply the HMM and the considered charts to a data set about weekly sales counts.  相似文献   

18.
We present an integrated Grid system for the prediction of protein secondary structures, based on the frequent automatic update of proteins in the training set. The predictor model is based on a feed-forward multilayer perceptron (MLP) neural network which is trained with the back-propagation algorithm; the design reuses existing legacy software and exploits novel grid components. The predictor takes into account the evolutionary information found in multiple sequence alignment (MSA); the information is obtained running an optimized parallel version of the PSI-BLAST tool, based on the MPI Master-Worker paradigm. The training set contains proteins of known structure. Using Grid technologies and efficient mechanisms for running the tools and extracting the data, the time needed to train the neural network is dramatically reduced, whereas the results are comparable to a set of well-known predictor tools.  相似文献   

19.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号