基于词汇化随机文法模型的RNA二级结构预测   总被引:1,自引:0,他引:1  
针对经典的随机文法模型预测RNA二级结构存在精度不高的问题,本文给出了一种词汇化随机文法模型预测RNA二级结构的方法。首先,用最大熵模型获取RNA序列中的词条信息,通过Viterbi算法搜索每个词条被标注为某种二级结构类型的最大概率;然后,将这些词条信息作为先验信息在随机文法模型训练过程中引入,从而加快对二级结构的搜索过程,提高准确率。  相似文献   

RNA二级结构预测问题是生物信息学的一个研究重点。该文主要利用自然语言理解中旬法分析的方法来研究RNA二级结构预测。使用基于角色反演算法建立起来的,采用概率上下文无关文法进行分析的句法分析器,来预测RNA二级结构。结合传统Chart算法分析器和广义LR算法分析器的优点,建立角色反演句法分析器;根据RNA二级结构的构建方法建立相应的概率上下文无关文法;给出对RNA二级结构进行预测的具体实例。  相似文献   

基于随机上下文无关文法(SCFG)理论模型进行RNA二级结构预测是目前采用计算方法研究RNA二级结构的一种重要途径.由于基于SCFG模型的标准结构预测算法(Coche-Younger-Kasami,CYK)巨大的时空复杂度,对CYK算法进行加速成为计算生物学领域一个极具挑战性的热点问题.CYK的并行性能受限于算法多维度、非一致性的数据依赖关系和较低的计算/通信比,现有的基于通用微处理器结构的大规模并行处理方案不能获得令人满意的加速效果,并且大规模并行计算机系统硬件设备的购置、使用、日常维护的成本高昂,其适用性受到诸多限制.文中在深入分析CYK算法计算特征的基础上,基于FPGA平台提出并实现了一种细粒度的并行CYK算法.设计采用了对三维动态规划矩阵按区域分割和逐层按列并行处理的计算策略实现了多个处理单元间的负载均衡;采用数据预取、滑动窗口和数据传递流水线实现处理单元间的数据重用,有效解决了计算和通信间的平衡问题;设计了一种类似脉动阵列(systolic-like array)结构的主从多PE并行计算阵列,并在目前最大规模的FPGA芯片(Xilinx XC5VLX330)上成功集成了16个处理单元(process...  相似文献   

Abe  Naoki  Mamitsuka  Hiroshi 《Machine Learning》1997,29(2-3):275-301
We propose a new method for predicting protein secondary structure of a given amino acid sequence, based on a training algorithm for the probability parameters of a stochastic tree grammar. In particular, we concentrate on the problem of predicting -sheet regions, which has previously been considered difficult because of the unbounded dependencies exhibited by sequences corresponding to -sheets. To cope with this difficulty, we use a new family of stochastic tree grammars, which we call Stochastic Ranked Node Rewriting Grammars, which are powerful enough to capture the type of dependencies exhibited by the sequences of -sheet regions, such as the parallel and anti-parallel dependencies and their combinations. The training algorithm we use is an extension of the inside-outside algorithm for stochastic context-free grammars, but with a number of significant modifications. We applied our method on real data obtained from the HSSP database (Homology-derived Secondary Structure of Proteins Ver 1.0) and the results were encouraging: Our method was able to predict roughly 75 percent of the -strands correctly in a systematic evaluation experiment, in which the test sequences not only have less than 25 percent identity to the training sequences, but are totally unrelated to them. This figure compares favorably to the predictive accuracy of the state-of-the-art prediction methods in the field, even though our experiment was on a restricted type of -sheet structures and the test was done on a relatively small data size. We also stress that our method can predict the structure as well as the location of -sheet regions, which was not possible by conventional methods for secondary structure prediction. Extended abstracts of parts of the work presented in this paper have appeared in (Abe & Mamitsuka, 1994) and (Mamitsuka & Abe, 1994).  相似文献   

韩召伟  李永明 《软件学报》2010,21(9):2107-2117
给出基于量子逻辑的下推自动机(e-VPDA)的概念,提出广义的子集构造方法,进而证明了一般的e-VPDA与状态转移为分明函数且具有量子终态的e-VPDA的等价性.利用此等价性,给出了量子上下文无关语言的代数刻画与层次刻画,并籍此证明了量子上下文无关语言关于正则运算的封闭性.最后,说明了量子下推自动机和量子上下文无关文法(e-VCFG)的等价性.  相似文献   

An increasing number of structural homology search tools, mostly based on profile stochastic context-free grammars (SCFGs) have been recently developed for the non-coding RNA gene identification. SCFGs can include statistical biases that often occur in RNA sequences, necessary to profile specific RNA structures for structural homology search. In this paper, a succinct stochastic grammar model is introduced for RNA that has competitive search effectiveness. More importantly, the profiling model can be easily extended to include pseudoknots, structures that are beyond the capability of profile SCFGs. In addition, the model allows heuristics to be exploited, resulting in a significant speed-up for the CYK algorithm-based search.  相似文献   

格值树自动机与格值上下文无关树文法的等价性   总被引:1,自引:0,他引:1  
本文将模糊树自动机和模糊上下文无关树文法的概念推广到格半群上。证明了在接受语言和生成语言的意义下,树自动机和上下文无关树文法是等价的。同时给出了构造正规形式的等价文法的方法。  相似文献   

利用混沌差分进化算法预测RNA二级结构   总被引:1,自引:0,他引:1  
胡桂武  彭宏 《计算机科学》2007,34(9):163-166
RNA二级结构预测在生物信息学中具有重要意义。本文针对RNA二级结构预测,提出了一种混沌差分进化算法。算法对种群进行混沌初始化,利用混沌扰动产生新的个体,缩小搜索空间;根据个体的适应值和种群密度自适应地对个体进行混沌更新,改善了种群的多样性。该算法充分利用了差分进化算法速度快以及混沌的遍历性、随机性和规律性等特点,有效克服了早熟现象,提高了算法的全局搜索能力。实验证明了算法的有效性。  相似文献   

基于符号动力学原理,提出了一种新的RNA二级结构序列的图形表示方法.通过生物信息和自由能两种信息,该图形表示方法将RNA二级结构序列中的自由基和碱基对分别映射成两类时间序列.这种映射方法不仅能够在转换过程中不丢失任何数据信息,而且在二维图形中也能够清楚地识别配对碱基所在的区域.基于该图形表示方法对二级结构的表示结果构建特征矩阵.进一步由该特征矩阵的最大特征值组成用于相似性分析的向量.采用新的相似性分析方法,分别从时域和频域对不同病毒在3′末端的RNA二级结构序列集合进行定性和定量的相似度分析.仿真结果表明,该方法能够有效地实现RNA二级结构序列的相似度分析.与其他方法相比,新方法所得结果中数值差值较大,有利于区分不同物种.  相似文献   

RNA二级结构预测是生物信息学的重要研究领域.本文提出一种新的基于混合蚁群遗传算法的RNA二级结构预测方法.充分利用茎区和茎区之间的关系信息和累积的信息,通过蚁群算法产生初始种群和新的个体,进而替换遗传算法中的变异算子.构造蚁群算法中的启发式信息、初始信息素矩阵、下一茎区的选取规则和信息素的更新机制,给出遗传算法中交叉...  相似文献   

本文提出了一个预测RNA二级结构的计算模型和动态规划算法.该算法采用子序列的组合策略和RNA二级结构的内在特性,计算多个平面伪结点和一个非平面伪结点结构.与Rivas算法相比,该算法减少了2n4的空间,并将时间复杂度由O(n6)降为O(n5).实验结果验证了算法的有效性.  相似文献   

A new model and corresponding dynamic programming algorithm are presented to predict RNA secondary structure including pseudoknots in this paper. This algorithm can compute arbitrary planar pseudoknots and one non-planar pseudoknots in O(n5) time and O(n4) space.  相似文献   

有效预测RNA二级结构是生物信息学中的重要研究领域.提出一种基于隐Markov模型预测RNA二级结构的新方法.首先,应用前后缀匹配算法快速找到所有可能(包括假结)的茎区,建立RNA-HMM,寻找最优的茎区组合方法,得到包含假结的RNA二级结构.实验结果表明,提出的新方法降低了计算复杂性,提高了预测的特异性和敏感性,具有较高的准确率,可以预测RNA的假结结构.  相似文献   

RNA二级结构预测问题是生物信息学的一个研究重点,本文主要利用支持向量机(SVM)模型来研究RNA 二级结构预测问题.通过改进NSSEL标签[4],形成了能表示平面伪结结构的E-NSSEL标签,该标签作为SVM模型输出端的类别标识,因此,测试序列经过SVM模型预测后得到相应的E-NSSEL序列,该序列可以恢复为二级结构.此算法能有效地解决传统算法中存在的时间复杂性的问题和长链分子的预测问题.  相似文献   

介绍了构造性机器学习方法——覆盖算法在蛋白质二级结构预测中的应用。相比普通的神经网络,这种方法直观且运算简单,对训练样本可100%识别。同时,考虑到同源家族的结构应该比单条序列结构预测更准确,采用了基于概率的Profile编码方式,相比以往的预测方法,具有更好的稳定性和精确性。  相似文献   

Recent research on structure and motion recovery has focused on issues related to sensitivity and robustness of existing techniques. One possible reason is that in practical applications, the underlying assumptions made by existing algorithms are often violated. In this paper, we propose a framework for 3D reconstruction from short monocular video sequences taking into account the statistical errors in reconstruction algorithms. Detailed error analysis is especially important for this problem because the motion between pairs of frames is small and slight perturbations in its estimates can lead to large errors in 3D reconstruction. We focus on the following issues: physical sources of errors, their experimental and theoretical analysis, robust estimation techniques and measures for characterizing the quality of the final reconstruction. We derive a precise relationship between the error in the reconstruction and the error in the image correspondences. The error analysis is used to design a robust, recursive multi-frame fusion algorithm using stochastic approximation as the framework since it is capable of dealing with incomplete information about errors in observations. Rate-distortion analysis is proposed for evaluating the quality of the final reconstruction as a function of the number of frames and the error in the image correspondences. Finally, to demonstrate the effectiveness of the algorithm, examples of depth reconstruction are shown for different video sequences.  相似文献   

提出了一种利用离散Hopfield网络求解图论极大独立集的启发式算法,并将其应用于RNA二级结构的茎区选择和预测当中.算法通过映射RNA序列的茎区为无向图中的节点,将预测RNA二级结构的问题转化为求解图的极大独立集的问题.定义了合理的能量变化函数,利用离散Hopfield网络进行迭代,以获得能量最优的预测结构.文中将算法与传统的最大匹配数算法以及最小自由能算法在运行时间上进行比较,并且选择特定的序列在茎区和碱基对水平上进行精度测试,结果证明该算法在效率和精度上具有一定的优势.算法的时间复杂性为max{O(n2),O(N2)},空间复杂度为O(N2),其中n为RNA序列长度,N为RNA的茎区段个数.  相似文献   

The explosive accumulation of protein sequences in the wake of large-scale sequencing projects is in shark contrast to the much slower experimental determination of protein structures. Neural Networks have been successfully applied into the prediction of protein structures, and the prediction accuracy continues to rise. This paper introduces the basic methods and technologies of the prediction of protein secondary structures using neural networks, especially expounds the two aspects., the improvement of neural network architecture and the adding of“evolutionary” information, which lead the ascent of prediction accuracy.  相似文献   

识别蛋白质二级结构对于蛋白质的特征和性质研究具有很重要的作用.用Cα原子三维空间坐标把蛋白质序列映射为距离矩阵,针对距离矩阵中隐含的纹理信息,用双树复小波变换对矩阵进行4级分解,提取不同方向的子带能量和标准偏差,得到48维特征向量来表示蛋白质的二级结构特征,再将提取的特征输入KNN和SVM分类器分类,通过实验验证,双树...  相似文献   

