首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Stochastic context-free grammars (SCFGs) have been applied to predicting RNA secondary structure. The prediction of RNA secondary structure can be facilitated by incorporating with comparative sequence analysis. However, most of existing SCFG-based methods lack explicit phylogenic analysis of homologous RNA sequences, which is probably the reason why these methods are not ideal in practical application. Hence, we present a new SCFG-based method by integrating phylogenic analysis with the newly defined profile SCFG. The method can be summarized as: 1) we define a new profile SCFG, M, to depict consensus secondary structure of multiple RNA sequence alignment; 2) we introduce two distinct hidden Markov models, λ and λ', to perform phylogenic analysis of homologous RNA sequences. Here, λ' is for non-structural regions of the sequence and λ' is for structural regions of the sequence; 3) we merge λ and λ' into M to devise a combined model for prediction of RNA secondary structure. We tested our method on data sets constructed from the Rfam database. The sensitivity and specificity of our method are more accurate than those of the predictions by Pfold.  相似文献   

2.
The equivalence of leaf languages of tree adjoining grammars and monadic linear context-free grammars was shown about a decade ago. This paper presents a proof of the strong equivalence of these grammar formalisms. Non-strict tree adjoining grammars and monadic linear context-free grammars define the same class of tree languages. We also present a logical characterisation of this tree language class showing that a tree language is a member of this class iff it is the two-dimensional yield of an MSO-definable three-dimensional tree language.  相似文献   

3.
Spinal-Formed Context-Free Tree Grammars   总被引:1,自引:0,他引:1  
In this paper we introduce a restricted model of context-free tree grammars called spine grammars, and study their formal properties including considerably simple normal forms. Recent research on natural languages has suggested that formalisms for natural languages need to generate a slightly larger class of languages than context-free grammars, and for that reason tree adjoining grammars have been widely studied relating them to natural languages. It is shown that the class of string languages generated by spine grammars coincides with that of tree adjoining grammars. We also introduce acceptors called linear pushdown tree automata, and show that linear pushdown tree automata accept exactly the class of tree languages generated by spine grammars. Linear pushdown tree automata are obtained from pushdown tree automata with a restriction on duplicability for the pushdown stacks. Received May 29, 1998, and in revised form April 27, 1999, and in final form May 10, 1999.  相似文献   

4.
5.
识别蛋白质二级结构对于蛋白质的特征和性质研究具有很重要的作用.用Cα原子三维空间坐标把蛋白质序列映射为距离矩阵,针对距离矩阵中隐含的纹理信息,用双树复小波变换对矩阵进行4级分解,提取不同方向的子带能量和标准偏差,得到48维特征向量来表示蛋白质的二级结构特征,再将提取的特征输入KNN和SVM分类器分类,通过实验验证,双树...  相似文献   

6.
文章对隐式半马科夫链模型(HSMMs)[1]进行了研究,并提出利用残基片段的疏水相互作用概率函数。此参数模型可以认为是片段简明网络的扩展,它通过获取残基序列的相互依赖关系来建立。使用该模型可克服传统预测方法在无同源家族蛋白时的预测困难,对靶蛋白质的同源性要求不高。  相似文献   

7.
文章对隐式半马科夫链模型(HSMMs)[1]进行了研究,并提出利用残基片段的疏水相互作用概率函数。此参数模型可以认为是片段简明网络的扩展,它通过获取残基序列的相互依赖关系来建立。使用该模型可克服传统预测方法在无同源家族蛋白时的预测困难,对靶蛋白质的同源性要求不高。  相似文献   

8.
信息提取就是从大量的数据中检索出有用的信息,但一般的Web信息提取技术都是基于对Web上HTML文档的分析.文中提出了一种先将HTML转化为XML形式,再提取信息的方法.XML是用于描述在Intemet网上用于数据交换的数据文档的格式的一种语言标准,它将结构、内容和表现分离.数据可被XML唯一标识,从而有利于用户对数据的组织和检索.这种方法能够达到较高的正确率,同时随着文档的增大,方法也能够保证线性的时间复杂度.  相似文献   

9.
A new model and corresponding dynamic programming algorithm are presented to predict RNA secondary structure including pseudoknots in this paper. This algorithm can compute arbitrary planar pseudoknots and one non-planar pseudoknots in O(n5) time and O(n4) space.  相似文献   

10.
一种基于树结构的Web数据自动抽取方法   总被引:10,自引:2,他引:8  
介绍了一种基于树结构的自动从HTML页面中抽取数据的方法.在HTML页面的树形结构之上,提出了基于语义块的HTML页面结构模型:HTML页面中的数据值主要存在于语义块中,不同的HTML页面的主要区别在于语义块的区别.基于语义块的结构模型,自动抽取通过4个步骤完成:通过HTML页面比较发现语义块;区分语义块中数据值的角色;推导数据模式和推导抽取规则.在实际HTML页面上的实验已经证明,这种方法能够达到较高的正确率,同时,随着文档的增大,方法也能够保证线性的时间复杂度.  相似文献   

11.
This paper describes a parsing algorithm for Tree Adjoining Grammar (TAG) and its parallel implementation on the Connection Machine. TAG is a formalism for natural language that employs trees as the basic grammar structures. Parsing involves the application of two operations, called adjunction and substitution, to produce derived tree structures. Sequential parsing algorithms for TAGs run in time quadratic in the grammar size, which is impractical for the very large grammars currently being developed for natural language. This paper presents two parallel algorithms, one running in time nearly linear in the grammar size, and the other running in time logarithmic in the grammar size. Both parallel algorithms were implemented on a Connection Machine CM-2 and performance measurements were obtained for varying grammar sizes.This research was supported in part by NSF Grant BNS-9022010, by the ARO Center for Excellence in Artificial Intelligence, University of Pennsylvania, and by the Army High Performance Computing Research Center (AHPCRC), University of Minnesota.  相似文献   

12.
利用混沌差分进化算法预测RNA二级结构   总被引:1,自引:0,他引:1  
胡桂武  彭宏 《计算机科学》2007,34(9):163-166
RNA二级结构预测在生物信息学中具有重要意义。本文针对RNA二级结构预测,提出了一种混沌差分进化算法。算法对种群进行混沌初始化,利用混沌扰动产生新的个体,缩小搜索空间;根据个体的适应值和种群密度自适应地对个体进行混沌更新,改善了种群的多样性。该算法充分利用了差分进化算法速度快以及混沌的遍历性、随机性和规律性等特点,有效克服了早熟现象,提高了算法的全局搜索能力。实验证明了算法的有效性。  相似文献   

13.
The explosive accumulation of protein sequences in the wake of large-scale sequencing projects is in shark contrast to the much slower experimental determination of protein structures. Neural Networks have been successfully applied into the prediction of protein structures, and the prediction accuracy continues to rise. This paper introduces the basic methods and technologies of the prediction of protein secondary structures using neural networks, especially expounds the two aspects., the improvement of neural network architecture and the adding of“evolutionary” information, which lead the ascent of prediction accuracy.  相似文献   

14.
格值树自动机与格值上下文无关树文法的等价性   总被引:1,自引:0,他引:1  
本文将模糊树自动机和模糊上下文无关树文法的概念推广到格半群上。证明了在接受语言和生成语言的意义下,树自动机和上下文无关树文法是等价的。同时给出了构造正规形式的等价文法的方法。  相似文献   

15.
特征向量的构造是蛋白质二级结构预测的一个关键问题. 现有的研究方法,通常只使用BLOSUM62进化矩阵生成PSSM矩阵,对蛋白质进化过程中存在的氨基酸残基突变现象缺乏考虑. 本文提出利用多重进化矩阵构造蛋白质特征向量,其融合了不同进化时间的PSSM矩阵,不仅能够很好地反映序列中氨基酸的位置信息,而且能够反映序列进化过程中氨基酸位点发生突变产生的影响. 本文通过组合不同进化程度的矩阵来构造特征向量,选用逻辑回归、随机森林和多分类支持向量机三种分类算法作为预测工具,利用网格搜索法和交叉实验法优化参数,在RS126、CB513和25PDB公用数据集上进行了若干组实验. 对比实验结果表明,本文所提出基于多重进化矩阵的蛋白质特征向量构造方法能够有效提高蛋白质二级结构的预测精度.  相似文献   

16.
南雨宏  陈绮 《微机发展》2011,(10):168-170,175
提出一种易于修改的蛋白质二级结构预测算法。以蛋白质数据银行中PDB文本数据作为数据源,提取所有蛋白质氨基酸序列并以此建立样本数据库,然后针对α-螺旋、β-折叠分别利用基于散列辞典的不同改进方法编程实现蛋白质二级结构序列片段预测,在预测过程中,随机抽取68421个蛋白质中部分样本作为测试集,对未知序列根据建立的散列辞典中的片段使用正向最大匹配分词法进行切分对比。从实验结果来看,对未知序列片段预测的准确度达到了83.9%,而且能够较好地体现片段之间的连接顺序。  相似文献   

17.
预测蛋白质二级结构,是当今生物信息学中一个难以解决的问题.由于预测蛋白质二级结构的精度在蛋白质结构研究中起到非常重要的作用,因此在基于KDTICM理论基础上,提出一种基于混合SVM方法的蛋白质二级结构预测算法.该算法有效地利用蛋白质的物化属性和PSI-SEARCH生成的位置特异性打分矩阵作为双层SVM的输入,从而大大地提高了蛋白质二级结构预测的精度.实验比较分析表明,新算法的预测精度和普适性明显优于目前其他典型的预测方法.  相似文献   

18.
基于级联神经网络的蛋白质二级结构预测   总被引:3,自引:1,他引:3       下载免费PDF全文
为提高蛋白质二级结构预测的精度,提出一种由两层网络构成的级联神经网络模型。第1层网络采用具有差异度的5个子网构成的网络模型,对第2层网络的输入编码进行改进。对PDBSelect25中的36条蛋白质共6 122个残基进行测试,结果表明,该模型能有效预测蛋白质二级结构,其预测精度分别比SNN, DSC, PREDSATOR方法提高5.31%, 1.21%和0.92%,平均预测精度提高到69.61%。  相似文献   

19.
预测蛋白质二级结构,是当今生物信息学中一个难以解决的问题。由于预测蛋白质二级结构的精度在蛋白 质结构研究中起到非常重要的作用,因此在基于KDTICM理论基础上,提出一种基于混合SVM方法的蛋白质二级 结构预测算法。该算法有效地利用蛋白质的物化属性和PSI-SEARCH生成的位置特异性打分矩阵作为双层SVM的 输入,从而大大地提高了蛋白质二级结构预测的精度。实验比较分析表明,新算法的预测精度和普适性明显优于目前 其他典型的预测方法。  相似文献   

20.
Programming and Computer Software - Graph data models are widely employed in different areas of computer science, e.g., graph databases, bioinformatics, social network analysis, and static code...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号