首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
In this article we consider deterministic and strongly deterministic top-down tree transducers with regular look-ahead, with regular check, with deterministic top-down look-ahead, and with deterministic top-down check. We compare the transformational power of these tree transducer classes by giving a correct inclusion diagram of the tree transformation classes induced by them. Along with the comparison we decompose some of the examined classes into simpler classes and we introduce the concept of the deterministic top-down tree automata with deterministic top-down look-ahead. We show that these recognizers recognize a tree language class which is strictly between the class of regular tree languages and the class of tree languages recognizable by deterministic top-down tree automata. We also study the closure properties of the examined tree transformation classes. We show that some classes are closed under composition while others, for example the class of tree transformations induced by deterministic top-down tree transducers with deterministic top-down look-ahead, are not.  相似文献   

2.
To experimentally validate learning and approximation algorithms for XML Schema Definitions (XSDs), we need algorithms to generate uniformly at random a corpus of XSDs as well as a similarity measure to compare how close the generated XSD resembles the target schema. In this paper, we provide the formal foundation for such a testbed. We adopt similarity measures based on counting the number of common and different trees in the two languages, and we develop the necessary machinery for computing them. We use the formalism of extended DTDs (EDTDs) to represent the unranked regular tree languages. In particular, we obtain an efficient algorithm to count the number of trees up to a certain size in an unambiguous EDTD. The latter class of unambiguous EDTDs encompasses the more familiar classes of single-type, restrained competition and bottom-up deterministic EDTDs. The single-type EDTDs correspond precisely to the core of XML Schema, while the others are strictly more expressive. We also show how constraints on the shape of allowed trees can be incorporated. As we make use of a translation into a well-known formalism for combinatorial specifications, we get for free a sampling procedure to draw members of any unambiguous EDTD. When dropping the restriction to unambiguous EDTDs, i.e. taking the full class of EDTDs into account, we show that the counting problem becomes #P-complete and provide an approximation algorithm. Finally, we discuss uniform generation of single-type EDTDs, i.e., the formal abstraction of XSDs. To this end, we provide an algorithm to generate k-occurrence automata (k-OAs) uniformly at random and show how this leads to the uniform generation of single-type EDTDs.  相似文献   

3.
The theory of formal string languages and of formal tree languages are both important parts of the theory of formal languages. Regular tree languages are recognized by finite tree automata. Trees in their postfix notation can be seen as strings. This paper presents a simple transformation from any given (bottom-up) finite tree automaton recognizing a regular tree language to a deterministic pushdown automaton accepting the same tree language in postfix notation. The resulting deterministic pushdown automaton can be implemented easily by an existing parser generator because it is constructed for an LR(0) grammar, and its size directly corresponds to the size of the deterministic finite tree automaton. The class of regular tree languages in postfix notation is a proper subclass of deterministic context-free string languages. Moreover, the class of tree languages which are in their postfix notation deterministic context-free string languages is a proper superclass of the class of regular tree languages.  相似文献   

4.
We define topdown pushdown tree automata (PDTA's) which extend the usual string pushdown automata by allowing trees instead of strings in both the input and the stack. We prove that PDTA's recognize the class of context-free tree languages. (Quasi)realtime and deterministic PDTA's accept the classes of Greibach and deterministic tree languages, respectively. Finally, PDTA's are shown to be equivalent to restricted PDTA's, whose stack is linear: this both yields a more operational way of recognizing context-free tree languages and connects them with the class of indexed languages.  相似文献   

5.
We improve on an existing [P.A. Abdulla, J. Högberg, L. Kaati, Bisimulation minimization of tree automata, International Journal of Foundations of Computer Science 18(4) (2007) 699–713] bisimulation minimization algorithm for finite-state tree automata by introducing backward and forward bisimulation and developing minimization algorithms for them. Minimization via forward bisimulation is also effective on deterministic tree automata, faster than the previous algorithm, and yields the minimal equivalent deterministic tree automaton. Minimization via backward bisimulation generalizes the previous algorithm and can yield smaller automata but is just as fast. We demonstrate implementations of these algorithms on a typical task in natural language processing.  相似文献   

6.
We analyze the properties of probabilistic reversible decide-and-halt automata (DH-PRA) and show that there is a strong relationship between DH-PRA and 1-way quantum automata. We show that a general class of regular languages is not recognizable by DH-PRA by proving that two “forbidden” constructions in minimal deterministic automata correspond to languages not recognizable by DH-PRA. The shown class is identical to a class known to be not recognizable by 1-way quantum automata. We also prove that the class of languages recognizable by DH-PRA is not closed under union and other non-trivial Boolean operations.  相似文献   

7.
We define a weighted monadic second order logic for unranked trees and the concept of weighted unranked tree automata, and we investigate the expressive power of these two concepts. We show that weighted tree automata and a syntactically restricted weighted MSO-logic have the same expressive power in case the semiring is commutative or in case we deal only with ranked trees, but, surprisingly, not in general. This demonstrates a crucial difference between the theories of ranked trees and unranked trees in the weighted case.  相似文献   

8.
We show that the absolute worst case time complexity for Hopcroft’s minimization algorithm applied to unary languages is reached only for deterministic automata or cover automata following the structure of the de Bruijn words. A previous paper by Berstel and Carton gave the example of de Bruijn words as a language that requires O(nlogn) steps in the case of deterministic automata by carefully choosing the splitting sets and processing these sets in a FIFO mode for the list of the splitting sets in the algorithm. We refine the previous result by showing that the Berstel/Carton example is actually the absolute worst case time complexity in the case of unary languages for deterministic automata. We show that the same result is valid also for the case of cover automata and an algorithm based on the Hopcroft’s method used for minimization of cover automata. We also show that a LIFO implementation for the splitting list will not achieve the same absolute worst time complexity for the case of unary languages both in the case of regular deterministic finite automata or in the case of the deterministic finite cover automata as defined by S. Yu.  相似文献   

9.
We develop new algorithms for learning monadic node selection queries in unranked trees from annotated examples, and apply them to visually interactive Web information extraction. We propose to represent monadic queries by bottom-up deterministic Node Selecting Tree Transducers (NSTTs), a particular class of tree automata that we introduce. We prove that deterministic NSTTs capture the class of queries definable in monadic second order logic (MSO) in trees, which Gottlob and Koch (2002) argue to have the right expressiveness for Web information extraction, and prove that monadic queries defined by NSTTs can be answered efficiently. We present a new polynomial time algorithm in RPNI-style that learns monadic queries defined by deterministic NSTTs from completely annotated examples, where all selected nodes are distinguished. In practice, users prefer to provide partial annotations. We propose to account for partial annotations by intelligent tree pruning heuristics. We introduce pruning NSTTs—a formalism that shares many advantages of NSTTs. This leads us to an interactive learning algorithm for monadic queries defined by pruning NSTTs, which satisfies a new formal active learning model in the style of Angluin (1987). We have implemented our interactive learning algorithm integrated it into a visually interactive Web information extraction system—called SQUIRREL—by plugging it into the Mozilla Web browser. Experiments on realistic Web documents confirm excellent quality with very few user interactions during wrapper induction. Editor: Georgios Paliouras and Yasubumi Sakakibara  相似文献   

10.
Spinal-Formed Context-Free Tree Grammars   总被引:1,自引:0,他引:1  
In this paper we introduce a restricted model of context-free tree grammars called spine grammars, and study their formal properties including considerably simple normal forms. Recent research on natural languages has suggested that formalisms for natural languages need to generate a slightly larger class of languages than context-free grammars, and for that reason tree adjoining grammars have been widely studied relating them to natural languages. It is shown that the class of string languages generated by spine grammars coincides with that of tree adjoining grammars. We also introduce acceptors called linear pushdown tree automata, and show that linear pushdown tree automata accept exactly the class of tree languages generated by spine grammars. Linear pushdown tree automata are obtained from pushdown tree automata with a restriction on duplicability for the pushdown stacks. Received May 29, 1998, and in revised form April 27, 1999, and in final form May 10, 1999.  相似文献   

11.
主要研究确定型模糊多重集有限自动机的状态极小化问题。给出了模糊多重集有限自动机的同余和同态概念,并利用同余和同态关系研究了确定型模糊多重集有限自动机的极小化问题。进一步从确定型模糊多重集有限自动机自身出发,构造出极小模糊多重集有限自动机,并给出了极小化的算法。  相似文献   

12.
采用树自动机推理技术的信息抽取方法   总被引:1,自引:1,他引:0       下载免费PDF全文
提出了一种利用改进的k-contextual树自动机推理算法的信息抽取技术。其核心思想是将结构化(半结构化)文档转换成树,然后利用一种改进的k-contextual树(KLH树)来构造出能够接受样本的无秩树自动机,依据该自动机接收和拒绝状态来确定是否抽取网页信息。该方法充分利用了网页文档的树状结构,依托树自动机将传统的以单一结构途径的信息抽取方法与文法推理原则相结合,得到信息抽取规则。实验证明,该方法与同类抽取方法相比,样本学习时间以及抽取所需时间上均有所缩短。  相似文献   

13.
This paper presents a study on lookahead hierarchies for restarting automata with auxiliary symbols. We show that the class of languages for deterministic monotone or monotone restarting automaton, whose restart step and rewrite step are separated, coincides with that of the same type of restarting automaton whose restart and rewrite steps are not separated, for any fixed lookahead size. For the non-monotone deterministic case, the lookahead length must be approximately doubled. We then turn our attention to restarting automata with small lookahead. For the general restarting automaton model, we show that there are just two different classes of languages recognized, through the restriction of lookahead size: those with lookahead size 1 and those with lookahead size 2. We also show that the respective (left-) monotone restarting automaton models characterize the context-free languages and that the respective right–left-monotone restarting automata characterize the linear languages both with just lookahead length 2.  相似文献   

14.
We consider symbolic tree automata (sta) and symbolic regular tree grammars and their corresponding classes of tree languages: s-recognizable tree languages and s-regular tree languages. We prove that the following three classes are equal: the class of s-recognizable tree languages, the class of s-regular tree languages, and the class of images of classical recognizable tree languages under tree relabelings. Moreover, the sta and the recently introduced variable tree automata are incomparable with respect to their recognition power. Also, we consider symbolic tree transducers (stt) and prove the following theorems. The syntactic composition of two stt computes the composition of the tree transformations computed by each stt, provided that (1) the first one is deterministic or the second one is linear and (2) the first one is total or the second one is nondeleting. Backward application of an stt to any s-recognizable tree language yields an s-recognizable tree language. There is a linear stt of which the range is not an s-recognizable tree language. Forward application of simple and linear stt preserves s-recognizability. A restricted version of both the type checking problem of simple and linear stt, and the inverse type checking problem of arbitrary stt is decidable. Since we deal with trees over infinite alphabets, we require appropriate conditions on sta and stt such that all the proofs are constructive.  相似文献   

15.
Summary We discuss the technique for testing the equivalence of two deterministic automata by constructing a language that matches the computations of two equivalent automata on the same input word. Specifically, we propose to use HDTOL languages that are powerful enough to match computations of many equivalent deterministic multitape automata, and at the same time, have nice decidable properties. Using this new technique of HDTOL matching, we show that the inclusion problem between an arbitrary deterministic multitape automaton and a simple one is decidable in both directions. Further, we show that the equivalence for a restricted class of transducers based on deterministic multitape automata is decidable.Reported research was supported by the National Sciences Foundation under Grant No. CCR-8702752 and by the Natural Sciences and Engineering Research Council of Canada under Grant No. A-7403  相似文献   

16.
XML tree structures can conveniently be represented using ordered unranked trees. Due to the repetitiveness of XML markup these trees can be compressed effectively using dictionary-based methods, such as minimal directed acyclic graphs (DAGs) or straight-line context-free (SLCF) tree grammars. While minimal SLCF tree grammars are in general smaller than minimal DAGs, they cannot be computed in polynomial time unless P=NPP=NP. Here, we present a new linear time algorithm for computing small SLCF tree grammars, called TreeRePair, and show that it greatly outperforms the best known previous algorithm BPLEX. TreeRePair is a generalization to trees of Larsson and Moffat's RePair string compression algorithm. SLCF tree grammars can be used as efficient memory representations of trees. Using TreeRePair, we are able to produce the smallest queryable memory representation of ordered trees that we are aware of. Our investigations over a large corpus of commonly used XML documents show that tree traversals over TreeRePair grammars are 14 times slower than over pointer structures and 5 times slower than over succinct trees, while memory consumption is only 1/43 and 1/6, respectively. With respect to file compression we are able to show that a Huffman-based coding of TreeRePair grammars gives compression ratios comparable to the best known XML file compressors.  相似文献   

17.
We study a special class of finite automata called reversible and defined by the following property: there exists a state such that, starting from it, both the automation and its (deterministic) reversal are strongly connected.The main tool to handle these automata is the ergodic representation which is mainly the Schützenberger matrix representation of the semigroup of the automaton over its minimal ideal. This representation allows the construction of a class of reversible automata: the so called homogeneous ones. This class is precisely the one obtained by considering a class of finite languages that are both prefix and suffix. We obtain here, by means of their automaton, a construction of these languages.  相似文献   

18.
In this note, we reduce the deterministic finite-state automata intersection problem to the problem of deciding co-observability for regular languages using a polynomial-time many-one mapping. This demonstrates that the problem of deciding co-observability for languages marked by deterministic finite-state automata is PSPACE-complete. We use a similar reduction to reduce the deterministic finite-state automata intersection problem to deciding other versions of co-observability introduced in a previous paper. These results imply that the co-observability of regular languages most likely cannot be decided in polynomial time unless we make further restrictions on the languages. These results also show that deciding decentralized supervisor existence is PSPACE-complete and therefore probably intractable.  相似文献   

19.
Caterpillar expressions have been introduced by Brüggemann-Klein and Wood for applications in markup languages. Caterpillar expressions provide a convenient formalism for specifying the operation of tree-walking automata on unranked trees. Here we give a formal definition of determinism of caterpillar expressions that is based on the language of instruction sequences defined by the expression. We show that determinism of caterpillar expressions can be decided in polynomial time.  相似文献   

20.
Tree pattern query minimization   总被引:2,自引:0,他引:2  
Tree patterns form a natural basis to query tree-structured data such as XML and LDAP. To improve the efficiency of tree pattern matching, it is essential to quickly identify and eliminate redundant nodes in the pattern. In this paper, we study tree pattern minimization both in the absence and in the presence of integrity constraints (ICs) on the underlying tree-structured database. In the absence of ICs, we develop a polynomial-time query minimization algorithm called CIM, whose efficiency stems from two key properties: (i) a node cannot be redundant unless its children are; and (ii) the order of elimination of redundant nodes is immaterial. When ICs are considered for minimization, we develop a technique for query minimization based on three fundamental operations: augmentation (an adaptation of the well-known chase procedure), minimization (based on homomorphism techniques), and reduction. We show the surprising result that the algorithm, referred to as ACIM, obtained by first augmenting the tree pattern using ICs, and then applying CIM, always finds the unique minimal equivalent query. While ACIM is polynomial time, it can be expensive in practice because of its inherent non-locality. We then present a fast algorithm, CDM, that identifies and eliminates local redundancies due to ICs, based on propagating "information labels" up the tree pattern. CDM can be applied prior to ACIM for improving the minimization efficiency. We complement our analytical results with an experimental study that shows the effectiveness of our tree pattern minimization techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号