首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose a new rebalancing method for binary search trees that allows rebalancing and updating to be uncoupled. In this way we obtain fast updates and, whenever the search tree is accessed by multiple users, a high degree of concurrency. The trees we use are obtained by relaxing the balance conditions ofred-black trees. The relaxed red-black trees, calledchromatic trees, contain information of possible imbalance such that the rebalancing can be done gradually as a shadow process, or it can be performed separately when no urgent operations are present.  相似文献   

2.
We develop new algorithms for learning monadic node selection queries in unranked trees from annotated examples, and apply them to visually interactive Web information extraction. We propose to represent monadic queries by bottom-up deterministic Node Selecting Tree Transducers (NSTTs), a particular class of tree automata that we introduce. We prove that deterministic NSTTs capture the class of queries definable in monadic second order logic (MSO) in trees, which Gottlob and Koch (2002) argue to have the right expressiveness for Web information extraction, and prove that monadic queries defined by NSTTs can be answered efficiently. We present a new polynomial time algorithm in RPNI-style that learns monadic queries defined by deterministic NSTTs from completely annotated examples, where all selected nodes are distinguished. In practice, users prefer to provide partial annotations. We propose to account for partial annotations by intelligent tree pruning heuristics. We introduce pruning NSTTs—a formalism that shares many advantages of NSTTs. This leads us to an interactive learning algorithm for monadic queries defined by pruning NSTTs, which satisfies a new formal active learning model in the style of Angluin (1987). We have implemented our interactive learning algorithm integrated it into a visually interactive Web information extraction system—called SQUIRREL—by plugging it into the Mozilla Web browser. Experiments on realistic Web documents confirm excellent quality with very few user interactions during wrapper induction. Editor: Georgios Paliouras and Yasubumi Sakakibara  相似文献   

3.
A standard approach to determining decision trees is to learn them from examples. A disadvantage of this approach is that once a decision tree is learned, it is difficult to modify it to suit different decision making situations. Such problems arise, for example, when an attribute assigned to some node cannot be measured, or there is a significant change in the costs of measuring attributes or in the frequency distribution of events from different decision classes. An attractive approach to resolving this problem is to learn and store knowledge in the form of decision rules, and to generate from them, whenever needed, a decision tree that is most suitable in a given situation. An additional advantage of such an approach is that it facilitates buildingcompact decision trees, which can be much simpler than the logically equivalent conventional decision trees (by compact trees are meant decision trees that may contain branches assigned aset of values, and nodes assignedderived attributes, i.e., attributes that are logical or mathematical functions of the original ones). The paper describes an efficient method, AQDT-1, that takes decision rules generated by an AQ-type learning system (AQ15 or AQ17), and builds from them a decision tree optimizing a given optimality criterion. The method can work in two modes: thestandard mode, which produces conventional decision trees, andcompact mode, which produces compact decision trees. The preliminary experiments with AQDT-1 have shown that the decision trees generated by it from decision rules (conventional and compact) have outperformed those generated from examples by the well-known C4.5 program both in terms of their simplicity and their predictive accuracy.  相似文献   

4.
A technique foremulating multicomputer interconnection networks that are based onseparable graphs (graphs having bounded degree and sublinear multicolor recursive bisectors) is presented. Efficient emulations among interconnection networks are necessary for porting programs designed for one network to another.Emulations are formalized asgraph embeddings, where the nodes (processors) of theguest graph (emulated network) are assigned to nodes of thehost graph (emulator), while the edges (communication links) of the guest are routed via paths in the host. The communication slowdown in an emulation depens on thedilation (length of the longest routing path) and thecongestion (number of paths that contend for a host edge) of the embedding. Theexpansion of the embedding (the ratio of the sizes of the host to guest) determines the inefficiency of processor utilization. Cell trees are introduced as interconnection networks whose special communication properties enable them to serve as intermediate devices in these emulations. Nodes in cell trees are organized into equinumerous parts calledcells; the cells are labeled by nodes of a complete binary tree. Communication in cell trees is restricted to two specific and distinct primitives:cell communication is confined within cells, whiletransfer communication occurs between adjacent cells. Rather than solved directly, the emulation problem for the original guest-host pair is decomposed into two independent parts: emulating the guest by the cell tree, and emulating the cell tree by the host.In emulations of separable graphs by cell trees, the node assignment that ensures small dilation is derived from the separator-based decomposition of guest graphs. The congestion-free edge routing is achieved by coordinatingglobal andlocal phases, which are based on two characteristic cell-tree communication primitives.The technique is instantiated by emulating cell trees on specific host graphs. Withshuffle-like hypercube-derivative networks as hosts new constant-expansion emulations are obtained that have both dilation and congestion logarithmic in the size of the multicolor bisector of guest graphs. These emulations are the first such to have optimal (up to constants)congestion; they provide the firstoptimal algorithm for emulating arbitrary separable graphs on shuffle-like networks. The application of the technique tohypercubes as hosts also produces optimal emulations that differ from those previously known by having smaller expansion constants.This research was supported in part by NSF Grants CCR-88-12567 and CCR-90-13184, and by the University of Massachusetts Graduate School Fellowship for the academic year 1991-92. A preliminary version of this paper was presented at the 3rd ACM Symposium on Parallel Algorithms and Architectures, July 22–24, 1991, in Hilton Head, South Carolina, USA.  相似文献   

5.
In this paper we present a new inductive inference algorithm for a class of logic programs, calledlinear monadic logic programs. It has several unique features not found in Shapiro’s Model Inference System. It has been proved that a set of trees isrational if and only if it is computed by a linear monadic logic program, and that the rational set of trees is recognized by a tree automaton. Based on these facts, we can reduce the problem of inductive inference of linear monadic logic programs to the problem of inductive inference of tree automata. Further several efficient inference algorithms for finite automata have been developed. We extend them to an inference algorithm for tree automata and use it to get an efficient inductive inference algorithm for linear monadic logic programs. The correctness, time complexity and several comparisons of our algorithm with Model Inference System are shown.  相似文献   

6.
We present approximation algorithms for the bandwidth minimization problem (BMP) for a large class of trees. The BMP is NP-hard, even for trees of maximum node degree 3. The problem finds applications in many areas, including VLSI layout, multiprocessor scheduling, and matrix processing, and has been studied for both graphs and matrices. We study the problem on trees having the following property: given any tree nodev, the depth difference of any two nonempty subtrees rooted atv is bounded by a constantk. We call such treesh(k)trees orgeneralized height-balanced (GHB)trees. The above definition extends the class of balanced trees to trees with depthd=Θ(\N\). For any tree in the above defined class, anO (logd) times optimal algorithm is presented. Furthermore, we extend the application of the algorithm to trees that simulate theh(k) property, which we callh(k)-like trees, and also provide intuitive ideas for an approximation algorithm for general trees. This work has been supported in part by the Computer Learning Research (CLEAR) Center at the University of Texas at Dallas.  相似文献   

7.
What is failure? An approach to constructive negation   总被引:3,自引:0,他引:3  
A standard approach to negation in logic programming is negation as failure. Its major drawback is that it cannot produce answer substitutions to negated queries. Approaches to overcoming this limitation are termed constructive negation. This work proposes an approach based on construction offailed trees for some instances of a negated query. For this purpose a generalization of the standard notion of a failed tree is needed. We show that a straightforward generalization leads to unsoundness and present a correct one.The method is applicable to arbitrary normal programs. If finitely failed trees are concerned then its semantics is given by Clark completion in 3-valued logic (and our approach is a proper extension of SLDNF-resolution). If infinite failed trees are allowed then we obtain a method for the well-founded semantics. In both cases soundness and completeness are proved.  相似文献   

8.
We propose a new rebalancing method for binary search trees that allows rebalancing and updating to be uncoupled. In this way we obtain fast updates and, whenever the search tree is accessed by multiple users, a high degree of concurrency. The trees we use are obtained by relaxing the balance conditions of red-black trees. The relaxed red-black trees, called chromatic trees, contain information of possible imbalance such that the rebalancing can be done gradually as a shadow process, or it can be performed separately when no urgent operations are present. Received December 5, 1991 / May 2, 1995  相似文献   

9.
Keyword search in XML documents has recently gained a lot of research attention. Given a keyword query, existing approaches first compute the lowest common ancestors (LCAs) or their variants of XML elements that contain the input keywords, and then identify the subtrees rooted at the LCAs as the answer. In this the paper we study how to use the rich structural relationships embedded in XML documents to facilitate the processing of keyword queries. We develop a novel method, called SAIL, to index such structural relationships for efficient XML keyword search. We propose the concept of minimal-cost trees to answer keyword queries and devise structure-aware indices to maintain the structural relationships for efficiently identifying the minimal-cost trees. For effectively and progressively identifying the top-k answers, we develop techniques using link-based relevance ranking and keyword-pair-based ranking. To reduce the index size, we incorporate a numbering scheme, namely schema-aware dewey code, into our structure-aware indices. Experimental results on real data sets show that our method outperforms state-of-the-art approaches significantly, in both answer quality and search efficiency.  相似文献   

10.
Summary In the following paper we are presenting a new algorithm for the on-line construction of position trees. Reading a given input string from left to right we are generating its position tree with the aid of the general concept of infix trees. An additional chain structure within the trees, called tail node connection, enables us to construct the tree within the best possible time (proportional to the number of nodes).  相似文献   

11.
Model trees were conceived as a structure-sharing approach to represent information in disjunctive deductive databases. In this paper we introduce the concept ofordered minimal model trees as a normal form for disjunctive deductive databases. These are model trees in which an order is imposed on the elements of the Herbrand base. The properties of ordered minimal model trees are investigated as well as their possible utilization for efficient manipulation of disjunctive deductive databases. Algorithms are presented for constructing and performing operations on ordered model trees. The complexity of ordered model tree processing is addressed. Model forests are presented as an approach to reduce the complexity of ordered model tree construction and processing.This research was supported by the National Science Foundation, under the grant Nr. IRI-89-16059, the Air Force Office of Scientific Research, under the grant Nr. AFOSR-91-0350, and the Fulbright Scholar Program.This work was done while visiting at the University of Maryland Institute for Advanced Computer Studies.  相似文献   

12.
This paper focuses on space efficient representations of rooted trees that permit basic navigation in constant time. While most of the previous work has focused on binary trees, we turn our attention to trees of higher degree. We consider both cardinal trees (or k-ary tries), where each node has k slots, labelled {1,...,k}, each of which may have a reference to a child, and ordinal trees, where the children of each node are simply ordered. Our representations use a number of bits close to the information theoretic lower bound and support operations in constant time. For ordinal trees we support the operations of finding the degree, parent, ith child, and subtree size. For cardinal trees the structure also supports finding the child labelled i of a given node apart from the ordinal tree operations. These representations also provide a mapping from the n nodes of the tree onto the integers {1, ..., n}, giving unique labels to the nodes of the tree. This labelling can be used to store satellite information with the nodes efficiently.  相似文献   

13.
The Matrix Framework is a recent proposal by Information Retrieval (IR) researchers to flexibly represent information retrieval models and concepts in a single multi-dimensional array framework. We provide computational support for exactly this framework with the array database system SRAM (Sparse Relational Array Mapping), that works on top of a DBMS. Information retrieval models can be specified in its comprehension-based array query language, in a way that directly corresponds to the underlying mathematical formulas. SRAM efficiently stores sparse arrays in (compressed) relational tables and translates and optimizes array queries into relational queries. In this work, we describe a number of array query optimization rules. To demonstrate their effect on text retrieval, we apply them in the TREC TeraByte track (TREC-TB) efficiency task, using the Okapi BM25 model as our example. It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. The use of the high-performance MonetDB/X100 relational backend, that provides transparent database compression, allows the system to achieve very fast response times with good precision and low resource usage.  相似文献   

14.
Partial deduction strategies for logic programs often use an abstraction operator to guarantee the finiteness of the set of goals for which partial deductions are produced. Finding an abstraction operator which guarantees finiteness and does not lose relevant information is a difficult problem. In earlier work Gallagher and Bruynooghe proposed to base the abstraction operator oncharacteristic paths andtrees, which capture the structure of the generated incomplete SLDNF-tree for a given goal. In this paper we exhibit the advantages of characteristic trees over purely syntactical measures: if characteristic trees can be preserved upon generalisation, then we obtain an almost perfect abstraction operator, providing just enough polyvariance to avoid any loss of local specialisation. Unfortunately, the abstraction operators proposed in earlier work do not always preserve the characteristic trees upon generalisation. We show that this can lead to important specialisation losses as well as to non-termination of the partial deduction algorithm. Furthermore, this problem cannot be adequately solved in the ordinary partial deduction setting. We therefore extend the expressivity and precision of the Lloyd and Shepherdson partial deduction framework by integrating constraints. We provide formal correctness results for the so obtained generic framework ofconstrained partial deduction. Within this new framework we are, among others, able to overcome the above mentioned problems by introducing an alternative abstraction operator, based on so calledpruning constraints. We thus present a terminating partial deduction strategy which, for purely determinate unfolding rules, induces no loss of local specialisation due to the abstraction while ensuring correctness of the specialised programs. Michael Leuschel, Ph.D.: He currently works as a postdoctoral researcher at the Department of Computer Science of the Katholieke Universiteit Leuven. His present research focuses on program transformation and specialisation for declarative programming languages. Other research interests include abstract interpretation, optimised integrity checking and meta-programming. He received his degree (“Licence”) in Computer Science from the Université Libre de Bruxelles in 1990 and a Master of Artificial Intelligence from the Katholieke Universiteit Leuven in 1993, where he also received his Ph.D in 1997. Danny De Schreye, Ph.D: He is a professor at the Department of Computer Science of the Katholieke Universiteit Leuven and a senior research associate of the Belgian National Fund for Scientific Research. He obtained his Ph.D from K.U. Leuven in 1983, on the topic of operator algebras. His research interests are in the field of Logic Programming, and include program transformation and termination, knowledge representation and reasoning, and constraint programming.  相似文献   

15.
In this paper, we study the problem of keyword proximity search in XML documents. We take the disjunctive semantics among the keywords into consideration and find top-k relevant compact connected trees (CCTrees) as the answers of keyword proximity queries. We first introduce the notions of compact lowest common ancestor (CLCA) and maximal CLCA (MCLCA), and then propose compact connected trees and maximal CCTrees (MCCTrees) to efficiently and effectively answer keyword proximity queries. We give the theoretical upper bounds of the numbers of CLCAs, MCLCAs, CCTrees and MCCTrees, respectively. We devise an efficient algorithm to generate all MCCTrees, and propose a ranking mechanism to rank MCCTrees. Our extensive experimental study shows that our method achieves both high efficiency and effectiveness, and outperforms existing state-of-the-art approaches significantly.  相似文献   

16.
Functional Trees   总被引:1,自引:0,他引:1  
In the context of classification problems, algorithms that generate multivariate trees are able to explore multiple representation languages by using decision tests based on a combination of attributes. In the regression setting, model trees algorithms explore multiple representation languages but using linear models at leaf nodes. In this work we study the effects of using combinations of attributes at decision nodes, leaf nodes, or both nodes and leaves in regression and classification tree learning. In order to study the use of functional nodes at different places and for different types of modeling, we introduce a simple unifying framework for multivariate tree learning. This framework combines a univariate decision tree with a linear function by means of constructive induction. Decision trees derived from the framework are able to use decision nodes with multivariate tests, and leaf nodes that make predictions using linear functions. Multivariate decision nodes are built when growing the tree, while functional leaves are built when pruning the tree. We experimentally evaluate a univariate tree, a multivariate tree using linear combinations at inner and leaf nodes, and two simplified versions restricting linear combinations to inner nodes and leaves. The experimental evaluation shows that all functional trees variants exhibit similar performance, with advantages in different datasets. In this study there is a marginal advantage of the full model. These results lead us to study the role of functional leaves and nodes. We use the bias-variance decomposition of the error, cluster analysis, and learning curves as tools for analysis. We observe that in the datasets under study and for classification and regression, the use of multivariate decision nodes has more impact in the bias component of the error, while the use of multivariate decision leaves has more impact in the variance component.  相似文献   

17.
Compressed representations have become effective to store and access large Web and social graphs, in order to support various graph querying and mining tasks. The existing representations exploit various typical patterns in those networks and provide basic navigation support. In this paper, we obtain unprecedented results by finding “dense subgraph” patterns and combining them with techniques such as node orderings and compact data structures. On those representations, we support out-neighbor and out/in-neighbor queries, as well as mining queries based on the dense subgraphs. First, we propose a compression scheme for Web graphs that reduces edges by representing dense subgraphs with “virtual nodes”; over this scheme, we apply node orderings and other compression techniques. With this approach, we match the best current compression ratios that support out-neighbor queries (i.e., nodes pointed from a given node), using 1.0–1.8 bits per edge (bpe) on large Web graphs, and retrieving each neighbor of a node in 0.6–1.0 microseconds ( \(\upmu \) s). When supporting both out- and in-neighbor queries, instead, our technique generally offers the best time when using little space. If the reduced graph, instead, is represented with a compact data structure that supports bidirectional navigation, we obtain the most compact Web graph representations (0.9–1.5 bpe) that support out/in-neighbor navigation; yet, the time per neighbor extracted raises to around 5–20  \(\upmu \) s. We also propose a compact data structure that represents dense subgraphs without using virtual nodes. It allows us to recover out/in-neighbors and answer other more complex queries on the dense subgraphs identified. This structure is not competitive on Web graphs, but on social networks, it achieves 4–13 bpe and 8–12  \(\upmu \) s per out/in-neighbor retrieved, which improves upon all existing representations.  相似文献   

18.
We present approximation algorithms for the bandwidth minimization problem (BMP) for a large class of trees. The BMP is NP-hard, even for trees of maximum node degree 3. The problem finds applications in many areas, including VLSI layout, multiprocessor scheduling, and matrix processing, and has been studied for both graphs and matrices. We study the problem on trees having the following property: given any tree nodev, the depth difference of any two nonempty subtrees rooted atv is bounded by a constantk. We call such treesh(k)trees orgeneralized height-balanced (GHB)trees. The above definition extends the class of balanced trees to trees with depthd=Θ(\N\). For any tree in the above defined class, anO (logd) times optimal algorithm is presented. Furthermore, we extend the application of the algorithm to trees that simulate theh(k) property, which we callh(k)-like trees, and also provide intuitive ideas for an approximation algorithm for general trees.  相似文献   

19.
Many classification tasks can be viewed as ordinal. Use of numeric information usually provides possibilities for more powerful analysis than ordinal data. On the other hand, ordinal data allows more powerful analysis when compared to nominal data. It is therefore important not to overlook knowledge about ordinal dependencies in data sets used in data mining. This paper investigates data mining support available from ordinal data. The effect of considering ordinal dependencies in the data set on the overall results of constructing decision trees and induction rules is illustrated. The degree of improved prediction of ordinal over nominal data is demonstrated. When data was very representative and consistent, use of ordinal information reduced the number of final rules with a lower error rate. Data treatment alternatives are presented to deal with data sets having greater imperfections.  相似文献   

20.
Huang  Jinjing  Chen  Wei  Liu  An  Wang  Weiqing  Yin  Hongzhi  Zhao  Lei 《World Wide Web》2020,23(2):755-779

A temporal knowledge graph (TKG) is theoretically a temporal graph. Recently, systems have been developed to support snapshot queries over temporal graphs. However, snapshot queries can only give separate answers. To retrieve forward-backward correlation facts from temporal knowledge graph, cluster query is proposed in this paper. To deal with the query, the logical view and physical model are presented. Subsequently, five corresponding basic query patters of unit matching are studied, and then the complete matchings are also addressed. To improve the query performance, index-based methods and pruning strategies are adopted. Experiments are conducted to evaluate cluster queries on three real datasets. The experimental results show the effectiveness and efficiency of cluster queries on temporal knowledge graphs.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号