首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Database applications often require a sophisticated class of storage structures in order to answer different types of queries efficiently. This often dictates that the file should be organized on multiple keys. Several storage structures have been proposed to satisfy these needs. Most are generalizations of the storage structures used for managing one-dimensional data. Recently, a new storage structure, called the BD tree, was proposed to manage multidimensional data. This structure has good dynamic characteristics. This paper presents algorithms for the BD tree to perform insertion, deletion, and to answer exact match, partial match and range queries. In addition, some experimental evidence is presented that suggests that BD trees have good dynamic characteristics.  相似文献   

2.
Multidimensional binary search tree (abbreviated k-d tree) is a popular data structure for the organization and manipulation of spatial data. The data structure is useful in several applications including graph partitioning, hierarchical applications such as molecular dynamics and n-body simulations, and databases. In this paper, we study efficient parallel construction of k-d trees on coarse-grained distributed memory parallel computers. We consider several algorithms for parallel k-d tree construction and analyze them theoretically and experimentally, with a view towards identifying the algorithms that are practically efficient. We have carried out detailed implementations of all the algorithms discussed on the CM-5 and report on experimental results  相似文献   

3.
The problem of multidimensional file partitioning (MDFP) arises in large databases that are subject to frequent range queries on one or more attributes. In an MDFP scheme, the search attribute space is partitioned into cells, which are mapped to physical disk locations. This mapping preserves the order of the search attribute values so that range queries can be answered most efficiently, while maintaining good performance for other types of queries. Recently, MDFP schemes have been suggested to include both dynamic and static file organizations. Optimal and heuristic MDFP algorithms are developed for the static case. The results of extensive computational experiments show that the proposed heuristics perform better than known static ones. It is also shown that incorporating a static algorithm into a dynamic MDFP such as a grid file at conversion and/or periodical reorganization points significantly improves the resulting storage utilization of the data file and decreases the size of the directory file  相似文献   

4.
Splay and randomized search trees (RSTs) are self‐balancing binary tree structures with little or no space overhead compared to a standard binary search tree (BST). Both trees are intended for use in applications where node accesses are skewed, for example in gathering the distinct words in a large text collection for index construction. We investigate the efficiency of these trees for such vocabulary accumulation. Surprisingly, unmodified splaying and RSTs are on average around 25% slower than using a standard binary tree. We investigate heuristics to limit splay tree reorganization costs and show their effectiveness in practice. In particular, a periodic rotation scheme improves the speed of splaying by 27%, while other proposed heuristics are less effective. We also report the performance of efficient bit‐wise hashing and red–blacktrees for comparison. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

5.
《Computer Networks》2007,51(8):1861-1881
The success of a P2P file-sharing network highly depends on the scalability and versatility of its search mechanism. Two particularly desirable search features are scope (ability to find infrequent items) and support for partial-match queries (queries that contain typos or include a subset of keywords). While centralized-index architectures (such as Napster) can support both these features, existing decentralized architectures seem to support at most one: prevailing unstructured P2P protocols (such as Gnutella and FastTrack) deploy a “blind” search mechanism where the set of peers probed is unrelated to the query; thus they support partial-match queries but have limited scope. On the other extreme, the recently-proposed distributed hash tables (DHTs) such as CAN and CHORD, couple index location with the item’s hash value, and thus have good scope but can not effectively support partial-match queries. Another hurdle to DHTs deployment is their tight control of the overlay structure and the information (part of the index) each peer maintains, which makes them more sensitive to failures and frequent joins and disconnects.We develop a new class of decentralized P2P architectures. Our design is based on unstructured architectures such as Gnutella and FastTrack, and retains many of their appealing properties including support for partial match queries, and relative resilience to peer failures. Yet, we obtain orders of magnitude improvement in the efficiency of locating rare items. Our approach exploits associations inherent in human selections to steer the search process to peers that are more likely to have an answer to the query. We demonstrate the potential of associative search using models, analysis, and simulations.  相似文献   

6.
Wireless data broadcasting is a popular data delivery approach in mobile computing environments, where the broadcasting servers usually adopt indexing schemes for mobile clients to energy-efficiently access data on a wireless broadcast stream. However, conventional indexing schemes use primary key attribute values to construct tree structures. Therefore, these schemes do not support content-based retrieval queries such as partial-match queries and range-queries. This paper proposes an indexing method that supports content-based retrieval queries on a wireless data stream. The method uses a tree-structured index, called B2V-Tree, which is composed of bit-vectors that are generated from data records through multi-attribute hashing. Through analysis and experiments, the effectiveness of the proposed method is shown.  相似文献   

7.
Many recent database applications need to deal with similarity queries. For such applications, it is important to measure the similarity between two objects using the distance between them. Focusing on this problem, this paper proposes the slim-tree, a new dynamic tree for organizing metric data sets in pages of fixed size. The slim-tree uses the triangle inequality to prune the distance calculations that are needed to answer similarity queries over objects in metric spaces. The proposed insertion algorithm uses new policies to select the nodes where incoming objects are stored. When a node overflows, the slim-tree uses a minimal spanning tree to help with the splitting. The new insertion algorithm leads to a tree with high storage utilization and improved query performance. The slim-tree is a metric access method that tackles the problem of overlaps between nodes in metric spaces and that allows one to minimize the overlap. The proposed "fat-factor" is a way to quantify whether a given tree can be improved and also to compare two trees. We show how to use the fat-factor to achieve accurate estimates of the search performance and also how to improve the performance of a metric tree through the proposed "slim-down" algorithm. This paper also presents a new tool in the slim-tree's arsenal of resources, aimed at visualizing it. Visualization is a powerful tool for interactive data mining and for the visual tracking of the behavior of a tree under updates. Finally, we present a formula to estimate the number of disk accesses in range queries. Results from experiments with real and synthetic data sets show that the new slim-tree algorithms lead to performance improvements. These results show that the slim-tree outperforms the M-tree by up to 200% for range queries. For insertion and splitting, the minimal-spanning-tree-based algorithm achieves up to 40 times faster insertions. We observed improvements of up to 40% in range queries after applying the slim-down algorithm  相似文献   

8.
9.
Semantic Web search is a new application of recent advances in information retrieval (IR), natural language processing, artificial intelligence, and other fields. The Powerset group in Microsoft develops a semantic search engine that aims to answer queries not only by matching keywords, but by actually matching meaning in queries to meaning in Web documents. Compared to typical keyword search, semantic search can pose additional engineering challenges for the back-end and infrastructure designs. Of these, the main challenge addressed in this paper is how to lower query latencies to acceptable, interactive levels. Index-based semantic search requires more data processing, such as numerous synonyms, hypernyms, multiple linguistic readings, and other semantic information, both on queries and in the index. In addition, some of the algorithms can be super-linear, such as matching co-references across a document. Consequently, many semantic queries can run significantly slower than the same keyword query. Users, however, have grown to expect Web search engines to provide near-instantaneous results, and a slow search engine could be deemed unusable even if it provides highly relevant results. It is therefore imperative for any search engine to meet its users’ interactivity expectations, or risk losing them. Our approach to tackle this challenge is to exploit data parallelism in slow search queries to reduce their latency in multi-core systems. Although all search engines are designed to exploit parallelism, at the single-node level this usually translates to throughput-oriented task parallelism. This paper focuses on the engineering of two latency-oriented approaches (coarse- and fine-grained) and compares them to the task-parallel approach. We use Powerset’s deployed search engine to evaluate the various factors that affect parallel performance: workload, overhead, load balancing, and resource contention. We also discuss heuristics to selectively control the degree of parallelism and consequent overhead on a query-by-query level. Our experimental results show that using fine-grained parallelism with these dynamic heuristics can significantly reduce query latencies compared to fixed, coarse-granularity parallelization schemes. Although these results were obtained on, and optimized for, Powerset’s semantic search, they can be readily generalized to a wide class of inverted-index search engines.  相似文献   

10.
Randomized search heuristics, among them randomized local search and evolutionary algorithms, are applied to problems whose structure is not well understood, as well as to problems in combinatorial optimization. The analysis of these randomized search heuristics has been started for some well-known problems, and this approach is followed here for the minimum spanning tree problem. After motivating this line of research, it is shown that randomized search heuristics find minimum spanning trees in expected polynomial time without employing the global technique of greedy algorithms.  相似文献   

11.
ABSTRACT

Explicit model predictive control (EMPC) moves the online computational burden of linear model predictive control (MPC) to offline computation by using multi-parametric programming which produces control laws defined over a set of polyhedral regions in the state space. The online computation of EMPC is to find the corresponding control law according to a given state, this is called the point location problem. This paper deals with efficient point location in larger polyhedral data sets. The authors propose a hybrid data structure, grid k-d tree (GKDT), which is constructed by the k-dimensional tree (k-d tree), hash table and binary search tree (BST). The main part of GKDT is a multiple branch tree which constructs subtrees by splitting the polyhedral region into several equal grids based on the k-d tree and is traversed by the hash function on each level. GKDT has a high search efficiency, even though it needs much more storage memory. A complexity analysis of the approach in the runtime and storage requirements is provided. Advantages of the method are supported by two examples in the paper.  相似文献   

12.
Database applications very often require a sophisticated class of storage structures in order to answer different types of queries efficiently. This often dictates that the file should be organized on multiple keys. Several storage structures have been proposed to satisfy these needs. Most of these are a generalization of the storage structures used for managing one-dimensional data. Thek-d tree is one such example and it is a natural generalization of the standard one-dimensional binary search tree. Recently, a new storage structure, called theBD tree, was proposed to manage multidimensional data. This structure has good dynamic characteristics. Several variations are possible on the basick-d tree structure. This paper studies the performance implications of three variations. Further, it provides an empirical performance comparison of thek-d tree andBD tree in database applications.  相似文献   

13.
研究了采用网络距离的道路网上移动对象连续多范围查询处理技术。设计了道路网、移动对象和查询数据在内存中存储的数据模型。基于该数据模型提出了两种道路网上的移动对象连续多范围查询处理算法。其中,增量式范围查询算法(incremental range query algorithm,IRQA)通过使用扩张树和影响列表结构减少查询的重新计算;组范围查询算法(group range query algorithm,GRQA)利用同一路径上多查询的结果具有相关性这一特点减少查询的重新计算。实验结果表明GRQA算法在查询分布比较集中时性能较优,IRQA算法在查询均匀分布时性能较优,此外,两种算法均优于重新计算所有查询结果的原始算法。  相似文献   

14.
Randomized search heuristics (e.g., evolutionary algorithms, simulated annealing etc.) are very appealing to practitioners, they are easy to implement and usually provide good performance. The theoretical analysis of these algorithms usually focuses on convergence rates. This paper presents a mathematical study of randomized search heuristics which use comparison based selection mechanism. The two main results are that comparison-based algorithms are the best algorithms for some robustness criteria and that introducing randomness in the choice of offspring improves the anytime behavior of the algorithm. An original Estimation of Distribution Algorithm combining both results is proposed and successfully experimented.  相似文献   

15.
王碧  霍红卫 《计算机工程》2003,29(3):105-107
系统地介绍了基于并行化k-d树的多维数据分布方法。给出了几种构造k-d树的策 略和相应算法,并从理论上分析和比较了各种策略的通信花费及其应用范围。  相似文献   

16.
A balanced binary search tree can be characterized by two orthogonal issues: its search strategy and its balancing strategy. In this paper, we show how to decouple search and balancing strategies so that they can be expressed independently of each other, communicating only by basic operations such as rotations. Different balancing strategies, such as red–black trees and splay trees, and different search applications, such as key search and rank search, can be combined arbitrarily. As a new result, we show how optimal string search can be expressed as a search application on any balanced binary tree. We implement our framework in C++, with the heavy use of templates and inlining. It keeps balancing and searching separate, while still delivering a performance that compares favorably to widely used binary tree implementations. Common services, such as correctness checks and timing measurements, do not have to be rewritten for each tree implementation. The common framework simplifies experimentation with trees and search algorithms; as a demonstration, we present some simple comparisons of red–black trees, splay trees and treaps. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

17.
Jim Bell  Gopal Gupta 《Software》1993,23(4):369-382
Much has been said in praise of self-adjusting data structures, particularly self-adjusting binary search trees. Self-adjusting trees are most suited to skewed key-access distributions as the techniques attempt to place the most commonly accessed keys near the root of the tree. Theoretical bounds on worst-case and amortized performance (i.e. performance over a sequence of operations) have been derived which compare well with those for optimal binary search trees. In this paper, we compare the performance of three different techniques for self-adjusting trees with that of AVL and random binary search trees. Comparisons are made for various tree sizes, levels of key-access-frequency skewness and ratios of insertions and deletions to searches. The results show that, because of the high cost of maintaining self-adjusting trees, in almost all cases the AVL tree outperforms all the self-adjusting trees and in many cases even a random binary search tree has better performance, in terms of CPU time, than any of the self-adjusting trees. Self-adjusting trees seem to perform best in a highly dynamic environment, contrary to intuition.  相似文献   

18.
Free lunches on the discrete Lipschitz class   总被引:1,自引:0,他引:1  
The No-Free-Lunch theorem states that there does not exist a genuine general-purpose optimizer because all algorithms have the identical performance on average over all functions. However, such a result does not imply that search heuristics or optimization algorithms are futile if we are more cautious with the applicability of these methods and the search space. In this paper, within the No-Free-Lunch framework, we firstly introduce the discrete Lipschitz class by transferring the Lipschitz functions, i.e., functions with bounded slope, as a measure to fulfill the notion of continuity in discrete functions. We then investigate the properties of the discrete Lipschitz class, generalize an algorithm called subthreshold-seeker for optimization, and show that the generalized subthreshold-seeker outperforms random search on this class. Finally, we propose a tractable sampling-test scheme to empirically demonstrate the superiority of the generalized subthreshold-seeker under practical configurations. This study concludes that there exist algorithms outperforming random search on the discrete Lipschitz class in both theoretical and practical aspects and indicates that the effectiveness of search heuristics may not be universal but still general in some broad sense.  相似文献   

19.
《Artificial Intelligence》1987,31(2):185-199
Of the many minimax algorithms, sss1 is noteworthy because it usually searches the smallest game trees. Its success can be attributed to the accumulation and use of information acquired while traversing the tree. The main disadvantages of sss1 are its high storage needs and management costs. This paper describes a class of methods, based on the popular alpha-beta algorithm, that acquire and use information to guide a tree search. They retain a given search direction and yet are as good as sss1, even while searching random trees. Further, although some of these new algorithms also require substantial storage, they are more flexible and can be programmed to use only the space available, at the cost of some degradation in performance.  相似文献   

20.
为了解决路网环境中传统的组最近邻查询无法支持用户不确定搜索的问题,在组最近邻查询的基础上引入了“模糊”因子来描述用户查询的不确定性,并提出了四种不同的算法,其中朴素的全局搜索算法利用了Dijkstra 算法的特性来处理不确定性,多维向量算法和V-Tree 算法在此基础上通过缩小搜索空间进一步优化,最后提出的近似算法在牺牲了一定正确率的前提下进一步提高了查询效率。通过在真实路网数据集上的大量实验,总结归纳了不同算法的优势,并充分验证了各个算法的合理性与实用性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号