首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Horizontal partitioning is a logical database design technique which facilitates efficient execution of queries by reducing the irrelevant objects accessed. Given a set of most frequently executed queries on a class, the horizontal partitioning generates horizontal class fragments (each of which is a subset of object instances of the class), that meet the queries requirements. There are two types of horizontal class partitioning, namely, primary and derived. Primary horizontal partitioning of a class is performed using predicates of queries accessing the class. Derived horizontal partitioning of a class is the partitioning of a class based on the horizontal partitioning of another class. We present algorithms for both primary and derived horizontal partitioning and discuss some issues in derived horizontal partitioning and present their solutions. There are two important aspects for supporting database operations on a partitioned database, namely, fragment localization for queries and object migration for updates. Fragment localization deals with identifying the horizontal fragments that contribute to the result of the query, and object migration deals with migrating objects from one class fragment to another due to updates. We provide novel solutions to these two problems, and finally we show the utility of horizontal partitioning for query processing.  相似文献   

2.
The index selection problem (ISP) is an important optimization problem in the physical design of databases. The aim of this paper is to show that ISP, although NP-hard, can in practice be solved effectively through well-designed algorithms. We formulate ISP as a 0-1 integer linear program and describe an exact branch-and-bound algorithm based on the linear programming relaxation of the model. The performance of the algorithm is enhanced by means of procedures to reduce the size of the candidate index set. We also describe heuristic algorithms based on the solution of a suitably defined knapsack subproblem and on Lagrangian decomposition. Finally, computational results on several classes of test problems are given. We report the exact solution of large-scale ISP instances involving several hundred indexes and queries. We also evaluate one of the heuristic algorithms we propose on very large-scale instances involving several thousand indexes and queries and show that it consistently produces very tight approximate (and sometimes provably optimal) solutions. Finally, we discuss possible extensions and future directions of research  相似文献   

3.
It is desirable to design partitioning methods that minimize the I/O time incurred during query execution in spatial databases. This paper explores optimal partitioning for two-dimensional data for a class of queries and develops multi-disk allocation techniques that maximize the degree of I/O parallelism obtained in each case. We show that hexagonal partitioning has optimal I/O performance for circular queries among all partitioning methods that use convex non-overlapping regions. An analysis and extension of this result to all possible partitioning techniques is also given. For rectangular queries, we show that hexagonal partitioning has overall better I/O performance for a general class of range queries, except for rectilinear queries, in which case rectangular grid partitioning is superior. By using current algorithms for rectangular grid partitioning, parallel storage and retrieval algorithms for hexagonal partitioning can be constructed. Some of these results carry over to circular partitioning of the data—which is an example of a non-convex region.  相似文献   

4.
An efficient technique for nearest-neighbor query processing on the SPY-TEC   总被引:1,自引:0,他引:1  
The SPY-TEC (spherical pyramid-technique) was proposed as a new indexing method for high-dimensional data spaces using a special partitioning strategy that divides a d-dimensional data space into 2d spherical pyramids. In the SPY-TEC, an efficient algorithm for processing hyperspherical range queries was introduced with a special partitioning strategy. However, the technique for processing k-nearest-neighbor queries, which are frequently used in similarity search, was not proposed. In this paper, we propose an efficient algorithm for processing nearest-neighbor queries on the SPY-TEC by extending the incremental nearest-neighbor algorithm. We also introduce a metric that can be used to guide an ordered best-first traversal when finding nearest neighbors on the SPY-TEC. Finally, we show that our technique significantly outperforms the related techniques in processing k-nearest-neighbor queries by comparing it to the R*-tree, the X-tree, and the sequential scan through extensive experiments.  相似文献   

5.
Massive XML data are increasingly generated for the representation, storage and exchange of web information. Twig query processing over massive XML data has become a research focus. However, most traditional algorithms cannot be directly implemented in a distributed manner. Some of the existing distributed algorithms generate a lot of useless intermediate results and execute many join operations of partial results in most cases; others require the priori knowledge of query pattern before XML partition, storage and query processing, which is impractical in the cases of large-scale data or frequent incoming new queries. To improve efficiency and scalability, in this paper, we propose a 3-phase distributed algorithm DisT3 based on node distribution mechanism to avoid unnecessary intermediate results. Furthermore, we propose a lightweight local index ReP with an enhanced XML partitioning approach using arbitrary partitioning strategy, and based on ReP we propose an improved 2-phase distributed algorithm DisT2ReP to further reduce the communication cost. After the performance guarantees are analyzed, extensive experiments are conducted to verify the efficiency and scalability of our proposed algorithms in distributed twig query applications.  相似文献   

6.
Advanced application domains such as computer-aided design, computer-aided software engineering, and office automation are characterized by their need to store, retrieve, and manage large quantities of data having complex structures. A number of object-oriented database management systems (OODBMS) are currently available that can effectively capture and process the complex data. The existing implementations of OODBMS outperform relational systems by maintaining and querying cross-references among related objects. However, the existing OODBMS still do not meet the efficiency requirements of advanced applications that require the execution of complex queries involving the retrieval of a large number of data objects and relationships among them. Parallel execution can significantly improve the performance of complex OO queries. In this paper, we analyze the performance of parallel OO query processing algorithms for various benchmark application domains. The application domains are characterized by specific mixes of queries of different semantic complexities. The performance of the application domains has been analyzed for various system and data parameters by running parallel programs on a 32-node transputer based parallel machine developed at the IBM Research Center at Yorktown Heights. The parallel processing algorithms, data routing techniques, and query management and control strategies have been implemented to obtain accurate estimation of controlling and processing overheads. However, generation of large complex databases for the study was impractical. Hence, the data used in the simulation have been parameterized. The parallel OO query processing algorithms analyzed in this study are based on a query graph approach rather than the traditional query tree approach. Using the query graph approach, a query is processed by simultaneously initiating the execution at several object classes, thereby, improving the parallelism. During processing, the algorithms avoid the execution of time-consuming join operations by making use of the object references among the objects. Further, the algorithms do not generate any temporary data, thereby, reducing disk accesses. This is accomplished by marking the selected objects and by employing a two-phase query processing strategy.  相似文献   

7.
Object-oriented database management systems (OODBMSs) provide rich facilities for the modeling and processing of structural as well as behavioral properties of complex application objects. However, due to their inherent generality and continuously evolving functionalities, efficient implementations are important for these OODBMSs to support the present and future applications, particularly when the databases are very large. In this paper, we present several parallel, multi-wavefront algorithms based on two processing approaches, i.e., identification and elimination approaches, to verify association patterns specified in queries. Both approaches allow more processors to operate concurrently on a query than the traditional tree-structured query processing approach, thus introducing a higher degree of parallelism in query processing. A heuristic method is presented for partitioning an object-oriented database (OODB). The main consideration for partitioning the database is load balancing. This method also tries to reduce the communication time by reducing the length of the path that wavefronts need to be propagated. Multiple wavefront algorithms based on the two approaches for tree-structured queries have been implemented on an nCUBE 2 parallel computer. The implementation of the query processor allows multiple queries to be executed simultaneously. This implementation provides an environment for evaluating the algorithms and the heuristic method for partitioning the database. The evaluation results are presented in this paper.Recommended by: Patrick Valduriez  相似文献   

8.
In many machine learning settings, labeled examples are difficult to collect while unlabeled data are abundant. Also, for some binary classification problems, positive examples which are elements of the target concept are available. Can these additional data be used to improve accuracy of supervised learning algorithms? We investigate in this paper the design of learning algorithms from positive and unlabeled data only. Many machine learning and data mining algorithms, such as decision tree induction algorithms and naive Bayes algorithms, use examples only to evaluate statistical queries (SQ-like algorithms). Kearns designed the statistical query learning model in order to describe these algorithms. Here, we design an algorithm scheme which transforms any SQ-like algorithm into an algorithm based on positive statistical queries (estimate for probabilities over the set of positive instances) and instance statistical queries (estimate for probabilities over the instance space). We prove that any class learnable in the statistical query learning model is learnable from positive statistical queries and instance statistical queries only if a lower bound on the weight of any target concept f can be estimated in polynomial time. Then, we design a decision tree induction algorithm POSC4.5, based on C4.5, that uses only positive and unlabeled examples and we give experimental results for this algorithm. In the case of imbalanced classes in the sense that one of the two classes (say the positive class) is heavily underrepresented compared to the other class, the learning problem remains open. This problem is challenging because it is encountered in many real-world applications.  相似文献   

9.
The authors discuss various performance issues in distributed query processing. They validate and evaluate the performance of the local reduction (LR) the fragment and replicate strategy (FRS) and the partition and replicate strategy (PRS) optimization algorithms. The experimental results reveal that the choices made by these algorithms concerning which local operations should be performed, which relation should remain fragmented or which relation should be partitioned are valid. It is shown using experimental results that various parameters, such as the number of processing sites, partitioning speed relative to join speed, and sizes of the join relations, affect the performance of PRS significantly. It is also shown that the response times of query execution are affected significantly by the degree of site autonomy, interferences among processes, interface with the local database management systems (DBMSs) and communications facilities. Pipeline strategies for processing queries in an environment where relations are fragmented are studied  相似文献   

10.
Wrappers export the schema and data of existing heterogeneous databases and support queries on them. In the context of cooperative information systems, we present a flexible approach to specify the derivation of object-oriented (OO) export databases from local relational databases. Our export database derivation consists of a set of extent derivation structures (EDS) which defines the extent and deep extent of export classes. Having well-defined semantics, the EDS can be readily used in transforming wrapper queries to local queries. Based on the EDS, we developed a wrapper query evaluation strategy which handles OO queries on the export databases. The strategy is unique in that it considers the limited query processing capabilities of local database systems as well as the language constraints on the local query languages.  相似文献   

11.
Array partitioning is an important research problem in array management area, since the partitioning strategies have important influence on storage, query evaluation, and other components in array management systems. Meanwhile, compression is highly needed for the array data due to its growing volume. Observing that the array partitioning can affect the compression performance significantly, this paper aims to design the efficient partitioning method for array data to optimize the compression performance. As far as we know, there still lacks research efforts on this problem. In this paper, the problem of array partitioning for optimizing the compression performance (PPCP for short) is firstly proposed. We adopt a popular compression technique which allows to process queries on the compressed data without decompression. Secondly, because the above problem is NP-hard, two essential principles for exploring the partitioning solution are introduced, which can explain the core idea of the partitioning algorithms proposed by us. The first principle shows that the compression performance can be improved if an array can be partitioned into two parts with different sparsities. The second principle introduces a greedy strategy which can well support the selection of the partitioning positions heuristically. Supported by the two principles, two greedy strategy based array partitioning algorithms are designed for the independent case and the dependent case respectively. Observing the expensive cost of the algorithm for the dependent case, a further optimization based on random sampling and dimension grouping is proposed to achieve linear time cost. Finally, the experiments are conducted on both synthetic and real-life data, and the results show that the two proposed partitioning algorithms achieve better performance on both compression and query evaluation.  相似文献   

12.
Decision support queries typically involve several joins, a grouping with aggregation, and/or sorting of the result tuples. We propose two new classes of query evaluation algorithms that can be used to speed up the execution of such queries. The algorithms are based on (1) early sorting and (2) early partitioning– or a combination of both. The idea is to push the sorting and/or the partitioning to the leaves, i.e., the base relations, of the query evaluation plans (QEPs) and thereby avoid sorting or partitioning large intermediate results generated by the joins. Both early sorting and early partitioning are used in combination with hash-based algorithms for evaluating the join(s) and the grouping. To enable early sorting, the sort order generated at an early stage of the QEP is retained through an arbitrary number of so-called order-preserving hash joins. To make early partitioning applicable to a large class of decision support queries, we generalize the so-called hash teams proposed by Graefe et al. [GBC98]. Hash teams allow to perform several hash-based operations (join and grouping) on the same attribute in one pass without repartitioning intermediate results. Our generalization consists of indirectly partitioning the input data. Indirect partitioning means partitioning the input data on an attribute that is not directly needed for the next hash-based operation, and it involves the construction of bitmaps to approximate the partitioning for the attribute that is needed in the next hash-based operation. Our performance experiments show that such QEPs based on early sorting, early partitioning, or both in combination perform significantly better than conventional strategies for many common classes of decision support queries. Received April 4, 2000 / Accepted June 23, 2000  相似文献   

13.
The interest for multimedia database management systems has grown rapidly due to the need for the storage of huge volumes of multimedia data in computer systems. An important building block of a multimedia database system is the query processor, and a query optimizer embedded to the query processor is needed to answer user queries efficiently. Query optimization problem has been widely studied for conventional database systems; however it is a new research area for multimedia database systems. Due to the differences in query processing strategies, query optimization techniques used in multimedia database systems are different from those used in traditional databases. In this paper, a query optimization strategy is proposed for processing spatio-temporal queries in video database systems. The proposed strategy includes reordering algorithms to be applied on query execution tree. The performance results obtained by testing the reordering algorithms on different query sets are also presented.  相似文献   

14.
The problem of multidimensional file partitioning (MDFP) arises in large databases that are subject to frequent range queries on one or more attributes. In an MDFP scheme, the search attribute space is partitioned into cells, which are mapped to physical disk locations. This mapping preserves the order of the search attribute values so that range queries can be answered most efficiently, while maintaining good performance for other types of queries. Recently, MDFP schemes have been suggested to include both dynamic and static file organizations. Optimal and heuristic MDFP algorithms are developed for the static case. The results of extensive computational experiments show that the proposed heuristics perform better than known static ones. It is also shown that incorporating a static algorithm into a dynamic MDFP such as a grid file at conversion and/or periodical reorganization points significantly improves the resulting storage utilization of the data file and decreases the size of the directory file  相似文献   

15.
The performance optimization of query processing in spatial networks focuses on minimizing network data accesses and the cost of network distance calculations. This paper proposes algorithms for network k-NN queries, range queries, closest-pair queries and multi-source skyline queries based on a novel processing framework, namely, incremental lower bound constraint. By giving high processing priority to the query associated data points and utilizing the incremental nature of the lower bound, the performance of our algorithms is better optimized in contrast to the corresponding algorithms based on known framework incremental Euclidean restriction and incremental network expansion. More importantly, the proposed algorithms are proven to be instance optimal among classes of algorithms. Through experiments on real road network datasets, the superiority of the proposed algorithms is demonstrated.  相似文献   

16.
Management of large quantities of complex data is essential in many advanced application areas. Object-oriented (OO) database management system have been developed to effectively model and process the complex domain knowledge. They have been shown to outperform some existing relational systems. The existing implementations of OO database management systems attempt to improve the efficiency of OO queries by explicitly capturing the relationships among objects. However, the execution of complex queries involving the retrieval of objects from many classes and relationships among them causes the existing system to operate inefficiently. In this paper, we present parallel algorithms for the processing of queries against a large OO database. The algorithms are based on a closed model of query processing pattern-based access instead of the conventional value-based access. During processing, the algorithms avoid the execution of time-consuming join operations by making use of the explicitly stored object associations. Generation of large quantities of temporary data is avoided by marking objects using their identifiers and by employing a two-phase query processing strategy. A query is processed by concurrent multiple waves, thereby improving parallelism avoiding the complexities introduced in their sequential implementation. The correctness and the performance of the parallel algorithms have been tested and analyzed by running parallel programs on a 32-node transputer based parallel machine designed and developed at the IBM Research Center at Yorktown Heights, New York. Benchmark queries of different semantic complexities are generated, and their performance is analyzed for various data and query parameters  相似文献   

17.
Metric databases are databases where a metric distance function is defined for pairs of database objects. In such databases, similarity queries in the form of range queries or k-nearest-neighbor queries are the most important query types. In traditional query processing, single queries are issued independently by different users. In many data mining applications, however, the database is typically explored by iteratively asking similarity queries for answers of previous similarity queries. We introduce a generic scheme for such data mining algorithms and we investigate two orthogonal approaches, reducing I/O cost as well as CPU cost, to speed-up the processing of multiple similarity queries. The proposed techniques apply to any type of similarity query and to an implementation based on an index or using a sequential scan. Parallelization yields an additional impressive speed-up. An extensive performance evaluation confirms the efficiency of our approach  相似文献   

18.
The problem of kNN (k Nearest Neighbor) queries has received considerable attention in the database and information retrieval communities. Given a dataset D and a kNN query q, the k nearest neighbor algorithm finds the closest k data points to q. The applications of kNN queries are board, not only in spatio-temporal databases but also in many areas. For example, they can be used in multimedia databases, data mining, scientific databases and video retrieval. The past studies of kNN query processing did not consider the case that the server may receive multiple kNN queries at one time. Their algorithms process queries independently. Thus, the server will be busy with continuously reaccessing the database to obtain the data that have already been acquired. This results in wasting I/O costs and degrading the performance of the whole system. In this paper, we focus on this problem and propose an algorithm named COrrelated kNN query Evaluation (COKE). The main idea of COKE is an “information sharing” strategy whereby the server reuses the query results of previously executed queries for efficiently processing subsequent queries. We conduct a comprehensive set of experiments to analyze the performance of COKE and compare it with the Best-First Search (BFS) algorithm. Empirical studies indicate that COKE outperforms BFS, and achieves lower I/O costs and less running time.  相似文献   

19.
空间数据库中约束K最接近对查询   总被引:1,自引:0,他引:1  
定义了满足空间约束的K最接近对查询,该查询检索两个数据集在给定约束区域中的K最接近对。在空间数据库中,对采用R树类型索引存储的数据集给出了三个查询处理算法。其中两阶段的RJ和JR算法采用了变换范围查询和最接近对查询执行顺序的策略。单阶段基于堆的SPH算法采用了最好优先的策略,并利用给出的裁减规则、更新规则和访问顺序规则来提高查询处理效率。实验表明SPH具有较好的适用性和性能。  相似文献   

20.
《Information Fusion》2008,9(3):412-424
Data processing applications for sensor streams have to deal with multiple continuous data streams with inputs arriving at highly variable and unpredictable rates from various sources. These applications perform various operations (e.g. filter, aggregate, join, etc.) on incoming data streams in real-time according to predefined queries or rules. Since the data rate and data distribution fluctuate over time, an appropriate join tree for processing join queries must be adaptively maintained in response to dynamic changes to prevent rapid degradation of the system performance. In this paper, we address the problem of finding an optimal join tree that maximizes throughput for sliding window based multi-join queries over continuous data streams and prove its NP-Hardness. We present a dynamic programming algorithm, OptDP, which produces the optimal tree but runs in an exponential time in the number of input streams. We then present a polynomial time greedy algorithm, XGreedyJoin. We tested these algorithms in ARES, an adaptively re-optimizing engine for stream queries, which we developed by extending Jess (Jess is a popular RETE-based, forward chaining rule engine written in java). For almost all instances, trees from XGreedyJoin perform close to the optimal trees from OptDP, and significantly better than common heuristics-based XJoin algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号