首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 750 毫秒
1.
An approach to vertical partitioning in relational databases in which the attributes of a relation are partitioned according to a set of transactions is proposed. The objective of vertical partitioning is to minimize the number of disk accesses in the system. Since transactions have more semantic meanings than attributes, this approach allows the optimization of the partitioning based on a selected set of important transactions. An optimal binary partitioning (OBP) algorithm based on the branch and bound method is presented, with the worst case complexity of O(2n), where n is the number of transactions. To handle systems with a large number of transactions, an algorithm BPi with complexity varying from O(n) to O(2n) is also developed. The experimental results reveal that the performance of vertical partitioning is sensitive to the skewness of transaction accesses. Further, BPi converges rather rapidly to OBP. Both OBP and BPi yield results comparable with that of global optimum obtained from an exhaustive search  相似文献   

2.
The effect of buffering on the performance of R-trees   总被引:3,自引:0,他引:3  
Past R-tree studies have focused on the number of nodes visited as a metric of query performance. Since database systems usually include a buffering mechanism, we propose that the number of disk accesses is a more realistic measure of performance. We develop a buffer model to analyze the number of disk accesses required for spatial queries using R-trees. The model can be used to evaluate the quality of R-tree update operations, such as various node splitting and tree restructuring policies, as measured by query performance on the resulting tree. We use our model to study the performance of three well-known R-tree loading algorithms. We show that ignoring buffer behavior and using number of nodes accessed as a performance metric can lead to incorrect conclusions, not only quantitatively, but also qualitatively. In addition, we consider the problem of how many levels of the R-tree should be pinned in the buffer  相似文献   

3.
Two graph models are developed to determine the minimum required buffer size for achieving the theoretical lower bound on the number of disk accesses for performing relational joins. Here, the lower bound implies only one disk access per joining block or page. The first graph model is based on the block connectivity of the joining relations. Using this model, the problem of determining an ordered list of joining blocks that requires the smallest buffer is considered. It is shown that this problem as well as the problem of computing the least upper bound on the buffer size is NP-hard. The second graph model represents the page connectivity of the joining relations. It is shown that the problem of computing the least upper bound on the buffer size for the page connectivity model is also NP-hard. Heuristic procedures are presented for the page connectivity model and it is shown that the sequence obtained using the heuristics requires a near-optimal buffer size The authors also show the performance improvement of the proposed heuristics over the hybrid-has join algorithm for a wide range of join factors  相似文献   

4.
The author examines join processing when the access paths available are nonclustered indexes on the joining attribute(s) for both relations involved in the join. He uses a bipartite graph model to represent the pages from the two relations that contain tuples to be joined. The minimization of the number of page accesses needed to compute a join in the author's database environment is explored from two perspectives. The first is to reduce the maximum buffer size so that no page is accessed more than once, and the second is to reduce the number of page accesses for a fixed buffer size. The author has developed heuristics for these problems. He gives performance comparisons of these heuristics and another method that recently appeared in the literature. Results show that one particular heuristic performs very well for addressing the problem from either perspective  相似文献   

5.
In object-oriented databases (OODBs), a method encapsulated in a class typically accesses a few, but not all the instance variables defined in the class. It may thus be preferable to vertically partition the class for reducing irrelevant data (instance variables) accessed by the methods. Our prior work has shown that vertical class partitioning can result in a substantial decrease in the total number of disk accesses incurred for executing a set of applications, but coming up with an optimal vertical class partitioning scheme is a hard problem. In this paper, we present two algorithms for deriving optimal and near-optimal vertical class partitioning schemes. The cost-driven algorithm provides the optimal vertical class partitioning schemes by enumerating, exhaustively, all the schemes and calculating the number of disk accesses required to execute a given set of applications. For this, a cost model for executing a set of methods in an OODB system is developed. Since exhaustive enumeration is costly and only works for classes with a small number of instance variables, a hill-climbing heuristic algorithm (HCHA) is developed, which takes the solution provided by the affinity-based algorithm and improves it, thereby further reducing the total number of disk accesses incurred. We show that the HCHA algorithm provides a reasonable near-optimal vertical class partitioning scheme for executing a given set of applications.Received: 29 March 1999, Accepted: 11 March 2002, Published online: 3 April 2003This research has been supported, in part, by Hong Kong UGC Research Grants Council under grant CityU 733/96E and CityU 1119/99E.  相似文献   

6.
The authors discuss various performance issues in distributed query processing. They validate and evaluate the performance of the local reduction (LR) the fragment and replicate strategy (FRS) and the partition and replicate strategy (PRS) optimization algorithms. The experimental results reveal that the choices made by these algorithms concerning which local operations should be performed, which relation should remain fragmented or which relation should be partitioned are valid. It is shown using experimental results that various parameters, such as the number of processing sites, partitioning speed relative to join speed, and sizes of the join relations, affect the performance of PRS significantly. It is also shown that the response times of query execution are affected significantly by the degree of site autonomy, interferences among processes, interface with the local database management systems (DBMSs) and communications facilities. Pipeline strategies for processing queries in an environment where relations are fragmented are studied  相似文献   

7.
Vertical partitioning is a design technique for reducing the number of disk accesses to execute a given set of queries by minimizing the number of irrelevant instance variables accessed. This is accomplished by grouping the frequently accessed instance variables as vertical class fragments. The complexity of object-oriented database models due to subclass hierarchy and class composition hierarchy complicates the definition and representation of vertical partitioning of the classes, which makes the problem of vertical partitioning in OODBs very challenging. In this paper, we develop a comprehensive analytical cost model for processing of queries on vertically partitioned OODB classes. A set of analytical evaluation results is presented to show the effect of vertical partitioning, and to study the trade-off between the projection ratio versus selectivity factor vis-a-vis sequential versus index access. Furthermore, an empirical experimental prototype supporting vertical class partitioning has been implemented on a commercial OODB tool kit to validate our analytical cost model.  相似文献   

8.
Efficient scheduling of page access in index-based join processing   总被引:1,自引:0,他引:1  
The paper examines the issue of scheduling page accesses in join processing, and proposes new heuristics for the following scheduling problems: 1) an optimal page access sequence for a join such that there are no page reaccesses using the minimum number of buffer pages, and 2) an optimal page access sequence for a join such that the number of page reaccesses for a given number of buffer pages is minimum. The experimental performance results show that the new heuristics perform better than existing heuristics for the first problem and also perform better for the second problem, provided that the number of available buffer pages is not much less than the optimal buffer size  相似文献   

9.
This paper studies workfile disk management for concurrent mergesorts ina multiprocessor database system. Specifically, we examine the impacts of workfile disk allocation and data striping on the average mergesort response time. Concurrent mergesorts in a multiprocessor system can creat severe I/O interference in which a large number of sequential write requests are continuously issued to the same workfile disk and block other read requests for a long period of time. We examine through detailed simulations a logical partitioning approach to workfile disk management and evaluate the effectiveness of datastriping. The results show that (1) without data striping, the best performance is achieved by using the entire workfile disks as a single partition if there are abundant workfile disks (or system workload is light); (2) however, if there are limited workfile disks (or system workload is heavy), the workfile disks should be partitioned into multiple groups and the optimal partition size is workload dependent; (3) data striping is beneficial only if the striping unit size is properly chosen; and (4) with a proper striping size, the best performance is generally achieved by using the entire disks as a single logical partition.  相似文献   

10.
Anna Haé 《Acta Informatica》1993,30(2):131-146
This paper proposes performance and reliability improvement by using new algorithms for asynchronous operations in disk buffer cache memory. These algorithms allow for writing the files into the buffer cache by the processes and consider the number of active processes in the system and the length of the queue to the disk buffer cache. Writing the contents of the buffer cache to the disk depends on the system load and the write activity. Performance and reliability measures including the elapsed time of writing a file into the buffer cache, the waiting time to start writing a file, and the mean number of blocks written to the disk between system failures are used to show performance and reliability improvement by using the algorithms. Sensitivity analysis is used to influence the algorithms' design. Examples of real systems are used to show the numerical results of performance and reliability improvement in different systems with various disk cache parameters and file sizes.  相似文献   

11.
The analytic prediction of buffer hit probability, based on the characterization of database accesses from real reference traces, is extremely useful for workload management and system capacity planning. The knowledge can be helpful for proper allocation of buffer space to various database relations, as well as for the management of buffer space for a mixed transaction and query environment. Access characterization can also be used to predict the buffer invalidation effect in a multi-node environment which, in turn, can influence transaction routing strategies. However, it is a challenge to characterize the database access pattern of a real workload reference trace in a simple manner that can easily be used to compute buffer hit probability. In this article, we use a characterization method that distinguishes three types of access patterns from a trace: (1) locality within a transaction, (2) random accesses by transactions, and (3) sequential accesses by long queries. We then propose a concise way to characterize the access skew across randomly accessed pages by logically grouping the large number of data pages into a small number of partitions such that the frequency of accessing each page within a partition can be treated as equal. Based on this approach, we present a recursive binary partitioning algorithm that can infer the access skew characterization from the buffer hit probabilities for a subset of the buffer sizes. We validate the buffer hit predictions for single and multiple node systems using production database traces. We further show that the proposed approach can predict the buffer hit probability of a composite workload from those of its component files.  相似文献   

12.
This study is concerned with a parallel join operation where the subject relations are partitioned according to an interpolation based grid file (IBGF) scheme. The partitioned relations and directories are distributed over a set of independently accessible external storage units, together with the partitioning control data. The join algorithms executed by a mesh type parallel computing system allow handling of uniform as well as nonuniformly partitioned relations. Each processor locates and retrieves the data partitions it is to join at each step of the join process, in synchronisation with other processors. The approach is found to be feasible as the speedup and efficiency results found by simulation are consistent with theoretical bounds. The algorithms are tuned to join-key distributions, so that effective load balancing is achieved during the actual join. © 1997 John Wiley & Sons, Ltd.  相似文献   

13.
Existing methods for spatial joins require pre-existing spatial indices or other precomputation, but such approaches are inefficient and limited in generality. Operand data sets of spatial joins may not all have precomputed indices, particularly when they are dynamically generated by other selection or join operations. Also, existing spatial indices are mostly designed for spatial selections, and are not always efficient for joins. This paper explores the design and implementation of seeded trees, which are effective for spatial joins and efficient to construct at join time. Seeded trees are R-tree-like structures, but divided into seed levels and grown levels. This structure facilitates using information regarding the join to accelerate the join process, and allows efficient buffer management. In addition to the basic structure and behavior of seeded trees we present techniques for efficient seeded tree construction, a new buffer management strategy to lower I/O costs, and theoretical analysis for choosing algorithmic parameters. We also present methods for reducing space requirements and improving the stability of seeded tree performance with no additional I/O costs. Our performance studies show that the seeded tree method outperforms other tree-based methods by far both in terms of the number disk pages accessed and weighted I/O costs. Further, its performance gain is stable across different input data, and its incurred CPU penalties are also lower  相似文献   

14.
In this paper, we propose data space mapping techniques for storage and retrieval in multi-dimensional databases on multi-disk architectures. We identify the important factors for an efficient multi-disk searching of multi-dimensional data and develop secondary storage organization and retrieval techniques that directly address these factors. We especially focus on high dimensional data, where none of the current approaches are effective. In contrast to the current declustering techniques, storage techniques in this paper consider both inter- and intra-disk organization of the data. The data space is first partitioned into buckets, then the buckets are declustered to multiple disks while they are clustered in each disk. The queries are executed through bucket identification techniques that locate the pages. One of the partitioning techniques we discuss is especially practical for high dimensional data, and our disk and page allocation techniques are optimal with respect to number of I/O accesses and seek times. We provide experimental results that support our claims on two real high dimensional datasets.  相似文献   

15.
A spatial join is a query that searches for a set of object pairs satisfying a given spatial relationship from a database. It is one of the most costly queries, and thus requires an efficient processing algorithm that fully exploits the features of the underlying spatial indexes. In our earlier work, we devised a fairly effective algorithm for processing spatial joins with double transformation (DOT) indexing, which is one of several spatial indexing schemes. However, the algorithm is restricted to only the one-dimensional cases. In this paper, we extend the algorithm for the two-dimensional cases, which are general in Geographic Information Systems (GIS) applications. We first extend DOT to two-dimensional original space. Next, we propose an efficient algorithm for processing range queries using extended DOT. This algorithm employs the quarter division technique and the tri-quarter division technique devised by analyzing the regularity of the space-filling curve used in DOT. This greatly reduces the number of space transformation operations. We then propose a novel spatial join algorithm based on this range query processing algorithm. In processing a spatial join, we determine the access order of disk pages so that we can minimize the number of disk accesses. We show the superiority of the proposed method by extensive experiments using data sets of various distributions and sizes. The experimental results reveal that the proposed method improves the performance of spatial join processing up to three times in comparison with the widely-used R-tree-based spatial join method.  相似文献   

16.
The current major theme in contrast enhancement is to partition the input histogram into multiple sub-histograms before final equalization of each sub-histogram is performed. This paper presents a novel contrast enhancement method based on Gaussian mixture modeling of image histograms, which provides a sound theoretical underpinning of the partitioning process. Our method comprises five major steps. First, the number of Gaussian functions to be used in the model is determined using a cost function of input histogram partitioning. Then the parameters of a Gaussian mixture model are estimated to find the best fit to the input histogram under a threshold. A binary search strategy is then applied to find the intersection points between the Gaussian functions. The intersection points thus found are used to partition the input histogram into a new set of sub-histograms, on which the classical histogram equalization (HE) is performed. Finally, a brightness preservation operation is performed to adjust the histogram produced in the previous step into a final one. Based on three representative test images, the experimental results demonstrate the contrast enhancement advantage of the proposed method when compared to twelve state-of-the-art methods in the literature.  相似文献   

17.
The join relational operation is one of the most expensive among database operations. In this study, we consider the problem of scheduling page accesses in join processing. This raises two interesting problems: (1) determining a page access sequence that uses the minimum number of buffer pages without any page reaccesses, and (2) determining a page access sequence that minimizes the number of page reaccesses for a given buffer size. We use a graph model to represent the pages from the relations that contain tuples to be joined, and present new heuristics for the two problems. Our experimental results show that the new heuristic performs well.  相似文献   

18.
Buffer queries   总被引:2,自引:0,他引:2  
A class of commonly asked queries in a spatial database is known as buffer queries. An example of such a query is to "find house-power line pairs that are within 50 meters of each other." A buffer query involves two spatial data sets and a distance d. The answer to this query are pairs of objects, one from each input set, that are within distance d of each other. Given nonpoint spatial objects, evaluation of buffer queries could be a costly operation, even when the numbers of objects in the input data sets are relatively small. This paper addresses the problem of how to evaluate this class of queries efficiently. A fundamental problem with buffer query evaluation is to find an efficient algorithm for solving the minimum distance (miniDist) problem for lines and regions. An efficient minDist algorithm, which only requires a subsequence of segments from each object to be examined, is derived. Finding a fast minDist algorithm is the first step in evaluating a buffer query efficiently. It is observed that many, and sometimes even most, candidates can be proven in the answer without resorting to the relatively expensive minDist operation. A candidate is first evaluated with a least expensive technique-called O-object filtering. If it fails, a more costly operation, called 1-object filtering, is applied. Finally, if both filterings fail, the most expensive minDist algorithm is invoked. To show the effectiveness of the these techniques, they are incorporated into the well-known tree join algorithm and tested with real-life as well as artificial data sets. Extensive experiments show that the proposed algorithm outperforms existing techniques by a wide margin in both execution time as well as IO accesses. More importantly, the performance gain improves drastically with the increase of distance values.  相似文献   

19.
陈刚  顾进广  李思川 《计算机科学》2010,37(12):143-144
数据流上的关系查询处理技术是数据库研究领域的一大热点。优化无阻塞连接算法的关键在于提高内存连接阶段的效率。当内存空间满时,需要将内存数据刷新到外存相应分区,良好的刷新策略对于改进算法的性能至关重要。利用数据分布的特征,对关系连接的输出流,使用基于统计的方法,查找使用频率最低的元组,将使用频率较低的元组刷新到外存,以提高内存数据的效率。基于统计分析策略提高了刷新策略的准确性和效率及算法的适用范围。  相似文献   

20.
Computing global illumination for complex environments in moderate time and walking through them is one of the challenges in computer graphics. To meet this goal, preprocessing is necessary. This preprocessing consists in partitioning the environment into cells and determining visibility between these cells. Most of the existing partitioning methods rely on the binary space partitioning (BSP) technique which can be easily applied to axial environments. However, for non-axial scenes, BSP has a high complexity of O (n3) in time to construct a tree of size at worst O(n2), n being the total number of input polygons. Moreover, this technique entails a large number of cells that do not necessarily fit with the topology of the environment. We propose in this paper a partitioning method which can be applied to non-axial buildings with several floors. It consists of two steps. In the first step each floor is extracted by applying a BSP technique using the most occlusive horizontal polygons for splitting. In the second step each floor is in turn partitioned with a model-based method operating in a dual 2D space. The result is a low number of cells fitting at best with the environment topology. © 1998 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号