共查询到20条相似文献,搜索用时 31 毫秒
1.
Disk input/output (I/O) efficient query execution is an important topic with respect to DBMS performance. In this context, we elaborate on the construction of disk access plans for sort order queries in balanced and nested grid files. The key idea is to use the order information contained in the directory of the multiattribute search structure. The presented algorithms are shown to yield a significant decrease in the number of disk I/O operations by appropriate use of the order information. Two algorithms for the construction of appropriate disk access plans are proposed, namely a greedy approach and a heuristic divide-and-conquer approach. Both approaches yield considerable I/O savings compared to straightforward query processing without consideration of any directory order information. The former performs well for small buffer page allocations, i.e., for a small number of buffer pages relative to the number of data buckets processed in the query. The latter is superior to the greedy algorithm with respect to the total number of I/O operations and with respect to the overall maximum of buffer pages needed to achieve the minimal number of disk I/O operations. Both approaches rely on a binary trie as a temporary data structure. This trie is used as an explicit representation of the order information. The storage consumption of the temporary data structure is shown to be negligible in realistic cases, Even for pathological cases with respect to degenerated balanced and nested grid files, reasonable upper bounds can be given 相似文献
2.
3.
Ishfaq Ahmad Kamalakar Karlapalem Yu-Kwong Kwok Siu-Kai So 《Distributed and Parallel Databases》2002,11(1):5-32
A major cost in executing queries in a distributed database system is the data transfer cost incurred in transferring relations (fragments) accessed by a query from different sites to the site where the query is initiated. The objective of a data allocation algorithm is to determine an assignment of fragments at different sites so as to minimize the total data transfer cost incurred in executing a set of queries. This is equivalent to minimizing the average query execution time, which is of primary importance in a wide class of distributed conventional as well as multimedia database systems. The data allocation problem, however, is NP-complete, and thus requires fast heuristics to generate efficient solutions. Furthermore, the optimal allocation of database objects highly depends on the query execution strategy employed by a distributed database system, and the given query execution strategy usually assumes an allocation of the fragments. We develop a site-independent fragment dependency graph representation to model the dependencies among the fragments accessed by a query, and use it to formulate and tackle data allocation problems for distributed database systems based on query-site and move-small query execution strategies. We have designed and evaluated evolutionary algorithms for data allocation for distributed database systems. 相似文献
4.
Chen C.Y. Lin H.F. Chang C.C. Lee R.C.T. 《Knowledge and Data Engineering, IEEE Transactions on》1997,9(1):148-160
The paper first shows that the bucket allocation problem of an MKH (multiple key hashing) file for partial match retrieval can be reduced to that of a smaller sized subfile, called the remainder of the file. And it is pointed out that the remainder type MKH file is the hardest MKH file for which to design an optimal allocation scheme. The authors then particularly concentrate on the allocation of an important remainder type MKH file; namely, the k-ary MKH file. They present various sufficient conditions on the number of available disks and the number of attributes for a k-ary MKH file to have a perfectly optimal allocation among the disks for partial match queries. Based upon these perfectly optimal allocations, they further present a heuristic method, called the CH (cyclic hashing) method, to produce near optimal allocations for the general k-ary MKH files. Finally, a comparison, by experiment, between the performances of the proposed method and an “ideal” perfectly optimal method, shows that the CH method is indeed satisfactorily good for the general k-ary MKH files 相似文献
5.
Characterization of database access pattern for analytic prediction of buffer hit probability 总被引:4,自引:0,他引:4
Asit Dan Ph.D. Philip S. Yu Ph.D. Jen-Yao Chung Ph.D. 《The VLDB Journal The International Journal on Very Large Data Bases》1995,4(1):127-154
The analytic prediction of buffer hit probability, based on the characterization of database accesses from real reference traces, is extremely useful for workload management and system capacity planning. The knowledge can be helpful for proper allocation of buffer space to various database relations, as well as for the management of buffer space for a mixed transaction and query environment. Access characterization can also be used to predict the buffer invalidation effect in a multi-node environment which, in turn, can influence transaction routing strategies. However, it is a challenge to characterize the database access pattern of a real workload reference trace in a simple manner that can easily be used to compute buffer hit probability. In this article, we use a characterization method that distinguishes three types of access patterns from a trace: (1) locality within a transaction, (2) random accesses by transactions, and (3) sequential accesses by long queries. We then propose a concise way to characterize the access skew across randomly accessed pages by logically grouping the large number of data pages into a small number of partitions such that the frequency of accessing each page within a partition can be treated as equal. Based on this approach, we present a recursive binary partitioning algorithm that can infer the access skew characterization from the buffer hit probabilities for a subset of the buffer sizes. We validate the buffer hit predictions for single and multiple node systems using production database traces. We further show that the proposed approach can predict the buffer hit probability of a composite workload from those of its component files. 相似文献
6.
空间数据库中基于层次化索引结构的全局最近邻(all-nearest-neighbor,All-NN)计算采用单节点展开策略的嵌套循环技术来降低计算开销.在同一数据集合的全局最近邻计算中,基于索引结构带来的对象空间位置临近性特点,抛弃传统理论距离裁剪规则和嵌套循环技术来减少计算和索引节点访问开销.提出了采用局部计算和完备计算两阶段的计算模型来获得全局最近邻结果.首先以叶节点为单位,采用扫描线算法获得节点内部所有对象的局部最近邻结果,然后根据计算结果得到启发式裁剪距离.在第2阶段采用层次化过滤的范围查询算法来获取外部的(可能的)最近邻对象.实验与分析表明该方法可以很好地支持不同种类、大小、分布的数据集合All-NN查询处理,具有良好的实用价值. 相似文献
7.
学习型索引通过学习数据分布可以准确地预测数据存取的位置,在保持高效稳定的查询下,显著降低索引的内存占用.现有的学习型索引主要针对只读查询进行优化,而对插入和更新支持不足.针对上述挑战,设计了一种基于Radix Tree的工作负载自适应学习型索引ALERT. ALERT使用Radix Tree来管理不定长的分段,段内采用具有最大误差界的线性插值模型进行预测.同时,ALERT使用一种高效的插入缓冲来降低数据插入更新的代价.针对点查询和范围查询提出两种自适应重组优化方法,通过对工作负载进行感知,动态地调整插入缓冲的组织结构.经实验验证,ALERT与业界流行的学习型索引相比,构建时间平均降低了81%,内存占用平均降低了75%,在保持了优秀读性能的同时,使插入延迟平均降低了50%;此外, ALERT使用自适应重组优化能有效感知查询工作负载特征,与不使用自适应重组优化相比,查询延迟平均降低了15%. 相似文献
8.
查询结果缓存可以对查询结果的文档标识符集合或者实际的返回页面进行缓存,以提高用户查询的响应速度,相应的缓存形式可以分别称之为标识符缓存或页面缓存。对于固定大小的内存,标识符缓存可以获得更高的命中率,而页面缓存可以达到更高的响应速度。该文根据用户查询访问的时间局部性和空间局部性,提出了一种新颖的基于时空局部性的层次化结果缓存机制。首先,该机制将固定大小的结果缓存划分为两层:页面缓存和标识符缓存。对于用户提交的查询,该机制会首先使用第一层的页面缓存进行应答,如果未能命中,则继续尝试使用第二层的标识符缓存。实验显示这种层次化的缓存机制较传统的仅依赖于单一缓存形式的机制,在平均查询响应时间上,取得了可观的性能提升:例如,相对单纯的页面缓存,平均达到9%,最好情况下达到11%。其次,该机制在标识符缓存的基础上,设计了一种启发式的预取策略,对用户查询检索的空间局部性进行挖掘。实验显示,这种预取策略的融合,能进一步促进检索系统性能的有效提升,从而最终建立起一套时空完备的、有效的结果缓存机制。 相似文献
9.
In this research, we address the query clustering problem which involves determining globally optimal execution strategies for a set of queries. The need to process a set of queries together often arises in deductive database systems, scientific database systems, large bibliographic retrieval systems and several other database applications. We address the optimization problem from the perspective of overlaps in data requirements, and model the batched operations using a set-partitioning approach. In this model, we first consider the case of m queries each involving a two-way join operation. We develop a recursive methodology to determine all the processing strategies in this case. Next, we establish certain dominance properties among the strategies, and develop exact as well as heuristic algorithms for selecting an appropriate strategy. We extend this analysis to a clustering approach, and outline a framework for optimizing multiway joins. The results show that the proposed approach is viable and efficient, and can easily be incorporated into the query processing component of most database systems 相似文献
10.
Compared with the conventional dynamic random access memory (DRAM), emerging non-volatile memory technologies provide better density and energy efficiency. However, current NVM devices typically suffer from high write power, long write latency and low write endurance. In this paper, we study the task allocation problem for the hybrid main memory architecture with both DRAM and PRAM, in order to leverage system performance and the energy consumption of the memory subsystem via assigning different memory devices for each individual task. For an embedded system with a static set of periodical tasks, we design an integer linear programming (ILP) based offline adaptive space allocation (offline-ASA) algorithm to obtain the optimal task allocation. Furthermore, we propose an online adaptive space allocation (online-ASA) algorithm for dynamic task set where arrivals of tasks are not known in advance. Experimental results show that our proposed schemes achieve 27.01% energy saving on average, with additional performance cost of 13.6%. 相似文献
11.
Jianhui Yue Yifeng Zhu Zhao Cai 《Parallel and Distributed Systems, IEEE Transactions on》2008,19(11):1565-1578
Power consumption is an important issue for cluster supercomputers as it directly affects running cost and cooling requirements. This paper investigates the memory energy efficiency of high-end data servers used for supercomputers. Emerging memory technologies allow memory devices to dynamically adjust their power states and enable free rides by overlapping multiple DMA transfers from different I/O buses to the same memory device. To achieve maximum energy saving, the memory management on data servers needs to judiciously utilize these energy-aware devices. As we explore different management schemes under five real-world parallel I/O workloads, we find that the memory energy behavior is determined by a complex interaction among four important factors: (1) cache hit rates that may directly translate performance gain into energy saving, (2) cache populating schemes that perform buffer allocation and affect access locality at the chip level, (3) request clustering that aims to temporally align memory transfers from different buses into the same memory chips, and (4) access patterns in workloads that affect the first three factors. 相似文献
12.
The GMAP: a versatile tool for physical data independence 总被引:1,自引:0,他引:1
Odysseas G. Tsatalos Marvin H. Solomon Yannis E. Ioannidis 《The VLDB Journal The International Journal on Very Large Data Bases》1996,5(2):101-118
Physical data independence is touted as a central feature of modern
database systems. It allows users to frame queries in terms of the logical
structure of the data, letting a query processor automatically translate
them into optimal plans that access physical storage structures. Both
relational and object-oriented systems, however, force users to frame their
queries in terms of a logical schema that is directly tied to physical
structures. We present an approach that eliminates this dependence. All
storage structures are defined in a declarative language based on
relational algebra as functions of a logical schema. We present an
algorithm, integrated with a conventional query optimizer, that translates
queries over this logical schema into plans that access the storage
structures. We also show how to compile update requests into plans that
update all relevant storage structures consistently and optimally.
Finally, we report on experiments with a prototype implementation of our
approach that demonstrate how it allows storage structures to be tuned to
the expected or observed workload to achieve significantly better
performance than is possible with conventional techniques.
Edited by
Matthias Jarke, Jorge Bocca, Carlo Zaniolo. Received
September 15, 1994 / Accepted September 1, 1995 相似文献
13.
The collective processing of multiple queries in a database system has recently received renewed attention due to its capability of improving the overall performance of a database system and its applicability to the design of knowledge-based expert systems and extensible database systems. A new multiple query processing strategy is presented which utilizes semantic knowledge on data integrity and information on predicate conditions of the access paths (plans) of queries. The processing of multiple queries is accomplished by the utilization of subset relationships between intermediate results of query executions, which are inferred employing both semantic and logical information. Given a set of fixed order access plans, the A* algorithm is used to find the set of reformulated access plans which is optimal for a given collection of semantic knowledge. 相似文献
14.
间隔查询作为重要的查询类型,广泛应用在社交网络、信息检索和数据库领域.为了支持高效的间隔查询,涌现出多种优化技术.尽管已有方法能够快速响应单个间隔查询,然而当查询负载超过服务器的处理能力时,70%的查询均不能在期望时间内得到响应.针对这一问题,提出采用共享执行策略优化间隔查询的方法SESIQ(shared execution strategy for interval queries).SESIQ对间隔查询进行批处理,分析一组间隔查询间可共享的操作,减少重复数据的访问,从而降低磁盘I/O和网络传输代价,提高检索性能.理论分析并实验验证了SESIQ的可行性,基于两种真实数据集的大量实验结果表明,SESIQ是有效的,间隔查询的检索性能可提升数十倍. 相似文献
15.
移动查询缓存处理的研究 总被引:5,自引:0,他引:5
客户缓存为提高客户/服务器数据库系统整体性能以及客户方数据可用性提供了有效途径。移动环境下网络资源的贫乏使客户缓存的作用更为重要,语义缓存是基于客户查询语义相关建立的一类缓存,提出一个基于语义缓存的客户缓存机制,给出缓存的内容组织,提出缓存项合并策略;然后讨论了基于语义缓存的查询处理策略;最后,模拟结果表明该客户缓存机制能够提高分布式、特别是移动环境下客户服务器数据库系统的性能。 相似文献
16.
Chih-Chieh HungAuthor VitaeWen-Chih PengAuthor Vitae 《Data & Knowledge Engineering》2011,70(7):617-641
This study proposes a method of in-network aggregate query processing to reduce the number of messages incurred in a wireless sensor network. When aggregate queries are issued to the resource-constrained wireless sensor network, it is important to efficiently perform these queries. Given a set of multiple aggregate queries, the proposed approach shares intermediate results among queries to reduce the number of messages. When the sink receives multiple queries, it should be propagated these queries to a wireless sensor network via existing routing protocols. The sink could obtain the corresponding topology of queries and views each query as a query tree. With a set of query trees collected at the sink, it is necessary to determine a set of backbones that share intermediate results with other query trees (called non-backbones). First, it is necessary to formulate the objective cost function for backbones and non-backbones. Using this objective cost function, it is possible to derive a reduction graph that reveals possible cases of sharing intermediate results among query trees. Using the reduction graph, this study first proposes a heuristic algorithm BM (standing for Backbone Mapping). This study also develops algorithm OOB (standing for Obtaining Optimal Backbones) that exploits a branch-and-bound strategy to obtain the optimal solution efficiently. This study tests the performance of these algorithms on both synthesis and real datasets. Experimental results show that by sharing the intermediate results, the BM and OOB algorithms significantly reduce the total number of messages incurred by multiple aggregate queries, thereby extending the lifetime of sensor networks. 相似文献
17.
Mohamed F. Mokbel Walid G. Aref 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(5):971-995
This paper presents the scalable on-line execution (SOLE) algorithm for continuous and on-line evaluation of concurrent continuous spatio-temporal queries over data streams.
Incoming spatio-temporal data streams are processed in-memory against a set of outstanding continuous queries. The SOLE algorithm
utilizes the scarce memory resource efficiently by keeping track of only the significant objects. In-memory stored objects are expired (i.e., dropped) from memory once they become insignificant. SOLE is a scalable algorithm where all the continuous outstanding queries share the same buffer pool. In addition, SOLE
is presented as a spatio-temporal join between two input streams, a stream of spatio-temporal objects and a stream of spatio-temporal
queries. To cope with intervals of high arrival rates of objects and/or queries, SOLE utilizes a load-shedding approach where some of the stored objects are dropped from memory. SOLE is implemented as a pipelined query operator that
can be combined with traditional query operators in a query execution plan to support a wide variety of continuous queries.
Performance experiments based on a real implementation of SOLE inside a prototype of a data stream management system show
the scalability and efficiency of SOLE in highly dynamic environments.
This work was supported in part by the National Science Foundation under Grants IIS-0093116, IIS-0209120, and 0010044-CCR. 相似文献
18.
An efficient multiversion access structure 总被引:1,自引:0,他引:1
An efficient multiversion access structure for a transaction-time database is presented. Our method requires optimal storage and query times for several important queries and logarithmic update times. Three version operations-inserts, updates, and deletes-are allowed on the current database, while queries are allowed on any version, present or past. The following query operations are performed in optimal query time: key range search, key history search, and time range view. The key-range query retrieves all records having keys in a specified key range at a specified time; the key history query retrieves all records with a given key in a specified time range; and the time range view query retrieves all records that were current during a specified time interval. Special cases of these queries include the key search query, which retrieves a particular version of a record, and the snapshot query which reconstructs the database at some past time. To the best of our knowledge no previous multiversion access structure simultaneously supports all these query and version operations within these time and space bounds. The bounds on query operations are worst case per operation, while those for storage space and version operations are (worst-case) amortized over a sequence of version operations. Simulation results show that good storage utilization and query performance is obtained 相似文献
19.
Cloud computing enables a conventional relational database system's hardware to be adjusted dynamically according to query workload, performance and deadline constraints. One can rent a large amount of resources for a short duration in order to run complex queries efficiently on large-scale data with virtual machine clusters. Complex queries usually contain common subexpressions, either in a single query or among multiple queries that are submitted as a batch. The common subexpressions scan the same relations, compute the same tasks (join, sort, etc.), and/or ship the same data among virtual computers. The total time spent for the queries can be reduced by executing these common tasks only once. In this study, we build and use efficient sets of query execution plans to reduce the total execution time. This is an NP-Hard problem therefore, a set of robust heuristic algorithms, Branch-and-Bound, Genetic, Hill Climbing, and Hybrid Genetic-Hill Climbing, are proposed to find (near-) optimal query execution plans and maximize the benefits. The optimization time of each algorithm for identifying the query execution plans and the quality of these plans are analyzed by extensive experiments. 相似文献