期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Active semantic caching to optimize multidimensional data analysis in parallel and distributed environments

《Parallel Computing》2007,33(7-8):497-520

In this paper, we present a multi-query optimization framework based on the concept of active semantic caching. The framework permits the identification and transparent reuse of data and computation in the presence of multiple queries (or query batches) that specify user-defined operators and aggregations originating from scientific data-analysis applications. We show how query scheduling techniques, coupled with intelligent cache replacement policies, can further improve the performance of query processing by leveraging the active semantic caching operators. We also propose a methodology for functionally decomposing complex queries in terms of primitives so that multiple reuse sites are exposed to the query optimizer, to increase the amount of reuse. The optimization framework and the database system implemented with it are designed to be efficient irrespective of the underlying parallel and/or distributed machine configuration. We present experimental results highlighting the performance improvements obtained by our methods using real scientific data-analysis applications on multiple parallel and distributed processing configurations (e.g., single symmetric multiprocessor (SMP) machine, cluster of SMP nodes, and a Grid computing configuration). 相似文献

2.

Parallelizing skyline queries over uncertain data streams with sliding window partitioning and grid index

Xiaoyong Li Yijie Wang Xiaoling Li Yuan Wang 《Knowledge and Information Systems》2014,41(2):277-309

Skyline query processing over uncertain data streams has attracted considerable attention in database community recently, due to its importance in helping users make intelligent decisions over complex data in many real applications. Although lots of recent efforts have been conducted to the skyline computation over data streams in a centralized environment typically with one processor, they cannot be well adapted to the skyline queries over complex uncertain streaming data, due to the computational complexity of the query and the limited processing capability. Furthermore, none of the existing studies on parallel skyline computation can effectively address the skyline query problem over uncertain data streams, as they are all developed to address the problem of parallel skyline queries over static certain data sets. In this paper, we formally define the parallel query problem over uncertain data streams with the sliding window streaming model. Particularly, for the first time, we propose an effective framework, named distributed parallel framework to address the problem based on the sliding window partitioning. Furthermore, we propose an efficient approach (parallel streaming skyline) to further optimize the parallel skyline computation with an optimized streaming item mapping strategy and the grid index. Extensive experiments with real deployment over synthetic and real data are conducted to demonstrate the effectiveness and efficiency of the proposed techniques. 相似文献

3.

Incremental join view maintenance on distributed log-structured storage

Huichao DUAN Huiqi HU Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》2021,15(4):154607

Modern database systems desperate for the ability to support highly scalable transactions and efficient queries simultaneously for real-time applications. One solution is to utilize query optimization techniques on the on-line transaction processing (OLTP) systems. The materialized view is considered as a panacea to decrease query latency. However, it also involves the significant cost of maintenance which trades away transaction performance. In this paper, we examine the design space and conclude several design features for the implementation of a view on a distributed log-structured merge-tree (LSMtree), which is a well-known structure for improving data write performance. As a result, we develop two incremental view maintenance (IVM) approaches on LSM-tree. One avoids join computation in view maintenance transactions. Another with two optimizations is proposed to decouple the view maintenance with the transaction process. Under the asynchronous update, we also provide consistency queries for views. Experiments on TPC-H benchmark show our methods achieve better performance than straightforward methods on different workloads. 相似文献

4.

Answering queries using materialized views with minimum size

Rada Chirkova Chen Li Jia Li 《The VLDB Journal The International Journal on Very Large Data Bases》2006,15(3):191-210

In this paper, we study the following problem. Given a database and a set of queries, we want to find a set of views that can compute the answers to the queries, such that the amount of space, in bytes, required to store the viewset is minimum on the given database. (We also handle problem instances where the input has a set of database instances, as described by an oracle that returns the sizes of view relations for given view definitions.) This problem is important for applications such as distributed databases, data warehousing, and data integration. We explore the decidability and complexity of the problem for workloads of conjunctive queries. We show that results differ significantly depending on whether the workload queries have self-joins. Further, for queries without self-joins we describe a very compact search space of views, which contains all views in at least one optimal viewset. We present techniques for finding a minimum-size viewset for a single query without self-joins by using the shape of the query and its constraints, and validate the approach by extensive experiments. Part of this article was published elsewhere [Chirkova, R., Li, C.: Materializing views with minimal size to answer queries. PODS (2003)]. In addition to the prior materials, this article contains new theoretical results, as well as new results on how to efficiently implement the proposed techniques (Sects. 5 and 5.4) 相似文献

5.

Efficient Distributed Skyline Queries for Mobile Applications 总被引：3，自引：0，他引：3

下载免费PDF全文

Ying-Yuan Xiao 《计算机科学技术学报》2010,25(3):523-536

In this paper, we consider skyline queries in a mobile and distributed environment, where data objects are distributed in some sites (database servers) which are interconnected through a high-speed wired network, and queries are issued by mobile units (laptop, cell phone, etc.) which access the data objects of database servers by wireless channels. The inherent properties of mobile computing environment such as mobility, limited wireless bandwidth, frequent disconnection, make skyline queries more complicated. We show how to efficiently perform distributed skyline queries in a mobile environment and propose a skyline query processing approach, called efficient distributed skyline based on mobile computing (EDS-MC). In EDS-MC, a distributed skyline query is decomposed into five processing phases and each phase is elaborately designed in order to reduce the network communication, network delay and query response time. We conduct extensive experiments in a simulated mobile database system, and the experimental results demonstrate the superiority of EDS-MC over other skyline query processing techniques on mobile computing. 相似文献

6.

Efficient Approximate Query Processing in Peer-to-Peer Networks

Arai B. Das G. Gunopulos D. Kalogeraki V. 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(7):919-933

Peer-to-peer (P2P) databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large-scale ad hoc analysis queries, for example, aggregation queries, on these databases poses unique challenges. Exact solutions can be time consuming and difficult to implement, given the distributed and dynamic nature of P2P databases. In this paper, we present novel sampling-based techniques for approximate answering of ad hoc aggregation queries in such databases. Computing a high-quality random sample of the database efficiently in the P2P environment is complicated due to several factors: the data is distributed (usually in uneven quantities) across many peers, within each peer, the data is often highly correlated, and, moreover, even collecting a random sample of the peers is difficult to accomplish. To counter these problems, we have developed an adaptive two-phase sampling approach based on random walks of the P2P graph, as well as block-level sampling techniques. We present extensive experimental evaluations to demonstrate the feasibility of our proposed solution. 相似文献

7.

Performance Evaluation of Range Queries in Key Value Stores

Pouria Pirzadeh Junichi Tatemura Oliver Po Hakan Hac?gümü? 《Journal of Grid Computing》2012,10(1):109-132

Recently there has been a considerable increase in the number of different Key-Value stores, for supporting data storage and applications on the cloud environment. While all these solutions try to offer highly available and scalable services on the cloud, they are significantly different with each other in terms of the architecture and types of the applications, they try to support. Considering three widely-used such systems: Cassandra, HBase and Voldemort; in this paper we compare them in terms of their support for different types of query workloads. We are mainly focused on the range queries. Unlike HBase and Cassandra that have built-in support for range queries, Voldemort does not support this type of queries via its available API. For this matter, practical techniques are presented on top of Voldemort to support range queries. Our performance evaluation is based on mixed query workloads, in the sense that they contain a combination of short and long range queries, beside other types of typical queries on key-value stores such as lookup and update. We show that there are trade-offs in the performance of the selected system and scheme, and the types of the query workloads that can be processed efficiently. 相似文献

8.

基于分布式对象的并行计算框架

李国东张德富《软件学报》2002,13(3):342-353

在为工作站机群构造并行软件的过程中,计算特征和组成特征非常重要.但是,由于缺乏有效的支撑环境,当今的分布式并行计算软件系统效率低下,这在计算特征方面尤为明显.提出一个基于分布式对象的并行计算框架,目的在于保证高效的并行计算开发,提供封装和复用并行程序的机制,并保证系统的动态平衡和容错性.框架是4层模型,包括对象组层和移动对象层.实验结果证明了方案的有效性. 相似文献

9.

一种基于SLP的新型编译框架

张素平王冬丁丽丽王鹏翔宫一于海宁《计算机应用研究》2017,34(1)

对于SLP算法不能高效处理并行代码占有率较小的大型应用程序的问题,本文提出并评估了一种新型的基于改进的SLP(Superword level parallel)算法的编译框架。它主要包括三个阶段,首先,将代码中的结构相似的异构语句通过改进的SLP算法尽可能的改为同构语句;然后,用全局的观点,在优化目标代码之前获取其数据模型重用;最后,联合数据布局优化进行进一步的性能提升。本文就此框架做了大量实验,实验结果表明本框架比SLP算法性能更佳,优于它约15.3%。相似文献

10.

面向OpenMP和OpenTM应用的并行数据重用理论

吴俊杰杨学军刘光辉唐玉华《软件学报》2010,21(12):3011-3028

将经典的数据重用理论扩充到并行领域,分别提出了面向OpenMP和OpenTM应用的并行数据重用理论.针对重用在线程、事务中的关系,系统地讨论了并行应用中重用的分类、判定和求解方法.同时,应用这一理论研究了OpenTM循环的优化技术,以降低事务被回退的风险.最后,使用并行数据重用理论分析和统计了SPEComp2001中的数据重用.并行数据重用理论可以用于指导面向多核存储共享结构的并行程序分析和编译优化技术研究. 相似文献

11.

Efficient parallel processing of range queries through replicated declustering

Hakan Ferhatosmanoglu Ali Şaman Tosun Guadalupe Canahuate Aravind Ramachandran 《Distributed and Parallel Databases》2006,20(2):117-147

A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus on optimizing access to large spatial data, and the most common type of queries on such data, i.e., range queries. An optimal declustering scheme is one in which the processing for all range queries is balanced uniformly among the available disks. It has been shown that single copy based declustering schemes are non-optimal for range queries. In this paper, we integrate replication in conjunction with parallel disk declustering for efficient processing of range queries. We note that replication is largely used in database applications for several purposes like load balancing, fault tolerance and availability of data. We propose theoretical foundations for replicated declustering and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal for a number of disks. We propose a framework for replicated declustering, using a limited amount of replication and provide extensions to apply it on real data, which include arbitrary grids and a large number of disks. Our framework also provides an effective indexing scheme that enables fast identification of data of interest in parallel servers. In addition to optimal processing of single queries, we show that this framework is effective for parallel processing of multiple queries. We present experimental results comparing the proposed replication scheme to other techniques for both single queries and multiple queries, on synthetic and real data sets. Recommended by: Ahmed Elmagarmid Supported by U.S. Department of Energy (DOE) Award No. DE-FG02-03ER25573, and National Science Foundation (NSF) grant CNS-0403342. 相似文献

12.

多重查询优化技术在移动数据库中的应用

杨晓宇岳丽华柳建平《小型微型计算机系统》2004,25(8):1538-1541

移动计算的一个非常重要的特点就是在有限的网络带宽上进行数据分发．将多重查询优化技术应用于移动数据库的数据分发，中央服务器对某一时间段内移动客户的请求进行优化，然后通过一定的策略对数据集进行广播．模拟试验证明该数据分发策略较简单的基于拉动的策略极大地节约了网络带宽，缩短移动客户的平均等待时间．相似文献

13.

MyBenchmark: generating databases for query workloads

Eric Lo Nick Cheng Wilfred W. K. Lin Wing-Kai Hon Byron Choi 《The VLDB Journal The International Journal on Very Large Data Bases》2014,23(6):895-913

To evaluate the performance of database applications and database management systems (DBMSs), we usually execute workloads of queries on generated databases of different sizes and then benchmark various measures such as respond time and throughput. This paper introduces MyBenchmark, a parallel data generation tool that takes a set of queries as input and generates database instances. Users of MyBenchmark can control the characteristics of the generated data as well as the characteristics of the resulting workload. Applications of MyBenchmark include DBMS testing, database application testing, and application-driven benchmarking. In this paper, we present the architecture and the implementation algorithms of MyBenchmark. Experimental results show that MyBenchmark is able to generate workload-aware databases for a variety of workloads including query workloads extracted from TPC-C, TPC-E, TPC-H, and TPC-W benchmarks. 相似文献

14.

A Distributed Stream Query Optimization Framework through Integrated Planning and Deployment

Seshadri Sangeetha Kumar Vibhore Cooper Brian Liu Ling 《Parallel and Distributed Systems, IEEE Transactions on》2009,20(10):1439-1453

This paper addresses the problem of optimizing multiple distributed stream queries that are executing simultaneously in distributed data stream systems. We argue that the static query optimization approach of "plan, then deployment” is inadequate for handling distributed queries involving multiple streams and node dynamics faced in distributed data stream systems and applications. Thus, the selection of an optimal execution plan in such dynamic and networked computing systems must consider operator ordering, reuse, network placement, and search space reduction. We propose to use hierarchical network partitions to exploit various opportunities for operator-level reuse while utilizing network characteristics to maintain a manageable search space during query planning and deployment. We develop top-down, bottom-up, and hybrid algorithms for exploiting operator-level reuse through hierarchical network partitions. Formal analysis is presented to establish the bounds on the search space and suboptimality of our algorithms. We have implemented our algorithms in the IFLOW [CHECK END OF SENTENCE] system, an adaptive distributed stream management system. Through simulations and experiments using a prototype deployed on Emulab [CHECK END OF SENTENCE], we demonstrate the effectiveness of our framework and our algorithms. 相似文献

15.

传感器网络中基于蚁群算法的实时查询处理 总被引：1，自引：0，他引：1

余建平林亚平《软件学报》2010,21(3):473-489

无线传感器网络因不同应用而被广泛部署于各种场合,通常被视为分布式数据库.可以通过向该类数据库发布查询请求来获取事件相关的响应信息.一些具有实时需求的应用对查询时延要求较高,而目前存在的查询算法通常不能很好地满足实时查询应用的需求.针对此类特定应用,提出了基于蚁群优化的实时查询处理算法,该算法采用基于事件重要性的分环存储策略和基于蚁群算法的分布式搜索机制,充分利用蚁群优化算法的自组织和正反馈等特征,综合提高查询处理算法的节能性、实时性及查询请求接受率,为分布式动态并行实时查询应用提供新的思路.执行过程仅需局相似文献

16.

Bitwise dimensional co-clustering for analytical workloads

Stephan Baumann Peter Boncz Kai-Uwe Sattler 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(3):291-316

Analytical workloads in data warehouses often include heavy joins where queries involve multiple fact tables in addition to the typical star-patterns, dimensional grouping and selections. In this paper we propose a new processing and storage framework called bitwise dimensional co-clustering (BDCC) that avoids replication and thus keeps updates fast, yet is able to accelerate all these foreign key joins, efficiently support grouping and pushes down most dimensional selections. The core idea of BDCC is to cluster each table on a mix of dimensions, each possibly derived from attributes imported over an incoming foreign key and this way creating foreign key connected tables with partially shared clusterings. These are later used to accelerate any join between two tables that have some dimension in common and additionally permit to push down and propagate selections (reduce I/O) and accelerate aggregation and ordering operations. Besides the general framework, we describe an algorithm to derive such a physical co-clustering database automatically and describe query processing and query optimization techniques that can easily be fitted into existing relational engines. We present an experimental evaluation on the TPC-H benchmark in the Vectorwise system, showing that co-clustering can significantly enhance its already high performance and at the same time significantly reduce the memory consumption of the system. 相似文献

17.

Evaluating recursive queries in distributed databases

Nejdl W. Ceri S. Wiederhold G. 《Knowledge and Data Engineering, IEEE Transactions on》1993,5(1):104-121

The execution of logic queries in a distributed database environment is studied. Conventional optimization strategies, such as the early evaluation of selection conditions and the clustering of processing to manipulate and exchange large sets of tuples, are redefined in view of the additional difficulties due to logic queries, in particular to recursive rules. In order to allow efficient processing of these logic queries, several program transformation techniques that attempt to minimize distribution costs based on the idea of semijoins and generalized semijoins in conventional databases are presented. Although local computation of semijoins is not possible for the general case, classes of programs are indicated for which these transformations succeed in producing set-oriented computation. Processes evaluating the recursive program in a distributed network are described, and an efficient method for testing the termination of the computation is developed. The approach is compared with sequential as well as dataflow-oriented evaluation 相似文献

18.

Designing data warehouses 总被引：9，自引：0，他引：9

Dimitri Timos 《Data & Knowledge Engineering》1999,31(3):279-301

A Data Warehouse (DW) is a database that collects and stores data from multiple remote and heterogeneous information sources. When a query is posed, it is evaluated locally, without accessing the original information sources. In this paper we deal with the issue of designing a DW, in the context of the relational model, by selecting a set of views to materialize in the DW. First, we briefly present a theoretical framework for the DW design problem, which concerns the selection of a set of views that (a) fit in the space allocated to the DW, (b) answer all the queries of interest, and (c) minimize the total query evaluation and view maintenance cost. We then formalize the DW design problem as a state space search problem by taking into account multiquery optimization over the maintenance queries (i.e., queries that compute changes to the materialized views) and the use of auxiliary views for reducing the view maintenance cost. Finally, incremental algorithms and heuristics for pruning the search space are presented. 相似文献

19.

A Probe-Based Technique to Optimize Join Queries in Distributed Internet Databases 总被引：1，自引：0，他引：1

Cyrus Shahabi Latifur Khan Dennis McLeod 《Knowledge and Information Systems》2000,2(3):373-385

An adaptive probe-based optimization technique is developed and demonstrated in the context of an Internet-based distributed database environment. More and more common are database systems which are distributed across servers communicating via the Internet where a query at a given site might require data from remote sites. Optimizing the response time of such queries is a challenging task due to the unpredictability of server performance and network traffic at the time of data shipment; this may result in the selection of an expensive query plan using a static query optimizer. We constructed an experimental setup consisting of two servers running the same database management system connected via the Internet. Concentrating on join queries, we demonstrate how a static query optimizer might choose an expensive plan by mistake. This is due to the lack of a priori knowledge of the run-time environment, inaccurate statistical assumptions in size estimation, and neglecting the cost of remote method invocation. These shortcomings are addressed collectively by proposing a probing mechanism. An implementation of our run-time optimization technique for join queries was constructed in the Java language and incorporated into an experimental setup. The results demonstrate the superiority of our probe-based optimization over a static optimization. Received 6 February 1999 / Revised 15 February 2000 / Accepted 10 May 2000 相似文献

20.

Building with ParadisEO reusable parallel and distributed evolutionary algorithms

《Parallel Computing》2004,30(5-6):677-697

Numerous parallel and distributed evolutionary algorithms (PDEAs) and their implementations have been proposed and are available on the Web. A robust approach to make easier their code and design reuse is the framework approach. In this paper, we present some existing frameworks for PDEAs and their development requirements, and propose a new C++ open source framework, named Parallel and distributed Evolving Objects (ParadisEO). ParadisEO is basically devoted to the reusable and flexible design of parallel and distributed metaheuristics, but we focus here only on PDEAs. Compared to other related frameworks, ParadisEO allows more reuse flexibility, and provides more implemented parallel and distributed models. Furthermore, these models can be exploited by the user in a transparent way, and deployed as well on shared memory multi-processors as on distributed memory machines. The architecture has been experimented on two real-world applications: the radio network design and the spectroscopic data mining. The experimental results demonstrate the efficiency and robustness of the different models. 相似文献