期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

潘立强李建中骆吉洲《软件学报》2010,21(4):1020-1030

由于无线传感器网络的能源有限,且在许多应用中Skyline 查询的部分结果即可满足用户需求,提出了一种近似Skyline 查询处理算法,在满足用户查询需求的前提下最大化地节省能量.该算法仅需无线传感器网络中的部分传感器节点回传其感知数据即可计算出Skyline 查询的一个近似结果集.由于该算法在处理查询时,每个传感器节点只需考察自身数据信息即可决定是否回传其感知数据,而无须与其他传感器节点的感知数据进行比较,因此可以避免大量的网内通信开销,从而节省网络能源.模拟环境下的大量实验结果表明,该算法可以根据用户的应用需求, 节能地处理传感器网络中的近似skyline 查询. 相似文献

2.

Qiang Wang Khuzaima Daudjee M. Tamer Özsu 《Peer-to-Peer Networking and Applications》2010,3(2):145-160

Unstructured peer-to-peer infrastructure has been widely employed to support large-scale distributed applications. Many of these applications, such as location-based services and multimedia content distribution, require the support of range selection queries. Under the widely-adopted query shipping protocols, the cost of query processing is affected by the number of result copies or replicas in the system. Since range queries can return results that include poorly-replicated data items, the cost of these queries is usually dominated by the retrieval cost of these data items. In this work, we propose a popularity-aware prefetch-based approach that can effectively facilitate the caching of poorly-replicated data items that are potentially requested in subsequent range queries, resulting in substantial cost savings. We prove that the performance of retrieving poorly-replicated data items is guaranteed to improve under an increasing query load. Extensive experiments show that the overall range query processing cost decreases significantly under various query load settings. 相似文献

3.

Multiple query scheduling for distributed semantic caches

Beomseok Nam Minho Shin Henrique Andrade Alan Sussman 《Journal of Parallel and Distributed Computing》2010

In distributed query processing systems, load balancing plays an important role in maximizing system throughput. When queries can leverage cached intermediate results, improving the cache hit ratio becomes as important as load balancing in query scheduling, especially when dealing with computationally expensive queries. The scheduling policies must be designed to take into consideration the dynamic contents of the distributed caching infrastructure. In this paper, we propose and discuss several distributed query scheduling policies that directly consider the available cache contents by employing distributed multidimensional indexing structures and an exponential moving average approach to predicting cache contents. These approaches are shown to produce better query plans and faster query response times than traditional scheduling policies that do not predict dynamic contents in distributed caches. We experimentally demonstrate the utility of the scheduling policies using MQO, which is a distributed, Grid-enabled, multiple query processing middleware system we developed to optimize query processing for data analysis and visualization applications. 相似文献

4.

Exemplar queries: a new way of searching

Davide Mottin Matteo Lissandrini Yannis Velegrakis Themis Palpanas 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(6):741-765

Modern search engines employ advanced techniques that go beyond the structures that strictly satisfy the query conditions in an effort to better capture the user intentions. In this work, we introduce a novel query paradigm that considers a user query as an example of the data in which the user is interested. We call these queries exemplar queries. We provide a formal specification of their semantics and show that they are fundamentally different from notions like queries by example, approximate queries and related queries. We provide an implementation of these semantics for knowledge graphs and present an exact solution with a number of optimizations that improve performance without compromising the result quality. We study two different congruence relations, isomorphism and strong simulation, for identifying the answers to an exemplar query. We also provide an approximate solution that prunes the search space and achieves considerably better time performance with minimal or no impact on effectiveness. The effectiveness and efficiency of these solutions with synthetic and real datasets are experimentally evaluated, and the importance of exemplar queries in practice is illustrated. 相似文献

5.

Approximating query answering on RDF databases 总被引：1，自引：0，他引：1

Hai Huang Chengfei Liu Xiaofang Zhou 《World Wide Web》2012,15(1):89-114

Database users may be frustrated by no answers returned when they pose a query on the database. In this paper, we study the problem of relaxing queries on RDF databases in order to acquire approximate answers. We address two problems in efficient query relaxation. First, to ensure the quality of answers, we compute the similarities between relaxed queries with regard to the user query and use them to score the potential relevant answers. Second, for obtaining top-k answers, we develop two algorithms. One is based on the best-first strategy and relaxed queries are executed in the ranking order. The batch based algorithm executes the relaxed queries as a batch and avoids unnecessary execution cost. At last, we implement and experimentally evaluate our approaches. 相似文献

6.

MobiCache: Cellular traffic offloading leveraging cooperative caching in mobile social networks

《Computer Networks》2015

Offloading cellular traffic through mobile social networks has arisen as a promising way for relieving cellular networks. Prior studies mainly focused on caching data in a number of pre-selected helpers. However, such a strategy would fail when mobile users enter and leave the target area over time. In this paper, we examine the research decisions and design tradeoffs that arise when offloading cellular traffic in such a dynamic area of interest, referred to as a MobiArea, and we design an offloading framework, MobiCache, for maximizing cellular operators’ revenues and minimizing the overhead imposed on mobile devices. On the user side, we propose a content floating-based cooperative caching strategy that caches data in geographical floating circles, instead of selected helpers in previous studies, to cope with the dynamics. A geographical routing scheme is designed for delivering data and queries towards floating circles. We also develop a cache replacement scheme to improve caching cost-effectiveness inside floating circles. On the operator side, query history and feedback are maintained for cellular operators to optimize framework parameters that maximize their revenues. Extensive trace-driven simulations show that, compared with a state-of-the-art scheme, MobiCache offloads up to 52% more traffic with 15% shorter delay and 6% less forwarding cost. 相似文献

7.

Adaptive data acquisition strategies for energy-efficient, smartphone-based, continuous processing of sensor streams 总被引：1，自引：0，他引：1

Lipyeow Lim Archan Misra Tianli Mo 《Distributed and Parallel Databases》2013,31(2):321-351

There is a growing interest in applications that utilize continuous sensing of individual activity or context, via sensors embedded or associated with personal mobile devices (e.g., smartphones). Reducing the energy overheads of sensor data acquisition and processing is essential to ensure the successful continuous operation of such applications, especially on battery-limited mobile devices. To achieve this goal, this paper presents a framework, called ACQUA, for ‘acquisition-cost’ aware continuous query processing. ACQUA replaces the current paradigm, where the data is typically streamed (pushed) from the sensors to the one or more smartphones, with a pull-based asynchronous model, where a smartphone retrieves appropriate blocks of relevant sensor data from individual sensors, as an integral part of the query evaluation process. We describe algorithms that dynamically optimize the sequence (for complex stream queries with conjunctive and disjunctive predicates) in which such sensor data streams are retrieved by the query evaluation component, based on a combination of (a) the communication cost & selectivity properties of individual sensor streams, and (b) the occurrence of the stream predicates in multiple concurrently executing queries. We also show how a transformation of a group of stream queries into a disjunctive normal form provides us with significantly greater degrees of freedom in choosing this sequence, in which individual sensor streams are retrieved and evaluated. While the algorithms can apply to a broad category of sensor-based applications, we specifically demonstrate their application to a scenario where multiple stream processing queries execute on a single smartphone, with the sensors transferring their data over an appropriate PAN technology, such as Bluetooth or IEEE 802.11. Extensive simulation experiments indicate that ACQUA’s intelligent batch-oriented data acquisition process can result in as much as 80 % reduction in the energy overhead of continuous query processing, without any loss in the fidelity of the processing logic. 相似文献

8.

Semantic caching of Web queries

Boris Chidlovskii Uwe M. Borghoff 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(1):2-17

Abstract. In meta-searchers accessing distributed Web-based information repositories, performance is a major issue. Efficient query processing requires an appropriate caching mechanism. Unfortunately, standard page-based as well as tuple-based caching mechanisms designed for conventional databases are not efficient on the Web, where keyword-based querying is often the only way to retrieve data. In this work, we study the problem of semantic caching of Web queries and develop a caching mechanism for conjunctive Web queries based on signature files. Our algorithms cope with both relations of semantic containment and intersection between a query and the corresponding cache items. We also develop the cache replacement strategy to treat situations when cached items differ in size and contribution when providing partial query answers. We report results of experiments and show how the caching mechanism is realized in the Knowledge Broker system. Received June 15, 1999 / Accepted December 24, 1999 相似文献

9.

High-throughput query scheduling with spatial clustering based on distributed exponential moving average

Beomseok Nam Deukyeon Hwang Jinwoong Kim Minho Shin 《Distributed and Parallel Databases》2012,30(5-6):401-414

In distributed scientific query processing systems, leveraging distributed cached data is becoming more important. In such systems, a front-end query scheduler distributes queries among many application servers rather than processing queries in a few high-performance workstations. Although many query scheduling policies exist such as round-robin and load-monitoring, they are not sophisticated enough to exploit cached results as well as balance the workload. Efforts were made to improve the query processing performance using statistical methods such as exponential moving average. However, existing methods have limitations for certain query patterns: queries with hotspots, or dynamic query distributions. In this paper, we propose novel query scheduling policies that take into account both the contents of distributed caching infrastructure and the load balance among the servers. Our experiments show that the proposed query scheduling policies outperform existing policies by producing better query plans in terms of load balance and cache-hit ratio. 相似文献

10.

Accuracy vs. Lifetime: Linear Sketches for Aggregate Queries in Sensor Networks

Vasundhara Puttagunta Konstantinos Kalpakis 《Algorithmica》2007,49(4):357-385

The in–network aggregation paradigm in sensor networks provides a versatile approach for evaluating aggregate queries. Traditional approaches need a separate aggregate to be computed and communicated for each query and hence do not scale well with the number of queries. Since approximate query results are sufficient for many applications, we use an alternate approach based on summary data–structures. We consider two kinds of aggregate queries: location range queries that compute the sum of values reported by sensors in a given location range, and value range queries that compute the number of sensors that report values in a given range. We construct summary data–structures called linear sketches, over the sensor data using in–network aggregation and use them to answer aggregate queries in an approximate manner at the base–station. There is a trade–off between accuracy of the query results and lifetime of the sensor network that can be exploited to achieve increased lifetimes for a small loss in accuracy. Most commonly occurring sets of range queries are highly correlated and display rich algebraic structure. Our approach takes full advantage of this by constructing linear sketches that depend on queries. Experimental results show that linear sketching achieves significant improvements in lifetime of sensor networks for only a small loss in accuracy of the queries. Further, our approach achieves more accurate query results than the other classical techniques using Discrete Fourier Transform and Discrete Wavelet Transform. This work was supported in part by NASA under Cooperative Agreement NCC5–315. 相似文献

11.

Optimization in Data Cube System Design

Edward Hung David W. Cheung Ben Kao 《Journal of Intelligent Information Systems》2004,23(1):17-45

The design of an OLAP system for supporting real-time queries is one of the major research issues. One approach is to use data cubes, which are materialized precomputed multidimensional views of data in a data warehouse. We can derive a set of data cubes to answer each frequently asked query directly. However, there are two practical problems: (1) the maintenance cost of the data cubes, and (2) the query cost to answer those queries. Maintaining a data cube requires disk storage and CPU computation, so the maintenance cost is related to the total size as well as the total number of data cubes materialized. In most cases, materializing all data cubes is impractical. The maintenance cost may be reduced by merging some data cubes. However, the resulting larger data cubes will increase the query cost of answering some queries. If the bounds on the maintenance cost and the query cost are too strict, we help the user decide which queries to be sacrificed and not taken into consideration. We have defined an optimization problem in data cube system design. Given a maintenance-cost bound, a query-cost bound and a set of frequently asked queries, it is necessary to determine a set of data cubes such that the system can answer a largest subset of the queries without violating the two bounds. This is an NP-hard problem. We propose approximate Greedy algorithms GR, 2GM and 2GMM, which are shown to be both effective and efficient by experiments done on a census data set and a forest-cover-type data set. 相似文献

12.

Three-Level Caching for Efficient Query Processing in Large Web Search Engines 总被引：1，自引：0，他引：1

Xiaohui Long Torsten Suel 《World Wide Web》2006,9(4):369-395

Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple terabytes, a single query may require the processing of hundreds of megabytes or more of index data. To keep up with this immense workload, large search engines employ clusters of hundreds or thousands of machines, and a number of techniques such as caching, index compression, and index and query pruning are used to improve scalability. In particular, two-level caching techniques cache results of repeated identical queries at the frontend, while index data for frequently used query terms are cached in each node at a lower level. We propose and evaluate a three-level caching scheme that adds an intermediate level of caching for additional performance gains. This intermediate level attempts to exploit frequently occurring pairs of terms by caching intersections or projections of the corresponding inverted lists. We propose and study several offline and online algorithms for the resulting weighted caching problem, which turns out to be surprisingly rich in structure. Our experimental evaluation based on a large web crawl and real search engine query log shows significant performance gains for the best schemes, both in isolation and in combination with the other caching levels. We also observe that a careful selection of cache admission and eviction policies is crucial for best overall performance. Work supported by NSF CAREER Award CCR-0093400 and the New York State Center for Advanced Technology in Telecommunications (CATT) at Polytechnic University. 相似文献

13.

Towards practical private processing of database queries over public data

Shiyuan Wang Divyakant Agrawal Amr El Abbadi 《Distributed and Parallel Databases》2014,32(1):65-89

Privacy is a major concern when users query public online data services. The privacy of millions of people has been jeopardized in numerous user data leakage incidents in many popular online applications. To address the critical problem of personal data leakage through queries, we enable private querying on public data services so that the contents of user queries and any user data are hidden and therefore not revealed to the online service providers. We propose two protocols for private processing of database queries, namely BHE and HHE. The two protocols provide strong query privacy by using Paillier’s homomorphic encryption, and support common database queries such as range and join queries by relying on the bucketization of public data. In contrast to traditional Private Information Retrieval proposals, BHE and HHE only incur one round of client server communication for processing a single query. BHE is a basic private query processing protocol that provides complete query privacy but still incurs expensive computation and communication costs. Built upon BHE, HHE is a hybrid protocol that applies ciphertext computation and communication on a subset of the data, such that this subset not only covers the actual requested data but also resembles some frequent query patterns of common users, thus achieving practical query performance while ensuring adequate privacy levels. By using frequent query patterns and data specific privacy protection, HHE is not vulnerable to the traditional attacks on k-Anonymity that exploit data similarity and skewness. Moreover, HHE consistently protects user query privacy for a sequence of queries in a single query session. 相似文献

14.

Active semantic caching to optimize multidimensional data analysis in parallel and distributed environments

《Parallel Computing》2007,33(7-8):497-520

In this paper, we present a multi-query optimization framework based on the concept of active semantic caching. The framework permits the identification and transparent reuse of data and computation in the presence of multiple queries (or query batches) that specify user-defined operators and aggregations originating from scientific data-analysis applications. We show how query scheduling techniques, coupled with intelligent cache replacement policies, can further improve the performance of query processing by leveraging the active semantic caching operators. We also propose a methodology for functionally decomposing complex queries in terms of primitives so that multiple reuse sites are exposed to the query optimizer, to increase the amount of reuse. The optimization framework and the database system implemented with it are designed to be efficient irrespective of the underlying parallel and/or distributed machine configuration. We present experimental results highlighting the performance improvements obtained by our methods using real scientific data-analysis applications on multiple parallel and distributed processing configurations (e.g., single symmetric multiprocessor (SMP) machine, cluster of SMP nodes, and a Grid computing configuration). 相似文献

15.

Towards Intelligent Semantic Caching for Web Sources 总被引：2，自引：0，他引：2

Dongwon Lee Wesley W. Chu 《Journal of Intelligent Information Systems》2001,17(1):23-45

An intelligent semantic caching scheme suitable for web sources is presented. Since web sources typically have weaker querying capabilities than conventional databases, existing semantic caching schemes cannot be directly applied. Our proposal takes care of the difference between the query capabilities of an end user system and web sources. In addition, an analysis on the match types between a user's input query and cached queries is presented. Based on this analysis, we present an algorithm that finds the best matched query under different circumstances. Furthermore, a method to use semantic knowledge, acquired from the data, to avoid unnecessary access to web sources by transforming the cache miss to the cache hit is presented. To verify the effectiveness of the proposed semantic caching scheme, we first show how to generate synthetic queries exhibiting different levels of semantic localities. Then, using the test sets, we show that the proposed query matching technique is an efficient and effective way for semantic caching in web databases. 相似文献

16.

Policies for Caching OLAP Queries in Internet Proxies

Loukopoulos T. Ahmad I. 《Parallel and Distributed Systems, IEEE Transactions on》2006,17(10):1124-1135

The Internet now offers more than just simple information to the users. Decision makers can now issue analytical, as opposed to transactional, queries that involve massive data (such as, aggregations of millions of rows in a relational database) in order to identify useful trends and patterns. Such queries are often referred to as On-Line-Analytical Processing (OLAP). Typically, pages carrying query results do not exhibit temporal locality and, therefore, are not considered for caching at Internet proxies. In OLAP processing, this is a major problem as the cost of these queries is significantly larger than that of the transactional queries. This paper proposes a technique to reduce the response time for OLAP queries originating from geographically distributed private LANs and issued through the Web toward a central data warehouse (DW) of an enterprise. An active caching scheme is introduced that enables the LAN proxies to cache some parts of the data, together with the semantics of the DW, in order to process queries and construct the resulting pages. OLAP queries arriving at the proxy are either satisfied locally or from the DW, depending on the relative access costs. We formulate a cost model for characterizing the respective latencies, taking into consideration the combined effects of both common Web access and query processing. We propose a cache admittance and replacement algorithm that operates on a hybrid Web-OLAP input, outperforming both pure-Web and pure-OLAP caching schemes. 相似文献

17.

Top-k Monitoring in Wireless Sensor Networks 总被引：1，自引：0，他引：1

Wu M. Jianliang Xu J. Xueyan Tang X. Wang-Chien Lee W.-C. 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(7):962-976

Top-k monitoring is important to many wireless sensor applications. This paper exploits the semantics of top-k query and proposes an energy-efficient monitoring approach called FILA. The basic idea is to install a filter at each sensor node to suppress unnecessary sensor updates. Filter setting and query reevaluation upon updates are two fundamental issues to the correctness and efficiency of the FILA approach. We develop a query reevaluation algorithm that is capable of handling concurrent sensor updates. In particular, we present optimization techniques to reduce the probing cost. We design a skewed filter setting scheme, which aims to balance energy consumption and prolong network lifetime. Moreover, two filter update strategies, namely, eager and lazy, are proposed to favor different application scenarios. We also extend the algorithms to several variants of top-k query, that is, order-insensitive, approximate, and value monitoring. The performance of the proposed FILA approach is extensively evaluated using real data traces. The results show that FILA substantially outperforms the existing TAG-based approach and range caching approach in terms of both network lifetime and energy consumption under various network configurations. 相似文献

18.

Approximate query processing using wavelets 总被引：7，自引：0，他引：7

Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(2-3):199-223

Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query execution-time speedups of more than two orders of magnitude; and (3) guarantee extremely fast synopsis construction times that scale linearly with the size of the data. Received: 7 August 2000 / Accepted: 1 April 2001 Published online: 7 June 2001 相似文献

19.

无线传感器网络数据环区域查询处理算法 总被引：1，自引：0，他引：1

李泽军曾利军刘卉《传感技术学报》2012,25(8):1132-1137

针对无线传感器网络节点能量高效问题以及Skyline查询位置属性决策问题,提出了基于无线传感器网络数据环区域查询处理算法。该算法以查询位置P为中心进行数据环划分,查询位置P最近的K个Skyline值时,根据剪枝策略只需对距离小于P的其它属性值进行比较,从而缩小了数据规模,提高了查询效率。另外,环内节点采用链簇式结构组织,环内查询处理过程采用串行数据处理与并行数据处理模式,从而提高了K-Skyline的数据查询能耗与节点处理延迟。仿真实验表明,数据环区域查询处理算法比Flooding算法与TAG算法具有更小的数据处理能耗和延迟。相似文献

20.

Evaluating refined queries in top-k retrieval systems 总被引：2，自引：0，他引：2

Kaushik Chakrabarti Ortega-Binderberger M. Mehrotra S. Porkaew K. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(2):256-270

In many applications, users specify target values for certain attributes/features without requiring exact matches to these values in return. Instead, the result is typically a ranked list of "top k" objects that best match the specified feature values. User subjectivity is an important aspect of such queries, i.e., which objects are relevant to the user and which are not depends on the perception of the user. Due to the subjective nature of top-k queries, the answers returned by the system to an user query often do not satisfy the users need right away, either because the weights and the distance functions associated with the features do not accurately capture the users perception or because the specified target values do not fully capture her information need or both. In such cases, the user would like to refine the query and resubmit it in order to get back a better set of answers. While there has been a lot of research on query refinement models, there is no work that we are aware of on supporting refinement of top-k queries efficiently in a database system. Done naively, each "refined" query can be treated as a "starting" query and evaluated from scratch. We explore alternative approaches that significantly improve the cost of evaluating refined queries by exploiting the observation that the refined queries are not modified drastically from one iteration to another. Our experiments over a real-life multimedia data set show that the proposed techniques save more than 80 percent of the execution cost of refined queries over the naive approach and is more than an order of magnitude faster than a simple sequential scan. 相似文献