首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The CQL continuous query language: semantic foundations and query execution   总被引:2,自引:0,他引:2  
CQL, a continuous query language, is supported by the STREAM prototype data stream management system (DSMS) at Stanford. CQL is an expressive SQL-based declarative language for registering continuous queries against streams and stored relations. We begin by presenting an abstract semantics that relies only on “black-box” mappings among streams and relations. From these mappings we define a precise and general interpretation for continuous queries. CQL is an instantiation of our abstract semantics using SQL to map from relations to relations, window specifications derived from SQL-99 to map from streams to relations, and three new operators to map from relations to streams. Most of the CQL language is operational in the STREAM system. We present the structure of CQL's query execution plans as well as details of the most important components: operators, interoperator queues, synopses, and sharing of components among multiple operators and queries. Examples throughout the paper are drawn from the Linear Road benchmark recently proposed for DSMSs. We also curate a public repository of data stream applications that includes a wide variety of queries expressed in CQL. The relative ease of capturing these applications in CQL is one indicator that the language contains an appropriate set of constructs for data stream processing. Edited by M. Franklin  相似文献   

2.
Skyline query processing over uncertain data streams has attracted considerable attention in database community recently, due to its importance in helping users make intelligent decisions over complex data in many real applications. Although lots of recent efforts have been conducted to the skyline computation over data streams in a centralized environment typically with one processor, they cannot be well adapted to the skyline queries over complex uncertain streaming data, due to the computational complexity of the query and the limited processing capability. Furthermore, none of the existing studies on parallel skyline computation can effectively address the skyline query problem over uncertain data streams, as they are all developed to address the problem of parallel skyline queries over static certain data sets. In this paper, we formally define the parallel query problem over uncertain data streams with the sliding window streaming model. Particularly, for the first time, we propose an effective framework, named distributed parallel framework to address the problem based on the sliding window partitioning. Furthermore, we propose an efficient approach (parallel streaming skyline) to further optimize the parallel skyline computation with an optimized streaming item mapping strategy and the grid index. Extensive experiments with real deployment over synthetic and real data are conducted to demonstrate the effectiveness and efficiency of the proposed techniques.  相似文献   

3.
With the exponential growth of moving objects data to the Gigabyte range, it has become critical to develop effective techniques for indexing, updating, and querying these massive data sets. To meet the high update rate as well as low query response time requirements of moving object applications, this paper takes a novel approach in moving object indexing. In our approach, we do not require a sophisticated index structure that needs to be adjusted for each incoming update. Rather, we construct conceptually simple short-lived index images that we only keep for a very short period of time (sub-seconds) in main memory. As a consequence, the resulting technique MOVIES supports at the same time high query rates and high update rates, trading this property for query result staleness. Moreover, MOVIES is the first main memory method supporting time-parameterized predictive queries. To support this feature, we present two algorithms: non-predictive MOVIES and predictive MOVIES. We obtain the surprising result that a predictive indexing approach—considered state-of-the-art in an external-memory scenario—does not scale well in a main memory environment. In fact, our results show that MOVIES outperforms state-of-the-art moving object indexes such as a main-memory adapted B x -tree by orders of magnitude w.r.t. update rates and query rates. In our experimental evaluation, we index the complete road network of Germany consisting of 40,000,000 road segments and 38,000,000 nodes. We scale our workload up to 100,000,000 moving objects, 58,000,000 updates per second and 10,000 queries per second, a scenario at a scale unmatched by any previous work.  相似文献   

4.
Data streaming has established itself as a viable communication abstraction in data-intensive parallel and distributed computations, occurring in applications such as scientific visualization, performance monitoring, and large-scale data transfer. A known problem in large-scale event communication is tailoring the data received at the consumer. It is the general problem of extracting data of interest from a data source, a problem that the database community has successfully addressed with SOL queries, a time tested, user-friendly way for noncomputer scientists to access data. By leveraging the efficiency of query processing provided by relational queries, the dQUOB system provides a conceptual relational data model and SOL query access over streaming data. Queries can be used to extract data, combine streams, and create new streams. The language augments queries with an action to enable more complex data transformations such as Fourier transforms. The dQUOB system has been applied to two large-scale distributed applications: a safety critical autonomous robotics simulation and scientific software visualization for global atmospheric transport modeling. In this paper, we present the dQUOB system and the results of performance evaluation undertaken to assess its applicability in data-intensive wide-area computations, where the benefit of portable data transformation must be evaluated against the cost of continuous query evaluation.  相似文献   

5.
Continuous queries applied over nonterminating data streams usually specify windows in order to obtain an evolving–yet restricted–set of tuples and thus provide timely and incremental results. Although sliding windows get frequently employed in many user requests, additional types like partitioned or landmark windows are also available in stream processing engines. In this paper, we set out to study the existence of monotonic-related semantics for a rich set of windowing constructs in order to facilitate a more efficient maintenance of their changing contents. After laying out a formal foundation for expressing windowed queries, we investigate update patterns observed in most common window variants as well as their impact on adaptations of typical operators (like windowed join, union or aggregation), thus offering more insight towards design and implementation of stream processing mechanisms. Furthermore, we identify syntactic equivalences in algebraic expressions involving windows, to the potential benefit of query optimizations. Finally, this framework is validated for several windowed operations against streaming datasets with simulations at diverse arrival rates and window specifications, providing concrete evidence of its significance.  相似文献   

6.
Continuous ranking on uncertain streams   总被引:1,自引:1,他引:0  
Data uncertainty widely exists in many web applications, financial applications and sensor networks. Ranking queries that return a number of tuples with maximal ranking scores are important in the field of database management. Most existing work focuses on proposing static solutions for various ranking semantics over uncertain data. Our focus is to handle continuous ranking queries on uncertain data streams: testing each new tuple to output highly-ranked tuples. The main challenge comes from not only the fact that the possible world space will grow exponentially when new tuples arrive, but also the requirement for low space- and time-complexity to adapt to the streaming environments. This paper aims at handling continuous ranking queries on uncertain data streams. We first study how to handle this issue exactly, then we propose a novel method (exponential sampling) to estimate the expected rank of a tuple with high quality. Analysis in theory and detailed experimental reports evaluate the proposed methods.  相似文献   

7.
A large amount of the world's data is both sequential and low-level. Many applications need to query higher-level information (e.g., words and sentences) that is inferred from these low-level sequences (e.g., raw audio signals) using a model (e.g., a hidden Markov model). This inference process is typically statistical, resulting in high-level sequences that are imprecise. Once archived, these imprecise streams are difficult to query efficiently because of their rich semantics and large volumes, forcing applications to sacrifice either performance or accuracy. There exists little work, however, that characterizes this trade-off space and helps applications make an appropriate choice.In this paper, we study the effects – on both efficiency and accuracy – of various stream approximations such as ignoring correlations, ignoring low-probability states, or retaining only the single most likely sequence of events. Through experiments on a real-world RFID data set, we identify conditions under which various approximations can improve performance by several orders of magnitude, with only minimal effects on query results. We also identify cases when the full rich semantics are necessary. This study is the first to evaluate the cost vs. quality trade-off of imprecise stream models.We perform this study using Lahar, a prototype Markovian stream warehouse. A secondary contribution of this paper is the development of query semantics and algorithms for processing aggregation queries on the output of pattern queries—we develop these queries in order to more fully understand the effects of approximation on a wider set of imprecise stream queries.  相似文献   

8.
Monitoring systems today often involve continuous queries over streaming data in a distributed collaborative fashion. The distribution of query operators over a network of processors, as well as their processing sequence, form a query configuration with inherent constraints on the throughput that it can support. In this paper, we discuss the implications of measuring and optimizing for output throughput, as well as its limitations. We propose to use instead the more granular input throughput and a version of throughput measure, the profiled input throughput, that is focused on matching the expected behavior of the input streams. We show how we can evaluate a query configuration based on profiled input throughput and that the problem of finding the optimal configuration is NP-hard. Furthermore, we describe how we can overcome the complexity limitation by adapting hill-climbing heuristics to reduce the search space of configurations. We show experimentally that the approach used is not only efficient but also effective.  相似文献   

9.
Recently, due to the imprecise nature of the data generated from a variety of streaming applications, such as sensor networks, query processing on uncertain data streams has become an important problem. However, all the existing works on uncertain data streams study unbounded streams. In this paper, we take the first step towards the important and challenging problem of answering sliding-window queries on uncertain data streams, with a focus on one of the most important types of queries—top-k queries. It is nontrivial to find an efficient solution for answering sliding-window top-k queries on uncertain data streams, because challenges not only stem from the strict space and time requirements of processing both arriving and expiring tuples in high-speed streams, but also rise from the exponential blowup in the number of possible worlds induced by the uncertain data model. In this paper, we design a unified framework for processing sliding-window top-k queries on uncertain streams. We show that all the existing top-k definitions in the literature can be plugged into our framework, resulting in several succinct synopses that use space much smaller than the window size, while they are also highly efficient in terms of processing time. We also extend our framework to answering multiple top-k queries. In addition to the theoretical space and time bounds that we prove for these synopses, we present a thorough experimental report to verify their practical efficiency on both synthetic and real data.  相似文献   

10.
Continuous Query Processing of Spatio-Temporal Data Streams in PLACE   总被引:1,自引:0,他引:1  
The tremendous increase in the use of cellular phones, GPS-like devices, and RFIDs results in highly dynamic environments where objects as well as queries are continuously moving. In this paper, we present a continuous query processor designed specifically for highly dynamic environments (e.g., location-aware environments). We implemented the proposed continuous query processor inside the PLACE server (Pervasive Location-Aware Computing Environments); a scalable location-aware database server developed at Purdue University. The PLACE server extends data streaming management systems to support location-aware environments. These environments are characterized by the wide variety of continuous spatio-temporal queries and the unbounded spatio-temporal streams. The proposed continuous query processor includes: (1) New incremental spatio-temporal operators to support a wide variety of continuous spatio-temporal queries, (2) Extended semantics of sliding window queries to deal with spatial sliding windows as well as temporal sliding windows, and (3) A shared-execution framework for scalable execution of a set of concurrent continuous spatio-temporal queries. Experimental evaluation shows promising performance of the continuous query processor of the PLACE server. This work was supported in part by the National Science Foundation under Grants IIS-0093116, IIS-0209120, and 0010044-CCR.  相似文献   

11.
According to the database outsourcing model, a data owner delegates database functionality to a third-party service provider, which answers queries received from clients. Authenticated query processing enables the clients to verify the correctness of query results. Despite the abundance of methods for authenticated processing in conventional databases, there is limited work on outsourced data streams. Stream environments pose new challenges such as the need for fast structure updating, support for continuous query processing and authentication, and provision for temporal completeness. Specifically, in addition to the correctness of individual results, the client must be able to verify that there are no missing results in between data updates. This paper presents a comprehensive set of methods covering relational streams. We first describe REF, a technique that achieves correctness and temporal completeness but incurs false transmissions, i.e., the provider has to inform the clients whenever there is a data update, even if their results are not affected. Then, we propose CADS, which minimizes the processing and transmission overhead through an elaborate indexing scheme and a virtual caching mechanism. In addition, we present an analytical study to determine the optimal indexing granularity, and extend CADS for the case that the data distribution changes over time. Finally, we evaluate the effectiveness of our techniques through extensive experiments.  相似文献   

12.
A sliding-window k-NN query (k-NN/w query) continuously monitors incoming data stream objects within a sliding window to identify k closest objects to a query. It enables effective filtering of data objects streaming in at high rates from potentially distributed sources, and offers means to control the rate of object insertions into result streams. Therefore k-NN/w processing systems may be regarded as one of the prospective solutions for the information overload problem in applications that require processing of structured data in real-time, such as the Sensor Web. Existing k-NN/w processing systems are mainly centralized and cannot cope with multiple data streams, where data sources are scattered over the Internet. In this paper, we propose a solution for distributed continuous k-NN/w processing of structured data from distributed streams. We define a k-NN/w processing model for such setting, and design a distributed k-NN/w processing system on top of the Content-Addressable Network (CAN) overlay. An extensive evaluation using both real and synthetic data sets demonstrates the feasibility of the proposed solution because it balances the load among the peers, while the messaging overhead within the P2P network remains reasonable. Moreover, our results clearly show the solution is scalable for an increasing number of queries and peers.  相似文献   

13.
RFID middleware collects and filters RFID streaming data to process applications' requests called continuous queries, because they are executed continuously during tag movement. Several approaches to building an index on queries rather than data records, called a query index, have been proposed to evaluate continuous queries over streaming data. EPCglobal proposed an Event Cycle Specification (ECSpec) model, which is a de facto standard query interface for RFID applications. Continuous queries based on ECSpec consist of a large number of segments that represent the query conditions. The problem when using any of the existing query indexes on these continuous queries is that it takes a long time to build the index, because it is necessary to insert a large number of segments into the index. To solve this problem, we propose a transform method that converts a group of segments into compressed data. We also propose an efficient query index scheme for the transformed space. Comparing with existing query indexes, the performance of proposed index outperforms the others on various datasets.  相似文献   

14.
Semantics preserving SPARQL-to-SQL translation   总被引:2,自引:0,他引:2  
Most existing RDF stores, which serve as metadata repositories on the Semantic Web, use an RDBMS as a backend to manage RDF data. This motivates us to study the problem of translating SPARQL queries into equivalent SQL queries, which further can be optimized and evaluated by the relational query engine and their results can be returned as SPARQL query solutions. The main contributions of our research are: (i) We formalize a relational algebra based semantics of SPARQL, which bridges the gap between SPARQL and SQL query languages, and prove that our semantics is equivalent to the mapping-based semantics of SPARQL; (ii) Based on this semantics, we propose the first provably semantics preserving SPARQL-to-SQL translation for SPARQL triple patterns, basic graph patterns, optional graph patterns, alternative graph patterns, and value constraints; (iii) Our translation algorithm is generic and can be directly applied to existing RDBMS-based RDF stores; and (iv) We outline a number of simplifications for the SPARQL-to-SQL translation to generate simpler and more efficient SQL queries and extend our defined semantics and translation to support the bag semantics of a SPARQL query solution. The experimental study showed that our proposed generic translation can serve as a good alternative to existing schema dependent translations in terms of efficient query evaluation and/or ensured query result correctness.  相似文献   

15.
As an important type of multidimensional preference query, the skyline query can find a superset of optimal results when there is no given linear function to combine values for all attributes of interest. Its processing has been extensively investigated in the past. While most skyline query processing algorithms are designed based on the assumption that query processing is done for all attributes in a static dataset with deterministic attribute values, some advanced work has been done recently to remove part of such a strong assumption in order to process skyline queries for real-life applications, namely, to deal with data with multi-valued attributes (known as data uncertainty), to support skyline queries in a subspace which is a subset of attributes selected by the user, and to support continuous queries on streaming data. Naturally, there are many application scenarios where these three complex issues must be considered together. In this paper, we tackle the problem of probabilistic subspace skyline query processing over sliding windows on uncertain data streams. That is, to retrieve all objects from the most recent window of streaming data in a user-selected subspace with a skyline probability no smaller than a given threshold. Based on the subtle relationship between the full space and an arbitrary subspace, a novel approach using a regular grid indexing structure is developed for this problem. An extensive empirical study under various settings is conducted to show the effectiveness and efficiency of our PSS algorithm.  相似文献   

16.
Conjunctive queries (CQs) are at the core of query languages encountered in many logic-based research fields such as AI, or database systems. The majority of existing work assumes set semantics but often in real applications the manipulation of duplicate tuples is required. One of the major problems that arises as part of advanced features of query optimization, data integration, query reformulation and many other research topics is testing for containment of such queries. In this work, we investigate the complexity of query containment problem for CQs under bag semantics (i.e. duplicate tuples are allowed in both the database and the results of queries) and under bag-set semantics (i.e. duplicates are allowed in the result of the queries but not in the database). We derive complexity results for these problems for five major subclasses of CQs; and we also find necessary conditions for CQ query containment. The general case of these problems remains open.  相似文献   

17.
On Similarity Measures for Multimedia Database Applications   总被引:1,自引:1,他引:0  
A multimedia database query consists of a set of fuzzy and boolean (or crisp) predicates, constants, variables, and conjunction, disjunction, and negation operators. The fuzzy predicates are evaluated based on different media criteria, such as color, shape, layout, keyword. Since media-based evaluation yields similarity values, results to such a query is defined as an ordered set. Since many multimedia applications require partial matches, query results also include tuples which do not satisfy all predicates. Hence, any fuzzy semantics which extends the boolean semantics of conjunction in a straight forward manner may not be desirable for multimedia databases. In this paper, we focus on the problem of ‘given a multimedia query which consists of multiple fuzzy and crisp predicates, how to provide the user with a meaningful overall ranking.’ More specifically, we study the problem of merging similarity values in queries with multiple fuzzy predicates. We describe the essential multimedia retrieval semantics, compare these with the known approaches, and propose a semantics which captures the retrieval requirements in multimedia databases. Received 13 August 1999 / Revised 13 May 2000 / Accepted in revised form 26 July 2000  相似文献   

18.
As probabilistic data management is becoming one of the main research focuses and keyword search is turning into a more popular query means, it is natural to think how to support keyword queries on probabilistic XML data. With regards to keyword query on deterministic XML documents, ELCA (Exclusive Lowest Common Ancestor) semantics allows more relevant fragments rooted at the ELCAs to appear as results and is more popular compared with other keyword query result semantics (such as SLCAs). In this paper, we investigate how to evaluate ELCA results for keyword queries on probabilistic XML documents. After defining probabilistic ELCA semantics in terms of possible world semantics, we propose an approach to compute ELCA probabilities without generating possible worlds. Then we develop an efficient stack-based algorithm that can find all probabilistic ELCA results and their ELCA probabilities for a given keyword query on a probabilistic XML document. Finally, we experimentally evaluate the proposed ELCA algorithm and compare it with its SLCA counterpart in aspects of result probability, time and space efficiency, and scalability.  相似文献   

19.
In a moving-object database system that supports continuous queries (CQ), an important problem is to keep the location data consistent with the actual locations of the entities being monitored, in order to produce correct query results. This goal is often difficult to achieve due to limited network resources. However, if an object is not required by any query, its value need not be refreshed. Based on this observation, we redefine the notion of temporal consistency of data items with respect to the query result, where only data items that are relevant to the CQs need to be fresh. To exploit this correctness definition, we develop an adaptive time-based update technique called query-result update (QRU). The advantage of this technique is that it identifies objects with different levels of significance to the correctness of query results. Locations of objects that have more impact to the query result are acquired more frequently than the ones that do not.  相似文献   

20.
Continuous query processing in data stream management systems (DSMS) has received considerable attention recently. Many applications share the same need for processing data streams in a continuous fashion. For most distributed streaming applications, the centralized processing of continuous queries over distributed data is simply not viable. This paper addresses the problem of computing approximate answers to continuous join queries over distributed data streams. We present a new method, called DHTJoin, which combines hash-based placement of tuples in a Distributed Hash Table (DHT) and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incurring little overhead. DHTJoin also deals with join attribute value skew which may hurt load balancing and result completeness. We provide a performance evaluation of DHTJoin which shows that it can achieve significant performance gains in terms of network traffic.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号