首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
With the wide spread of smartphones, a large number of user-generated videos are produced everyday. The embedded sensors, e.g., GPS and the digital compass, make it possible that videos are accessed based on their geo-properties. In our previous work, we have created a framework for integrated, sensor-rich video acquisition (with one instantiation implemented in the form of smartphone applications) which associates a continuous stream of location and viewing direction information with the collected videos, hence allowing them to be expressed and manipulated as spatio-temporal objects. These sensor meta-data are considerably smaller in size compared to the visual content and are helpful in effectively and efficiently searching for geo-tagged videos in large-scale repositories. In this study, we propose a novel three-level grid-based index structure and introduce a number of related query types, including typical spatial queries and ones based on bounded radius and viewing direction restriction. These two criteria are important in many video applications and we demonstrate the importance with a real-world dataset. Moreover, experimental results on a large-scale synthetic dataset show that our approach can provide a significant speed improvements of at least 30 %, considering a mix of queries, compared to a multi-dimensional R-tree implementation.  相似文献   

2.
The CQL continuous query language: semantic foundations and query execution   总被引:2,自引:0,他引:2  
CQL, a continuous query language, is supported by the STREAM prototype data stream management system (DSMS) at Stanford. CQL is an expressive SQL-based declarative language for registering continuous queries against streams and stored relations. We begin by presenting an abstract semantics that relies only on “black-box” mappings among streams and relations. From these mappings we define a precise and general interpretation for continuous queries. CQL is an instantiation of our abstract semantics using SQL to map from relations to relations, window specifications derived from SQL-99 to map from streams to relations, and three new operators to map from relations to streams. Most of the CQL language is operational in the STREAM system. We present the structure of CQL's query execution plans as well as details of the most important components: operators, interoperator queues, synopses, and sharing of components among multiple operators and queries. Examples throughout the paper are drawn from the Linear Road benchmark recently proposed for DSMSs. We also curate a public repository of data stream applications that includes a wide variety of queries expressed in CQL. The relative ease of capturing these applications in CQL is one indicator that the language contains an appropriate set of constructs for data stream processing. Edited by M. Franklin  相似文献   

3.
Due to the resource limitation in the data stream environments, it has been reported that answering user queries according to the wavelet synopsis of a stream is an essential ability of a data stream management system (DSMS). In the literature, recent research has been elaborated upon minimizing the local error metric of an individual stream. However, many emergent applications such as stock marketing and sensor detection also call for the need of recording multiple streams in a commercial DSMS. As shown in our thorough analysis and experimental studies, minimizing global error in multiple-stream environments leads to good reliability for DSMS to answer the queries. In contrast, only minimizing local error may lead to a significant loss of query accuracy. As such, we first study in this paper the problem of maintaining the wavelet coefficients of multiple streams within collective memory so that the predetermined global error metric is minimized. Moreover, we also examine a promising application in the multistream environment, that is, the queries for top-k range sum. We resolve the problem of efficient top-k query processing with minimized global error by developing a general framework. For the purposes of maintaining the wavelet coefficients and processing top-k queries, several well- designed algorithms are utilized to optimize the performance of each primary component of this general framework. We also evaluate the proposed algorithms empirically on real and simulated data streams and show that our framework can process top-k queries accurately and efficiently.  相似文献   

4.
Optimization and evaluation of disjunctive queries   总被引:2,自引:0,他引:2  
It is striking that the optimization of disjunctive queries-i.e. those which contain at least one OR-connective in the query predicate-has been vastly neglected in the literature, as well as in commercial systems. In this paper, we propose a novel technique, called bypass processing, for evaluating such disjunctive queries. The bypass processing technique is based on new selection and join operators that produce two output streams: the TRUE-stream with tuples satisfying the selection (join) predicate and the FALSE-stream with tuples not satisfying the corresponding predicate. Splitting the tuple streams in this way enables us to “bypass” costly predicates whenever the “fate” of the corresponding tuple (stream) can be determined without evaluating this predicate. In the paper, we show how to systematically generate bypass evaluation plans utilizing a bottom-up building-block approach. We show that our evaluation technique allows us to incorporate the standard SQL semantics of null values. For this, we devise two different approaches: one is based on explicitly incorporating three-valued logic into the evaluation plans; the other one relies on two-valued logic by “moving” all negations to atomic conditions of the selection predicate. We describe how to extend an iterator-based query engine to support bypass evaluation with little extra overhead. This query engine was used to quantitatively evaluate the bypass evaluation plans against the traditional evaluation techniques utilizing a CNFor DNF-based query predicate  相似文献   

5.
Due to the resource limitation in the data stream environments, it has been reported that answering user queries according to the wavelet synopsis of a stream is an essential ability of a Data Stream Management System (DSMS). In the literature, recent research has been elaborated upon minimizing the local error metric of an individual stream. However, many emergent applications, such as stock marketing and sensor detection, also call for the need of recording multiple streams in a commercial DSMS. As shown in our thorough analysis and experimental studies, minimizing global error in multiple-stream environments leads to good reliability for DSMS to answer the queries; in contrast, only minimizing local error may lead to significant loss of query accuracy. As such, we first study in this paper the problem of maintaining the wavelet coefficients of multiple streams within collective memory so that the predetermined global error metric is minimized. Moreover, we also examine a promising application in the multistream environment, i.e., the queries for top-k range sum. We resolve the problem of efficient top-k query processing with minimized global error by developing a general framework. For the purposes of maintaining the wavelet coefficients and processing top-k queries, several well-designed algorithms are utilized to optimize the performance of each primary component of this general framework. We also evaluate the proposed algorithms empirically on real and simulated data streams and show that our framework can process top-k queries accurately and efficiently.  相似文献   

6.
7.
Query processing for a data stream should also be continuous and rapid. This article proposes a novel approach for consistent collective evaluation of multiple continuous queries for filtering two different types of data streams: a relational stream and an XML stream. The proposed approach commonly provides region-based selection constructs: an attribute selection construct for relational queries and a path selection construct for XPath queries. Both collectively evaluate the selection predicates of the same attribute (path), based on the precomputed matching results of the queries in each of the disjoint regions divided by the selection predicates. The performance experiments show that the proposed approach is practically more efficient and stable than other approaches at run-time.  相似文献   

8.
Continuous query processing in data stream management systems (DSMS) has received considerable attention recently. Many applications share the same need for processing data streams in a continuous fashion. For most distributed streaming applications, the centralized processing of continuous queries over distributed data is simply not viable. This paper addresses the problem of computing approximate answers to continuous join queries over distributed data streams. We present a new method, called DHTJoin, which combines hash-based placement of tuples in a Distributed Hash Table (DHT) and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incurring little overhead. DHTJoin also deals with join attribute value skew which may hurt load balancing and result completeness. We provide a performance evaluation of DHTJoin which shows that it can achieve significant performance gains in terms of network traffic.  相似文献   

9.
This paper presents the scalable on-line execution (SOLE) algorithm for continuous and on-line evaluation of concurrent continuous spatio-temporal queries over data streams. Incoming spatio-temporal data streams are processed in-memory against a set of outstanding continuous queries. The SOLE algorithm utilizes the scarce memory resource efficiently by keeping track of only the significant objects. In-memory stored objects are expired (i.e., dropped) from memory once they become insignificant. SOLE is a scalable algorithm where all the continuous outstanding queries share the same buffer pool. In addition, SOLE is presented as a spatio-temporal join between two input streams, a stream of spatio-temporal objects and a stream of spatio-temporal queries. To cope with intervals of high arrival rates of objects and/or queries, SOLE utilizes a load-shedding approach where some of the stored objects are dropped from memory. SOLE is implemented as a pipelined query operator that can be combined with traditional query operators in a query execution plan to support a wide variety of continuous queries. Performance experiments based on a real implementation of SOLE inside a prototype of a data stream management system show the scalability and efficiency of SOLE in highly dynamic environments. This work was supported in part by the National Science Foundation under Grants IIS-0093116, IIS-0209120, and 0010044-CCR.  相似文献   

10.
Uncertain data streams, where data are incomplete and imprecise, have been observed in many environments. Feeding such data streams to existing stream systems produces results of unknown quality, which is of paramount concern to monitoring applications. In this paper, we present the claro system that supports stream processing for uncertain data naturally captured using continuous random variables. claro employs a unique data model that is flexible and allows efficient computation. Built on this model, we develop evaluation techniques for relational operators by exploring statistical theory and approximation. We also consider query planning for complex queries given an accuracy requirement. Evaluation results show that our techniques can achieve high performance while satisfying accuracy requirements and outperform state-of-the-art sampling methods.  相似文献   

11.
The filtering of incoming tuples of a data stream should be completed quickly and continuously, which requires strict time and space constraints. In order to guarantee these constraints, the selection predicates of continuous queries are grouped or indexed in most data stream management systems (DSMS). This paper proposes a new scheme called attribute selection construct (ASC). Given a set of continuous queries, an ASC divides the domain of an attribute of a data stream into a set of disjoint regions based on the selection predicates that are imposed on the attribute. Each region maintains the pre-computed matching results of the selection predicates. Consequently, an ASC can collectively evaluate all of its selection predicates at the same time. Furthermore, it can also monitor the overall evaluation statistics, such as its selectivity and tuple dropping ratio, dynamically. For those attributes that are employed to express the selection predicates of the queries, the processing order of their ASC’s can significantly influence the overall performance of a multiple query evaluation. The evaluation sequence can be optimized by periodically capturing the run-time tuple dropping ratio of its current evaluation sequence. The performance of the proposed method is analyzed by a series of experiments to identify its various characteristics.  相似文献   

12.
Many systems and strategies have been proposed for processing nonterminating data streams. Each approach has advantages and disadvantages, including the kinds of queries that can be executed. We present a framework for characterizing the kinds of queries that can be executed over streams based on a notion of compact sets from topology. We first apply our framework to queries over punctuated data streams. Previous work on punctuations focused primarily on the behavior of individual query operators. We use our framework to determine if an entire query can benefit from punctuations available from stream sources. We then consider other common strategies proposed in the literature for executing queries over streams, and we discuss how our framework can characterize the kinds of queries each strategy can answer.  相似文献   

13.
数据流上的预测聚集查询处理算法   总被引:19,自引:3,他引:16  
实时数据流未来趋势的预测具有重要的实际应用意义.例如,在环境监测传感器网络中,通过对感知数据流进行预测聚集查询,观察者可以预测网络覆盖的区域在未来一段时间内的平均温度和湿度,以确定是否会发生异常事件.目前的研究工作多数集中在数据流上当前数据的查询,数据流上预测查询的研究工作还很少.采用多元线性回归方法,给出了数据流上的聚集值预测模型,提出了一种数据流预测聚集查询处理方法.当预测失败的次数大于预先给定的阈值时,给出了一种预测模型自动调整策略,以降低预测误差.还提出了滑动窗口的更新周期、数据流的流速对预测精度影响的数学模型.理论分析与实验结果表明,提出的预测聚集查询处理算法具有较高的性能,并且能够返回满足用户精度要求的预测查询结果.在实验中,采用TPC-H国际标准测试数据和TAO(tropical atmosphere ocean)测量的海洋表面空气温度数据来构造数据流.  相似文献   

14.
数据流处理中确定性QoS的保证方法   总被引:2,自引:0,他引:2  
武珊珊  于戈  吕雁飞  谷峪  李晓静 《软件学报》2008,19(8):2066-2079
与以往尽最大努力的查询服务提供方式不同,讨论了数据流处理中确定性QoS的保证问题以网络演算为理论基础,提出了一种数据流处理中的QoS建模和QoS保证方法.系统运行前验证所有查询在满足各自的QoS前提下的可调度性.在运行时为通过QoS可调度性验证的每个查询分配代表其QoS需求的服务曲线,从而保证各查询期望的QoS.为了提高查询处理效率,还讨论了保证QoS的批调度和查询共享.实验结果表明,该QoS保证方法能够有效地为数据流上的连续查询提供确定性的QoS保证.  相似文献   

15.
《Information Fusion》2008,9(3):412-424
Data processing applications for sensor streams have to deal with multiple continuous data streams with inputs arriving at highly variable and unpredictable rates from various sources. These applications perform various operations (e.g. filter, aggregate, join, etc.) on incoming data streams in real-time according to predefined queries or rules. Since the data rate and data distribution fluctuate over time, an appropriate join tree for processing join queries must be adaptively maintained in response to dynamic changes to prevent rapid degradation of the system performance. In this paper, we address the problem of finding an optimal join tree that maximizes throughput for sliding window based multi-join queries over continuous data streams and prove its NP-Hardness. We present a dynamic programming algorithm, OptDP, which produces the optimal tree but runs in an exponential time in the number of input streams. We then present a polynomial time greedy algorithm, XGreedyJoin. We tested these algorithms in ARES, an adaptively re-optimizing engine for stream queries, which we developed by extending Jess (Jess is a popular RETE-based, forward chaining rule engine written in java). For almost all instances, trees from XGreedyJoin perform close to the optimal trees from OptDP, and significantly better than common heuristics-based XJoin algorithms.  相似文献   

16.
A common approach to improve the reliability of query results based on error-prone sensors is to introduce redundant sensors. However, using multiple sensors to generate the value for a data item can be expensive, especially in wireless environments where continuous queries are executed. Moreover, some sensors may not be working properly and their readings need to be discarded. In this paper, we propose a statistical approach to decide which sensor nodes to be used to answer a query. In particular, we propose to solve the problem with the aid of continuous probabilistic query (CPQ), which is originally used to manage uncertain data and is associated with a probabilistic guarantee on the query result. Based on the historical data values from the sensor nodes, the query type, and the requirement on the query, we present methods to select an appropriate set of sensors and provide reliable answers for several common aggregate queries. Our statistics-based sensor node selection algorithm is demonstrated in a number of simulation experiments, which shows that a small number of sensor nodes can provide accurate and robust query results.  相似文献   

17.
传感器采样数据流查询技术   总被引:3,自引:0,他引:3  
这里所讨论的数据不再是具有持久关系的数据集合,而是形成了瞬时的、多重的、持续的、迅速的、时间变化的数据流。由于具有了这些特性,数据流处理现状对数据管理的很多方面提出了新的研究方向。文章着重讨论数据流的查询技术和方法,特别提出了关于传感器采样数据流的查询。最后,介绍应用了数据流查询技术的管道煤气管网数据监测系统,进一步说明由传感器采样数据产生的数据流查询的设计思想和实现方案。  相似文献   

18.
Search engine query log mining has evolved over time to more like data stream mining due to the endless and continuous sequence of queries known as query stream. In this paper, we propose an online frequent sequence discovery (OFSD) algorithm to extract frequent phrases from within query streams, based on a new frequency rate metric, which is suitable for query stream mining. OFSD is an online, single pass, and real-time frequent sequence miner appropriate for data streams. The frequent phrases extracted by the OFSD algorithm are used to guide novice Web search engine users to complete their search queries more efficiently. YourEye, our online phrase recommender is then introduced. The advantages of YourEye compared with Google Suggest, a service powered by Google for phrase suggestion, is also described. Various characteristics of two specific Web search engine query logs are analyzed and then the query logs are used to evaluate YourEye. The experimental results confirm the significant benefit of monitoring frequent phrases within the queries instead of the whole queries because none-separable items. The number of the monitored elements substantially decreases, which results in smaller memory consumption as well as better performance. Re-ranking the retrieved pages based on past users clicks for each frequent phrase extracted by OFSD is also introduced. The preliminary results show the advantages of the proposed method compared to the similar work reported in Smyth et al.  相似文献   

19.
A sliding-window k-NN query (k-NN/w query) continuously monitors incoming data stream objects within a sliding window to identify k closest objects to a query. It enables effective filtering of data objects streaming in at high rates from potentially distributed sources, and offers means to control the rate of object insertions into result streams. Therefore k-NN/w processing systems may be regarded as one of the prospective solutions for the information overload problem in applications that require processing of structured data in real-time, such as the Sensor Web. Existing k-NN/w processing systems are mainly centralized and cannot cope with multiple data streams, where data sources are scattered over the Internet. In this paper, we propose a solution for distributed continuous k-NN/w processing of structured data from distributed streams. We define a k-NN/w processing model for such setting, and design a distributed k-NN/w processing system on top of the Content-Addressable Network (CAN) overlay. An extensive evaluation using both real and synthetic data sets demonstrates the feasibility of the proposed solution because it balances the load among the peers, while the messaging overhead within the P2P network remains reasonable. Moreover, our results clearly show the solution is scalable for an increasing number of queries and peers.  相似文献   

20.
由于数据的动态性及不确定性等特征,使得不确定数据流上Skyline查询研究面临挑战.不确定对象一般采用多元概率密度函数(PDF)表示,现有的不确定数据流Skyline查询方法均采用离散型随机变量建模.然而不确定数据流中的对象可能是连续变化的,离散模型对连续性随机变量难以适用.针对连续PDF建模的不确定数据流Skyline查询进行了研究,提出了基于高斯模型的不确定数据流Skyline查询方法(SGMU),该方法包含2个过程:1)动态高斯建模算法(DGM):对滑动窗口采样并建立高斯模型,将原始的数据流转化为不确定对象PDF的参数流;2)提出了基于高斯树的查询算法(GTS)以建立空间索引结构和执行Skyline查询.实验结果表明,SGMU算法不仅能够对连续型不确定对象进行有效建模以辅助Skyline查询,而且能够有效地减少查询对象个数,提高Skyline查询效率.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号