首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 46 毫秒
1.
Efficient monitoring of skyline queries over distributed data streams   总被引:1,自引:0,他引:1  
Data management and data mining over distributed data streams have received considerable attention within the database community recently. This paper is the first work to address skyline queries over distributed data streams, where streams derive from multiple horizontally split data sources. Skyline query returns a set of interesting objects which are not dominated by any other objects within the base dataset. Previous work is concentrated on skyline computations over static data or centralized data streams. We present an efficient and an effective algorithm called BOCS to handle this issue under a more challenging environment of distributed streams. BOCS consists of an efficient centralized algorithm GridSky and an associated communication protocol. Based on the strategy of progressive refinement in BOCS, the skyline is incrementally computed by two phases. In the first phase, local skylines on remote sites are maintained by GridSky. At each time, only skyline increments on remote sites are sent to the coordinator. In the second phase, a global skyline is obtained by integrating remote increments with the latest global skyline. A theoretical analysis shows that BOCS is communication-optimal among all algorithms which use a share-nothing strategy. Extensive experiments demonstrate that our proposals are efficient, scalable, and stable.  相似文献   

2.
3.
4.
Sliding window-based frequent pattern mining over data streams   总被引:2,自引:0,他引:2  
Finding frequent patterns in a continuous stream of transactions is critical for many applications such as retail market data analysis, network monitoring, web usage mining, and stock market prediction. Even though numerous frequent pattern mining algorithms have been developed over the past decade, new solutions for handling stream data are still required due to the continuous, unbounded, and ordered sequence of data elements generated at a rapid rate in a data stream. Therefore, extracting frequent patterns from more recent data can enhance the analysis of stream data. In this paper, we propose an efficient technique to discover the complete set of recent frequent patterns from a high-speed data stream over a sliding window. We develop a Compact Pattern Stream tree (CPS-tree) to capture the recent stream data content and efficiently remove the obsolete, old stream data content. We also introduce the concept of dynamic tree restructuring in our CPS-tree to produce a highly compact frequency-descending tree structure at runtime. The complete set of recent frequent patterns is obtained from the CPS-tree of the current window using an FP-growth mining technique. Extensive experimental analyses show that our CPS-tree is highly efficient in terms of memory and time complexity when finding recent frequent patterns from a high-speed data stream.  相似文献   

5.
High utility pattern (HUP) mining over data streams has become a challenging research issue in data mining. When a data stream flows through, the old information may not be interesting in the current time period. Therefore, incremental HUP mining is necessary over data streams. Even though some methods have been proposed to discover recent HUPs by using a sliding window, they suffer from the level-wise candidate generation-and-test problem. Hence, they need a large amount of execution time and memory. Moreover, their data structures are not suitable for interactive mining. To solve these problems of the existing algorithms, in this paper, we propose a novel tree structure, called HUS-tree (high utility stream tree) and a new algorithm, called HUPMS (high utility pattern mining over stream data) for incremental and interactive HUP mining over data streams with a sliding window. By capturing the important information of stream data into an HUS-tree, our HUPMS algorithm can mine all the HUPs in the current window with a pattern growth approach. Furthermore, HUS-tree is very efficient for interactive mining. Extensive performance analyses show that our algorithm is very efficient for incremental and interactive HUP mining over data streams and significantly outperforms the existing sliding window-based HUP mining algorithms.  相似文献   

6.
Skyline query processing over uncertain data streams has attracted considerable attention in database community recently, due to its importance in helping users make intelligent decisions over complex data in many real applications. Although lots of recent efforts have been conducted to the skyline computation over data streams in a centralized environment typically with one processor, they cannot be well adapted to the skyline queries over complex uncertain streaming data, due to the computational complexity of the query and the limited processing capability. Furthermore, none of the existing studies on parallel skyline computation can effectively address the skyline query problem over uncertain data streams, as they are all developed to address the problem of parallel skyline queries over static certain data sets. In this paper, we formally define the parallel query problem over uncertain data streams with the sliding window streaming model. Particularly, for the first time, we propose an effective framework, named distributed parallel framework to address the problem based on the sliding window partitioning. Furthermore, we propose an efficient approach (parallel streaming skyline) to further optimize the parallel skyline computation with an optimized streaming item mapping strategy and the grid index. Extensive experiments with real deployment over synthetic and real data are conducted to demonstrate the effectiveness and efficiency of the proposed techniques.  相似文献   

7.
一种面向分布式数据流的闭频繁模式挖掘方法   总被引:1,自引:0,他引:1  
  相似文献   

8.
High utility sequential pattern (HUSP) mining has emerged as an important topic in data mining. A number of studies have been conducted on mining HUSPs, but they are mainly intended for non-streaming data and thus do not take data stream characteristics into consideration. Streaming data are fast changing, continuously generated unbounded in quantity. Such data can easily exhaust computer resources (e.g., memory) unless a proper resource-aware mining is performed. In this study, we explore the fundamental problem of how limited memory can be best utilized to produce high quality HUSPs over a data stream. We design an approximation algorithm, called MAHUSP, that employs memory adaptive mechanisms to use a bounded portion of memory, in order to efficiently discover HUSPs over data streams. An efficient tree structure, called MAS-Tree, is proposed to store potential HUSPs over a data stream. MAHUSP guarantees that all HUSPs are discovered in certain circumstances. Our experimental study shows that our algorithm can not only discover HUSPs over data streams efficiently, but also adapt to memory allocation with limited sacrifices in the quality of discovered HUSPs. Furthermore, in order to show the effectiveness and efficiency of MAHUSP in real-life applications, we apply our proposed algorithm to a web clickstream dataset obtained from a Canadian news portal to showcase users’ reading behavior, and to a real biosequence database to identify disease-related gene regulation sequential patterns. The results show that MAHUSP effectively discovers useful and meaningful patterns in both cases.  相似文献   

9.
Skyline query is important in the circumstances that require the support of decision making. The existing work on skyline queries is based mainly on the assumption that the datasets are static. Querying skylines over moving objects, however, is also important and requires more attention. In this paper, we propose a framework, namely PRISMO, for processing predictive skyline queries over moving objects that not only contain spatio-temporal information, but also include non-spatial dimensions, such as other dynamic and static attributes. We present two schemes, RBBS (branch-and-bound skyline with rescanning and repacking) and TPBBS (time-parameterized branch-and-bound skyline), each with two alternative methods, to handle predictive skyline computation. The basic TPBBS is further extended to TPBBSE (TPBBS with expansion) to enhance the performance of memory space consumption and CPU time. Our schemes are flexible and thus can process point, range, and subspace predictive skyline queries. Extensive experiments show that our proposed schemes can handle predictive skyline queries effectively, and that TPBBS significantly outperforms RBBS.  相似文献   

10.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号