期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient mining of skyline objects in subspaces over data streams

Zhenhua Huang Shengli Sun Wei Wang 《Knowledge and Information Systems》2010,22(2):159-183

Given a set of k-dimensional objects, the skyline query finds the objects that are not dominated by others. In practice, different users may be interested in different dimensions of the data, and issue queries on any subset of k dimensions in stream environments. This paper focuses on supporting concurrent and unpredictable subspace skyline queries over data streams. Simply to compute and store the skyline objects of every subspace in stream environments will incur expensive update cost. To balance the query cost and update cost, we only maintain the full space skyline in this paper. We first propose an efficient maintenance algorithm and several novel pruning techniques. Then, an efficient and scalable two-phase algorithm is proposed to process the skyline queries in different subspaces based on the full space skyline. Furthermore, we present the theoretical analyses and extensive experiments that demonstrate our method is both efficient and effective. 相似文献

2.

Continuous Skyline Queries for Moving Objects 总被引：3，自引：0，他引：3

Zhiyong Huang Hua Lu Beng Chin Ooi Tung A.K.H. 《Knowledge and Data Engineering, IEEE Transactions on》2006,18(12):1645-1658

The literature on skyline algorithms has so far dealt mainly with queries of static query points over static data sets. With the increasing number of mobile service applications and users, however, the need for continuous skyline query processing has become more pressing. A continuous skyline query involves not only static dimensions, but also the dynamic one. In this paper, we examine the spatiotemporal coherence of the problem and propose a continuous skyline query processing strategy for moving query points. First, we distinguish the data points that are permanently in the skyline and use them to derive a search bound. Second, we investigate the connection between the spatial positions of data points and their dominance relationship, which provides an indication of where to find changes in the skyline and how to maintain the skyline continuously. Based on the analysis, we propose a kinetic-based data structure and an efficient skyline query processing algorithm. We concisely analyze the space and time costs of the proposed method and conduct an extensive experiment to evaluate the method. To the best of our knowledge, this is the first work on continuous skyline query processing 相似文献

3.

Continuous Probabilistic Subspace Skyline Query Processing Using Grid Projections

下载免费PDF全文

赵雷杨艳艳周晓方《计算机科学技术学报》2014,29(2):332-344

As an important type of multidimensional preference query, the skyline query can find a superset of optimal results when there is no given linear function to combine values for all attributes of interest. Its processing has been extensively investigated in the past. While most skyline query processing algorithms are designed based on the assumption that query processing is done for all attributes in a static dataset with deterministic attribute values, some advanced work has been done recently to remove part of such a strong assumption in order to process skyline queries for real-life applications, namely, to deal with data with multi-valued attributes （known as data uncertainty）, to support skyline queries in a subspace which is a subset of attributes selected by the user, and to support continuous queries on streaming data. Naturally, there are many application scenarios where these three complex issues must be considered together. In this paper, we tackle the problem of probabilistic subspace skyline query processing over sliding windows on uncertain data streams. That is, to retrieve all objects from the most recent window of streaming data in a user-selected subspace with a skyline probability no smaller than a given threshold. Based on the subtle relationship between the full space and an arbitrary subspace, a novel approach using a regular grid indexing structure is developed for this problem. An extensive empirical study under various settings is conducted to show the effectiveness and efficiency of our PSS algorithm. 相似文献

4.

Recommendations for two-way selections using skyline view queries

Jian Chen Jin Huang Bin Jiang Jian Pei Jian Yin 《Knowledge and Information Systems》2013,34(2):397-424

We study a practical and novel problem of making recommendations between two parties such as applicants and job positions. We model the competent choices of each party using skylines. In order to make recommendations in various scenarios, we propose a series of skyline view queries. To make recommendations, we often need to answer skyline view queries for many entries in one or two parties in batch, such as for many applicants versus many jobs. However, the existing skyline computation algorithms focus on answering a single skyline query at a time and do not consider sharing computation when answering skyline view queries for many members in one party or both parties. To tackle the batch recommendation problem, we develop several efficient algorithms to process skyline view queries in batch. The experiment results demonstrate that our algorithms significantly outperform the state-of-the-art methods. 相似文献

5.

Efficient Skyline and Top-k Retrieval in Subspaces

Yufei Tao Xiaokui Xiao Jian Pei 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(8):1072-1088

Skyline and top-k queries are two popular operations for preference retrieval. In practice, applications that require these operations usually provide numerous candidate attributes, whereas, depending on their interests, users may issue queries regarding different subsets of the dimensions. The existing algorithms are inadequate for subspace skyline/top-k search because they have at least one of the following defects: 1) they require scanning the entire database at least once, 2) they are optimized for one subspace but incur significant overhead for other subspaces, or 3) they demand expensive maintenance cost or space consumption. In this paper, we propose a technique SUBSKY, which settles both types of queries by using purely relational technologies. The core of SUBSKY is a transformation that converts multidimensional data to one-dimensional (1D) values. These values are indexed by a simple B-tree, which allows us to answer subspace queries by accessing a fraction of the database. SUBSKY entails low maintenance overhead, which equals the cost of updating a traditional B-tree. Extensive experiments with real data confirm that our technique outperforms alternative solutions significantly in both efficiency and scalability. 相似文献

6.

Toward efficient multidimensional subspace skyline computation

Jongwuk Lee Seung-won Hwang 《The VLDB Journal The International Journal on Very Large Data Bases》2014,23(1):129-145

Skyline queries have attracted considerable attention to assist multicriteria analysis of large-scale datasets. In this paper, we focus on multidimensional subspace skyline computation that has been actively studied for two approaches. First, to narrow down a full-space skyline, users may consider multiple subspace skylines reflecting their interest. For this purpose, we tackle the concept of a skycube, which consists of all possible non-empty subspace skylines in a given full space. Second, to understand diverse semantics of subspace skylines, we address skyline groups in which a skyline point (or a set of skyline points) is annotated with decisive subspaces. Our primary contributions are to identify common building blocks of the two approaches and to develop orthogonal optimization principles that benefit both approaches. Our experimental results show the efficiency of proposed algorithms by comparing them with state-of-the-art algorithms in both synthetic and real-life datasets. 相似文献

7.

Spatial skyline queries: exact and approximation algorithms

Mu-Woong Lee Wanbin Son Hee-Kap Ahn Seung-won Hwang 《GeoInformatica》2011,15(4):665-697

As more data-intensive applications emerge, advanced retrieval semantics, such as ranking and skylines, have attracted the attention of researchers. Geographic information systems are a good example of an application using a massive amount of spatial data. Our goal is to efficiently support exact and approximate skyline queries over massive spatial datasets. A spatial skyline query, consisting of multiple query points, retrieves data points that are not father than any other data points, from all query points. To achieve this goal, we present a simple and efficient algorithm that computes the correct results, also propose a fast approximation algorithm that returns a desirable subset of the skyline results. In addition, we propose a continuous query algorithm to trace changes of skyline points while a query point moves. To validate the effectiveness and efficiency of our algorithm, we provide an extensive empirical comparison between our algorithms and the best known spatial skyline algorithms from several perspectives. 相似文献

8.

PRISMO: predictive skyline query processing over moving objects

Nan CHEN Li-dan SHOU Gang CHEN Yun-jun GAO Jin-xiang DONG 《浙江大学学报:C卷英文版》2012,(2):99-117

Skyline query is important in the circumstances that require the support of decision making. The existing work on skyline queries is based mainly on the assumption that the datasets are static. Querying skylines over moving objects, however, is also important and requires more attention. In this paper, we propose a framework, namely PRISMO, for processing predictive skyline queries over moving objects that not only contain spatio-temporal information, but also include non-spatial dimensions, such as other dynamic and static attributes. We present two schemes, RBBS (branch-and-bound skyline with rescanning and repacking) and TPBBS (time-parameterized branch-and-bound skyline), each with two alternative methods, to handle predictive skyline computation. The basic TPBBS is further extended to TPBBSE (TPBBS with expansion) to enhance the performance of memory space consumption and CPU time. Our schemes are flexible and thus can process point, range, and subspace predictive skyline queries. Extensive experiments show that our proposed schemes can handle predictive skyline queries effectively, and that TPBBS significantly outperforms RBBS. 相似文献

9.

Probabilistic skylines on uncertain data: model and bounding-pruning-refining methods

Bin Jiang Jian Pei Xuemin Lin Yidong Yuan 《Journal of Intelligent Information Systems》2012,38(1):1-39

Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all objects whose skyline probabilities are at least p (0 < p ≤ 1). Computing probabilistic skylines on large uncertain data sets is challenging. We develop a bounding-pruning-refining framework and three algorithms systematically. The bottom-up algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The top-down algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Combining the advantages of the bottom-up algorithm and the top-down algorithm, we develop a hybrid algorithm to further improve the performance. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our algorithms are efficient on large data sets. 相似文献

10.

An efficient approach to finding potential products continuously

《Information Systems》2017

Skyline points and queries are important in the context of processing datasets with multiple dimensions. As skyline points can be viewed as representing marketable products that are useful for clients and business owners, one may also consider non-skyline points that are highly competitive with the current skyline points. We address the problem of continuously finding such potential products from a dynamic d-dimensional dataset, and formally define a potential product and its upgrade promotion cost. In this paper, we propose the CP-Sky algorithm, an efficient approach for continuously evaluating potential products by utilizing a second-order skyline set, which consists of candidate points that are closest to regular skyline points (also termed the first-order skyline set), to facilitate efficient computations and updates for potential products. With the knowledge of the second-order skyline set, CP-Sky enables the system to (1) efficiently find substitute skyline points from the second-order skyline set only if a first-order skyline point is removed, and (2) continuously retrieve the top-k potential products. Within this context, the Approximate Exclusive Dominance Region algorithm (AEDR) is proposed to reduce the computational complexity of determining a candidate set for second-order skyline updates over a dynamic data set without affecting the result accuracy. Additionally, we extend the CP-Sky algorithm to support the computations of top-k potential products. Finally, we present experimental results on data sets with various distributions to demonstrate the performance and utility of our approach. 相似文献

11.

Efficient Processing of Metric Skyline Queries

Lei Chen Xiang Lian 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(3):351-365

Skyline query is of great importance in many applications, such as multi-criteria decision making and business planning. In particular, a skyline point is a data object in the database whose attribute vector is not dominated by that of any other objects. Previous methods to retrieve skyline points usually assume static data objects in the database (i.e. their attribute vectors are fixed), whereas several recent work focus on skyline queries with dynamic attributes. In this paper, we propose a novel variant of skyline queries, namely metric skyline, whose dynamic attributes are defined in the metric space (i.e. not limited to the Euclidean space). We illustrate an efficient and effective pruning mechanism to answer metric skyline queries through a metric index. Most importantly, we formalize the query performance of the metric skyline query in terms of the pruning power, by a cost model, in light of which we construct an optimized metric index aiming to maximize the pruning power of metric skyline queries. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed pruning techniques as well as the constructed index in answering metric skyline queries. 相似文献

12.

数值和名义属性混合数据空间上的轮廓体查询方法

张忠平夏炎李立宁《小型微型计算机系统》2011,32(6):1157-1163

近年来,数值和名义属性混合数据空间上的轮廓查询仅局限在单个空间上进行,而实际应用中存在对不同子空间轮廓查询的需求.为此,本文结合IPO-tree Search半物化轮廓的方法,定义了半物化轮廓体的概念,提出通过共享子空间轮廓结果集及查询条件计算半物化轮廓体的算法SMS,并设计了存储半物化轮廓体的索引结构NNAS-tre... 相似文献

13.

一种并行处理Skyline查询的有效方法

黄震华向阳薛永生赵杠《自动化学报》2010,36(7):968-975

Skyline查询是近年来数据库领域的一个研究重点和热点, 这主要是因为Skyline查询在许多领域有着广泛的应用. 现有的工作大都集中于单处理机环境, 然而, 由于Skyline查询是CPU敏感的, 因此,在实际应用中, 现有的方法具有很大的局限性. 基于此, 提出一种有效降低处理Skyline查询时间开销的并行算法PAPSQ (Parallel algorithm for processing skyline queries). 算法有机结合多维数据对象的自身特性和通用多处理机系统的实施优点, 以Skyline查询搜索偏序格为底层结构, 利用多维数据对象的同胚评估值和偏序格加权技术来有效提高并行处理Skyline查询的效率. 实验评估表明, PAPSQ算法具有有效性和实用性. 相似文献

14.

Efficient k-dominant skyline query over incomplete data using MapReduce

Linlin DING Shu WANG Baoyan SONG 《Frontiers of Computer Science》2021,15(4):154611

Skyline queries are extensively incorporated in various real-life applications by filtering uninteresting data objects. Sometimes, a skyline query may return so many results because it cannot control the retrieval conditions especially for highdimensional datasets. As an extension of skyline query, the kdominant skyline query reduces the control of the dimension by controlling the value of the parameter kto achieve the purpose of reducing the retrieval objects. In addition, with the continuous promotion of Bigdata applications, the data we acquired may not have the entire content that people wanted for some practically reasons of delivery failure, no power of battery, accidental loss, so that the data might be incomplete with missing values in some attributes. Obviously, the k-dominant skyline query algorithms of incomplete data depend on the user definition in some degree and the results cannot be shared. Meanwhile, the existing algorithms are unsuitable for directly used to the incomplete big data. Based on the above situations, this paper mainly studies k-dominant skyline query problem over incomplete dataset and combines this problem with the distributed structure like MapReduce environment. First, we propose an index structure over incomplete data, named incomplete data index based on dominate hierarchical tree (ID-DHT). Applying the bucket strategy, the incomplete data is divided into different buckets according to the dimensions of missing attributes. Second, we also put forward query algorithm for incomplete data in MapReduce environment, named MapReduce incomplete data based on dominant hierarchical tree algorithm (MR-ID-DHTA). The data in the bucket is allocated to the subspace according to the dominant condition by Map function. Reduce function controls the data according to the key value and returns the k-dominant skyline query result. The effective experiments demonstrate the validity and usability of our index structure and the algorithm. 相似文献

15.

Personalized top-k skyline queries in high-dimensional space

Jongwuk Lee Gae-won YouSeung-won Hwang 《Information Systems》2009

As data of an unprecedented scale are becoming accessible, it becomes more and more important to help each user identify the ideal results of a manageable size. As such a mechanism, skyline queries have recently attracted a lot of attention for its intuitive query formulation. This intuitiveness, however, has a side effect of retrieving too many results, especially for high-dimensional data. This paper is to support personalized skyline queries as identifying “truly interesting” objects based on user-specific preference and retrieval size k. In particular, we abstract personalized skyline ranking as a dynamic search over skyline subspaces guided by user-specific preference. We then develop a novel algorithm navigating on a compressed structure itself, to reduce the storage overhead. Furthermore, we also develop novel techniques to interleave cube construction with navigation for some scenarios without a priori structure. Finally, we extend the proposed techniques for user-specific preferences including equivalence preference. Our extensive evaluation results validate the effectiveness and efficiency of the proposed algorithms on both real-life and synthetic data. 相似文献

16.

Monitoring distributed fragmented skylines

Odysseas Papapetrou Minos Garofalakis 《Distributed and Parallel Databases》2018,36(4):675-715

Distributed skyline computation is important for a wide range of domains, from distributed and web-based systems to ISP-network monitoring and distributed databases. The problem is particularly challenging in dynamic distributed settings, where the goal is to efficiently monitor a continuous skyline query over a collection of distributed streams. All existing work relies on the assumption of a single point of reference for object attributes/dimensions: objects may be vertically or horizontally partitioned, but the accurate value of each dimension for each object is always maintained by a single site. This assumption is unrealistic for several distributed applications, where object information is fragmented over a set of distributed streams (each monitored by a different site) and needs to be aggregated (e.g., averaged) across several sites. Furthermore, it is frequently useful to define skyline dimensions through complex functions over the aggregated objects, which raises further challenges for dealing with distribution and object fragmentation. We present the first known distributed algorithms for continuous monitoring of skylines over complex functions of fragmented multi-dimensional objects. Our algorithms rely on decomposition of the skyline monitoring problem to a select set of distributed threshold-crossing queries, which can be monitored locally at each site. We propose several optimizations, including: (a) a technique for adaptively determining the most efficient monitoring strategy for each object, (b) an approximate monitoring technique, and (c) a strategy that reduces communication overhead by grouping together threshold-crossing queries. Furthermore, we discuss how our proposed algorithms can be used to address other continuous query types. A thorough experimental study with synthetic and real-life data sets verifies the effectiveness of our schemes and demonstrates order-of-magnitude improvements in communication costs compared to the only alternative centralized solution. 相似文献

17.

Parallelizing skyline queries over uncertain data streams with sliding window partitioning and grid index

Xiaoyong Li Yijie Wang Xiaoling Li Yuan Wang 《Knowledge and Information Systems》2014,41(2):277-309

Skyline query processing over uncertain data streams has attracted considerable attention in database community recently, due to its importance in helping users make intelligent decisions over complex data in many real applications. Although lots of recent efforts have been conducted to the skyline computation over data streams in a centralized environment typically with one processor, they cannot be well adapted to the skyline queries over complex uncertain streaming data, due to the computational complexity of the query and the limited processing capability. Furthermore, none of the existing studies on parallel skyline computation can effectively address the skyline query problem over uncertain data streams, as they are all developed to address the problem of parallel skyline queries over static certain data sets. In this paper, we formally define the parallel query problem over uncertain data streams with the sliding window streaming model. Particularly, for the first time, we propose an effective framework, named distributed parallel framework to address the problem based on the sliding window partitioning. Furthermore, we propose an efficient approach (parallel streaming skyline) to further optimize the parallel skyline computation with an optimized streaming item mapping strategy and the grid index. Extensive experiments with real deployment over synthetic and real data are conducted to demonstrate the effectiveness and efficiency of the proposed techniques. 相似文献

18.

Understanding the meaning of a shifted sky: a general framework on extending skyline query

Zhenjie Zhang Hua Lu Beng Chin Ooi Anthony K. H. Tung 《The VLDB Journal The International Journal on Very Large Data Bases》2010,19(2):181-201

Skyline queries are often used on data sets in multi-dimensional space for many decision-making applications. Traditionally, an object p is said to dominate another object q if, for all dimensions, it is no worse than q and is better on at least one dimension. Therefore, the skyline of a data set consists of all objects not dominated by any other object. To better cater to application requirements such as controlling the size of the skyline or handling data sets that are not well-structured, various works have been proposed to extend the definition of skyline based on variants of the dominance relationship. In view of the proliferation of variants, in this paper, a generalized framework is proposed to guide the extension of skyline query from conventional definition to different variants. Our framework explicitly and carefully examines the various properties that should be preserved in a variant of the dominance relationship so that: (1) maintaining original advantages, while extending adaptivity to application semantics, and (2) keeping computational complexity almost unaffected. We prove that traditional dominance is the only relationship satisfying all desirable properties, and present some new dominance relationships by relaxing some of the properties. These relationships are general enough for us to design new top-k skyline queries that return robust results of a controllable size. We analyze the existing skyline algorithms based on their minimum requirements on dominance properties. We also extend our analysis to data sets with missing values, and present extensive experimental results on the combinations of new dominance relationships and skyline algorithms. 相似文献

19.

Efficient Optimization of Multiple Subspace Skyline Queries

下载免费PDF全文

黄震华郭建奎孙圣力汪卫《计算机科学技术学报》2008,23(1):103-111

We present the first efficient sound and complete algorithm （i.e., AOMSSQ） for optimizing multiple subspace skyline queries simultaneously in this paper. We first identify three performance problems of the na/ve approach （i.e., SUBSKY） which can be used in processing arbitrary single-subspace skyline query. Then we propose a cell-dominance computation algorithm （i.e., CDCA） to efficiently overcome the drawbacks of SUBSKY. Specially, a novel pruning technique is used in CDCA to dramatically decrease the query time. Finally, based on the CDCA algorithm and the share mechanism between subspaces, we present and discuss the AOMSSQ algorithm and prove it sound and complete. We also present detailed theoretical analyses and extensive experiments that demonstrate our algorithms are both efficient and effective. 相似文献

20.

Efficient continuous skyline computation

M. Morse J.M. Patel 《Information Sciences》2007,177(17):3411-3437

In a number of emerging streaming applications, the data values that are produced have an associated time interval for which they are valid. A useful computation over such streaming data is to produce a continuous and valid skyline summary. Previous work on skyline algorithms have only focused on evaluating skylines over static data sets, and there are no known algorithms for skyline computation in the continuous setting. In this paper, we introduce the continuous time-interval skyline operator, which continuously computes the current skyline over a data stream. We present a new algorithm called LookOut for evaluating such queries efficiently, and empirically demonstrate the scalability of this algorithm. In addition, we also examine the effect of the underlying spatial index structure when evaluating skylines. Whereas previous work on skyline computations have only considered using the R^∗-tree index structure, we show that for skyline computations using an underlying quadtree has significant performance benefits over an R^∗-tree index. 相似文献