首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Skyline query is important in the circumstances that require the support of decision making. The existing work on skyline queries is based mainly on the assumption that the datasets are static. Querying skylines over moving objects, however, is also important and requires more attention. In this paper, we propose a framework, namely PRISMO, for processing predictive skyline queries over moving objects that not only contain spatio-temporal information, but also include non-spatial dimensions, such as other dynamic and static attributes. We present two schemes, RBBS (branch-and-bound skyline with rescanning and repacking) and TPBBS (time-parameterized branch-and-bound skyline), each with two alternative methods, to handle predictive skyline computation. The basic TPBBS is further extended to TPBBSE (TPBBS with expansion) to enhance the performance of memory space consumption and CPU time. Our schemes are flexible and thus can process point, range, and subspace predictive skyline queries. Extensive experiments show that our proposed schemes can handle predictive skyline queries effectively, and that TPBBS significantly outperforms RBBS.  相似文献   

2.
We study a practical and novel problem of making recommendations between two parties such as applicants and job positions. We model the competent choices of each party using skylines. In order to make recommendations in various scenarios, we propose a series of skyline view queries. To make recommendations, we often need to answer skyline view queries for many entries in one or two parties in batch, such as for many applicants versus many jobs. However, the existing skyline computation algorithms focus on answering a single skyline query at a time and do not consider sharing computation when answering skyline view queries for many members in one party or both parties. To tackle the batch recommendation problem, we develop several efficient algorithms to process skyline view queries in batch. The experiment results demonstrate that our algorithms significantly outperform the state-of-the-art methods.  相似文献   

3.
Distributed skyline computation is important for a wide range of domains, from distributed and web-based systems to ISP-network monitoring and distributed databases. The problem is particularly challenging in dynamic distributed settings, where the goal is to efficiently monitor a continuous skyline query over a collection of distributed streams. All existing work relies on the assumption of a single point of reference for object attributes/dimensions: objects may be vertically or horizontally partitioned, but the accurate value of each dimension for each object is always maintained by a single site. This assumption is unrealistic for several distributed applications, where object information is fragmented over a set of distributed streams (each monitored by a different site) and needs to be aggregated (e.g., averaged) across several sites. Furthermore, it is frequently useful to define skyline dimensions through complex functions over the aggregated objects, which raises further challenges for dealing with distribution and object fragmentation. We present the first known distributed algorithms for continuous monitoring of skylines over complex functions of fragmented multi-dimensional objects. Our algorithms rely on decomposition of the skyline monitoring problem to a select set of distributed threshold-crossing queries, which can be monitored locally at each site. We propose several optimizations, including: (a) a technique for adaptively determining the most efficient monitoring strategy for each object, (b) an approximate monitoring technique, and (c) a strategy that reduces communication overhead by grouping together threshold-crossing queries. Furthermore, we discuss how our proposed algorithms can be used to address other continuous query types. A thorough experimental study with synthetic and real-life data sets verifies the effectiveness of our schemes and demonstrates order-of-magnitude improvements in communication costs compared to the only alternative centralized solution.  相似文献   

4.
Skyline queries have attracted considerable attention to assist multicriteria analysis of large-scale datasets. In this paper, we focus on multidimensional subspace skyline computation that has been actively studied for two approaches. First, to narrow down a full-space skyline, users may consider multiple subspace skylines reflecting their interest. For this purpose, we tackle the concept of a skycube, which consists of all possible non-empty subspace skylines in a given full space. Second, to understand diverse semantics of subspace skylines, we address skyline groups in which a skyline point (or a set of skyline points) is annotated with decisive subspaces. Our primary contributions are to identify common building blocks of the two approaches and to develop orthogonal optimization principles that benefit both approaches. Our experimental results show the efficiency of proposed algorithms by comparing them with state-of-the-art algorithms in both synthetic and real-life datasets.  相似文献   

5.
Skyline has been widely recognized as being useful for multi-criteria decision-making applications. While most of the existing work computes skylines in various contexts, in this paper, we consider a novel problem: how far away a point is from the skyline? We propose a novel notion of skyline distance that measures the minimum cost of upgrading a point to the skyline given a cost function. Skyline distance can be regarded as a measure of multidimensional competence and can be used to rank possible choices in recommendation systems. Computing skyline distances efficiently is far from trivial and cannot be handled by any straightforward extension of the existing skyline computation methods. To tackle this problem, we systematically explore several directions. We first present a dynamic programming method. Then, we investigate the boundary of skylines and develop a sort-projection method that utilizes the skyline boundary in calculating skyline distances. Last, we develop a space partitioning method to further improve the performance. We report extensive experiment results which show that our methods are efficient and scalable.  相似文献   

6.
Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all objects whose skyline probabilities are at least p (0 < p ≤ 1). Computing probabilistic skylines on large uncertain data sets is challenging. We develop a bounding-pruning-refining framework and three algorithms systematically. The bottom-up algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The top-down algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Combining the advantages of the bottom-up algorithm and the top-down algorithm, we develop a hybrid algorithm to further improve the performance. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our algorithms are efficient on large data sets.  相似文献   

7.
Skyline queries have been increasingly used in multi-criteria decision making and data mining applications. They retrieve a set of interesting points from a potentially large set of data points. A point is said to be interesting if it is not dominated by any other point. Skyline cube (skycube) consists of skylines of all possible non-empty subsets of a given set of dimensions. In this paper, we propose two algorithms for computing skycube using bitmaps that are derivable from indexes. The Point-based skycube algorithm is an improvement over the existing Bitmap algorithm, extended to compute skycube. The Point-based algorithm processes one point at a time to check for skylines in all subspaces. The Domain-based skycube algorithm views points as value combinations and probes entire search space for potential skyline points. It significantly reduces bitmap access for low cardinality dimensions. Our experimental study shows that the two algorithms strictly dominate, or at least comparable to, the current skycube algorithm in most of the cases, suggesting that such an approach could be a useful addition to the set of skyline query processing techniques.  相似文献   

8.
The importance of skyline analysis has been well recognized in multi-criteria decision making applications. All of the previous studies assume a fixed order on the attributes in question. However, in some applications, users may be interested in skylines with respect to various total or partial orders on nominal attributes. In this paper, we identify and tackle the problem of online skyline analysis with dynamic preferences on nominal attributes. We investigate how changes of orders in attributes lead to changes of skylines. We address two novel types of interesting queries: a viewpoint query returns with respect to which orders a point is (or is not) in the skylines and an order-based skyline query retrieves the skyline with respect to a specific order. We develop two methods systematically and report an extensive performance study using both synthetic and real data sets to verify their effectiveness and efficiency.  相似文献   

9.
Uncertain data are inevitable in many applications due to various factors such as the limitations of measuring equipment and delays in data updates. Although modeling and querying uncertain data have recently attracted considerable attention from the database community, there are still many critical issues to be resolved with respect to conducting advanced analysis on uncertain data. In this paper, we study the execution of the probabilistic skyline query over uncertain data streams. We propose a novel sliding window skyline model where an uncertain tuple may take the probability to be in the skyline at a certain timestamp t. Formally, a Wp-Skyline(p, t) contains all the tuples whose probabilities of becoming skylines are at least p at timestamp t. However, in the stream environment, computing a probabilistic skyline on a large number of uncertain tuples within the sliding window is a daunting task in practice. In order to efficiently calculate Wp-Skyline, we propose an efficient and effective approach, namely the candidate list approach, which maintains lists of candidates that might become skylines in future sliding windows. We also propose algorithms that continuously monitor the newly incoming and expired data to maintain the skyline candidate set incrementally. To further reduce the computation cost of deciding whether or not a candidate tuple belongs to the skyline, we propose an enhanced refinement strategy that is based on a multi-dimensional indexing structure combined with a grouping-and-conquer strategy. To validate the effectiveness of our proposed approach, we conduct extensive experiments on both real and synthetic data sets and make comparisons with basic techniques.  相似文献   

10.
In a number of emerging streaming applications, the data values that are produced have an associated time interval for which they are valid. A useful computation over such streaming data is to produce a continuous and valid skyline summary. Previous work on skyline algorithms have only focused on evaluating skylines over static data sets, and there are no known algorithms for skyline computation in the continuous setting. In this paper, we introduce the continuous time-interval skyline operator, which continuously computes the current skyline over a data stream. We present a new algorithm called LookOut for evaluating such queries efficiently, and empirically demonstrate the scalability of this algorithm. In addition, we also examine the effect of the underlying spatial index structure when evaluating skylines. Whereas previous work on skyline computations have only considered using the R-tree index structure, we show that for skyline computations using an underlying quadtree has significant performance benefits over an R-tree index.  相似文献   

11.
We provide an overall framework for learning in search based systems that are used to find optimum solutions to problems. This framework assumes that prior knowledge is available in the form of one or more heuristic functions (or features) of the problem domain. An appropriate clustering strategy is used to partition the state space into a number of classes based on the available features. The number of classes formed will depend on the resource constraints of the system. In the training phase, example problems are run using a standard admissible search algorithm. In this phase, heuristic information corresponding to each class is learned. This new information can be used in the problem solving phase by appropriate search algorithms so that subsequent problem instances can be solved more efficiently. In this framework, we also show that heuristic information of forms other than the conventional single valued underestimate value can be used, since we maintain the heuristic of each class explicitly. We show some novel search algorithms that can work with some such forms. Experimental results have been provided for some domains  相似文献   

12.
Maintaining sliding window skylines on data streams   总被引:15,自引:0,他引:15  
The skyline of a multidimensional data set contains the "best" tuples according to any preference function that is monotonic on each dimension. Although skyline computation has received considerable attention in conventional databases, the existing algorithms are inapplicable to stream applications because 1) they assume static data that are stored in the disk (rather than continuously arriving/expiring), 2) they focus on "one-time" execution that returns a single skyline (in contrast to constantly tracking skyline changes), and 3) they aim at reducing the I/O overhead (as opposed to minimizing the CPU-cost and main-memory consumption). This paper studies skyline computation in stream environments, where query processing takes into account only a "sliding window" covering the most recent tuples. We propose algorithms that continuously monitor the incoming data and maintain the skyline incrementally. Our techniques utilize several interesting properties of stream skylines to improve space/time efficiency by expunging data from the system as early as possible (i.e., before their expiration). Furthermore, we analyze the asymptotical performance of the proposed solutions, and evaluate their efficiency with extensive experiments.  相似文献   

13.
This paper studies the problem of computing the skyline of a vast-sized spatial dataset in SpatialHadoop, an extension of Hadoop that supports spatial operations efficiently. The problem is particularly interesting due to advent of Big Spatial Data that are generated by modern applications run on mobile devices, and also because of the importance of the skyline operator for decision-making and supporting business intelligence. To this end, we present a scalable and efficient framework for skyline query processing that operates on top of SpatialHadoop, and can be parameterized by individual techniques related to filtering of candidate points as well as merging of local skyline sets. Then, we introduce two novel algorithms that follow the pattern of the framework and boost the performance of skyline query processing. Our algorithms employ specific optimizations based on effective filtering and efficient merging, the combination of which is responsible for improved efficiency. We compare our solution against the state-of-the-art skyline algorithm in SpatialHadoop. The results show that our techniques are more efficient and outperform the competitor significantly, especially in the case of large skyline output size.  相似文献   

14.
Efficient monitoring of skyline queries over distributed data streams   总被引:1,自引:0,他引:1  
Data management and data mining over distributed data streams have received considerable attention within the database community recently. This paper is the first work to address skyline queries over distributed data streams, where streams derive from multiple horizontally split data sources. Skyline query returns a set of interesting objects which are not dominated by any other objects within the base dataset. Previous work is concentrated on skyline computations over static data or centralized data streams. We present an efficient and an effective algorithm called BOCS to handle this issue under a more challenging environment of distributed streams. BOCS consists of an efficient centralized algorithm GridSky and an associated communication protocol. Based on the strategy of progressive refinement in BOCS, the skyline is incrementally computed by two phases. In the first phase, local skylines on remote sites are maintained by GridSky. At each time, only skyline increments on remote sites are sent to the coordinator. In the second phase, a global skyline is obtained by integrating remote increments with the latest global skyline. A theoretical analysis shows that BOCS is communication-optimal among all algorithms which use a share-nothing strategy. Extensive experiments demonstrate that our proposals are efficient, scalable, and stable.  相似文献   

15.
As more data-intensive applications emerge, advanced retrieval semantics, such as ranking and skylines, have attracted the attention of researchers. Geographic information systems are a good example of an application using a massive amount of spatial data. Our goal is to efficiently support exact and approximate skyline queries over massive spatial datasets. A spatial skyline query, consisting of multiple query points, retrieves data points that are not father than any other data points, from all query points. To achieve this goal, we present a simple and efficient algorithm that computes the correct results, also propose a fast approximation algorithm that returns a desirable subset of the skyline results. In addition, we propose a continuous query algorithm to trace changes of skyline points while a query point moves. To validate the effectiveness and efficiency of our algorithm, we provide an extensive empirical comparison between our algorithms and the best known spatial skyline algorithms from several perspectives.  相似文献   

16.
Skyline queries, together with other advanced query operators, are essential in order to help identify sets of interesting data points buried within huge amount of data readily available these days. A skyline query retrieves sets of non-dominated data points in a multi-dimensional dataset. As computing infrastructures become increasingly pervasive, connected by readily available network services, data storage and management have become inevitably more distributed. Under these distributed environments, designing efficient skyline querying with desirable quick response time and progressive returning of answers faces new challenges. To address this, in this paper, we propose a novel skyline query scheme termed MpSky. MpSky is based on a novel space partitioning scheme, employing the dependency relationships among data points on different servers. By grouping points of each server using dependencies, we are able to qualify a skyline point by only comparing it with data on dependent servers, and parallelize the skyline computation among non-dependent partitions that are from different servers or individual servers. By controlling the query propagation among partitions, we are able to generate skyline results progressively and prune partitions and points efficiently. Analytical and extensive simulation results show the effectiveness of the proposed scheme.  相似文献   

17.
Given a set of points with uncertain locations, we consider the problem of computing the probability of each point lying on the skyline, that is, the probability that it is not dominated by any other input point. If each point’s uncertainty is described as a probability distribution over a discrete set of locations, we improve the best known exact solution. We also suggest why we believe our solution might be optimal. Next, we describe simple, near-linear time approximation algorithms for computing the probability of each point lying on the skyline. In addition, some of our methods can be adapted to construct data structures that can efficiently determine the probability of a query point lying on the skyline.  相似文献   

18.
A node ranking scheme provides the necessary structural view for developing algorithms on a network. We present two ranking schemes for the star interconnection network both of which allow constant time order preserving communication. The first scheme is based on a hierarchical view of the star network. It enables one to efficiently implement order preserving ASCEND/DESCEND class of algorithms. This class includes several important algorithms such as the Fast Fourier Transform (FFT) and matrix multiplication. The other ranking scheme gives a flexible pipelined view of the star interconnection network and provides a suitable framework for implementation of pipelined algorithms  相似文献   

19.
In this paper, we address the problem of domain adaptation for binary classification. This problem arises when the distributions generating the source learning data and target test data are somewhat different. From a theoretical standpoint, a classifier has better generalization guarantees when the two domain marginal distributions of the input space are close. Classical approaches try mainly to build new projection spaces or to reweight the source data with the objective of moving closer the two distributions. We study an original direction based on a recent framework introduced by Balcan et?al. enabling one to learn linear classifiers in an explicit projection space based on a similarity function, not necessarily symmetric nor positive semi-definite. We propose a well-founded general method for learning a low-error classifier on target data, which is effective with the help of an iterative procedure compatible with Balcan et?al.??s framework. A reweighting scheme of the similarity function is then introduced in order to move closer the distributions in a new projection space. The hyperparameters and the reweighting quality are controlled by a reverse validation procedure. Our approach is based on a linear programming formulation and shows good adaptation performances with very sparse models. We first consider the challenging unsupervised case where no target label is accessible, which can be helpful when no manual annotation is possible. We also propose a generalization to the semi-supervised case allowing us to consider some few target labels when available. Finally, we evaluate our method on a synthetic problem and on a real image annotation task.  相似文献   

20.
The application of specific learning schemes in memetic algorithms (MAs) can have significant impact on their performances. One main issue revolves around two different learning schemes, specifically, Lamarckian and Baldwinian. It has been shown that the two learning schemes are better suited for different types of problems and some previous studies have attempted to combine both learning schemes as a means to develop a single optimisation framework capable of solving more classes of problems. However, most of the past approaches are often implemented heuristically and have not investigated the effect of different learning scheme on noisy design optimisation. In this article, we introduce a simple probabilistic approach to address this issue. In particular, we investigate a centroid-based approach that combines the two learning schemes within an MA framework (centroid-based MS; CBMA) through the effective allocation of resources (in terms of local search cost) that are based on information obtained during the optimisation process itself. A scheme that applies the right learning scheme (Lamarckian or Baldwinian) at the right time (during search) would lead to higher search performance. We conducted an empirical study to test this hypothesis using two different types of benchmark problems. The first problem set consists of simple benchmark problems whereby the problem landscape is static and gradient information can be obtained accurately. These problems are known to favour Lamarckian learning while Baldwinian learning is known to exhibit slower convergence. The second problem set consists of noisy versions of the first problem set whereby the problem landscape is dynamic as a result of the random noise perturbation injected into the design vector. These problems are known to favour learning processes that re-sample search points such as Baldwinian learning. Our experiments show that CBMA manages to adaptively allocate resources productively according to problem in most of the cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号