首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many recent applications involve processing and analyzing uncertain data. In this paper, we combine the feature of top-k objects with that of skyline to model the problem of top-k skyline objects against uncertain data. The problem of efficiently computing top-k skyline objects on large uncertain datasets is challenging in both discrete and continuous cases. In this paper, firstly an efficient exact algorithm for computing the top-k   skyline objects is developed for discrete cases. To address applications where each object may have a massive set of instances or a continuous probability density function, we also develop an efficient randomized algorithm with an ?‐approximation?approximation guarantee. Moreover, our algorithms can be immediately extended to efficiently compute p-skyline; that is, retrieving the uncertain objects with skyline probabilities above a given threshold. Our extensive experiments on synthetic and real data demonstrate the efficiency of both algorithms and the randomized algorithm is highly accurate. They also show that our techniques significantly outperform the existing techniques for computing p-skyline.  相似文献   

2.
We consider the following partition problem: Given a set S of n elements that is organized as k sorted subsets of size n/k each and given a parameter h with 1/k ≤ h ≤ n/k , partition S into g = O(n/(hk)) subsets D 1 , D 2 , . . . , D g of size Θ(hk) each, such that, for any two indices i and j with 1 ≤ i < j ≤ g , no element in D i is bigger than any element in D j . Note that with various combinations of the values of parameters h and k , several fundamental problems, such as merging, sorting, and finding an approximate median, can be formulated as or be reduced to this partition problem. The partition problem also finds many applications in solving problems of parallel computing and computational geometry. In this paper we present efficient parallel algorithms for solving the partition problem and a number of its applications. Our parallel partition algorithm runs in O( log n) time using processors in the EREW PRAM model. The complexity bounds of our parallel partition algorithm on the respective special cases match those of the optimal EREW PRAM algorithms for merging, sorting, and finding an approximate median. Using our parallel partition algorithm, we are also able to obtain better complexity bounds (even possibly on a weaker parallel model) than the previously best known parallel algorithms for several important problems, including parallel multiselection, parallel multiranking, and parallel sorting of k sorted subsets. Received May 5, 1996; revised July 30, 1998.  相似文献   

3.
Choosing the best location for starting a business or expanding an existing enterprize is an important issue. A number of location selection problems have been discussed in the literature. They often apply the Reverse Nearest Neighbor as the criterion for finding suitable locations. In this paper, we apply the Average Distance as the criterion and propose the so-called k-most suitable locations (k-MSL) selection problem. Given a positive integer k and three datasets: a set of customers, a set of existing facilities, and a set of potential locations. The k-MSL selection problem outputs k locations from the potential location set, such that the average distance between a customer and his nearest facility is minimized. In this paper, we formally define the k-MSL selection problem and show that it is NP-hard. We first propose a greedy algorithm which can quickly find an approximate result for users. Two exact algorithms are then proposed to find the optimal result. Several pruning rules are applied to increase computational efficiency. We evaluate the algorithms’ performance using both synthetic and real datasets. The results show that our algorithms are able to deal with the k-MSL selection problem efficiently.  相似文献   

4.
Recently, due to intrinsic characteristics in many underlying data sets, a number of probabilistic queries on uncertain data have been investigated. Top-k dominating queries are very important in many applications including decision making in a multidimensional space. In this paper, we study the problem of efficiently computing top-k dominating queries on uncertain data. We first formally define the problem. Then, we develop an efficient, threshold-based algorithm to compute the exact solution. To overcome some inherent computational deficiency in an exact computation, we develop an efficient randomized algorithm with an accuracy guarantee. Our extensive experiments demonstrate that both algorithms are quite efficient, while the randomized algorithm is quite scalable against data set sizes, object areas, k values, etc. The randomized algorithm is also highly accurate in practice.  相似文献   

5.
We present a new approach for the problem of finding overlapping communities in graphs and social networks. Our approach consists of a novel problem definition and three accompanying algorithms. We are particularly interested in graphs that have labels on their vertices, although our methods are also applicable to graphs with no labels. Our goal is to find k communities so that the total edge density over all k communities is maximized. In the case of labeled graphs, we require that each community is succinctly described by a set of labels. This requirement provides a better understanding for the discovered communities. The proposed problem formulation leads to the discovery of vertex-overlapping and dense communities that cover as many graph edges as possible. We capture these properties with a simple objective function, which we solve by adapting efficient approximation algorithms for the generalized maximum-coverage problem and the densest-subgraph problem. Our proposed algorithm is a generic greedy scheme. We experiment with three variants of the scheme, obtained by varying the greedy step of finding a dense subgraph. We validate our algorithms by comparing with other state-of-the-art community-detection methods on a variety of performance measures. Our experiments confirm that our algorithms achieve results of high quality in terms of the reported measures, and are practical in terms of performance.  相似文献   

6.
The influence of a spatial facility object depicts the importance of the object in the whole data space. In this paper, we present a novel definition of object influence in applications where objects are of different categories. We study the problem of Spatial Influence Query which considers the contribution of an object in forming functional units consisting of a given set of objects with different categories designated by users. We first show that the problem of spatial influence query is NP-hard with respect to the number of object categories in the functional unit. To tackle the computational hardness, we develop an efficient framework following two main steps, possible participants finding and optimal functional unit computation. Based on this framework, for the first step, novel and efficient pruning techniques are developed based on the nearest neighbor set (NNS) approach. To find the optimal functional unit efficiently, we propose two algorithms, an exact algorithm and an efficient approximate algorithm with performance guarantee. Comprehensive experiments on both real and synthetic datasets demonstrate the effectiveness and efficiency of our techniques.  相似文献   

7.
We provide in this article a branch-and-bound algorithm that solves the problem of finding the k closest pairs of points (p,q), p?∈?P,q?∈?Q, considering two sets of points in the euclidean plane P,Q stored in external memory assuming that only one of the sets has a spatial index. This problem arises naturally in many scenarios, for instance when the set without an index is the answer to a spatial query. The main idea of our algorithm is to partition the space occupied by the set without an index into several cells or subspaces and to make use of the properties of a set of metrics defined on two Minimum Bounding Rectangles (MBRs). We evaluated our algorithm for different values of k by means of a series of experiments that considered both synthetical and real world datasets. We compared the performance of our algorithm with that of techniques that either assume that both datasets have a spatial index or that none has an index. The results show that our algorithm needs only between a 0.3 and a 35 % of the disk accesses required by such techniques. Our algorithm also shows a good scalability, both in terms of k and of the size of the data set.  相似文献   

8.
We study in this paper the problem of finding in a graph a subset of k edges whose deletion causes the largest increase in the weight of a minimum spanning tree. We propose for this problem an explicit enumeration algorithm whose complexity, when compared to the current best algorithm, is better for general k but very slightly worse for fixed k. More interestingly, unlike in the previous algorithms, we can easily adapt our algorithm so as to transform it into an implicit enumeration algorithm based on a branch and bound scheme. We also propose a mixed integer programming formulation for this problem. Computational results show a clear superiority of the implicit enumeration algorithm both over the explicit enumeration algorithm and the mixed integer program.  相似文献   

9.
In this paper we investigate the k-path cover problem for graphs, which is to find the minimum number of vertex disjoint k-paths that cover all the vertices of a graph. The k-path cover problem for general graphs is NP-complete. Though notable applications of this problem to database design, network, VLSI design, ring protocols, and code optimization, efficient algorithms are known for only few special classes of graphs. In order to solve this problem for cacti, i.e., graphs where no edge lies on more than one cycle, we introduce the so-called Steiner version of the k-path cover problem, and develop an efficient algorithm for the Steiner k-path cover problem for cacti, which finds an optimal k-path cover for a given cactus in polynomial time.  相似文献   

10.
We establish a refined search tree technique for the parameterized DOMINATING SET problem on planar graphs. Here, we are given an undirected graph and we ask for a set of at most k vertices such that every other vertex has at least one neighbor in this set. We describe algorithms with running times O(8kn) and O(8kk+n3), where n is the number of vertices in the graph, based on bounded search trees. We describe a set of polynomial time data-reduction rules for a more general “annotated” problem on black/white graphs that asks for a set of k vertices (black or white) that dominate all the black vertices. An intricate argument based on the Euler formula then establishes an efficient branching strategy for reduced inputs to this problem. In addition, we give a family examples showing that the bound of the branching theorem is optimal with respect to our reduction rules. Our final search tree algorithm is easy to implement; its analysis, however, is involved.  相似文献   

11.
We present a parallel algorithm for finding a maximum weight matching in general bipartite graphs with an adjustable time complexity of using O(nmax(2ω,4+ω)) processing elements for ω?1. Parameter ω is not bounded. This is the fastest known strongly polynomial parallel algorithm to solve this problem. This is also the first adjustable parallel algorithm for the maximum weight bipartite matching problem in which the execution time can be reduced by an unbounded factor. We also present a general approach for finding efficient parallel algorithms for the maximum matching problem.  相似文献   

12.
Traditional minimum spanning tree-based clustering algorithms only make use of information about edges contained in the tree to partition a data set. As a result, with limited information about the structure underlying a data set, these algorithms are vulnerable to outliers. To address this issue, this paper presents a simple while efficient MST-inspired clustering algorithm. It works by finding a local density factor for each data point during the construction of an MST and discarding outliers, i.e., those whose local density factor is larger than a threshold, to increase the separation between clusters. This algorithm is easy to implement, requiring an implementation of iDistance as the only k-nearest neighbor search structure. Experiments performed on both small low-dimensional data sets and large high-dimensional data sets demonstrate the efficacy of our method.  相似文献   

13.
Efficient processing of top-k queries has drawn increasing attention from both industry and academia due to its varied applications. Lower access cost is a crucial concern for a top-k query processing. Typically, when answering a top-k query, there exist two types of accesses: sorted access and random access. In some scenarios, the latter is not supported by the data source. Fagin et al. proposed the No Random Access (NRA) algorithm (Fagin et?al, J Comput Syst Sci 66:614–656, 2003) for this situation. In this paper, we motivate our work by a key observation of the NRA algorithm: the number of accesses could be further reduced by selectively (instead of in parallel) performing sorted accesses to different lists of the dataset. Based on this insight, we propose a Selective NRA (SNRA) algorithm aiming to cut down the unnecessary access cost. Later, we optimize the SNRA algorithm in terms of runtime cost and present the SNRA-opt algorithm. Furthermore, we address the problem of instance optimality theoretically and turn SNRA (and SNRA-opt) into instance optimal algorithms, termed as Hybrid-SNRA (HSNRA) and HSNRA-opt. Extensive experimental results show that our algorithms perform significantly fewer sorted accesses than NRA (and its state-of-the-art variations). In terms of runtime cost, the proposed SNRA-opt and HSNRA-opt algorithms are two orders of magnitude faster than the NRA algorithm. In addition, we discuss the parameter selection problem of the SNRA algorithms, both theoretically and experimentally.  相似文献   

14.
Networks of workstations (NOWs) provide an economical platform for high performance parallel computing. Such networks may comprise a variety of different types of workstations and network devices. This paper addresses the problem of efficient multicast in a heterogeneous communication model. Although the problem of finding optimal multicast schedules is known to be NP-complete in this model, a greedy algorithm has been shown experimentally to find good solutions in practice. In this paper we show that the greedy algorithm finds provably near-optimal schedules in polynomial time and that optimal schedules can be found in polynomial time when the number of distinct types of workstations is bounded by a constant. Specifically, this paper presents three results. First, when there are n workstations of some constant k distinct types, the greedy algorithm is shown to find schedules that complete at most a constant additive term later than optimal. Second, an algorithm is given that finds optimal schedules in time O(n2k). Finally, it is shown that for the general problem, the greedy algorithm finds solutions that complete the multicast in at most twice the optimal time.  相似文献   

15.
Temporal regularity of itemset appearance can be regarded as an important criterion for measuring the interestingness of itemsets in several applications. A frequent itemset can be said to be regular-frequent in a database if it appears at a regular period. Therefore, the problem of mining a complete set of regular-frequent itemsets requires the specification of a support and a regularity threshold. However, in practice, it is often difficult for users to provide an appropriate support threshold. In addition, the use of a support threshold tends to produce a large number of regular-frequent itemsets and it might be better to ask for the number of desired results. We thus propose an efficient algorithm for mining top-k regular-frequent itemsets without setting a support threshold. Based on database partitioning and support estimation techniques, the proposed algorithm also uses a best-first search strategy with only one database scan. We then compare our algorithm with the state-of-the-art algorithms for mining top-k regular-frequent itemsets. Our experimental studies on both synthetic and real data show that our proposal achieves high performance for small and large values of k.  相似文献   

16.
AVL (Adel'son-Vel'skii and Landis) trees are efficient data structures for implementing dictionaries. We present a parallel dictionary, using AVL trees, on the EREW PRAM by proposing optimal algorithms to performkoperations withp(1 ≤pk) processors. An explicit processor scheduling is devised to avoid simultaneous reads in our parallel algorithm to performksearches, which avoids the need for any additional memory in the parallelization. To perform multiple insertions and deletions, we identify rotations (in addition to AVL tree rotations) required to restore balance and present parallel algorithms to performpinsertions/deletions inO(logn+ logp) time withpprocessors.  相似文献   

17.
Using predicate logic, the concept of a linear problem is formalized. The class of linear problems is huge, diverse, complex, and important. Linear and randomized linear algorithms are formalized. For each linear problem, a linear algorithm is constructed that solves the problem and a randomized linear algorithm is constructed that completely solves it, that is, for any data of the problem, the output set of the randomized linear algorithm is identical to the solution set of the problem. We obtain a single machine, referred to as the Universal (Randomized) Linear Machine, which (completely) solves every instance of every linear problem. Conversely, for each randomized linear algorithm, a linear problem is constructed that the algorithm completely solves. These constructions establish a one-to-one and onto correspondence from equivalence classes of linear problems to equivalence classes of randomized linear algorithms.Our construction of (randomized) linear algorithms to (completely) solve linear problems as well as the algorithms themselves are based on Fourier Elimination and have superexponential complexity. However, there is no evidence that the inefficiency of our methods is unavoidable relative to the difficulty of the problem.  相似文献   

18.
Ensemble methods have been shown to be an effective tool for solving multi-label classification tasks. In the RAndom k-labELsets (RAKEL) algorithm, each member of the ensemble is associated with a small randomly-selected subset of k labels. Then, a single label classifier is trained according to each combination of elements in the subset. In this paper we adopt a similar approach, however, instead of randomly choosing subsets, we select the minimum required subsets of k labels that cover all labels and meet additional constraints such as coverage of inter-label correlations. Construction of the cover is achieved by formulating the subset selection as a minimum set covering problem (SCP) and solving it by using approximation algorithms. Every cover needs only to be prepared once by offline algorithms. Once prepared, a cover may be applied to the classification of any given multi-label dataset whose properties conform with those of the cover. The contribution of this paper is two-fold. First, we introduce SCP as a general framework for constructing label covers while allowing the user to incorporate cover construction constraints. We demonstrate the effectiveness of this framework by proposing two construction constraints whose enforcement produces covers that improve the prediction performance of random selection by achieving better coverage of labels and inter-label correlations. Second, we provide theoretical bounds that quantify the probabilities of random selection to produce covers that meet the proposed construction criteria. The experimental results indicate that the proposed methods improve multi-label classification accuracy and stability compared to the RAKEL algorithm and to other state-of-the-art algorithms.  相似文献   

19.
Kernelization algorithms for the cluster editing problem have been a popular topic in the recent research in parameterized computation. Most kernelization algorithms for the problem are based on the concept of critical cliques. In this paper, we present new observations and new techniques for the study of kernelization algorithms for the cluster editing problem. Our techniques are based on the study of the relationship between cluster editing and graph edge-cuts. As an application, we present a simple algorithm that constructs a 2k-vertex kernel for the integral-weighted version of the cluster editing problem. Our result matches the best kernel bound for the unweighted version of the cluster editing problem, and significantly improves the previous best kernel bound for the weighted version of the problem. For the more general real-weighted version of the problem, our techniques lead to a simple kernelization algorithm that constructs a kernel of at most 4k vertices.  相似文献   

20.
This paper presents improved algorithms for matroid-partitioning problems, such as finding a maximum cardinality set of edges of a graph that can be partitioned intok forests, and finding as many disjoint spanning trees as possible. The notion of a clump in a matroid sum is introduced, and efficient algorithms for clumps are presented. Applications of these algorithms are given to problems arising in the study of the structural rigidity of graphs, the Shannon switching game, and others.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号