期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A two-phase algorithm for the biobjective integer minimum cost flow problem

Andrea Raith Matthias Ehrgott 《Computers & Operations Research》2009

We present an algorithm to compute a complete set of efficient solutions for the biobjective integer minimum cost flow problem. We use the two phase method, with a parametric network simplex algorithm in phase 1 to compute all non-dominated extreme points. In phase 2, the remaining non-dominated points (non-extreme supported and non-supported) are computed using a k best flow algorithm on single-objective weighted sum problems. 相似文献

2.

A Dijkstra-like method computing all extreme supported non-dominated solutions of the biobjective shortest path problem

《Computers & Operations Research》2015

We address the problem of determining all extreme supported solutions of the biobjective shortest path problem. A novel Dijkstra-like method generalizing Dijkstra׳s algorithm to this biobjective case is proposed. The algorithm runs in O(N(m+n log n)) time to solve one-to-one and one-to-all biobjective shortest path problems determining all extreme supported non-dominated points in the outcome space and one supported efficient path associated with each one of them. Here n is the number of nodes, m is the number of arcs and N is the number of extreme supported points in outcome space for the one-to-all biobjective shortest path problem. The memory space required by the algorithm is O(n+m) for the one-to-one problem and O(N+m) for the one-to-all problem. A computational experiment comparing the performance of the proposed methods and state-of-the-art methods is included. 相似文献

3.

Finding extreme supported solutions of biobjective network flow problems: An enhanced parametric programming approach

《Computers & Operations Research》2017

We address the problem of determining a complete set of extreme supported efficient solutions of biobjective minimum cost flow (BMCF) problems. A novel method improving the classical parametric method for this biobjective problem is proposed. The algorithm runs in O(Nn(m + nlogn)) time determining all extreme supported non-dominated points in the outcome space and one extreme supported efficient solution associated with each one of them. Here n is the number of nodes, m is the number of arcs and N is the number of extreme supported non-dominated points in outcome space for the BMCF problem. The memory space required by the algorithm is O(n + m) when the extreme supported efficient solutions are not required to be stored in RAM. Otherwise, the algorithm requires O(N + m) space. Extensive computational experiments comparing the performance of the proposed method and a standard parametric network simplex method are presented. 相似文献

4.

An efficient procedure for finding best compromise solutions to the multi-objective assignment problem

《Computers & Operations Research》2014

In this paper, we consider the problem of determining a best compromise solution for the multi-objective assignment problem. Such a solution minimizes a scalarizing function, such as the weighted Tchebychev norm or reference point achievement functions. To solve this problem, we resort to a ranking (or k-best) algorithm which enumerates feasible solutions according to an appropriate weighted sum until a condition, ensuring that an optimal solution has been found, is met. The ranking algorithm is based on a branch and bound scheme. We study how to implement efficiently this procedure by considering different algorithmic variants within the procedure: choice of the weighted sum, branching and bounding schemes. We present an experimental analysis that enables us to point out the best variants, and we provide experimental results showing the remarkable efficiency of the procedure, even for large size instances. 相似文献

5.

The MinMax k-Means clustering algorithm

Grigorios Tzortzis Aristidis Likas 《Pattern recognition》2014

Applying k-Means to minimize the sum of the intra-cluster variances is the most popular clustering approach. However, after a bad initialization, poor local optima can be easily obtained. To tackle the initialization problem of k-Means, we propose the MinMax k-Means algorithm, a method that assigns weights to the clusters relative to their variance and optimizes a weighted version of the k-Means objective. Weights are learned together with the cluster assignments, through an iterative procedure. The proposed weighting scheme limits the emergence of large variance clusters and allows high quality solutions to be systematically uncovered, irrespective of the initialization. Experiments verify the effectiveness of our approach and its robustness over bad initializations, as it compares favorably to both k-Means and other methods from the literature that consider the k-Means initialization problem. 相似文献

6.

Determining the K-best solutions of knapsack problems

《Computers & Operations Research》2014

It is well-known that knapsack problems arise as subproblems of a number of large-scale integer optimization problems. In order to solve these large problems, it is necessary to solve the subproblems efficiently, and for many of them it can be useful to determine the K-best solutions. In this paper, a branch-and-bound method for the unbounded knapsack problem described in the literature is extended to determine the K-best solutions of unbounded and bounded knapsack problems. We show that the proposed extension determines exactly the K-best solutions and we solve important classical instances using high values of K. 相似文献

7.

Information theoretic clustering using a k-nearest neighbors approach

Vidar V. Vikjord Robert Jenssen 《Pattern recognition》2014

We develop a new non-parametric information theoretic clustering algorithm based on implicit estimation of cluster densities using the k-nearest neighbors (k-nn) approach. Compared to a kernel-based procedure, our hierarchical k-nn approach is very robust with respect to the parameter choices, with a key ability to detect clusters of vastly different scales. Of particular importance is the use of two different values of k, depending on the evaluation of within-cluster entropy or across-cluster cross-entropy, and the use of an ensemble clustering approach wherein different clustering solutions vote in order to obtain the final clustering. We conduct clustering experiments, and report promising results. 相似文献

8.

An algorithm for k-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers

David Rebollo-MonederoAuthor Vitae Jordi FornéAuthor Vitae 《Data & Knowledge Engineering》2011,70(10):892-921

We present a multidisciplinary solution to the problems of anonymous microaggregation and clustering, illustrated with two applications, namely privacy protection in databases, and private retrieval of location-based information. Our solution is perturbative, is based on the same privacy criterion used in microdata k-anonymization, and provides anonymity through a substantial modification of the Lloyd algorithm, a celebrated quantization design algorithm, endowed with numerical optimization techniques.Our algorithm is particularly suited to the important problem of k-anonymous microaggregation of databases, with a small integer k representing the number of individual respondents indistinguishable from each other in the published database. Our algorithm also exhibits excellent performance in the problem of clustering or macroaggregation, where k may take on arbitrarily large values. We illustrate its applicability in this second, somewhat less common case, by means of an example of location-based services. Specifically, location-aware devices entrust a third party with accurate location information. This party then uses our algorithm to create distortion-optimized, size-constrained clusters, where k nearby devices share a common centroid location, which may be regarded as a distorted version of the original one. The centroid location is sent back to the devices, which use it when contacting untrusted location-based information providers, in lieu of the exact home location, to enforce k-anonymity.We compare the performance of our novel algorithm to the state-of-the-art microaggregation algorithm MDAV, on both synthetic and standardized real data, which encompass the cases of small and large values of k. The most promising aspect of our proposed algorithm is its capability to maintain the same k-anonymity constraint, while outperforming MDAV by a significant reduction in data distortion, in all the cases considered. 相似文献

9.

Fast modified global k-means algorithm for incremental cluster construction

Adil M. Bagirov^{Author Vitae} Julien Ugon Author VitaeAuthor Vitae 《Pattern recognition》2011,44(4):866-876

The k-means algorithm and its variations are known to be fast clustering algorithms. However, they are sensitive to the choice of starting points and are inefficient for solving clustering problems in large datasets. Recently, incremental approaches have been developed to resolve difficulties with the choice of starting points. The global k-means and the modified global k-means algorithms are based on such an approach. They iteratively add one cluster center at a time. Numerical experiments show that these algorithms considerably improve the k-means algorithm. However, they require storing the whole affinity matrix or computing this matrix at each iteration. This makes both algorithms time consuming and memory demanding for clustering even moderately large datasets. In this paper, a new version of the modified global k-means algorithm is proposed. We introduce an auxiliary cluster function to generate a set of starting points lying in different parts of the dataset. We exploit information gathered in previous iterations of the incremental algorithm to eliminate the need of computing or storing the whole affinity matrix and thereby to reduce computational effort and memory usage. Results of numerical experiments on six standard datasets demonstrate that the new algorithm is more efficient than the global and the modified global k-means algorithms. 相似文献

10.

Mining top-k regular-frequent itemsets using database partitioning and support estimation

Komate Amphawan Philippe Lenca Athasit Surarerks 《Expert systems with applications》2012,39(2):1924-1936

Temporal regularity of itemset appearance can be regarded as an important criterion for measuring the interestingness of itemsets in several applications. A frequent itemset can be said to be regular-frequent in a database if it appears at a regular period. Therefore, the problem of mining a complete set of regular-frequent itemsets requires the specification of a support and a regularity threshold. However, in practice, it is often difficult for users to provide an appropriate support threshold. In addition, the use of a support threshold tends to produce a large number of regular-frequent itemsets and it might be better to ask for the number of desired results. We thus propose an efficient algorithm for mining top-k regular-frequent itemsets without setting a support threshold. Based on database partitioning and support estimation techniques, the proposed algorithm also uses a best-first search strategy with only one database scan. We then compare our algorithm with the state-of-the-art algorithms for mining top-k regular-frequent itemsets. Our experimental studies on both synthetic and real data show that our proposal achieves high performance for small and large values of k. 相似文献

11.

N-gram posterior probability confidence measures for statistical machine translation: an empirical study 总被引：1，自引：1，他引：0

Adrià de Gispert Graeme Blackwood Gonzalo Iglesias William Byrne 《Machine Translation》2013,27(2):85-114

We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework. 相似文献

12.

An algorithm for the biobjective integer minimum cost flow problem

《Computers & Operations Research》2001,28(2):139-156

In this paper, we study the single commodity flow problems, optimizing two objectives simultaneously, where the flow values must be integer values. We propose a method that finds all the efficient integer points in the objective space. Our algorithm performs two phases. In the first phase, all integer points on the efficient boundary are found and in the second phase, the efficient integer points that do not lie on the efficient boundary are calculated. In addition, we carry out a computational experiment showing that the number of efficient integer solutions that do not lie on the efficient boundary is greater than the number of integer solutions on the efficient boundary.Scope and purposeIn many combinatorial optimization problems, the selection of the optimum solution takes into account more than one criterion. For example, in transportation problems or in network flows problems, the criteria that can be considered are the minimization of the cost for selected routes, the minimization of arrival times at the destinations, the minimization of the deterioration of goods, the minimization of the load capacity that would not be used in the selected vehicles, the maximization of safety, reliability, etc. Often, these criteria are in conflict and for this reason, a multiobjective network flow formulation of the problem is necessary. The solution to this problem is searched for among the set of efficient points. Although multiobjective network flow problems can be solved using the techniques available for the multiobjective linear programming problem, network-based methods are computationally better. The multicriteria minimum cost flow problem has already merited the attention of several authors and the case which has been considered in literature is that which has two objectives, where the continuous flow values are permissible. However, the integer case of the biobjective minimum cost flow problem has scarcely been studied. Whereas, in many real network flow problems, integer values on flow values are required. In this paper, we propose an approach to solve the biobjective integer minimum cost flow problem. An algorithm to obtain all efficient integer solutions of this problem is introduced. This method is characterized by the use of the classic resolution tools of network flow problems, such as the network simplex method. It does not utilize the biobjective integer linear programming methodology. Furthermore, the method does not calculate dominated solutions, so it is not necessary to incorporate tools to eliminate dominated solutions. 相似文献

13.

Privacy-preserving data mining: A feature set partitioning approach

Nissim Matatov Oded Maimon 《Information Sciences》2010,180(14):2696-2720

In privacy-preserving data mining (PPDM), a widely used method for achieving data mining goals while preserving privacy is based on k-anonymity. This method, which protects subject-specific sensitive data by anonymizing it before it is released for data mining, demands that every tuple in the released table should be indistinguishable from no fewer than k subjects. The most common approach for achieving compliance with k-anonymity is to replace certain values with less specific but semantically consistent values. In this paper we propose a different approach for achieving k-anonymity by partitioning the original dataset into several projections such that each one of them adheres to k-anonymity. Moreover, any attempt to rejoin the projections, results in a table that still complies with k-anonymity. A classifier is trained on each projection and subsequently, an unlabelled instance is classified by combining the classifications of all classifiers.Guided by classification accuracy and k-anonymity constraints, the proposed data mining privacy by decomposition (DMPD) algorithm uses a genetic algorithm to search for optimal feature set partitioning. Ten separate datasets were evaluated with DMPD in order to compare its classification performance with other k-anonymity-based methods. The results suggest that DMPD performs better than existing k-anonymity-based algorithms and there is no necessity for applying domain dependent knowledge. Using multiobjective optimization methods, we also examine the tradeoff between the two conflicting objectives in PPDM: privacy and predictive performance. 相似文献

14.

Ranking uncertain sky: The probabilistic top-k skyline operator

Ying Zhang Wenjie Zhang Xuemin Lin Bin Jiang Jian Pei 《Information Systems》2011

Many recent applications involve processing and analyzing uncertain data. In this paper, we combine the feature of top-k objects with that of skyline to model the problem of top-k skyline objects against uncertain data. The problem of efficiently computing top-k skyline objects on large uncertain datasets is challenging in both discrete and continuous cases. In this paper, firstly an efficient exact algorithm for computing the top-k skyline objects is developed for discrete cases. To address applications where each object may have a massive set of instances or a continuous probability density function, we also develop an efficient randomized algorithm with an ?‐approximation

? ‐ approximation

guarantee. Moreover, our algorithms can be immediately extended to efficiently compute p-skyline; that is, retrieving the uncertain objects with skyline probabilities above a given threshold. Our extensive experiments on synthetic and real data demonstrate the efficiency of both algorithms and the randomized algorithm is highly accurate. They also show that our techniques significantly outperform the existing techniques for computing p-skyline. 相似文献

15.

Improving the performance of k-means for color quantization

M. Emre Celebi 《Image and vision computing》2011,29(4):260-271

Color quantization is an important operation with many applications in graphics and image processing. Most quantization methods are essentially based on data clustering algorithms. However, despite its popularity as a general purpose clustering algorithm, k-means has not received much respect in the color quantization literature because of its high computational requirements and sensitivity to initialization. In this paper, we investigate the performance of k-means as a color quantizer. We implement fast and exact variants of k-means with several initialization schemes and then compare the resulting quantizers to some of the most popular quantizers in the literature. Experiments on a diverse set of images demonstrate that an efficient implementation of k-means with an appropriate initialization strategy can in fact serve as a very effective color quantizer. 相似文献

16.

The global k-means clustering algorithm

Aristidis LikasAuthor Vitae Nikos VlassisAuthor VitaeJakob J. VerbeekAuthor Vitae 《Pattern recognition》2003,36(2):451-461

We present the global k-means algorithm which is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N (with N being the size of the data set) executions of the k-means algorithm from suitable initial positions. We also propose modifications of the method to reduce the computational load without significantly affecting solution quality. The proposed clustering methods are tested on well-known data sets and they compare favorably to the k-means algorithm with random restarts. 相似文献

17.

On Multicast Algorithms for Heterogeneous Networks of Workstations

《Journal of Parallel and Distributed Computing》2001,61(11):1665-1679

Networks of workstations (NOWs) provide an economical platform for high performance parallel computing. Such networks may comprise a variety of different types of workstations and network devices. This paper addresses the problem of efficient multicast in a heterogeneous communication model. Although the problem of finding optimal multicast schedules is known to be NP-complete in this model, a greedy algorithm has been shown experimentally to find good solutions in practice. In this paper we show that the greedy algorithm finds provably near-optimal schedules in polynomial time and that optimal schedules can be found in polynomial time when the number of distinct types of workstations is bounded by a constant. Specifically, this paper presents three results. First, when there are n workstations of some constant k distinct types, the greedy algorithm is shown to find schedules that complete at most a constant additive term later than optimal. Second, an algorithm is given that finds optimal schedules in time O(n^2k). Finally, it is shown that for the general problem, the greedy algorithm finds solutions that complete the multicast in at most twice the optimal time. 相似文献

18.

Overlapping community detection in labeled graphs

Esther Galbrun Aristides Gionis Nikolaj Tatti 《Data mining and knowledge discovery》2014,28(5-6):1586-1610

We present a new approach for the problem of finding overlapping communities in graphs and social networks. Our approach consists of a novel problem definition and three accompanying algorithms. We are particularly interested in graphs that have labels on their vertices, although our methods are also applicable to graphs with no labels. Our goal is to find k communities so that the total edge density over all k communities is maximized. In the case of labeled graphs, we require that each community is succinctly described by a set of labels. This requirement provides a better understanding for the discovered communities. The proposed problem formulation leads to the discovery of vertex-overlapping and dense communities that cover as many graph edges as possible. We capture these properties with a simple objective function, which we solve by adapting efficient approximation algorithms for the generalized maximum-coverage problem and the densest-subgraph problem. Our proposed algorithm is a generic greedy scheme. We experiment with three variants of the scheme, obtained by varying the greedy step of finding a dense subgraph. We validate our algorithms by comparing with other state-of-the-art community-detection methods on a variety of performance measures. Our experiments confirm that our algorithms achieve results of high quality in terms of the reported measures, and are practical in terms of performance. 相似文献

19.

A derandomized approximation algorithm for the critical node detection problem

《Computers & Operations Research》2014

In this paper we propose an efficient approximation algorithm for determining solutions to the critical node detection problem (CNDP) on unweighted and undirected graphs. Given a user-defined number of vertices

k > 0

, the problem is to determine which k nodes to remove such as to minimize pairwise connectivity in the induced subgraph. We present a simple, yet powerful, algorithm that is derived from a randomized rounding of the relaxed linear programming solution to the CNDP. We prove that the expected solution quality obtained by the linear-time algorithm is bounded by a constant. To highlight the algorithm quality four common complex network models are utilized, in addition to four real-world networks. 相似文献

20.

An improved,randomized algorithm for parallel selection with an experimental study

《Journal of Parallel and Distributed Computing》2004,64(9):1051-1059

A common statistical problem is that of finding the median element in a set of data. This paper presents an efficient randomized high-level parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our algorithm solves the general selection problem that requires the determination of the element of rank k, for an arbitrarily given integer k.Our general framework is an SPMD distributed-memory programming model that is enhanced by a set of communication primitives. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. The algorithms have been coded in the message-passing standard MPI, and our experimental results from the IBM SP-2 illustrate the scalability and efficiency of our algorithm and improve upon all the related experimental results known to the author.The main contributions of this paper are(1) New techniques for speeding the performance of certain randomized algorithms, such as selection, which are efficient with likely probability.(2) A new, practical randomized selection algorithm (UltraFast) with significantly improved convergence. 相似文献