期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The knowledge base partitioning problem: Mathematical formulation and heuristic clustering

Tzvi Raz Nancy A. Botten 《Data & Knowledge Engineering》1992,8(4):329-337

Allocation of production rules among several partitions in a knowledge base can shorten the compilation and execution of expert systems applications and facilitate their verification, validation and maintenance. The problem of allocation of rules or groups of rules among partitions of limited size while minimizing the sum of inter-partition connections is formulated as a 0–1 integer programming problem with a quadratic objective function. Since this type of problems is NP-complete, a clustering algorithm based on the nearest neighbor heuristic is proposed. 相似文献

2.

A set partitioning reformulation of a school bus scheduling problem

Armin Fügenschuh 《Journal of Scheduling》2011,14(4):307-318

We present an integer programming model for the integrated optimization of bus schedules and school starting times, which is a single-depot vehicle scheduling problem with additional coupling constraints among the time windows. For instances with wide time windows the linear relaxation is weak and feasible solutions found by an ILP solver are of poor quality. We apply a set partitioning relaxation to compute better lower bounds and, in combination with a primal construction heuristic, also better primal feasible solutions. Integer programs with at most two non-zero coefficient per constraint play a prominent role in our approach. Computational results for several random and a real-world instance are given and compared with results from a standard branch-and-cut approach. 相似文献

3.

Privacy-preserving data mining: A feature set partitioning approach

Nissim Matatov Oded Maimon 《Information Sciences》2010,180(14):2696-2720

In privacy-preserving data mining (PPDM), a widely used method for achieving data mining goals while preserving privacy is based on k-anonymity. This method, which protects subject-specific sensitive data by anonymizing it before it is released for data mining, demands that every tuple in the released table should be indistinguishable from no fewer than k subjects. The most common approach for achieving compliance with k-anonymity is to replace certain values with less specific but semantically consistent values. In this paper we propose a different approach for achieving k-anonymity by partitioning the original dataset into several projections such that each one of them adheres to k-anonymity. Moreover, any attempt to rejoin the projections, results in a table that still complies with k-anonymity. A classifier is trained on each projection and subsequently, an unlabelled instance is classified by combining the classifications of all classifiers.Guided by classification accuracy and k-anonymity constraints, the proposed data mining privacy by decomposition (DMPD) algorithm uses a genetic algorithm to search for optimal feature set partitioning. Ten separate datasets were evaluated with DMPD in order to compare its classification performance with other k-anonymity-based methods. The results suggest that DMPD performs better than existing k-anonymity-based algorithms and there is no necessity for applying domain dependent knowledge. Using multiobjective optimization methods, we also examine the tradeoff between the two conflicting objectives in PPDM: privacy and predictive performance. 相似文献

4.

A Relax-and-Cut algorithm for the set partitioning problem

Victor F. Cavalcante Cid C. de Souza Abílio Lucena 《Computers & Operations Research》2008

Relax-and-Cut algorithms offer an alternative to strengthen Lagrangian relaxation bounds. The main idea behind these algorithms is to dynamically select and dualize inequalities (cuts) within a Lagrangian relaxation framework. This paper proposes a Relax-and-Cut algorithm for the Set Partitioning Problem. Computational tests are reported for benchmark instances from the literature. For Set Partitioning instances with integrality gaps, a variant of the classical Lagrangian relaxation is often used in the literature. It introduces a knapsack constraint to the standard formulation of the problem. Our results indicate that the proposed Relax-and-Cut algorithm outperforms the latter approach in terms of lower bound quality. Furthermore, it turns out to be very competitive in terms of CPU time. Decisive in achieving that performance was the implementation of dominance rules to manage inequalities in the cut pool. The Relax-and-Cut framework proposed here could also be used as a preprocessing tool for Linear Integer Programming solvers. Computational experiments demonstrated that the combined use of our framework and XPRESS improved the performance of that Linear Integer Programming solver for the test sets used in this study. 相似文献

5.

A genetic and set partitioning two-phase approach for the vehicle routing problem with time windows

G.B. Alvarenga G.R. Mateus G. de Tomi 《Computers & Operations Research》2007

The Vehicle Routing Problem with Time Windows (VRPTW) is a well-known and complex combinatorial problem, which has received considerable attention in recent years. This problem has been addressed using many different techniques including both exact and heuristic methods. The VRPTW benchmark problems of Solomon [Algorithms for the vehicle routing and scheduling problems with time window constraints, Operations Research 1987; 35(2): 254–65] have been most commonly chosen to evaluate and compare all algorithms. Results from exact methods have been improved considerably because of parallel implementations and modern branch-and-cut techniques. However, 24 out of the 56 high order instances from Solomon's original test set still remain unsolved. Additionally, in many cases a prohibitive time is needed to find the exact solution. Many of the heuristic methods developed have proved to be efficient in identifying good solutions in reasonable amounts of time. Unfortunately, whilst the research efforts based on exact methods have been focused on the total travel distance, the focus of almost all heuristic attempts has been on the number of vehicles. Consequently, it is more difficult to compare and take advantage of the strong points from each approach. This paper proposes a robust heuristic approach for the VRPTW using travel distance as the main objective through an efficient genetic algorithm and a set partitioning formulation. The tests were produced using real numbers and truncated data type, allowing a direct comparison of its results against previously published heuristic and exact methods. Furthermore, computational results show that the proposed heuristic approach outperforms all previously known and published heuristic methods in terms of the minimal travel distance. 相似文献

6.

UPS-k: a set partitioning problem with applications in UPS pickup-delivery system

Minghui Jiang 《Information Processing Letters》2005,93(4):173-175

相似文献

7.

基于遗传算法的集合划分问题求解 总被引：1，自引：0，他引：1

鲍江宏李炯城《计算机工程与设计》2008,29(11):2879-2882

集合划分问题是组合优化领域中有着广泛应用基础的著名问题,属于NP难问题.通过引入精英策略提出对遗传算法的改进,并为了能把遗传算法应用到集合划分问题,对数学模型进行了等价变换.针对集合划分问题,设计出一种高效的基因表示,避免了组合优化中处理约束条件的麻烦.解决了传统二进制基因编码无法精确适应离散优化问题,首次提出一种离散编码解决方案.最后,使用Visual C 6编程实现,取得较好的结果. 相似文献

8.

High-dimensional clustering: a clique-based hypergraph partitioning framework

Tianming Hu Chuanren Liu Yong Tang Jing Sun Hui Xiong Sam Yuan Sung 《Knowledge and Information Systems》2014,39(1):61-88

Hypergraph partitioning has been considered as a promising method to address the challenges of high-dimensional clustering. With objects modeled as vertices and the relationship among objects captured by the hyperedges, the goal of graph partitioning is to minimize the edge cut. Therefore, the definition of hyperedges is vital to the clustering performance. While several definitions of hyperedges have been proposed, a systematic understanding of desired characteristics of hyperedges is still missing. To that end, in this paper, we first provide a unified clique perspective of the definition of hyperedges, which serves as a guide to define hyperedges. With this perspective, based on the concepts of shared (reverse) nearest neighbors, we propose two new types of clique hyperedges and analyze their properties regarding purity and size issues. Finally, we present an extensive evaluation using real-world document datasets. The experimental results show that, with shared (reverse) nearest neighbor-based hyperedges, the clustering performance can be improved significantly in terms of various external validation measures without the need for fine tuning of parameters. 相似文献

9.

A rough set approach for selecting clustering attribute

Tutut Herawan Mustafa Mat Deris Jemal H. Abawajy 《Knowledge》2010,23(3):220-231

A few of clustering techniques for categorical data exist to group objects having similar characteristics. Some are able to handle uncertainty in the clustering process while others have stability issues. However, the performance of these techniques is an issue due to low accuracy and high computational complexity. This paper proposes a new technique called maximum dependency attributes (MDA) for selecting clustering attribute. The proposed approach is based on rough set theory by taking into account the dependency of attributes of the database. We analyze and compare the performance of MDA technique with the bi-clustering, total roughness (TR) and min–min roughness (MMR) techniques based on four test cases. The results establish the better performance of the proposed approach. 相似文献

10.

Algorithms and implementation of a set partitioning approach for modular machining line design

Pavel Borisovsky Alexandre Dolgui Sergey Kovalev 《Computers & Operations Research》2012

A transfer line design problem is considered. Transfer lines are sequences of workstations equipped with processing modules called blocks each of which performs specific operations. These lines are used for mass production of one type of product and thus execute repetitively a given set of operations. The machine parts move along the stations in the same direction. An identical cost is associated with each station and differing costs are associated with the blocks. The problem is to determine the number of stations, select a set of blocks and assign selected blocks to the stations so that operations of the selected blocks constitute the original set of operations and the total cost is minimized. A distinct feature of the problem is that operations at the same station are performed in parallel. Plus, there are inclusion, exclusion and precedence relations that restrict the assignment of the blocks and operations to the same station as well as the processing order of the operations on the transfer line. We implement a novel set partitioning formulation of this design problem with pre-processing procedures and heuristics. The presented approach has the best performance among the existing methods in terms of solution time and quality. 相似文献

11.

Topological operators: a relaxed query processing approach

Alberto Belussi Barbara Catania Paola Podestà 《GeoInformatica》2012,16(1):67-110

Relaxation and approximation techniques have been proposed as approaches for improving the quality of query results, in terms of completeness and accuracy, in environments where the user may not be able to specify the query in a complete and exact way, since data are quite heterogeneous or she may not know all the characteristics of data at hand. This problem, mainly addressed for relational and XML data, is nowadays quite relevant also for geo-spatial data, due to their increasing usage in highly critical decisional processes. Among geo-spatial queries, those based on spatial and more precisely topological relations are currently used in an increasing number of applications. As far as we know, no approach has been proposed so far for relaxing queries based on topological predicates when they return an empty or insufficient answer, in order to improve result quality and user satisfaction. In this paper, we consider this problem and we present a general relaxation strategy for, possibly multi-domain, topological selection and join queries. Two specific semantics are also provided: the first applies the minimum amount of relaxation in order to get an acceptable answer; the second relaxes the given query of a certain fixed amount, depending on the considered topological predicate. Index-based processing algorithms, for efficiently executing relaxed queries based on the proposed semantics, are also presented and a specific topological similarity function, to be used for relaxation purposes, is proposed. Experimental results show that the overhead given by query relaxation is acceptable. 相似文献

12.

Interval set clustering

Min Chen Duoqian Miao 《Expert systems with applications》2011,38(4):2923-2932

Rough k-means clustering describes uncertainty by assigning some objects to more than one cluster. Rough cluster quality index based on decision theory is applicable to the evaluation of rough clustering. In this paper we analyze rough k-means clustering with respect to the selection of the threshold, the value of risk for assigning an object and uncertainty of objects. According to the analysis, clusters presented as interval sets with lower and upper approximations in rough k-means clustering are not adequate to describe clusters. This paper proposes an interval set clustering based on decision theory. Lower and upper approximations in the proposed algorithm are hierarchical and constructed as outer-level approximations and inner-level ones. Uncertainty of objects in out-level upper approximation is described by the assignment of objects among different clusters. Accordingly, ambiguity of objects in inner-level upper approximation is represented by local uniform factors of objects. In addition, interval set clustering can be improved to obtain a satisfactory clustering result with the optimal number of clusters, as well as optimal values of parameters, by taking advantage of the usefulness of rough cluster quality index in the evaluation of clustering. The experimental results on synthetic and standard data demonstrate how to construct clusters with satisfactory lower and upper approximations in the proposed algorithm. The experiments with a promotional campaign for the retail data illustrates the usefulness of interval set clustering for improving rough k-means clustering results. 相似文献

13.

Spam query detection using stream clustering

Tahere?Shakiba Sajjad?Zarifzadeh Email author Vali?Derhami 《World Wide Web》2018,21(2):557-572

Nowadays, search engines play a gateway role for users to access their needed information in the Web. However, malicious users can also use them to facilitate their attacks by submitting excessive amounts of bot-generated queries, called spam queries. In this paper, we propose a novel semi-supervised method which can effectively detect spam queries in a practical manner. We first train a model to characterize normal and malicious users, using the linguistic properties of queries as well as the behavioral characteristics of users and IP addresses. Then, we use the trained model to predict the label of arriving requests with a fast and efficient algorithm which works based on the stream clustering approach. The results of our evaluation with the real log of a local search engine show that the proposed algorithm yields an accuracy of about %94, while incurring a low response-time and memory overhead. 相似文献

14.

The set LCS problem

D. S. Hirschberg L. L. Larmore 《Algorithmica》1987,2(1-4):91-95

An efficient algorithm is presented that solves a generalization of the Longest Common Subsequence problem, in which one of the two input strings contains sets of symbols which may be permuted. This problem arises from a music application. 相似文献

15.

Correlation-aware partitioning for skewed range query optimization

Ge Wei Li Xianxian Yuan Chunfeng Huang Yihua 《World Wide Web》2019,22(1):125-151

World Wide Web - Data partitioning is an effective way to reduce cost and improve query performance in large-scale Web data analytical applications. State-of-the-art partitioning approaches on... 相似文献

16.

The set LCS problem

D. S. Hirschberg L. L. Larmore 《Algorithmica》1987,2(1):91-95

An efficient algorithm is presented that solves a generalization of the Longest Common Subsequence problem, in which one of the two input strings contains sets of symbols which may be permuted. This problem arises from a music application. 相似文献

17.

Recursive partitioning clustering tree algorithm

Ji Hoon Kang Chan Hee Park Seoung Bum Kim 《Pattern Analysis & Applications》2016,19(2):355-367

Clustering analysis elicits the natural groupings of a dataset without requiring information about the sample class and has been widely used in various fields. Although numerous clustering algorithms have been proposed and proven to perform reasonably well, no consensus exists about which one performs best in real situations. In this study, we propose a nonparametric clustering method based on recursive binary partitioning that was implemented in a classification and regression tree model. The proposed clustering algorithm has two key advantages: (1) users do not have to specify any parameters before running it; (2) the final clustering result is represented by a set of if–then rules, thereby facilitating analysis of the clustering results. Experiments with the simulations and real datasets demonstrate the effectiveness and usefulness of the proposed algorithm. 相似文献

18.

Semi-supervised graph clustering: a kernel approach 总被引：6，自引：0，他引：6

Brian Kulis Sugato Basu Inderjit Dhillon Raymond Mooney 《Machine Learning》2009,74(1):1-22

Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are designed for data represented as vectors. In this paper, we unify vector-based and graph-based approaches. We first show that a recently-proposed objective function for semi-supervised clustering based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of constraint penalty functions, can be expressed as a special case of the weighted kernel k-means objective (Dhillon et al., in Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining, 2004a). A recent theoretical connection between weighted kernel k-means and several graph clustering objectives enables us to perform semi-supervised clustering of data given either as vectors or as a graph. For graph data, this result leads to algorithms for optimizing several new semi-supervised graph clustering objectives. For vector data, the kernel approach also enables us to find clusters with non-linear boundaries in the input data space. Furthermore, we show that recent work on spectral learning (Kamvar et al., in Proceedings of the 17th International Joint Conference on Artificial Intelligence, 2003) may be viewed as a special case of our formulation. We empirically show that our algorithm is able to outperform current state-of-the-art semi-supervised algorithms on both vector-based and graph-based data sets. 相似文献

19.

The capacitated centred clustering problem

Marcos Negreiros Augusto Palhano 《Computers & Operations Research》2006

The capacitated centred clustering problem (CCCP) consists of defining a set of clusters with limited capacity and maximum proper similarity per cluster. Each cluster is composed of individuals from whom we can compute a centre value and hence, determine a similarity measure. The clusters must cover the demands of their individuals. This problem can be applied to the design of garbage collection zones, defining salesmen areas, etc. In this work, we present two variations (p-CCCP and Generic CCCP) of this problem and their mathematical programming formulations. We first focus our attention on the Generic CCCP, and then we create a new procedure for p -CCCP. These problems being NP-HARD, we propose a two-phase polynomial heuristic algorithm. The first phase is a constructive phase for which we will propose two variants: the first technique uses known cluster procedures oriented by a log-polynomial geometric tree search, the other one uses unconstrained to constrained clustering. The second phase is a refinement of the variable neighbourhood search (VNS). We also show the results we have obtained for tests from the CCP literature, the design of garbage collection zones, and salesmen areas distribution using the approach implemented for the SISROT^®

{SISROT}^{®}

system. 相似文献

20.

Improving constrained clustering with active query selection

Viet-Vu Vu Nicolas Labroche Bernadette Bouchon-Meunier 《Pattern recognition》2012,45(4):1749-1758

In this article, we address the problem of automatic constraint selection to improve the performance of constraint-based clustering algorithms. To this aim we propose a novel active learning algorithm that relies on a k-nearest neighbors graph and a new constraint utility function to generate queries to the human expert. This mechanism is paired with propagation and refinement processes that limit the number of constraint candidates and introduce a minimal diversity in the proposed constraints. Existing constraint selection heuristics are based on a random selection or on a min–max criterion and thus are either inefficient or more adapted to spherical clusters. Contrary to these approaches, our method is designed to be beneficial for all constraint-based clustering algorithms. Comparative experiments conducted on real datasets and with two distinct representative constraint-based clustering algorithms show that our approach significantly improves clustering quality while minimizing the number of human expert solicitations. 相似文献