共查询到20条相似文献,搜索用时 15 毫秒
1.
Aristidis LikasAuthor Vitae Nikos VlassisAuthor VitaeJakob J. VerbeekAuthor Vitae 《Pattern recognition》2003,36(2):451-461
We present the global k-means algorithm which is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N (with N being the size of the data set) executions of the k-means algorithm from suitable initial positions. We also propose modifications of the method to reduce the computational load without significantly affecting solution quality. The proposed clustering methods are tested on well-known data sets and they compare favorably to the k-means algorithm with random restarts. 相似文献
2.
Jim Z.C. Lai Author Vitae Author Vitae 《Pattern recognition》2008,41(12):3677-3681
In this paper, we present a modified filtering algorithm (MFA) by making use of center variations to speed up clustering process. Our method first divides clusters into static and active groups. We use the information of cluster displacements to reject unlikely cluster centers for all nodes in the kd-tree. We reduce the computational complexity of filtering algorithm (FA) through finding candidates for each node mainly from the set of active cluster centers. Two conditions for determining the set of candidate cluster centers for each node from active clusters are developed. Our approach is different from the major available algorithm, which passes no information from one stage of iteration to the next. Theoretical analysis shows that our method can reduce the computational complexity, in terms of the number of distance calculations, of FA at each stage of iteration by a factor of FC/AC, where FC and AC are the numbers of total clusters and active clusters, respectively. Compared with the FA, our algorithm can effectively reduce the computing time and number of distance calculations. It is noted that our proposed algorithm can generate the same clusters as that produced by hard k-means clustering. The superiority of our method is more remarkable when a larger data set with higher dimension is used. 相似文献
3.
In this paper, the conventional k-modes-type algorithms for clustering categorical data are extended by representing the clusters of categorical data with k-populations instead of the hard-type centroids used in the conventional algorithms. Use of a population-based centroid representation makes it possible to preserve the uncertainty inherent in data sets as long as possible before actual decisions are made. The k-populations algorithm was found to give markedly better clustering results through various experiments. 相似文献
4.
David Rebollo-MonederoAuthor Vitae Jordi FornéAuthor Vitae 《Data & Knowledge Engineering》2011,70(10):892-921
We present a multidisciplinary solution to the problems of anonymous microaggregation and clustering, illustrated with two applications, namely privacy protection in databases, and private retrieval of location-based information. Our solution is perturbative, is based on the same privacy criterion used in microdata k-anonymization, and provides anonymity through a substantial modification of the Lloyd algorithm, a celebrated quantization design algorithm, endowed with numerical optimization techniques.Our algorithm is particularly suited to the important problem of k-anonymous microaggregation of databases, with a small integer k representing the number of individual respondents indistinguishable from each other in the published database. Our algorithm also exhibits excellent performance in the problem of clustering or macroaggregation, where k may take on arbitrarily large values. We illustrate its applicability in this second, somewhat less common case, by means of an example of location-based services. Specifically, location-aware devices entrust a third party with accurate location information. This party then uses our algorithm to create distortion-optimized, size-constrained clusters, where k nearby devices share a common centroid location, which may be regarded as a distorted version of the original one. The centroid location is sent back to the devices, which use it when contacting untrusted location-based information providers, in lieu of the exact home location, to enforce k-anonymity.We compare the performance of our novel algorithm to the state-of-the-art microaggregation algorithm MDAV, on both synthetic and standardized real data, which encompass the cases of small and large values of k. The most promising aspect of our proposed algorithm is its capability to maintain the same k-anonymity constraint, while outperforming MDAV by a significant reduction in data distortion, in all the cases considered. 相似文献
5.
Ming-Chao Chiang Author Vitae Chun-Wei Tsai Author Vitae Chu-Sing Yang Author Vitae 《Information Sciences》2011,181(4):716-3410
This paper presents an efficient algorithm, called pattern reduction (PR), for reducing the computation time of k-means and k-means-based clustering algorithms. The proposed algorithm works by compressing and removing at each iteration patterns that are unlikely to change their membership thereafter. Not only is the proposed algorithm simple and easy to implement, but it can also be applied to many other iterative clustering algorithms such as kernel-based and population-based clustering algorithms. Our experiments—from 2 to 1000 dimensions and 150 to 10,000,000 patterns—indicate that with a small loss of quality, the proposed algorithm can significantly reduce the computation time of all state-of-the-art clustering algorithms evaluated in this paper, especially for large and high-dimensional data sets. 相似文献
6.
An image-hiding method with high hiding capacity based on best-block matching and k-means clustering
This paper presents a novel image-hiding method that exhibits a high hiding capacity that allows the embedded important image to be larger than the cover image, a facility that is seldom described in the literature. In the proposed method, the entire important image is divided into many nonoverlapping blocks. For each block of the important image, a block-matching procedure is used to search for the best similar block from a series of numbered candidate blocks. The obtained indices of the best-matching blocks are encoded using Huffman coding scheme, and then recorded in the least-significant-bit planes of the cover image through a monoalphabetic transposition cipher. The proposed method exhibits the following advantages over existing methods: (1) a high hiding capacity such that the embedded important image can be larger than the cover image; (2) a stego-image with a high quality, which improves the secrecy of the hidden image; and (3) a small error between the extracted important image and the original important image indicating that the extracted important image is of acceptable quality. 相似文献
7.
We develop a new non-parametric information theoretic clustering algorithm based on implicit estimation of cluster densities using the k-nearest neighbors (k-nn) approach. Compared to a kernel-based procedure, our hierarchical k-nn approach is very robust with respect to the parameter choices, with a key ability to detect clusters of vastly different scales. Of particular importance is the use of two different values of k, depending on the evaluation of within-cluster entropy or across-cluster cross-entropy, and the use of an ensemble clustering approach wherein different clustering solutions vote in order to obtain the final clustering. We conduct clustering experiments, and report promising results. 相似文献
8.
Clustering is one of the widely used knowledge discovery techniques to reveal structures in a dataset that can be extremely useful to the analyst. In iterative clustering algorithms the procedure adopted for choosing initial cluster centers is extremely important as it has a direct impact on the formation of final clusters. Since clusters are separated groups in a feature space, it is desirable to select initial centers which are well separated. In this paper, we have proposed an algorithm to compute initial cluster centers for k-means algorithm. The algorithm is applied to several different datasets in different dimension for illustrative purposes. It is observed that the newly proposed algorithm has good performance to obtain the initial cluster centers for the k-means algorithm. 相似文献
9.
A k-factor of graph G is defined as a k-regular spanning subgraph of G. For instance, a 2-factor of G is a set of cycles that span G. 2-factors have multiple applications in Graph Theory, Computer Graphics, and Computational Geometry. We define a simple 2-factor as a 2-factor without degenerate cycles. In general, simple k-factors are defined as k-regular spanning subgraphs where no edge is used more than once. We propose a new algorithm for computing simple k-factors for all values of k?2. 相似文献
10.
Adil M. Bagirov Author Vitae Julien Ugon Author VitaeAuthor Vitae 《Pattern recognition》2011,44(4):866-876
The k-means algorithm and its variations are known to be fast clustering algorithms. However, they are sensitive to the choice of starting points and are inefficient for solving clustering problems in large datasets. Recently, incremental approaches have been developed to resolve difficulties with the choice of starting points. The global k-means and the modified global k-means algorithms are based on such an approach. They iteratively add one cluster center at a time. Numerical experiments show that these algorithms considerably improve the k-means algorithm. However, they require storing the whole affinity matrix or computing this matrix at each iteration. This makes both algorithms time consuming and memory demanding for clustering even moderately large datasets. In this paper, a new version of the modified global k-means algorithm is proposed. We introduce an auxiliary cluster function to generate a set of starting points lying in different parts of the dataset. We exploit information gathered in previous iterations of the incremental algorithm to eliminate the need of computing or storing the whole affinity matrix and thereby to reduce computational effort and memory usage. Results of numerical experiments on six standard datasets demonstrate that the new algorithm is more efficient than the global and the modified global k-means algorithms. 相似文献
11.
Jim Z.C. Lai Author Vitae Author Vitae 《Pattern recognition》2010,43(5):1954-1963
In this paper, we present a fast global k-means clustering algorithm by making use of the cluster membership and geometrical information of a data point. This algorithm is referred to as MFGKM. The algorithm uses a set of inequalities developed in this paper to determine a starting point for the jth cluster center of global k-means clustering. Adopting multiple cluster center selection (MCS) for MFGKM, we also develop another clustering algorithm called MFGKM+MCS. MCS determines more than one starting point for each step of cluster split; while the available fast and modified global k-means clustering algorithms select one starting point for each cluster split. Our proposed method MFGKM can obtain the least distortion; while MFGKM+MCS may give the least computing time. Compared to the modified global k-means clustering algorithm, our method MFGKM can reduce the computing time and number of distance calculations by a factor of 3.78-5.55 and 21.13-31.41, respectively, with the average distortion reduction of 5,487 for the Statlog data set. Compared to the fast global k-means clustering algorithm, our method MFGKM+MCS can reduce the computing time by a factor of 5.78-8.70 with the average reduction of distortion of 30,564 using the same data set. The performances of our proposed methods are more remarkable when a data set with higher dimension is divided into more clusters. 相似文献
12.
M. Emre Celebi 《Image and vision computing》2011,29(4):260-271
Color quantization is an important operation with many applications in graphics and image processing. Most quantization methods are essentially based on data clustering algorithms. However, despite its popularity as a general purpose clustering algorithm, k-means has not received much respect in the color quantization literature because of its high computational requirements and sensitivity to initialization. In this paper, we investigate the performance of k-means as a color quantizer. We implement fast and exact variants of k-means with several initialization schemes and then compare the resulting quantizers to some of the most popular quantizers in the literature. Experiments on a diverse set of images demonstrate that an efficient implementation of k-means with an appropriate initialization strategy can in fact serve as a very effective color quantizer. 相似文献
13.
In recent years, there have been numerous attempts to extend the k-means clustering protocol for single database to a distributed multiple database setting and meanwhile keep privacy of each data site. Current solutions for (whether two or more) multiparty k-means clustering, built on one or more secure two-party computation algorithms, are not equally contributory, in other words, each party does not equally contribute to k-means clustering. This may lead a perfidious attack where a party who learns the outcome prior to other parties tells a lie of the outcome to other parties. In this paper, we present an equally contributory multiparty k-means clustering protocol for vertically partitioned data, in which each party equally contributes to k-means clustering. Our protocol is built on ElGamal's encryption scheme, Jakobsson and Juels's plaintext equivalence test protocol, and mix networks, and protects privacy in terms that each iteration of k-means clustering can be performed without revealing the intermediate values. 相似文献
14.
Ruijuan Li 《Information Processing Letters》2010,110(16):651-654
For a positive integer k, a graph G is k-ordered hamiltonian if for every ordered sequence of k vertices there is a hamiltonian cycle that encounters the vertices of the sequence in the given order. In this paper, we show that if G is a ⌊3k/2⌋-connected graph of order n?100k, and d(u)+d(v)?n for any two vertices u and v with d(u,v)=2, then G is k-ordered hamiltonian. Our result implies the theorem of G. Chen et al. [Ars Combin. 70 (2004) 245-255] [1], which requires the degree sum condition for all pairs of non-adjacent vertices, not just those distance 2 apart. 相似文献
15.
We say that a distribution over {0,1}n is (ε,k)-wise independent if its restriction to every k coordinates results in a distribution that is ε-close to the uniform distribution. A natural question regarding (ε,k)-wise independent distributions is how close they are to some k-wise independent distribution. We show that there exist (ε,k)-wise independent distributions whose statistical distance is at least nO(k)·ε from any k-wise independent distribution. In addition, we show that for any (ε,k)-wise independent distribution there exists some k-wise independent distribution, whose statistical distance is nO(k)·ε. 相似文献
16.
To protect individual privacy in data mining, when a miner collects data from respondents, the respondents should remain anonymous. The existing technique of Anonymity-Preserving Data Collection partially solves this problem, but it assumes that the data do not contain any identifying information about the corresponding respondents. On the other hand, the existing technique of Privacy-Enhancing k-Anonymization can make the collected data anonymous by eliminating the identifying information. However, it assumes that each respondent submits her data through an unidentified communication channel. In this paper, we propose k-Anonymous Data Collection, which has the advantages of both Anonymity-Preserving Data Collection and Privacy-Enhancing k-Anonymization but does not rely on their assumptions described above. We give rigorous proofs for the correctness and privacy of our protocol, and experimental results for its efficiency. Furthermore, we extend our solution to the fully malicious model, in which a dishonest participant can deviate from the protocol and behave arbitrarily. 相似文献
17.
Dejan Deli? 《Information Processing Letters》2010,110(16):662-665
Let k be a positive integer, and let G=(V,E) be a graph with minimum degree at least k−1. A function f:V→{−1,1} is said to be a signed k-dominating function (SkDF) if ∑u∈N[v]f(u)?k for every v∈V. An SkDF f of a graph G is minimal if there exists no SkDF g such that g≠f and g(v)?f(v) for every v∈V. The maximum of the values of ∑v∈Vf(v), taken over all minimal SkDFs f, is called the upper signed k-domination numberΓkS(G). In this paper, we present a sharp upper bound on this number for a general graph. 相似文献
18.
The densest k-subgraph (DkS) problem asks for a k-vertex subgraph of a given graph with the maximum number of edges. The DkS problem is NP-hard even for special graph classes including bipartite, planar, comparability and chordal graphs, while no constant approximation algorithm is known for any of these classes. In this paper we present a 3-approximation algorithm for the class of chordal graphs. The analysis of our algorithm is based on a graph theoretic lemma of independent interest. 相似文献
19.
The k-MST is a well known NP-hard problem and several approximation algorithms exist to solve this problem with a guaranteed performance bound. A closely related problem, called the bottleneck k-MST (BMST(k)) can however be solved in O(mlogn) time on graph with n nodes and m edges. We propose two algorithms to solve BMST(k), one of complexity O(m+nlogn) and the other of O(m) time. We also consider a generalization of BMST(k) which subsumes many bottleneck problems studied in the literature and show that this generalized problem can also be solved in O(m) time. 相似文献
20.
In clustering algorithm, one of the main challenges is to solve the global allocation of the clusters instead of just local tuning of the partition borders. Despite this, all external cluster validity indexes calculate only point-level differences of two partitions without any direct information about how similar their cluster-level structures are. In this paper, we introduce a cluster level index called centroid index. The measure is intuitive, simple to implement, fast to compute and applicable in case of model mismatch as well. To a certain extent, we expect it to generalize other clustering models beyond the centroid-based k-means as well. 相似文献