首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
In cluster analysis, determining number of clusters is an important issue because information about the most appropriate number of clusters do not exist in the real-world problems. Automatic clustering is a clustering approach which is able to automatically find the most suitable number of clusters as well as divide the instances into the corresponding clusters. This study proposes a novel automatic clustering algorithm using a hybrid of improved artificial bee colony optimization algorithm and K-means algorithm (iABC). The proposed iABC algorithm improves the onlooker bee exploration scheme by directing their movements to a better location. Instead of using a random neighborhood location, the improved onlooker bee considers the data centroid to find a better initial centroid for the K-means algorithm. To increase efficiency of the improvement, the updating process is only applied on the worst cluster centroid. The proposed iABC algorithm is verified using some benchmark datasets. The computational result indicates that the proposed iABC algorithm outperforms the original ABC algorithm for automatic clustering problem. Furthermore, the proposed iABC algorithm is utilized to solve the customer segmentation problem. The result reveals that the iABC algorithm has better and more stable result than original ABC algorithm.  相似文献   

2.
Cellular Learning Automata (CLAs) are hybrid models obtained from combination of Cellular Automata (CAs) and Learning Automata (LAs). These models can be either open or closed. In closed CLAs, the states of neighboring cells of each cell called local environment affect on the action selection process of the LA of that cell whereas in open CLAs, each cell, in addition to its local environment has an exclusive environment which is observed by the cell only and the global environment which can be observed by all the cells in CLA. In dynamic models of CLAs, one of their aspects such as structure, local rule or neighborhood radius may change during the evolution of the CLA. CLAs can also be classified as synchronous CLAs or asynchronous CLAs. In a synchronous CLA, all LAs in different cells are activated synchronously whereas in an asynchronous CLA, the LAs in different cells are activated asynchronously. In this paper, a new closed asynchronous dynamic model of CLA whose structure and the number of LAs in each cell may vary with time has been introduced. To show the potential of the proposed model, a landmark clustering algorithm for solving topology mismatch problem in unstructured peer-to-peer networks has been proposed. To evaluate the proposed algorithm, computer simulations have been conducted and then the results are compared with the results obtained for two existing algorithms for solving topology mismatch problem. It has been shown that the proposed algorithm is superior to the existing algorithms with respect to communication delay and average round-trip time between peers within clusters.  相似文献   

3.
Recently, sparse subspace clustering, as a subspace learning technique, has been successfully applied to several computer vision applications, e.g. face clustering and motion segmentation. The main idea of sparse subspace clustering is to learn an effective sparse representation that are used to construct an affinity matrix for spectral clustering. While most of existing sparse subspace clustering algorithms and its extensions seek the forms of convex relaxation, the use of non-convex and non-smooth l q (0 < q < 1) norm has demonstrated better recovery performance. In this paper we propose an l q norm based Sparse Subspace Clustering method (lqSSC), which is motivated by the recent work that l q norm can enhance the sparsity and make better approximation to l 0 than l 1. However, the optimization of l q norm with multiple constraints is much difficult. To solve this non-convex problem, we make use of the Alternating Direction Method of Multipliers (ADMM) for solving the l q norm optimization, updating the variables in an alternating minimization way. ADMM splits the unconstrained optimization into multiple terms, such that the l q norm term can be solved via Smooth Iterative Reweighted Least Square (SIRLS), which converges with guarantee. Different from traditional IRLS algorithms, the proposed algorithm is based on gradient descent with adaptive weight, making it well suit for general sparse subspace clustering problem. Experiments on computer vision tasks (synthetic data, face clustering and motion segmentation) demonstrate that the proposed approach achieves considerable improvement of clustering accuracy than the convex based subspace clustering methods.  相似文献   

4.
Emergence of MapReduce (MR) framework for scaling data mining and machine learning algorithms provides for Volume, while handling of Variety and Velocity needs to be skilfully crafted in algorithms. So far, scalable clustering algorithms have focused solely on Volume, taking advantage of the MR framework. In this paper we present a MapReduce algorithm—data aware scalable clustering (DASC), which is capable of handling the 3 Vs of big data by virtue of being (i) single scan and distributed to handle Volume, (ii) incremental to cope with Velocity and (iii) versatile in handling numeric and categorical data to accommodate Variety. DASC algorithm incrementally processes infinitely growing data set stored on distributed file system and delivers quality clustering scheme while ensuring recency of patterns. The up-to-date synopsis is preserved by the algorithm for the data seen so far. Each new data increment is processed and merged with the synopsis. Since the synopsis itself may grow very large in size, the algorithm stores it as a file. This makes DASC algorithm truly scalable. Exclusive clusters are obtained on demand by applying connected component analysis (CCA) algorithm over the synopsis. CCA presents subtle roadblock to effective parallelism during clustering. This problem is overcome by accomplishing the task in two stages. In the first stage, hyperclusters are identified based on prevailing data characteristics. The second stage utilizes this knowledge to determine the degree of parallelism, thereby making DASC data aware. Hyperclusters are distributed over the available compute nodes for discovering embedded clusters in parallel. Staged approach for clustering yields dual advantage of improved parallelism and desired complexity in \(\mathcal {MRC}^0\) class. DASC algorithm is empirically compared with incremental Kmeans and Scalable Kmeans++ algorithms. Experimentation on real-world and synthetic data with approximately 1.2 billion data points demonstrates effectiveness of DASC algorithm. Empirical observations of DASC execution are in consonance with the theoretical analysis with respect to stability in resources utilization and execution time.  相似文献   

5.
Unsupervised technique like clustering may be used for software cost estimation in situations where parametric models are difficult to develop. This paper presents a software cost estimation model based on a modified K-Modes clustering algorithm. The aims of this paper are: first, the modified K-Modes clustering which is an enhancement over the simple K-Modes algorithm using a proper dissimilarity measure for mixed data types, is presented and second, the proposed K-Modes algorithm is applied for software cost estimation. We have compared our modified K-Modes algorithm with existing algorithms on different software cost estimation datasets, and results showed the effectiveness of our proposed algorithm.  相似文献   

6.
Instagram is a popular photo-sharing social application. It is widely used by tourists to record their journey information such as location, time and interest. Consequently, a huge volume of geo-tagged photos with spatio-temporal information are generated along tourist’s travel trajectories. Such Instagram photo trajectories consist of travel paths, travel density distributions, and traveller behaviors, preferences, and mobility patterns. Mining Instagram photo trajectories is thus very useful for many mobile and location-based social applications, including tour guide and recommender systems. However, we have not found any work that extracts interesting group-like travel trajectories from Instagram photos asynchronously taken by different tourists. Motivated by this, we propose a novel concept: coterie, which reveals representative travel trajectory patterns hidden in Instagram photos taken by users at shared locations and paths. Our work includes the discovery of (1) coteries, (2) closed coteries, and (3) the recommendation of popular travel routes based on closed coteries. For this, we first build a statistically reliable trajectory database from Instagram geo-tagged photos. These trajectories are then clustered by the DBSCAN method to find tourist density. Next, we transform each raw spatio-temporal trajectory into a sequence of clusters. All discriminative closed coteries are further identified by a Cluster-Growth algorithm. Finally, distance-aware and conformityaware recommendation strategies are applied on closed coteries to recommend popular tour routes. Visualized demos and extensive experimental results demonstrate the effectiveness and efficiency of our methods.  相似文献   

7.
Individual privacy may be compromised during the process of mining for valuable information, and the potential for data mining is hindered by the need to preserve privacy. It is well known that k-means clustering algorithms based on differential privacy require preserving privacy while maintaining the availability of clustering. However, it is difficult to balance both aspects in traditional algorithms. In this paper, an outlier-eliminated differential privacy (OEDP) k-means algorithm is proposed that both preserves privacy and improves clustering efficiency. The proposed approach selects the initial centre points in accordance with the distribution density of data points, and adds Laplacian noise to the original data for privacy preservation. Both a theoretical analysis and comparative experiments were conducted. The theoretical analysis shows that the proposed algorithm satisfies ε-differential privacy. Furthermore, the experimental results show that, compared to other methods, the proposed algorithm effectively preserves data privacy and improves the clustering results in terms of accuracy, stability, and availability.  相似文献   

8.
In this paper, a steganographic scheme adopting the concept of the generalized K d -distance N-dimensional pixel matching is proposed. The generalized pixel matching embeds a B-ary digit (B is a function of K and N) into a cover vector of length N, where the order-d Minkowski distance-measured embedding distortion is no larger than K. In contrast to other pixel matching-based schemes, a N-dimensional reference table is used. By choosing d, K, and N adaptively, an embedding strategy which is suitable for arbitrary relative capacity can be developed. Additionally, an optimization algorithm, namely successive iteration algorithm (SIA), is proposed to optimize the codeword assignment in the reference table. Benefited from the high dimensional embedding and the optimization algorithm, nearly maximal embedding efficiency is achieved. Compared with other content-free steganographic schemes, the proposed scheme provides better image quality and statistical security. Moreover, the proposed scheme performs comparable to state-of-the-art content-based approaches after combining with image models.  相似文献   

9.
Clustering a large volume of data in a distributed environment is a challenging issue. Data stored across multiple machines are huge in size, and solution space is large. Genetic algorithm deals effectively with larger solution space and provides better solution. In this paper, we proposed a novel clustering algorithm for distributed datasets, using combination of genetic algorithm (GA) with Mahalanobis distance and k-means clustering algorithm. The proposed algorithm is two phased; in phase 1, GA is applied in parallel on data chunks located across different machines. Mahalanobis distance is used as fitness value in GA, which considers covariance between the data points and thus provides a better representation of initial data. K-means with K-means\( ++ \) initialization is applied in phase 2 on intermediate output to get final result. The proposed algorithm is implemented on Hadoop framework, which is inherently designed to deal with distributed datasets in a fault-tolerant manner. Extensive experiments were conducted for multiple real-life and synthetic datasets to measure performance of our proposed algorithm. Results were compared with MapReduce-based algorithms, mrk-means, parallel k-means and scaling GA.  相似文献   

10.
Huge volume of data over several domains demands the development of new more efficient tools for search, analysis, and interpretation. Clustering approaches represent an important step in exploring the internal structure and relationships in datasets. In this study, the cognitively motivated neural network Freeman K3-set was applied as a filter to preprocess the data, achieving a better clustering performance. We combine K3 with a variety of clustering algorithms commonly used, and tested its performance using standard UCI datasets and also datasets from social networks. A comprehensive evaluation using a number of cluster validation measures shows significant improvement in the overall performance of the K3-based clustering method for social data sets, for two types of clustering validation measures. Additionally, K3 filtering results in transparent representation of data, which leads to improved efficiency of data processing algorithms used.  相似文献   

11.
Traditional elasticity imaging systems use short pulses with low sound power, causing the signal to be attenuated severely in deep zones. On the basis of the coded excitation and spatial composition theorems, an ultrasonic elastography optimization algorithm is proposed in this paper. It takes advantage of coded excitation and spatial compounding such as high peak power and average sound power, suppresses speckle noise, and improves the imaging quality effectively. Specifically, a coded excitation system encodes the long pulses when transmitting, and then decodes the long pulses into short pulses upon receiving. This increases the average sound power of the beam without sacrificing the spatial resolution. A imaging system based on coded excitation can therefore achieve a good signal-to-noise ratio (SNR e ) and contrast-to-noise ratio (CNR e ) in deep zones below the detection surface. The proposed algorithm combines coded excitation with a filter-group based spatial compounding algorithm at the receiving terminal. Finally, experimental results show that the proposed algorithm yields a higher SNR e and CNR e than using chirp coded excitation or spatial compounding alone.  相似文献   

12.
This paper introduces α-systems of differential inclusions on a bounded time interval [t0, ?] and defines α-weakly invariant sets in [t0, ?] × ?n, where ?n is a phase space of the differential inclusions. We study the problems connected with bringing the motions (trajectories) of the differential inclusions from an α-system to a given compact set M ? ?n at the moment ? (the approach problems). The issues of extracting the solvability set W ? [t0, ?] × ?n in the problem of bringing the motions of an α-system to M and the issues of calculating the maximal α-weakly invariant set Wc ? [t0, ?] × ?n are also discussed. The notion of the quasi-Hamiltonian of an α-system (α-Hamiltonian) is proposed, which seems important for the problems of bringing the motions of the α-system to M.  相似文献   

13.
We consider the planning problem for freight transportation between two railroad stations. We are required to fulfill orders (transport cars by trains) that arrive at arbitrary time moments and have different value (weight). The speed of trains moving between stations may be different. We consider problem settings with both fixed and undefined departure times for the trains. For the problem with fixed train departure times we propose an algorithm for minimizing the weighted lateness of orders with time complexity O(qn 2 log n) operations, where q is the number of trains and n is the number of orders. For the problem with undefined train departure and arrival times we construct a Pareto optimal set of schedules optimal with respect to criteria wL max and C max in O(n 2 max{n log n, q log v}) operations, where v is the number of time windows during which the trains can depart. The proposed algorithm allows to minimize both weighted lateness wL max and total time of fulfilling freight delivery orders C max.  相似文献   

14.
Consideration was given to the classical NP-hard problem 1|rj|Lmax of the scheduling theory. An algorithm to determine the optimal schedule of processing n jobs where the job parameters satisfy a system of linear constraints was presented. The polynomially solvable area of the problem 1|rj|Lmax was expanded. An algorithm was described to construct a Pareto-optimal set of schedules by the criteria Lmax and Cmax for complexity of O(n3logn) operations.  相似文献   

15.
Given a tree T=(V,E) of n nodes such that each node v is associated with a value-weight pair (val v ,w v ), where value val v is a real number and weight w v is a non-negative integer, the density of T is defined as \(\frac{\sum_{v\in V}{\mathit{val}}_{v}}{\sum_{v\in V}w_{v}}\). A subtree of T is a connected subgraph (V′,E′) of T, where V′?V and E′?E. Given two integers w min? and w max?, the weight-constrained maximum-density subtree problem on T is to find a maximum-density subtree T′=(V′,E′) satisfying w min?≤∑vV w v w max?. In this paper, we first present an O(w max? n)-time algorithm to find a weight-constrained maximum-density path in a tree T, and then present an O(w max? 2 n)-time algorithm to find a weight-constrained maximum-density subtree in T. Finally, given a node subset S?V, we also present an O(w max? 2 n)-time algorithm to find a weight-constrained maximum-density subtree in T which covers all the nodes in S.  相似文献   

16.
We introduce the notion of a q-ary (r 0, r 1, ..., r q?1) superimposed code. We obtain upper and lower asymptotic bounds for the rate of these (colored) codes.  相似文献   

17.
The well known clustering algorithm DBSCAN is founded on the density notion of clustering. However, the use of global density parameter ε-distance makes DBSCAN not suitable in varying density datasets. Also, guessing the value for the same is not straightforward. In this paper, we generalise this algorithm in two ways. First, adaptively determine the key input parameter ε-distance, which makes DBSCAN independent of domain knowledge satisfying the unsupervised notion of clustering. Second, the approach of deriving ε-distance based on checking the data distribution of each dimension makes the approach suitable for subspace clustering, which detects clusters enclosed in various subspaces of high dimensional data. Experimental results illustrate that our approach can efficiently find out the clusters of varying sizes, shapes as well as varying densities.  相似文献   

18.
The notion of the equivalence of vertex labelings on a given graph is introduced. The equivalence of three bimagic labelings for regular graphs is proved. A particular solution is obtained for the problem of the existence of a 1-vertex bimagic vertex labeling of multipartite graphs, namely, for graphs isomorphic with Kn, n, m. It is proved that the sequence of bi-regular graphs Kn(ij)?=?((Kn???1???M)?+?K1)???(unui)???(unuj) admits 1-vertex bimagic vertex labeling, where ui, uj is any pair of non-adjacent vertices in the graph Kn???1???M, un is a vertex of K1, M is perfect matching of the complete graph Kn???1. It is established that if an r-regular graph G of order n is distance magic, then graph G + G has a 1-vertex bimagic vertex labeling with magic constants (n?+?1)(n?+?r)/2?+?n2 and (n?+?1)(n?+?r)/2?+?nr. Two new types of graphs that do not admit 1-vertex bimagic vertex labelings are defined.  相似文献   

19.
Nowadays, location-based services (LBS) are facilitating people in daily life through answering LBS queries. However, privacy issues including location privacy and query privacy arise at the same time. Existing works for protecting query privacy either work on trusted servers or fail to provide sufficient privacy guarantee. This paper combines the concepts of differential privacy and k-anonymity to propose the notion of differentially private k-anonymity (DPkA) for query privacy in LBS. We recognize the sufficient and necessary condition for the availability of 0-DPkA and present how to achieve it. For cases where 0-DPkA is not achievable, we propose an algorithm to achieve ??-DPkA with minimized ??. Extensive simulations are conducted to validate the proposed mechanisms based on real-life datasets and synthetic data distributions.  相似文献   

20.
Clustering is a powerful machine learning technique that groups “similar” data points based on their characteristics. Many clustering algorithms work by approximating the minimization of an objective function, namely the sum of within-the-cluster distances between points. The straightforward approach involves examining all the possible assignments of points to each of the clusters. This approach guarantees the solution will be a global minimum; however, the number of possible assignments scales quickly with the number of data points and becomes computationally intractable even for very small datasets. In order to circumvent this issue, cost function minima are found using popular local search-based heuristic approaches such as k-means and hierarchical clustering. Due to their greedy nature, such techniques do not guarantee that a global minimum will be found and can lead to sub-optimal clustering assignments. Other classes of global search-based techniques, such as simulated annealing, tabu search, and genetic algorithms, may offer better quality results but can be too time-consuming to implement. In this work, we describe how quantum annealing can be used to carry out clustering. We map the clustering objective to a quadratic binary optimization problem and discuss two clustering algorithms which are then implemented on commercially available quantum annealing hardware, as well as on a purely classical solver “qbsolv.” The first algorithm assigns N data points to K clusters, and the second one can be used to perform binary clustering in a hierarchical manner. We present our results in the form of benchmarks against well-known k-means clustering and discuss the advantages and disadvantages of the proposed techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号