首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A class-consistent k-means clustering algorithm (CCKM) and its hierarchical extension (Hierarchical CCKM) are presented for generating discriminative visual words for recognition problems. In addition to using the labels of training data themselves, we associate a class label with each cluster center to enforce discriminability in the resulting visual words. Our algorithms encourage data points from the same class to be assigned to the same visual word, and those from different classes to be assigned to different visual words. More specifically, we introduce a class consistency term in the clustering process which penalizes assignment of data points from different classes to the same cluster. The optimization process is efficient and bounded by the complexity of k-means clustering. A very efficient and discriminative tree classifier can be learned for various recognition tasks via the Hierarchical CCKM. The effectiveness of the proposed algorithms is validated on two public face datasets and four benchmark action datasets.  相似文献   

2.
During the last decade artificial immune systems have drawn much of the researchers’ attention. All the work that has been done allowed to develop many interesting algorithms which come in useful when solving engineering problems such as data mining and analysis, anomaly detection and many others. Being constantly developed and improved, the algorithms based on immune metaphors have some limitations, though. In this paper we elaborate on the concept of a novel artificial immune algorithm by considering the possibility of combining the clonal selection principle and the well known K-means algorithm. This novel approach and a new way of performing suppression (based on the usefulness of the evolving lymphocytes) in clonal selection result in a very effective and stable immune algorithm for both unsupervised and supervised learning. Further improvements to the cluster analysis by means of the proposed algorithm, immune K-means, are introduced. Different methods for clusters construction are compared, together with multi-point cluster validity index and a novel strategy based on minimal spanning tree (mst) and a analysis of the midpoints of the edges of the (mst). Interesting and useful improvements of the proposed approach by means of negative selection algorithms are proposed and discussed.  相似文献   

3.
Security assessment is a major concern in planning and operation studies of a power system. Conventional method of security evaluation performed by simulation involves long computer time and generates voluminous results. This paper presents a K-means clustering approach for classifying power system states as secure/insecure under a given operating condition and contingency. This paper demonstrates how the traditional K-means clustering algorithm can be profitably modified to be used as a classifier algorithm. The proposed algorithm combines particle swarm optimization (PSO) with the traditional K-means algorithm to satisfy the requirements of a classifier. The proposed PSO based K-means clustering technique is implemented in IEEE 30 Bus, 57 Bus, 118 Bus and 300 Bus standard test systems for static security and transient security evaluation. The simulation results of the proposed algorithm are compared with unsupervised K-means clustering, which uses different methods for cluster center initialization.  相似文献   

4.
This paper proposes a hybrid technique for color image segmentation. First an input image is converted to the image of CIE L*a*b* color space. The color features “a” and “b” of CIE L*a*b* are then fed into fuzzy C-means (FCM) clustering which is an unsupervised method. The labels obtained from the clustering method FCM are used as a target of the supervised feed forward neural network. The network is trained by the Levenberg-Marquardt back-propagation algorithm, and evaluates its performance using mean square error and regression analysis. The main issues of clustering methods are determining the number of clusters and cluster validity measures. This paper presents a method namely co-occurrence matrix based algorithm for finding the number of clusters and silhouette index values that are used for cluster validation. The proposed method is tested on various color images obtained from the Berkeley database. The segmentation results from the proposed method are validated and the classification accuracy is evaluated by the parameters sensitivity, specificity, and accuracy.  相似文献   

5.
This paper proposes a fuzzy clustering-based algorithm for fuzzy modeling. The algorithm incorporates unsupervised learning with an iterative process into a framework, which is based on the use of the weighted fuzzy c-means. In the first step, the learning vector quantization (LVQ) algorithm is exploited as a data pre-processor unit to group the training data into a number of clusters. Since different clusters may contain different number of objects, the centers of these clusters are assigned weight factors, the values of which are calculated by the respective cluster cardinalities. These centers accompanied with their weights are considered to be a new data set, which is further elaborated by an iterative process. This process consists of applying in sequence the weighted fuzzy c-means and the back-propagation algorithm. The application of the weighted fuzzy c-means ensures that the contribution of each cluster center to the final fuzzy partition is determined by its cardinality, meaning that the real data structure can be easier discovered. The algorithm is successfully applied to three test cases, where the produced fuzzy models prove to be very accurate as well as compact in size.  相似文献   

6.
An efficient unsupervised method is developed for automatic segmentation of the area covered by upwelling waters in the coastal ocean of Morocco using the Sea Surface Temperature (SST) satellite images. The proposed approach first uses the two popular unsupervised clustering techniques, k-means and fuzzy c-means (FCM), to provide different possible classifications to each SST image. Then several cluster validity indices are combined in order to determine the optimal number of clusters, followed by a cluster fusion scheme, which merges consecutive clusters to produce a first segmentation of upwelling area. The region-growing algorithm is then used to filter noisy residuals and to extract the final upwelling region. The performance of our algorithm is compared to a popular algorithm used to detect upwelling regions and is validated by an oceanographer over a database of 92 SST images covering each week of the years 2006 and 2007. The results show that our proposed method outperforms the latter algorithm, in terms of segmentation accuracy and computational efficiency.  相似文献   

7.
半监督的改进K-均值聚类算法   总被引:4,自引:1,他引:3       下载免费PDF全文
K-均值聚类算法必须事先获取聚类数目,并且随机地选取聚类初始中心会造成聚类结果不稳定,容易在获得一个局部最优值时终止。提出了一种基于半监督学习理论的改进K-均值聚类算法,利用少量标签数据建立图的最小生成树并迭代分裂获取K-均值聚类算法所需要的聚类数和初始聚类中心。在IRIS数据集上的实验表明,尽管随机样本构造的生成树不同,聚类中心也不同,但聚类是一致且稳定的,迭代的次数较少,验证了该文算法的有效性。  相似文献   

8.
The present paper considers the problem of partitioning a dataset into a known number of clusters using the sum of squared errors criterion (SSE). A new clustering method, called DE-KM, which combines differential evolution algorithm (DE) with the well known K-means procedure is described. In the method, the K-means algorithm is used to fine-tune each candidate solution obtained by mutation and crossover operators of DE. Additionally, a reordering procedure which allows the evolutionary algorithm to tackle the redundant representation problem is proposed. The performance of the DE-KM clustering method is compared to the performance of differential evolution, global K-means method, genetic K-means algorithm and two variants of the K-means algorithm. The experimental results show that if the number of clusters K is sufficiently large, DE-KM obtains solutions with lower SSE values than the other five algorithms.  相似文献   

9.
We introduce a novel clustering algorithm named GAKREM (Genetic Algorithm K-means Logarithmic Regression Expectation Maximization) that combines the best characteristics of the K-means and EM algorithms but avoids their weaknesses such as the need to specify a priori the number of clusters, termination in local optima, and lengthy computations. To achieve these goals, genetic algorithms for estimating parameters and initializing starting points for the EM are used first. Second, the log-likelihood of each configuration of parameters and the number of clusters resulting from the EM is used as the fitness value for each chromosome in the population. The novelty of GAKREM is that in each evolving generation it efficiently approximates the log-likelihood for each chromosome using logarithmic regression instead of running the conventional EM algorithm until its convergence. Another novelty is the use of K-means to initially assign data points to clusters. The algorithm is evaluated by comparing its performance with the conventional EM algorithm, the K-means algorithm, and the likelihood cross-validation technique on several datasets.  相似文献   

10.
Fuzzy c-means (FCM) algorithm is one of the most popular methods for image segmentation. However, the standard FCM algorithm must be estimated by expertise users to determine the cluster number. So, we propose an automatic fuzzy clustering algorithm (AFCM) for automatically grouping the pixels of an image into different homogeneous regions when the number of clusters is not known beforehand. In order to get better segmentation quality, this paper presents an algorithm based on AFCM algorithm, called automatic modified fuzzy c-means cluster segmentation algorithm (AMFCM). AMFCM algorithm incorporates spatial information into the membership function for clustering. The spatial function is the weighted summation of the membership function in the neighborhood of each pixel under consideration. Experimental results show that AMFCM algorithm not only can spontaneously estimate the appropriate number of clusters but also can get better segmentation quality.  相似文献   

11.
This paper studies supervised clustering in the context of label ranking data. The goal is to partition the feature space into K clusters, such that they are compact in both the feature and label ranking space. This type of clustering has many potential applications. For example, in target marketing we might want to come up with K different offers or marketing strategies for our target audience. Thus, we aim at clustering the customers’ feature space into K clusters by leveraging the revealed or stated, potentially incomplete customer preferences over products, such that the preferences of customers within one cluster are more similar to each other than to those of customers in other clusters. We establish several baseline algorithms and propose two principled algorithms for supervised clustering. In the first baseline, the clusters are created in an unsupervised manner, followed by assigning a representative label ranking to each cluster. In the second baseline, the label ranking space is clustered first, followed by partitioning the feature space based on the central rankings. In the third baseline, clustering is applied on a new feature space consisting of both features and label rankings, followed by mapping back to the original feature and ranking space. The RankTree principled approach is based on a Ranking Tree algorithm previously proposed for label ranking prediction. Our modification starts with K random label rankings and iteratively splits the feature space to minimize the ranking loss, followed by re-calculation of the K rankings based on cluster assignments. The MM-PL approach is a multi-prototype supervised clustering algorithm based on the Plackett-Luce (PL) probabilistic ranking model. It represents each cluster with a union of Voronoi cells that are defined by a set of prototypes, and assign each cluster with a set of PL label scores that determine the cluster central ranking. Cluster membership and ranking prediction for a new instance are determined by cluster membership of its nearest prototype. The unknown cluster PL parameters and prototype positions are learned by minimizing the ranking loss, based on two variants of the expectation-maximization algorithm. Evaluation of the proposed algorithms was conducted on synthetic and real-life label ranking data by considering several measures of cluster goodness: (1) cluster compactness in feature space, (2) cluster compactness in label ranking space and (3) label ranking prediction loss. Experimental results demonstrate that the proposed MM-PL and RankTree models are superior to the baseline models. Further, MM-PL is has shown to be much better than other algorithms at handling situations with significant fraction of missing label preferences.  相似文献   

12.
To cluster web documents, all of which have the same name entities, we attempted to use existing clustering algorithms such as K-means and spectral clustering. Unexpectedly, it turned out that these algorithms are not effective to cluster web documents. According to our intensive investigation, we found that clustering such web pages is more complicated because (1) the number of clusters (known as ground truth) is larger than two or three clusters as in general clustering problems and (2) clusters in the data set have extremely skewed distributions of cluster sizes. To overcome the aforementioned problem, in this paper, we propose an effective clustering algorithm to boost up the accuracy of K-means and spectral clustering algorithms. In particular, to deal with skewed distributions of cluster sizes, our algorithm performs both bisection and merge steps based on normalized cuts of the similarity graph G to correctly cluster web documents. Our experimental results show that our algorithm improves the performance by approximately 56% compared to spectral bisection and 36% compared to K-means.  相似文献   

13.
In cluster analysis, determining number of clusters is an important issue because information about the most appropriate number of clusters do not exist in the real-world problems. Automatic clustering is a clustering approach which is able to automatically find the most suitable number of clusters as well as divide the instances into the corresponding clusters. This study proposes a novel automatic clustering algorithm using a hybrid of improved artificial bee colony optimization algorithm and K-means algorithm (iABC). The proposed iABC algorithm improves the onlooker bee exploration scheme by directing their movements to a better location. Instead of using a random neighborhood location, the improved onlooker bee considers the data centroid to find a better initial centroid for the K-means algorithm. To increase efficiency of the improvement, the updating process is only applied on the worst cluster centroid. The proposed iABC algorithm is verified using some benchmark datasets. The computational result indicates that the proposed iABC algorithm outperforms the original ABC algorithm for automatic clustering problem. Furthermore, the proposed iABC algorithm is utilized to solve the customer segmentation problem. The result reveals that the iABC algorithm has better and more stable result than original ABC algorithm.  相似文献   

14.
Fuzzy c-means (FCMs) is an important and popular unsupervised partitioning algorithm used in several application domains such as pattern recognition, machine learning and data mining. Although the FCM has shown good performance in detecting clusters, the membership values for each individual computed to each of the clusters cannot indicate how well the individuals are classified. In this paper, a new approach to handle the memberships based on the inherent information in each feature is presented. The algorithm produces a membership matrix for each individual, the membership values are between zero and one and measure the similarity of this individual to the center of each cluster according to each feature. These values can change at each iteration of the algorithm and they are different from one feature to another and from one cluster to another in order to increase the performance of the fuzzy c-means clustering algorithm. To obtain a fuzzy partition by class of the input data set, a way to compute the class membership values is also proposed in this work. Experiments with synthetic and real data sets show that the proposed approach produces good quality of clustering.  相似文献   

15.
目的 高光谱图像波段数目巨大,导致在解译及分类过程中出现“维数灾难”的现象。针对该问题,在K-means聚类算法基础上,考虑各个波段对不同聚类的重要程度,同时顾及类间信息,提出一种基于熵加权K-means全局信息聚类的高光谱图像分类算法。方法 首先,引入波段权重,用来刻画各个波段对不同聚类的重要程度,并定义熵信息测度表达该权重。其次,为避免局部最优聚类,引入类间距离测度实现全局最优聚类。最后,将上述两类测度引入K-means聚类目标函数,通过最小化目标函数得到最优分类结果。结果 为了验证提出的高光谱图像分类方法的有效性,对Salinas高光谱图像和Pavia University高光谱图像标准图中的地物类别根据其光谱反射率差异程度进行合并,将合并后的标准图作为新的标准分类图。分别采用本文算法和传统K-means算法对Salinas高光谱图像和Pavia University高光谱图像进行实验,并定性、定量地评价和分析了实验结果。对于图像中合并后的地物类别,光谱反射率差异程度大,从视觉上看,本文算法较传统K-means算法有更好的分类结果;从分类精度看,本文算法的总精度分别为92.20%和82.96%, K-means算法的总精度分别为83.39%和67.06%,较K-means算法增长8.81%和15.9%。结论 提出一种基于熵加权K-means全局信息聚类的高光谱图像分类算法,实验结果表明,本文算法对高光谱图像中具有不同光谱反射率差异程度的各类地物目标均能取得很好的分类结果。  相似文献   

16.
Source recording device recognition is an important emerging research field in digital media forensics. The literature has mainly focused on the source recording device identification problem, whereas few studies have focused on the source recording device verification problem. Sparse representation based classification methods have shown promise for many applications. This paper proposes a source cell phone verification scheme based on sparse representation. It can be further divided into three schemes which utilize exemplar dictionary, unsupervised learned dictionary and supervised learned dictionary respectively. Specifically, the discriminative dictionary learned by supervised learning algorithm, which considers the representational and discriminative power simultaneously compared to the unsupervised learning algorithm, is utilized to further improve the performances of verification systems based on sparse representation. Gaussian supervectors (GSVs) based on MFCCs, which have shown to be effective in capturing the intrinsic characteristics of recording devices, are utilized for constructing and learning dictionary. SCUTPHONE, which is a corpus of speech recordings from 15 cell phones, is presented. Evaluation experiments are conducted on three corpora of speech recordings from cell phones and demonstrate the effectiveness of the proposed methods for cell phone verification. In addition, the influences of number of target examples in the exemplar dictionary and size of the unsupervised learned dictionary on source cell phone verification performance are also analyzed.  相似文献   

17.
一种半监督K均值多关系数据聚类算法   总被引:1,自引:0,他引:1  
高滢  刘大有  齐红  刘赫 《软件学报》2008,19(11):2814-2821
提出了一种半监督K均值多关系数据聚类算法.该算法在K均值聚类算法的基础上扩展了其初始类簇的选择方法和对象相似性度量方法,以用于多关系数据的半监督学习.为了获取高性能,该算法在聚类过程中充分利用了标记数据、对象属性及各种关系信息.多关系数据库Movie上的实验结果验证了该算法的有效性.  相似文献   

18.
通过对k-平均算法存在不足的分析,提出了一种基于Ward’s方法的k-平均优化算法。算法首先在用Ward’s方法对样本数据初步聚类的基础上,确定合适的簇数目、初始聚类中心等k-平均算法的初始参数,并进行孤立点检测、删除;基于上述处理再采用传统k-平均算法进行聚类。将优化的k-平均算法应用到罪犯人格类型分析中,实验结果表明,该算法的效率、聚类效果均明显优于传统k-平均算法。  相似文献   

19.
一种基于人工鱼群的混合聚类算法   总被引:2,自引:0,他引:2  
聚类分析是数据挖掘的核心技术之一,它是一种无导师监督的模式识别方式。聚类分析就是按照数据间的相似程度,依据特定的准则将数据划分成不同子类。文中通过分析K-平均算法的优缺点,提出了一种基于人工鱼群算法的聚类分析算法,并把它与传统的K-平均算法结合得到一种新的混合聚类算法。仿真实验表明,该算法是有效的,具有聚类速度快、精度高特点。  相似文献   

20.
《Pattern recognition letters》2001,22(6-7):603-610
The proposed stochastic K-means algorithm (SKA) associates a vector with a cluster according to a probability distribution, which depends on the distance between the vector and the cluster gravity centre. It is less dependent than the K-means algorithm (KMA) on the initial centre choice. It can reach local minima closer to the global minimum of a distortion measure than the KMA. It has been applied to vector quantization of speech signals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号