首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents an online algorithm for mixture model-based clustering. Mixture modeling is the problem of identifying and modeling components in a given set of data. The online algorithm is based on unsupervised learning of finite Dirichlet mixtures and a stochastic approach for estimates updating. For the selection of the number of clusters, we use the minimum message length (MML) approach. The proposed method is validated by synthetic data and by an application concerning the dynamic summarization of image databases.  相似文献   

2.
Positive vectors clustering using inverted Dirichlet finite mixture models   总被引:1,自引:0,他引:1  
In this work we present an unsupervised algorithm for learning finite mixture models from multivariate positive data. Indeed, this kind of data appears naturally in many applications, yet it has not been adequately addressed in the past. This mixture model is based on the inverted Dirichlet distribution, which offers a good representation and modeling of positive non-Gaussian data. The proposed approach for estimating the parameters of an inverted Dirichlet mixture is based on the maximum likelihood (ML) using Newton Raphson method. We also develop an approach, based on the minimum message length (MML) criterion, to select the optimal number of clusters to represent the data using such a mixture. Experimental results are presented using artificial histograms and real data sets. The challenging problem of software modules classification is investigated within the proposed statistical framework, also.  相似文献   

3.
This paper proposes an unsupervised algorithm for learning a finite mixture of scaled Dirichlet distributions. Parameters estimation is based on the maximum likelihood approach, and the minimum message length (MML) criterion is proposed for selecting the optimal number of components. This research work is motivated by the flexibility issues of the Dirichlet distribution, the widely used model for multivariate proportional data, which has prompted a number of scholars to search for generalizations of the Dirichlet. By introducing the extra parameters of the scaled Dirichlet, several useful statistical models could be obtained. Experimental results are presented using both synthetic and real datasets. Moreover, challenging real-world applications are empirically investigated to evaluate the efficiency of our proposed statistical framework.  相似文献   

4.
机器学习的无监督聚类算法已被广泛应用于各种目标识别任务。基于密度峰值的快速搜索聚类算法(DPC)能快速有效地确定聚类中心点和类个数,但在处理复杂分布形状的数据和高维图像数据时仍存在聚类中心点不容易确定、类数偏少等问题。为了提高其处理复杂高维数据的鲁棒性,文中提出了一种基于学习特征表示的密度峰值快速搜索聚类算法(AE-MDPC)。该算法采用无监督的自动编码器(AutoEncoder)学出数据的最优特征表示,结合能刻画数据全局一致性的流形相似性,提高了同类数据间的紧致性和不同类数据间的分离性,促使潜在类中心点的密度值成为局部最大。在4个人工数据集和4个真实图像数据集上将AE-MDPC与经典的K-means,DBSCAN,DPC算法以及结合了PCA的DPC算法进行比较。实验结果表明,在外部评价指标聚类精度、内部评价指标调整互信息和调整兰德指数上,AE-MDPC的聚类性能优于对比算法,而且提供了更好的可视化性能。总之,基于特征表示学习且结合流形距离的AE-MDPC算法能有效地处理复杂流形数据和高维图像数据。  相似文献   

5.
Bayesian Ying-Yang (BYY) learning has provided a new mechanism that makes parameter learning with automated model selection via maximizing a harmony function on a backward architecture of the BYY system for the Gaussian mixture. However, since there are a large number of local maxima for the harmony function, any local searching algorithm, such as the hard-cut EM algorithm, does not work well. In order to overcome this difficulty, we propose a simulated annealing learning algorithm to search the global maximum of the harmony function, being expressed as a kind of deterministic annealing EM procedure. It is demonstrated by the simulation experiments that this BYY annealing learning algorithm can efficiently and automatically determine the number of clusters or Gaussians during the learning process. Moreover, the BYY annealing learning algorithm is successfully applied to two real-life data sets, including Iris data classification and unsupervised color image segmentation.  相似文献   

6.
This paper discusses the unsupervised learning problem for finite mixtures of Gamma distributions. An important part of this problem is determining the number of clusters which best describes a set of data. We apply the Minimum Message Length (MML) criterion to the unsupervised learning problem in the case of finite mixtures of Gamma distributions. The MML and other criteria in the literature are compared in terms of their ability to estimate the number of clusters in a data set. The comparison utilizes synthetic and RADARSAT SAR images. The performance of our method is also tested by contextual evaluations involving SAR image segmentation and change detection.  相似文献   

7.
This paper proposes a fuzzy clustering-based algorithm for fuzzy modeling. The algorithm incorporates unsupervised learning with an iterative process into a framework, which is based on the use of the weighted fuzzy c-means. In the first step, the learning vector quantization (LVQ) algorithm is exploited as a data pre-processor unit to group the training data into a number of clusters. Since different clusters may contain different number of objects, the centers of these clusters are assigned weight factors, the values of which are calculated by the respective cluster cardinalities. These centers accompanied with their weights are considered to be a new data set, which is further elaborated by an iterative process. This process consists of applying in sequence the weighted fuzzy c-means and the back-propagation algorithm. The application of the weighted fuzzy c-means ensures that the contribution of each cluster center to the final fuzzy partition is determined by its cardinality, meaning that the real data structure can be easier discovered. The algorithm is successfully applied to three test cases, where the produced fuzzy models prove to be very accurate as well as compact in size.  相似文献   

8.
Self-splitting competitive learning: a new on-line clusteringparadigm   总被引:2,自引:0,他引:2  
Clustering in the neural-network literature is generally based on the competitive learning paradigm. The paper addresses two major issues associated with conventional competitive learning, namely, sensitivity to initialization and difficulty in determining the number of prototypes. In general, selecting the appropriate number of prototypes is a difficult task, as we do not usually know the number of clusters in the input data a priori. It is therefore desirable to develop an algorithm that has no dependency on the initial prototype locations and is able to adaptively generate prototypes to fit the input data patterns. We present a new, more powerful competitive learning algorithm, self-splitting competitive learning (SSCL), that is able to find the natural number of clusters based on the one-prototype-take-one-cluster (OPTOC) paradigm and a self-splitting validity measure. It starts with a single prototype randomly initialized in the feature space and splits adaptively during the learning process until all clusters are found; each cluster is associated with a prototype at its center. We have conducted extensive experiments to demonstrate the effectiveness of the SSCL algorithm. The results show that SSCL has the desired ability for a variety of applications, including unsupervised classification, curve detection, and image segmentation.  相似文献   

9.
Competitive learning approaches with individual penalization or cooperation mechanisms have the attractive ability of automatic cluster number selection in unsupervised data clustering. In this paper, we further study these two mechanisms and propose a novel learning algorithm called Cooperative and Penalized Competitive Learning (CPCL), which implements the cooperation and penalization mechanisms simultaneously in a single competitive learning process. The integration of these two different kinds of competition mechanisms enables the CPCL to locate the cluster centers more quickly and be insensitive to the number of seed points and their initial positions. Additionally, to handle nonlinearly separable clusters, we further introduce the proposed competition mechanism into kernel clustering framework. Correspondingly, a new kernel-based competitive learning algorithm which can conduct nonlinear partition without knowing the true cluster number is presented. The promising experimental results on real data sets demonstrate the superiority of the proposed methods.  相似文献   

10.
Clustering is an important unsupervised learning technique widely used to discover the inherent structure of a given data set. Some existing clustering algorithms uses single prototype to represent each cluster, which may not adequately model the clusters of arbitrary shape and size and hence limit the clustering performance on complex data structure. This paper proposes a clustering algorithm to represent one cluster by multiple prototypes. The squared-error clustering is used to produce a number of prototypes to locate the regions of high density because of its low computational cost and yet good performance. A separation measure is proposed to evaluate how well two prototypes are separated. Multiple prototypes with small separations are grouped into a given number of clusters in the agglomerative method. New prototypes are iteratively added to improve the poor cluster separations. As a result, the proposed algorithm can discover the clusters of complex structure with robustness to initial settings. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the proposed clustering algorithm.  相似文献   

11.
Probabilistic self-organizing map and radial basis function networks   总被引:2,自引:0,他引:2  
F. Anouar  F. Badran  S. Thiria   《Neurocomputing》1998,20(1-3):83-96
We propose in this paper a new learning algorithm probabilistic self-organizing map (PRSOM) using a probabilistic formalism for topological maps. This algorithm approximates the density distribution of the input set with a mixture of normal distributions. The unsupervised learning is based on the dynamic clusters principle and optimizes the likelihood function. A supervised version of this algorithm based on radial basis functions (RBF) is proposed. In order to validate the theoretical approach, we achieve regression tasks on simulated and real data using the PRSOM algorithm. Moreover, our results are compared with normalized Gaussian basis functions (NGBF) algorithm.  相似文献   

12.
This paper proposes an image segmentation approach for multispectral remote sensing imagery based on rival penalized controlled competitive learning (RPCCL) and fuzzy entropy. In this approach, the clustering center component for each band of the image is first chosen based on the fuzzy entropy histogram of the corresponding band of the image. The initial clustering centers are then formed by combining the obtained clustering center components. The number of clusters and the real clustering centers are then determined by the use of the RPCCL method. The advantages of the proposed approach are the appropriate initial cluster centers and the fact that the number of clusters is determined automatically. The results of the experiments showed that without providing the number of clustering centers before the clustering operation, the proposed method can effectively perform an unsupervised segmentation of remote sensing images.  相似文献   

13.
We propose a new clustering algorithm, called SyMP, which is based on synchronization of pulse-coupled oscillators. SyMP represents each data point by an Integrate-and-Fire oscillator and uses the relative similarity between the points to model the interaction between the oscillators. SyMP is robust to noise and outliers, determines the number of clusters in an unsupervised manner, and identifies clusters of arbitrary shapes. The robustness of SyMP is an intrinsic property of the synchronization mechanism. To determine the optimum number of clusters, SyMP uses a dynamic and cluster dependent resolution parameter. To identify clusters of various shapes, SyMP models each cluster by an ensemble of Gaussian components. SyMP does not require the specification of the number of components for each cluster. This number is automatically determined using a dynamic intra-cluster resolution parameter. Clusters with simple shapes would be modeled by few components while clusters with more complex shapes would require a larger number of components. The proposed clustering approach is empirically evaluated with several synthetic data sets, and its performance is compared with GK and CURE. To illustrate the performance of SyMP on real and high-dimensional data sets, we use it to categorize two image databases.  相似文献   

14.
传统的聚类方法大都是基于空间划分的方法,一般都假设数据符合混合高斯模型。这在实际应用中往往是不成立的。在大部分模式分类的问题中,常见的参数形式不适合实际遇到的概率密度,特别是所有经典的参数密度都是单峰的,而一般遥感图像都是包含多峰的密度,因此分类结果往往不够精确。用于模式分类的非参数方法正是解决这类问题的一个重要途径,可以从本质上克服这一缺陷,而且可以发现任意形状的聚类。均值漂移方法是基于密度估计的非参数聚类方法,遥感图像的聚类分析可以通过均值漂移方法来实现,而且均值漂移过程不需要预先给出地物的类别数目,在聚类过程中自动确定类别数,这对于图像中类别数目不易确定的情况,给非监督遥感图像聚类带来方便。  相似文献   

15.
We consider the problem of determining the structure of high-dimensional data, without prior knowledge of the number of clusters. Data are represented by a finite mixture model based on the generalized Dirichlet distribution. The generalized Dirichlet distribution has a more general covariance structure than the Dirichlet distribution and offers high flexibility and ease of use for the approximation of both symmetric and asymmetric distributions. This makes the generalized Dirichlet distribution more practical and useful. An important problem in mixture modeling is the determination of the number of clusters. Indeed, a mixture with too many or too few components may not be appropriate to approximate the true model. Here, we consider the application of the minimum message length (MML) principle to determine the number of clusters. The MML is derived so as to choose the number of clusters in the mixture model which best describes the data. A comparison with other selection criteria is performed. The validation involves synthetic data, real data clustering, and two interesting real applications: classification of web pages, and texture database summarization for efficient retrieval.  相似文献   

16.
In this paper, we propose a Bayesian nonparametric approach for modeling and selection based on a mixture of Dirichlet processes with Dirichlet distributions, which can also be seen as an infinite Dirichlet mixture model. The proposed model uses a stick-breaking representation and is learned by a variational inference method. Due to the nature of Bayesian nonparametric approach, the problems of overfitting and underfitting are prevented. Moreover, the obstacle of estimating the correct number of clusters is sidestepped by assuming an infinite number of clusters. Compared to other approximation techniques, such as Markov chain Monte Carlo (MCMC), which require high computational cost and whose convergence is difficult to diagnose, the whole inference process in the proposed variational learning framework is analytically tractable with closed-form solutions. Additionally, the proposed infinite Dirichlet mixture model with variational learning requires only a modest amount of computational power which makes it suitable to large applications. The effectiveness of our model is experimentally investigated through both synthetic data sets and challenging real-life multimedia applications namely image spam filtering and human action videos categorization.  相似文献   

17.
半监督聚类利用少部分标签的数据辅助大量未标签的数据进行非监督的学习,从而提高聚类的性能。大部分的谱聚类算法都需事先确定聚类数目,利用半监督机器学习技术和自适应聚类算法,解决算法中存在的聚类数目需要事先确定、易陷入局部最优、收敛速度缓慢、对孤立点敏感等缺陷。实验证明该算法有很好的聚类效果。  相似文献   

18.
This study presents an image segmentation system that automatically segments and labels T1-weighted brain magnetic resonance (MR) images. The method is based on a combination of unsupervised learning algorithm of the self-organizing maps (SOM) and supervised learning vector quantization (LVQ) methods. Stationary wavelet transform (SWT) is applied to the images to obtain multiresolution information for distinguishing different tissues. Statistical information of the different tissues is extracted by applying spatial filtering to the coefficients of SWT. A multidimensional feature vector is formed by combining SWT coefficients and their statistical features. This feature vector is used as input to the SOM. SOM is used to segment images in a competitive unsupervised approach and an LVQ system is used for fine-tuning. Results are evaluated using Tanimoto similarity index and are compared with manually segmented images. Quantitative comparisons of our system with the other methods on real brain MR images using Tanimoto similarity index demonstrate that our system shows better segmentation performance for the gray matter while it gives average results for white matter.  相似文献   

19.
Features extracted from real world applications increase dramatically, while machine learning methods decrease their performance given the previous scenario, and feature reduction is required. Particularly, for fault diagnosis in rotating machinery, the number of extracted features are sizable in order to collect all the available information from several monitored signals. Several approaches lead to data reduction using supervised or unsupervised strategies, where the supervised ones are the most reliable and its main disadvantage is the beforehand knowledge of the fault condition. This work proposes a new unsupervised algorithm for feature selection based on attribute clustering and rough set theory. Rough set theory is used to compute similarities between features through the relative dependency. The clustering approach combines classification based on distance with clustering based on prototype to group similar features, without requiring the number of clusters as an input. Additionally, the algorithm has an evolving property that allows the dynamic adjustment of the cluster structure during the clustering process, even when a new set of attributes feeds the algorithm. That gives to the algorithm an incremental learning property, avoiding a retraining process. These properties define the main contribution and significance of the proposed algorithm. Two fault diagnosis problems of fault severity classification in gears and bearings are studied to test the algorithm. Classification results show that the proposed algorithm is able to select adequate features as accurate as other feature selection and reduction approaches.  相似文献   

20.
An algorithm using the unsupervised Bayesian online learning process is proposed for the segmentation of object-based video images. The video image segmentation is solved using a classification method. First, different visual features (the spatial location, colour and optical-flow vectors) are fused in a probability framework for image pixel clustering. The appropriate modelling of the probability distribution function (PDF) for each feature-cluster is obtained through a Gaussian distribution. The image pixel is then assigned a cluster number in a maximum a posteriori probability framework. Different from the previous segmentation methods, the unsupervised Bayesian online learning algorithm has been developed to understand a cluster's PDF parameters through the image sequence. This online learning process uses the pixels of the previous clustered image and information from the feature-cluster to update the PDF parameters for segmentation of the current image. The unsupervised Bayesian online learning algorithm has shown satisfactory experimental results on different video sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号