首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 921 毫秒
1.
In this paper we propose two new EM-type algorithms for model-based clustering. The first algorithm, Ascent EM, draws its ideas from the Monte Carlo EM algorithm and uses only random subsets from the entire database. Using only a subset rather than the entire database allows for significant computational improvements since many fewer data points need to be evaluated in every iteration. We also argue that one can choose the subsets intelligently by appealing to EMs highly-appreciated likelihood-ascent property. The second algorithm that we propose builds upon Ascent EM and incorporates ideas from evolutionary computation to find the global optimum. Model-based clustering can feature local, sub-optimal solutions which can make it hard to find the global optimum. Our algorithm borrows ideas from the Genetic Algorithm (GA) by incorporating the concepts of crossover, mutation and selection into EMs updating scheme. We call this new algorithm the GA Ascent EM algorithm. We investigate the performance of these two algorithms in a functional database of online auction price-curves gathered from eBay.com.  相似文献   

2.
Model-Based Clustering by Probabilistic Self-Organizing Maps   总被引:1,自引:0,他引:1  
In this paper, we consider the learning process of a probabilistic self-organizing map (PbSOM) as a model-based data clustering procedure that preserves the topological relationships between data clusters in a neural network. Based on this concept, we develop a coupling-likelihood mixture model for the PbSOM that extends the reference vectors in Kohonen's self-organizing map (SOM) to multivariate Gaussian distributions. We also derive three expectation-maximization (EM)-type algorithms, called the SOCEM, SOEM, and SODAEM algorithms, for learning the model (PbSOM) based on the maximum-likelihood criterion. SOCEM is derived by using the classification EM (CEM) algorithm to maximize the classification likelihood; SOEM is derived by using the EM algorithm to maximize the mixture likelihood; and SODAEM is a deterministic annealing (DA) variant of SOCEM and SOEM. Moreover, by shrinking the neighborhood size, SOCEM and SOEM can be interpreted, respectively, as DA variants of the CEM and EM algorithms for Gaussian model-based clustering. The experimental results show that the proposed PbSOM learning algorithms achieve comparable data clustering performance to that of the deterministic annealing EM (DAEM) approach, while maintaining the topology-preserving property.  相似文献   

3.
This study focuses on clustering algorithms for data on the unit hypersphere. This type of directional data lain on the surface of a unit hypersphere is used in geology, biology, meteorology, medicine and oceanography. The EM algorithm with mixtures of von Mises-Fisher distributions is often used for model-based clustering for data on the unit hypersphere. However, the EM algorithm is sensitive to initial values and outliers and a number of clusters must be assigned a priori. In this paper, we propose an effective approach, called a learning-based EM algorithm with von Mises-Fisher distributions, to cluster this type of hyper-spherical data. The proposed clustering method is robust to outliers, without the need for initialization, and automatically determines the number of clusters. Thus, it becomes a fully-unsupervised model-based clustering method for data on the unit hypersphere. Some numerical and real examples with comparisons are given to demonstrate the effectiveness and superiority of the proposed method. We also apply the proposed learning-based EM algorithm to cluster exoplanet data in extrasolar planets. The clustering results have several important implications for exoplanet data and allow an interpretation of exoplanet migration.  相似文献   

4.
Meilă  Marina  Heckerman  David 《Machine Learning》2001,42(1-2):9-29
We compare the three basic algorithms for model-based clustering on high-dimensional discrete-variable datasets. All three algorithms use the same underlying model: a naive-Bayes model with a hidden root node, also known as a multinomial-mixture model. In the first part of the paper, we perform an experimental comparison between three batch algorithms that learn the parameters of this model: the Expectation–Maximization (EM) algorithm, a winner take all version of the EM algorithm reminiscent of the K-means algorithm, and model-based agglomerative clustering. We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization methods on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of agglomerative clustering. Although the methods are substantially different, they lead to learned models that are similar in quality.  相似文献   

5.
In spite of the initialization problem, the Expectation-Maximization (EM) algorithm is widely used for estimating the parameters of finite mixture models. Most popular model-based clustering techniques might yield poor clusters if the parameters are not initialized properly. To reduce the sensitivity of initial points, a novel algorithm for learning mixture models from multivariate data is introduced in this paper. The proposed algorithm takes advantage of TRUST-TECH (TRansformation Under STability-reTaining Equilibra CHaracterization) to compute neighborhood local maxima on likelihood surface using stability regions. Basically, our method coalesces the advantages of the traditional EM with that of the dynamic and geometric characteristics of the stability regions of the corresponding nonlinear dynamical system of the log-likelihood function. Two phases namely, the EM phase and the stability region phase, are repeated alternatively in the parameter space to achieve improvements in the maximum likelihood. The EM phase obtains the local maximum of the likelihood function and the stability region phase helps to escape out of the local maximum by moving towards the neighboring stability regions. The algorithm has been tested on both synthetic and real datasets and the improvements in the performance compared to other approaches are demonstrated. The robustness with respect to initialization is also illustrated experimentally.  相似文献   

6.
Evolutionary Computing on Consumer Graphics Hardware   总被引:1,自引:0,他引:1  
We propose implementing a parallel EA on consumer graphics cards, which we can find in many PCs. This lets more people use our parallel algorithm to solve large-scale, real-world problems such as data mining. Parallel evolutionary algorithms run on consumer-grade graphics hardware achieve better execution times than ordinary evolutionary algorithms and offer greater accessibility than those run on high-performance computers  相似文献   

7.
We present two scalable model-based clustering systems based on a Gaussian mixture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm—EMACF. EMACF approximates the aggregate behavior of each sub-cluster of data items in the Gaussian mixture model. It provably converges. The experiments show that our clustering systems run one or two orders of magnitude faster than the traditional EM algorithm with few losses of accuracy.  相似文献   

8.
Reusable components for partitioning clustering algorithms   总被引:1,自引:1,他引:0  
Clustering algorithms are well-established and widely used for solving data-mining tasks. Every clustering algorithm is composed of several solutions for specific sub-problems in the clustering process. These solutions are linked together in a clustering algorithm, and they define the process and the structure of the algorithm. Frequently, many of these solutions occur in more than one clustering algorithm. Mostly, new clustering algorithms include frequently occurring solutions to typical sub-problems from clustering, as well as from other machine-learning algorithms. The problem is that these solutions are usually integrated in their algorithms, and that original algorithms are not designed to share solutions to sub-problems outside the original algorithm easily. We propose a way of designing cluster algorithms and to improve existing ones, based on reusable components. Reusable components are well-documented, frequently occurring solutions to specific sub-problems in a specific area. Thus we identify reusable components, first, as solutions to characteristic sub-problems in partitioning cluster algorithms, and, further, identify a generic structure for the design of partitioning cluster algorithms. We analyze some partitioning algorithms (K-means, X-means, MPCK-means, and Kohonen SOM), and identify reusable components in them. We give examples of how new cluster algorithms can be designed based on them.  相似文献   

9.
何力  卢冰原 《计算机工程》2010,36(24):136-138
针对由类的重叠引起的训练样本模糊不确定性,以及属性不足引起的类边界粗糙不确定性,提出一种基于期望-最大化(EM)的模糊-粗糙集最近邻分类算法——EM-FRNN。利用UCI数据库的突发性水污染事件案例进行实验,实验结果表明,与朴素的KNN、模糊最近邻算法、模糊粗糙最近邻算法相比,该算法的运算精度高且计算成本较低。  相似文献   

10.
杨天鹏  陈黎飞 《计算机应用》2018,38(10):2844-2849
针对传统K-means型算法的"均匀效应"问题,提出一种基于概率模型的聚类算法。首先,提出一个描述非均匀数据簇的高斯混合分布模型,该模型允许数据集中同时包含密度和大小存在差异的簇;其次,推导了非均匀数据聚类的目标优化函数,并定义了优化该函数的期望最大化(EM)型聚类算法。分析结果表明,所提算法可以进行非均匀数据的软子空间聚类。最后,在合成数据集与实际数据集上进行的实验结果表明,所提算法有较高的聚类精度,与现有K-means型算法及基于欠抽样的算法相比,所提算法获得了5%~50%的精度提升。  相似文献   

11.
Finite mixture models are being increasingly used to provide model-based cluster analysis. To tackle the problem of block clustering which aims to organize the data into homogeneous blocks, recently we have proposed a block mixture model; we have considered this model under the classification maximum likelihood approach and we have developed a new algorithm for simultaneous partitioning based on the classification EM algorithm. From the estimation point of view, classification maximum likelihood approach yields inconsistent estimates of the parameters and in this paper we consider the block clustering problem under the maximum likelihood approach; unfortunately, the application of the classical EM algorithm for the block mixture model is not direct: difficulties arise due to the dependence structure in the model and approximations are required. Considering the block clustering problem under a fuzzy approach, we propose a fuzzy block clustering algorithm to approximate the EM algorithm. To illustrate our approach, we study the case of binary data by using a Bernoulli block mixture.  相似文献   

12.
一种基于高斯混合模型的距离图像分割算法   总被引:24,自引:0,他引:24       下载免费PDF全文
向日华  王润生 《软件学报》2003,14(7):1250-1257
提出了一种基于表面法向的高斯混合模型的距离图像分割算法.它充分利用了表面法向高斯混合模型的物理含义,使数据聚类的次数减少,并利用Expectation-Maximization(EM)算法估计出的模型参数计算模型的后验概率实现了自动模型选择.算法针对两种距离相机的60幅真实距离图像进行了实验.将实验结果与几个流行的分割算法进行了客观比较.  相似文献   

13.
基于有限混合多变量t分布的鲁棒聚类算法   总被引:1,自引:1,他引:0  
余成文  郭雷 《计算机科学》2007,34(5):190-193
在用混合模型聚类时,聚类数据中存在局外点是非常困难的问题。为了提高混合拟合的鲁棒性,本文用混合t模型替代混合高斯模型,来拟合含有背景噪音的多变量多高斯分布数据;提出了两个求解混合t模型的修改版期望最大化(EM)算法,并将它们与模型选择准则集成在一起,应用一个组合规则成分灭绝策略选择聚类成分数,得到两个对应的鲁棒聚类算法。对含有背景噪音的多个高斯成分进行不同聚类算法的大量实验表明,本文的鲁棒聚类算法能自动选择最佳的聚类成分数,相对于混合高斯模型的聚类方法,鲁棒性增强很多;相对于传统求解混合t模型(EM/ECM)的聚类方法,能有效避免其严重依赖初始值和易收敛至参数空间边界的缺点,具有较强的鲁棒性和较快的收敛速度。  相似文献   

14.
针对多项式有限混合模型参数估计过程中存在的初始化依赖、参数易收敛到边界值以及容易陷入局部最优等问题,引入了最小信息长度准则,优化多项式有限混合模型的参数估计过程。在此基础上,采用基于多项式有限混合模型的聚类算法对用户评分行为进行聚类,利用模型求解得到的聚类归属概率对Slope One算法实施改进。实验结果表明:应用最小信息长度准则对多项式有限混合模型进行优化后,聚类效果明显提高;同时,相比于基于用户聚类的Slope One推荐算法,改进算法具有明显的改进效果。  相似文献   

15.
Algorithm selection can be performed using a model of runtime distribution, learned during a preliminary training phase. There is a trade-off between the performance of model-based algorithm selection, and the cost of learning the model. In this paper, we treat this trade-off in the context of bandit problems. We propose a fully dynamic and online algorithm selection technique, with no separate training phase: all candidate algorithms are run in parallel, while a model incrementally learns their runtime distributions. A redundant set of time allocators uses the partially trained model to propose machine time shares for the algorithms. A bandit problem solver mixes the model-based shares with a uniform share, gradually increasing the impact of the best time allocators as the model improves. We present experiments with a set of SAT solvers on a mixed SAT-UNSAT benchmark; and with a set of solvers for the Auction Winner Determination problem. This work was supported by SNF grant 200020-107590/1.  相似文献   

16.
Glowworm swarm optimization (GSO) algorithm is the one of the newest nature inspired heuristics for optimization problems. In order to enhances accuracy and convergence rate of the GSO, two strategies about the movement phase of GSO are proposed. One is the greedy acceptance criteria for the glowworms update their position one-dimension by one-dimension. The other is the new movement formulas which are inspired by artificial bee colony algorithm (ABC) and particle swarm optimization (PSO). To compare and analyze the performance of our proposed improvement GSO, a number of experiments are carried out on a set of well-known benchmark global optimization problems. The effects of the parameters about the improvement algorithms are discussed by uniform design experiment. Numerical results reveal that the proposed algorithms can find better solutions when compared to classical GSO and other heuristic algorithms and are powerful search algorithms for various global optimization problems.  相似文献   

17.
通过对传统协同过滤算法中存在的问题以及解决情况进行分析,论文采用了一种混合减聚类的遗传模糊聚类的协同过滤推荐算法,利用混合减聚类的模糊聚类可以更有效地对数据进行柔性划分,更好地发挥遗传算法的全局搜索能力,加快收敛速度,同时也能够很好地解决数据稀疏性带来的冷启动问题.  相似文献   

18.
点覆盖是一个著名的NP难解问题,在通信网络和生物信息学等领域具有重要应用。针对点覆盖的研究主要集中在启发式或近似算法,其主要不足是无法实现全局最优。核心化是处理难解问题的一种新方法。提出融合启发式操作和核心化操作的算法框架,利用核心化技术进行点覆盖启发式算法优化。核心化操作挖掘出全局最优的顶点集,而启发式操作改变网络拓扑,使下一轮核心化操作能够继续,两者交叉执行实现解精度优化。实验结果表明,提出的算法在不同网络中均能实现不同程度的优化,在几乎所有稀疏网络实例中获得了最优解。  相似文献   

19.
陈聿  田博今  彭云竹  廖勇 《计算机应用》2005,40(11):3217-3223
为进一步提升电力系统客户的用户体验,针对现有聚类算法寻优能力差、紧凑性不足以及较难求解聚类数目最优值的问题,提出一种联合手肘法与期望最大化(EM)的高斯混合聚类算法,挖掘大量客户数据中的潜在信息。该算法通过EM算法迭代出良好的聚类结果,而针对传统的高斯混合聚类算法需要提前获取用户分群数量的缺点,利用手肘法合理找出客户的分群数量。案例分析表明,所提算法与层次聚类算法和K-Means算法相比,FM、AR指标的增幅均超过10%,紧凑度(CI)和分离度(DS)的降幅分别低于15%和25%,可见性能有较大提升。  相似文献   

20.
This paper describes a novel feature selection algorithm for unsupervised clustering, that combines the clustering ensembles method and the population based incremental learning algorithm. The main idea of the proposed unsupervised feature selection algorithm is to search for a subset of all features such that the clustering algorithm trained on this feature subset can achieve the most similar clustering solution to the one obtained by an ensemble learning algorithm. In particular, a clustering solution is firstly achieved by a clustering ensembles method, then the population based incremental learning algorithm is adopted to find the feature subset that best fits the obtained clustering solution. One advantage of the proposed unsupervised feature selection algorithm is that it is dimensionality-unbiased. In addition, the proposed unsupervised feature selection algorithm leverages the consensus across multiple clustering solutions. Experimental results on several real data sets demonstrate that the proposed unsupervised feature selection algorithm is often able to obtain a better feature subset when compared with other existing unsupervised feature selection algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号