首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
讨论在一般的混合分布条件下,用EM算法,在最小熵原理的优化准则下的数据拟合问题。简单推导有限混合高斯分布的EM算法,并针对其收敛速度慢的缺点设计一种有效的选取参数初始值的方法。实验结果表明,该方法有助于EM算法以较快的速度在参数真值附近收敛。  相似文献   

2.
讨论在一般的混合分布条件下,用EM算法,在最小熵原理的优化准则下的数据拟合问题。简单推导有限混合高斯分布的EM算法.并针对其收敛速度慢的缺点设计一种有效的选取参数初始值的方法。实验结果表明,该方法有助于EM算法以较快的速度在参数真值附近收敛。  相似文献   

3.
We consider regression models with a group structure in explanatory variables. This structure is commonly seen in practice, but it is only recently realized that taking the information into account in the modeling process may improve both the interpretability and accuracy of the model. In this paper, we study a new approach to group variable selection using random-effect models. Specific distributional assumptions on random effects pertaining to a given structure lead to a new class of penalties that include some existing penalties. We also develop an efficient computational algorithm. Numerical studies are provided to demonstrate better sensitivity and specificity properties without sacrificing the prediction accuracy. Finally, we present some real-data applications of the proposed approach.  相似文献   

4.
Clustering is one of the most important techniques used in data mining. This article focuses on the EM clustering algorithm. Two fundamental aspects are studied: achieving faster convergence and finding higher quality clustering solutions. This work introduces several improvements to the EM clustering algorithm, being periodic M steps during initial iterations, reseeding of low-weight clusters and splitting of high-weight clusters the most important. These improvements lead to two important parameters. The first parameter is the number of M steps per iteration and the second one, a weight threshold to reseed low-weight clusters. Experiments show how frequently the M step must be executed and what weight threshold values make EM reach higher quality solutions. In general, the improved EM clustering algorithm finds higher quality solutions than the classical EM algorithm and converges in fewer iterations.  相似文献   

5.
This paper proposes a new method for estimating the true number of clusters and initial cluster centers in a dataset with many clusters. The observation points are assigned to the data space to observe the clusters through the distributions of the distances between the observation points and the objects in the dataset. A Gamma Mixture Model (GMM) is built from a distance distribution to partition the dataset into subsets, and a GMM tree is obtained by recursively partitioning the dataset. From the leaves of the GMM tree, a set of initial cluster centers are identified and the true number of clusters is estimated. This method is implemented in the new GMM-Tree algorithm. Two GMM forest algorithms are further proposed to ensemble multiple GMM trees to handle high dimensional data with many clusters. The GMM-P-Forest algorithm builds GMM trees in parallel, whereas the GMM-S-Forest algorithm uses a sequential process to build a GMM forest. Experiments were conducted on 32 synthetic datasets and 15 real datasets to evaluate the performance of the new algorithms. The results have shown that the proposed algorithms outperformed the existing popular methods: Silhouette, Elbow and Gap Statistic, and the recent method I-nice in estimating the true number of clusters from high dimensional complex data.  相似文献   

6.
Model-based approaches and in particular finite mixture models are widely used for data clustering which is a crucial step in several applications of practical importance. Indeed, many pattern recognition, computer vision and image processing applications can be approached as feature space clustering problems. For complex high-dimensional data, however, the use of these approaches presents several challenges such as the presence of many irrelevant features which may affect the speed and also compromise the accuracy of the used learning algorithm. Another problem is the presence of outliers which potentially influence the resulting model’s parameters. For this purpose, we propose and discuss an algorithm that partitions a given data set without a priori information about the number of clusters, the saliency of the features or the number of outliers. We illustrate the performance of our approach using different applications involving synthetic data, real data and objects shape clustering.  相似文献   

7.
采用滤波方法在EM算法中引入像素的位置信息,利用图像减采样方法以提高EM算法的收敛速度。为了避免小样本情况下混合分量选择的不稳定性问题,在所给出的受位置约束混合模型基础上,对采样数据进行加权处理。该方法在获得与原始分辨率分割效果相接近的情况下,能够明显地提高算法的运行速度。  相似文献   

8.
One of the most popular criteria for model selection is the Bayesian Information Criterion (BIC). It is based on an asymptotic approximation using Bayes rule when the sample size tends to infinity and the dimension of the model is fixed. Although it works well in classical applications, it performs less satisfactorily for high dimensional problems, i.e. when the number of regressors is very large compared to the sample size. For this reason, an alternative version of the BIC has been proposed for the problem of mapping quantitative trait loci (QTLs) considered in genetics. One approach is to locate QTLs by using model selection in the context of a regression model with an extremely large number of potential regressors. Since the assumption of normally distributed errors is often unrealistic in such settings, we extend the idea underlying the modified BIC to the context of robust regression.  相似文献   

9.
有限混合密度模型及遥感影像EM聚类算法   总被引:3,自引:0,他引:3       下载免费PDF全文
遥感信息是地球表层信息的综合反映,由于地球表层系统的复杂性和开放性,地表信息是多维的、无限的、遥感信息传递过程中的局限性以及遥感信息之间的复杂相关性,决定了遥感信息其结果的不确定性和多解性,遥感信息具有一定的统计特性,同时又具有高度的随机性和复杂性,在特征空间中往往表现为混合密度分布,针对遥感信息这种统计分布的复杂性,提出了有限混合密度的期望最大(EM)分解模型,该模型假设总体分布可分解为有限个参数化的密度分布,通过EM迭代计算可估计出各密度分布的最大似然参数集;将有限混合EM聚类算法应用于遥感影像的聚类分析中,并与传统统计聚类方法进行了比较,比较结果表明,其对复杂地物的区分具有优势,另外在融合专家知识、初始化等方面具有扩展能力。  相似文献   

10.
提出一种基于模式聚类和混合模型参数自动选择的图库索引方法。因为传统的EM(Expectation Maximization)算法为混合模型聚类问题中的参数估计提供了一个很好的解决方法,但需要事先指定聚类数,影响了高维数据索引的精度和效率。综合利用改进的CEM2(Component-wise EM of Mixture)混合模型自动选择算法、矢量量化和概率近似的索引机制,在保证准确率同时有效提高了检索效率。  相似文献   

11.
The problem of clustering probability density functions is emerging in different scientific domains. The methods proposed for clustering probability density functions are mainly focused on univariate settings and are based on heuristic clustering solutions. New aspects of the problem associated with the multivariate setting and a model-based perspective are investigated. The novel approach relies on a hierarchical mixture modeling of the data. The method is introduced in the univariate context and then extended to multivariate densities by means of a factorial model performing dimension reduction. Model fitting is carried out using an EM-algorithm. The proposed method is illustrated through simulated experiments and applied to two real data sets in order to compare its performance with alternative clustering strategies.  相似文献   

12.
基于EM算法的混合模型的参数估计   总被引:3,自引:0,他引:3  
介绍了极大似然参数估计,然后介绍了混合模型极大似然参数估计的EM算法实现,最后利用计算机仿真实验验证了此算法的有效性和收敛性.  相似文献   

13.
In this paper, we present a novel algorithm for estimating the parameters of a linear system when the observed output signal is quantized. This question has relevance to many areas including sensor networks and telecommunications. The algorithms described here have closed form solutions for the SISO case. However, for the MIMO case, a set of pre-computed scenarios is used to reduce the computational complexity of EM type algorithms that are typically deployed for this kind of problem. Comparisons are made with other algorithms that have been previously described in the literature as well as with the implementation of algorithms based on the Quasi-Newton method.  相似文献   

14.
基于聚类思想的软件可靠性模型选择   总被引:6,自引:1,他引:6  
软件可靠性模型应用的不一致性一直是困扰软件可靠性研究人员的主要问题。模型选择和组合策略作为主要解决方法已成为模型应用的研究重点。该文主要探讨运用聚类思想进行软件可靠性模型选择的方法。根据针对实际失效数据的可靠性模型评价准则值编码,该文采用高斯混合模型进行聚类分析,使用EM算法估计高斯混合模型的参数,贝叶斯准则进行模型选择。为验证此方法的有效性和可行性使用了多个实际项目中的失效数据进行试验。结果表明,此模型选择方法简单有效,有利于软件可靠性模型应用不一致性问题的解决。  相似文献   

15.
This paper investigates a genetic programming (GP) approach aimed at the multi-objective design of hierarchical consensus functions for clustering ensembles. By this means, data partitions obtained via different clustering techniques can be continuously refined (via selection and merging) by a population of fusion hierarchies having complementary validation indices as objective functions. To assess the potential of the novel framework in terms of efficiency and effectiveness, a series of systematic experiments, involving eleven variants of the proposed GP-based algorithm and a comparison with basic as well as advanced clustering methods (of which some are clustering ensembles and/or multi-objective in nature), have been conducted on a number of artificial, benchmark and bioinformatics datasets. Overall, the results corroborate the perspective that having fusion hierarchies operating on well-chosen subsets of data partitions is a fine strategy that may yield significant gains in terms of clustering robustness.  相似文献   

16.
Mixture model based clustering (also simply called model-based clustering hereinafter) consists of fitting a mixture model to data and identifying each cluster with one of its components. This paper tackles the model selection and parameter estimation problems in model-based clustering so as to improve the clustering performance on the data sets whose true kernel distribution functions are not in the family of assumed ones, as well as with inherently overlapped clusters. Being tailored to clustering applications, an effective model selection criterion is first proposed. Unlike most criteria that measure the goodness-of-fit of the model only to generate data, the proposed one also evaluates whether the candidate model provides a reasonable partition for the observed data, which enforces a model with well-separated components. Accordingly, an improved method for the estimation of mixture parameters is derived, which aims to suppress the spurious estimates by the standard expectation maximization (EM) algorithm and enforce well-supported components in the mixture model. Finally, the estimation of mixture parameters and the model selection is integrated in a single algorithm which favors a compact mixture model with both the well-supported and well-separated components. Extensive experiments on synthetic and real-world data sets are carried out to show the effectiveness of the proposed approach to the mixture model based clustering.  相似文献   

17.
Three-way data sets occur when various attributes are measured for a set of observational units in different situations. Examples are genotype by environment by attribute data obtained in a plant experiment, individual by time point by response data in a longitudinal study, and individual by brand by attribute data in a market research survey. Clustering observational units (genotypes/individuals) by means of a special type of the normal mixture model has been proposed. An implicit assumption of this approach is, however, that observational units are in the same cluster in all situations. An extension is presented that makes it possible to relax this assumption and that because of this may yield much simpler clustering solutions. The proposed extension—which includes the earlier model as a special case—is obtained by adapting the multilevel latent class model for categorical responses to the three-way situation, as well as to the situation in which responses include continuous variables. An efficient EM algorithm for parameter estimation by maximum likelihood is described and two empirical examples are provided.  相似文献   

18.
A new variant of the dynamic hierarchical model (DHM) that describes a large number of parallel time series is presented. The separate series, which may be interdependent, are modeled through dynamic linear models (DLMs). This interdependence is included in the model through the definition of a ‘top-level’ or ‘average’ DLM. The model features explicit dependences between the latent states of the parallel DLMs and the states of the average model, and thus the many parallel time series are linked to each other. The combination of dependences within each time series and dependences between the different DLMs makes the computation time that is required for exact inference cubic in the number of parallel time series, however, which is unacceptable for practical tasks that involve large numbers of parallel time series. Therefore, two methods for fast, approximate inference are proposed: a variational approximation and a factorial approach. Under these approximations, inference can be performed in linear time, and it still features exact means. Learning is implemented through a maximum likelihood (ML) estimation of the model parameters. This estimation is realized through an expectation maximization (EM) algorithm with approximate inference in the E-step. Examples of learning and forecasting on two data sets show that the addition of direct dependences has a ‘smoothing’ effect on the evolution of the states of the individual time series, and leads to better prediction results. The use of approximate instead of exact inference is further shown not to lead to inferior results on either data set.  相似文献   

19.
基于高斯混合模型(GaussianMixtureModel,M)间差别的方法是进行说话人聚类的常用的一类方法。该文GM提出两种新颖的GMM差别度量,“类散度”和GMM的相互概率。“类散度”即模型间“离散度”与模型内“离散度”之比,在计算中综合考虑了GMM各个胞腔的权值、均值及方差的影响,全面地反映了高斯混合模型参数的差别。GMM的相互概率即其中一个GMM的参数在另一个GMM下的概率。实验证明,两种方法均能很好地描述GMM间的差别,在说话人聚类实验中表现良好。  相似文献   

20.
陈晨  陈永生 《计算机应用》2008,28(8):2109-2112
通过对近年来软件模型检测领域流行的几种技术进行综述,提出了一种基于层次单元划分,使用引导式搜索方式的软件模型检测方案。本方案分为预处理、单元划分、状态空间搜索三个阶段,其中使用on-the-fly技术提高了搜索性能。实验证明,该方案在解决状态爆炸问题上有较好的效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号