首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 125 毫秒
1.
贝叶斯网络的学习可以分为结构学习和参数学习。期望最大化(EM)算法通常用于不完整数据的参数学习,但是由于EM算法计算相对复杂,存在收敛速度慢和容易局部最大化等问题,传统的EM算法难于处理大规模数据集。研究了EM算法的主要问题,采用划分数据块的方法将大规模数据集划分为小的样本集来处理,降低了EM算法的计算量,同时也提高了计算精度。实验证明,该改进的EM算法具有较高的性能。  相似文献   

2.
并行的贝叶斯网络参数学习算法   总被引:2,自引:0,他引:2  
针对大样本条件下EM算法学习贝叶斯网络参数的计算问题,提出一种并行EM算法(Parallel EM,PL-EM)提高大样本条件下复杂贝叶斯网络参数学习的速度.PL-EM算法在E步并行计算隐变量的后验概率和期望充分统计因子;在M步,利用贝叶斯网络的条件独立性和完整数据集下的似然函数可分解性,并行计算各个局部似然函数.实验结果表明PL-EM为解决大样本条件下贝叶斯网络参数学习提供了一种有效的方法.  相似文献   

3.
EM算法研究与应用   总被引:2,自引:1,他引:1  
引入了可处理缺失数据的EM算法.EM算法是一种迭代算法,每一次迭代都能保证似然函数值增加,并且收敛到一个局部极大值.对EM算法的基本原理和实施步骤进行了分析.算法的命名,是因为算法的每一迭代包括两步:第一步求期望(Expectation Step),称为E步;第二步求极大值(Maximization Step),称为M步.EM算法主要用来计算基于不完全数据的极大似然估计.在此基础上,把EM算法融合到状态空间模型的参数估计问题.给出了基于Kalman平滑和算法的线性状态空问模型参数估计方法.  相似文献   

4.
以EM算法为基础,在给定贝叶斯网络结构情况下。研究分析了Voting EM算法并利用该算法对防洪决策贝叶斯网络进行在线参数学习,将该算法与EM算法的学习结果进行了比较分析,结果表明Voting EM算法不但能够进行在线参数学习,而且也具有较高的学习精度.  相似文献   

5.
邹薇  王会进 《微型机与应用》2011,30(16):75-77,81
实际应用中大量的不完整的数据集,造成了数据中信息的丢失和分析的不方便,所以对缺失数据的处理已经成为目前分类领域研究的热点。由于EM方法随机选取初始代表簇中心会导致聚类不稳定,本文使用朴素贝叶斯算法的分类结果作为EM算法的初始使用范围,然后按E步M步反复求精,利用得到的最大化值填充缺失数据。实验结果表明,本文的算法加强了聚类的稳定性,具有更好的数据填充效果。  相似文献   

6.
基于EM和贝叶斯网络的丢失数据填充算法   总被引:2,自引:0,他引:2  
实际应用中存在大量的丢失数据的数据集,对丢失数据的处理已成为目前分类领域的研究热点。分析和比较了几种通用的丢失数据填充算法,并提出一种新的基于EM和贝叶斯网络的丢失数据填充算法。算法利用朴素贝叶斯估计出EM算法初值,然后将EM和贝叶斯网络结合进行迭代确定最终更新器,同时得到填充后的完整数据集。实验结果表明,与经典填充算法相比,新算法具有更高的分类准确率,且节省了大量开销。  相似文献   

7.
针对EM算法中的初始类的数目很难决定,在迭代中经常产生部分最优的情况,将K-means算法与基于EM的聚类方法相结合,提出了一个新的适用于基因表达数据的模型聚类方法。新的聚类方法,首先利用K-means算法具有全局性、效率高的优点,快速得到聚类的起始类的划分,将其设置为高斯混合模型的初始参数值,进一步采用EM方法进行聚类,得到最优聚类结果。通过2次对真实数据集的实验测试,将新的算法分别与K均值算法和EM算法进行了比较。实验结果表明,新算法是一种有效的聚类方法,聚类结果的准确度得到了提高。  相似文献   

8.
EM算法是一种有效的应对缺失数据的估计算法,它的应用非常广泛,比如人工智能、模式识别、数理统计、图像处理、信号检测等等。首先对最有效的估计算法极大似然估计进行简单阐述,接下来引出算法的主要内容,在原理上说明了基于迭代理论的似然估计期望最大化算法,讨论EM算法的收敛性,并提出了EM算法的应用,最后简单介绍了几种EM改进型算法。  相似文献   

9.
由于存在大量服从高斯分布的样本数据,采用高斯混合模型(Gaussian Mixture Models,GMM)对这些样本数据进行聚类分析,可以得到比较准确的聚类结果.通常采用EM算法(Expectation Maximization Algorithm)对GMM的参数进行迭代式估计.但传统EM算法存在两点不足:对初始聚类中心的取值比较敏感;迭代式参数估计的迭代终止条件是相邻两次估计参数的距离小于给定的阈值,这不能保证算法收敛于参数的最优值.为了弥补上述不足,提出采用密度峰值聚类(Density Peaks Clustering,DPC)来初始化EM算法,以提高算法的鲁棒性,采用相对熵作为EM算法的迭代终止条件,实现对GMM算法参数值的优化选取.在人工数据集及UCI数据集上的对比实验表明,所提算法不但提高了EM算法的鲁棒性,而且其聚类结果优于传统算法.尤其在服从高斯分布的数据集上的实验结果显示,所提算法大幅提高了聚类精度.  相似文献   

10.
提出了利用大量用户评价结果来进行特征权重的计算方法,用于解决搜索引擎中查询串与搜索结果的相似度分析。该方法完全利用用户对搜索结果的“潜在评价”来进行。用户对输入查询串所做的点击反映了其内部的关联性,该文提出的方法可获取这种关联性,对该问题建立了数学模型,利用EM算法解决了特征权重的计算。由于模型的函数比较复杂,难于计算其收敛性,因此,使用了模拟退火算法作为EM算法的补充,用于验证算法的收敛性。实验使用百度搜索引擎在竞价广告上进行,提取的测试数据样本为100个广告和144 132个query,获得的数据结果显示,所有特征收敛到全局最优解,抽样部分数据获得检索相似准确率为93.32%,召回率为87.43%。  相似文献   

11.
Accelerating EM for Large Databases   总被引:6,自引:0,他引:6  
Thiesson  Bo  Meek  Christopher  Heckerman  David 《Machine Learning》2001,45(3):279-299
The EM algorithm is a popular method for parameter estimation in a variety of problems involving missing data. However, the EM algorithm often requires significant computational resources and has been dismissed as impractical for large databases. We present two approaches that significantly reduce the computational cost of applying the EM algorithm to databases with a large number of cases, including databases with large dimensionality. Both approaches are based on partial E-steps for which we can use the results of Neal and Hinton (In Jordan, M. (Ed.), Learning in Graphical Models, pp. 355–371. The Netherlands: Kluwer Academic Publishers) to obtain the standard convergence guarantees of EM. The first approach is a version of the incremental EM algorithm, described in Neal and Hinton (1998), which cycles through data cases in blocks. The number of cases in each block dramatically effects the efficiency of the algorithm. We provide amethod for selecting a near optimal block size. The second approach, which we call lazy EM, will, at scheduled iterations, evaluate the significance of each data case and then proceed for several iterations actively using only the significant cases. We demonstrate that both methods can significantly reduce computational costs through their application to high-dimensional real-world and synthetic mixture modeling problems for large databases.  相似文献   

12.
Mixture models implemented via the expectation-maximization (EM) algorithm are being increasingly used in a wide range of problems in pattern recognition such as image segmentation. However, the EM algorithm requires considerable computational time in its application to huge data sets such as a three-dimensional magnetic resonance (MR) image of over 10 million voxels. Recently, it was shown that a sparse, incremental version of the EM algorithm could improve its rate of convergence. In this paper, we show how this modified EM algorithm can be speeded up further by adopting a multiresolution kd-tree structure in performing the E-step. The proposed algorithm outperforms some other variants of the EM algorithm for segmenting MR images of the human brain.  相似文献   

13.
The joint segmentation of multiple series is considered. A mixed linear model is used to account for both covariates and correlations between signals. An estimation algorithm based on EM which involves a new dynamic programming strategy for the segmentation step is proposed. The computational efficiency of this procedure is shown and its performance is assessed through simulation experiments. Applications are presented in the field of climatic data analysis.  相似文献   

14.
The analysis of incomplete longitudinal data requires joint modeling of the longitudinal outcomes (observed and unobserved) and the response indicators. When non-response does not depend on the unobserved outcomes, within a likelihood framework, the missingness is said to be ignorable, obviating the need to formally model the process that drives it. For the non-ignorable or non-random case, estimation is less straightforward, because one must work with the observed data likelihood, which involves integration over the missing values, thereby giving rise to computational complexity, especially for high-dimensional missingness. The stochastic EM algorithm is a variation of the expectation-maximization (EM) algorithm and is particularly useful in cases where the E (expectation) step is intractable. Under the stochastic EM algorithm, the E-step is replaced by an S-step, in which the missing data are simulated from an appropriate conditional distribution. The method is appealing due to its computational simplicity. The SEM algorithm is used to fit non-random models for continuous longitudinal data with monotone or non-monotone missingness, using simulated, as well as case study, data. Resulting SEM estimates are compared with their direct likelihood counterparts wherever possible.  相似文献   

15.
It is an important research issue to deal with mixture models when missing values occur in the data. In this paper, computational strategies using auxiliary indicator matrices are introduced for efficiently handling mixtures of multivariate normal distributions when the data are missing at random and have an arbitrary missing data pattern, meaning that missing data can occur anywhere. We develop a novel EM algorithm that can dramatically save computation time and be exploited in many applications, such as density estimation, supervised clustering and prediction of missing values. In the aspect of multiple imputations for missing data, we also offer a data augmentation scheme using the Gibbs sampler. Our proposed methodologies are illustrated through some real data sets with varying proportions of missing values.  相似文献   

16.
For the classification of very large data sets with a mixture model approach a two-step strategy for the estimation of the mixture is proposed. In the first step data are scaled down using compression techniques. Data compression consists of clustering the single observations into a medium number of groups and the representation of each group by a prototype, i.e. a triple of sufficient statistics (mean vector, covariance matrix, number of observations compressed). In the second step the mixture is estimated by applying an adapted EM algorithm (called sufficient EM) to the sufficient statistics of the compressed data. The estimated mixture allows the classification of observations according to their maximum posterior probability of component membership. The performance of sufficient EM in clustering a real data set from a web-usage mining application is compared to standard EM and the TwoStep clustering algorithm as implemented in SPSS. It turns out that the algorithmic efficiency of the sufficient EM algorithm is much more higher than for standard EM. While the TwoStep algorithm is even faster the results show a lack of stability.  相似文献   

17.
异常挖掘是数据挖掘的重要研究内容之一,对于不完全数据会面对双重的困难.首先将用于缺失数据填充的EM算法和MI算法推广到混合缺失情形,并根据Weisberg的不完全数据填充理论,提出了RE算法,然后通过将聚类分析与向前搜索算法结合起来,获得了比单纯的向前搜索法更优越的算法.最后,在上述填充算法的基础上探讨了不完全数据的异常挖掘.理论和实例分析均表明,基于不完全数据的异常挖掘算法是有效可行的.  相似文献   

18.
EM(Expectation Maximization)算法是含有隐变量(latent variable)的概率参数模型最大似然估计、极大后验概率估计最有效的算法,但很容易进入局部最优现象,对此提出基于半监督机器学习机制的EM算法.本文方法是在最大似然函数中加入惩罚最小二乘因子,同时引入非负约束作为先验信息,结合半监督机器学习方法,将EM算法改进转化为最小化求解问题,再采用最大似然方法求解EM模型,有效估计了混合矩阵和高斯混合模型参数,实现EM算法的改进.仿真结果表明,该方法能够很好地解决了EM算法容易局部最优化问题.  相似文献   

19.
黄卓  王文峰  郭波 《控制与决策》2008,23(2):133-139
针对目前连续PH分布数据拟合EM(Expectation-Maximization)算法存在的初值敏感问题,提出运用确定性退火EM算法进行连续PH分布数据拟合,给出了详细的理论推导,并通过两个拟合实例与标准EM算法进行了对比.对比结果表明所提出的方法可以有效地避免初值选择的不同对EM算法结果的影响,减小陷入局部最优的可能性,能得到比标准EM算法更好的结果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号