首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The problem of selecting variables or features in a regression model in the presence of both additive (vertical) and leverage outliers is addressed. Since variable selection and the detection of anomalous data are not separable problems, the focus is on methods that select variables and outliers simultaneously. For selection, the fast forward selection algorithm, least angle regression (LARS), is used, but it is not robust. To achieve robustness to additive outliers, a dummy variable identity matrix is appended to the design matrix allowing both real variables and additive outliers to be in the selection set. For leverage outliers, these selection methods are used on samples of elemental sets in a manner similar to that used in high breakdown robust estimation. These results are compared to several other selection methods of varying computational complexity and robustness. The extension of these methods to situations where the number of variables exceeds the number of observations is discussed.  相似文献   

2.
陶志勇  刘晓芳  王和章 《计算机应用》2018,38(12):3433-3437
针对高斯混合模型(GMM)聚类算法对初始值敏感且容易陷入局部极小值的问题,利用密度峰值(DP)算法全局搜索能力强的优势,对GMM算法的初始聚类中心进行优化,提出了一种融合DP的GMM聚类算法(DP-GMMC)。首先,基于DP算法寻找聚类中心,得到混合模型的初始参数;其次,采用最大期望(EM)算法迭代估计混合模型的参数;最后,根据贝叶斯后验概率准则实现数据点的聚类。在Iris数据集下,DP-GMMC聚类准确率可达到96.67%,与传统GMM算法相比提高了33.6个百分点,解决了对初始聚类中心依赖的问题。实验结果表明,DP-GMMC对低维数据集有较好的聚类效果。  相似文献   

3.
针对传统EM算法训练GMM不能充分利用训练数据所属高斯分量信息, 从而在一定程度上影响说话人识别性能的缺陷, 采用RPEM (竞争惩罚EM)算法训练GMM, 并引入批处理RPEM算法解决RPEM算法运算量大、收敛速度慢的问题, 同时针对RPEM和批处理RPEM算法训练时方差优化存在的问题进行了改进, 提出了改进的批处理RPEM算法。在Chains 说话人识别数据库上的实验表明, 改进的批处理RPEM算法取得了相对于传统EM、RPEM以及批处理RPEM算法更好的性能, 还极大地提高了训练效率, 减小了运算量, 说明了提出的改进批处理RPEM算法用于说话人识别时的有效性。  相似文献   

4.
J. C.  J. S. 《Pattern recognition》2002,35(12):2711-2718
This paper addresses the problem of tracking objects with complex motion dynamics or shape changes. It is assumed that some of the visual features detected in the image (e.g., edge strokes) are outliers i.e., they do not belong to the object boundary. A robust tracking algorithm is proposed which allows to efficiently track an object with complex shape or motion changes in clutter environments. The algorithm relies on the use of multiple models, i.e., a bank of stochastic motion models switched according to a probabilistic mechanism. Robust filtering methods are used to estimate the label of the active model as well as the state trajectory.  相似文献   

5.
    
《Journal of Process Control》2014,24(9):1472-1488
In this paper, we propose a robust multiple-model linear parameter varying (LPV) approach to identification of the nonlinear process contaminated with outliers. The identification problem is formulated and solved under the EM framework. Instead of assuming that the measurement noise comes from the Gaussian distribution like conventional LPV approaches, the proposed robust algorithm formulates the LPV solution using mixture t distributions and thus naturally addresses the robust identification problem. By modulating the distribution tails through degrees of freedom, the proposed algorithm can handle various outliers. Two simulated examples and an experiment are studied to verify the effectiveness of the proposed approach.  相似文献   

6.
提出一种鲁棒自适应表面模型,该模型中每个像素值的变化过程由一混合高斯分布描述.为了适应目标表面的变化,这些高斯参数在跟踪期间通过在线的EM算法自适应更新;在估计目标状态时。采用了粒子滤波算法。设计了基于自适应表面模型的观测模型;在处理遮挡时,采用了一种鲁棒估计技术.多组试验结果表明,该算法对光照变化、姿态变化、部分或完全遮挡下的跟踪具有较强的鲁棒性.  相似文献   

7.
Interval methods have been shown to be efficient, robust and reliable to solve difficult set-membership localization problems. However, they are unsuitable in a probabilistic context, where the approximation of an unbounded probability density function by a set cannot be accepted. This paper proposes a new probabilistic approach which makes possible to use classical set-membership localization methods which are robust with respect to outliers. The approach is illustrated on two simulated examples.  相似文献   

8.
The observed difference between the swap rate and the government bond yield of corresponding maturity is known as the swap spread. The swap spread reflects the risk premium that is involved in a swap transaction instead of holding risk-free government bonds. It is primarily composed of the liquidity risk premium and the credit risk premium. In recent years there has been growing interest in modelling swap spreads because the swap spread is the key pricing variable for the swap rate. The Australian interest rate swap market is the most important over-the-counter (OTC) derivative market in Australia. In this paper we apply the class of mixture autoregressive conditional heteroscedastic (MARCH) models to three (3-year, 5-year and 10-year) swap spread series in Australia. The MARCH model is able to capture both of the stylised characteristics of the observed changes of the swap spread series: volatility persistence and the dependence of volatility on the level of the data. The proposed MARCH model also allows for regime switches in the swap spreads.  相似文献   

9.
A Greedy EM Algorithm for Gaussian Mixture Learning   总被引:7,自引:0,他引:7  
Learning a Gaussian mixture with a local algorithm like EM can be difficult because (i) the true number of mixing components is usually unknown, (ii) there is no generally accepted method for parameter initialization, and (iii) the algorithm can get trapped in one of the many local maxima of the likelihood function. In this paper we propose a greedy algorithm for learning a Gaussian mixture which tries to overcome these limitations. In particular, starting with a single component and adding components sequentially until a maximum number k, the algorithm is capable of achieving solutions superior to EM with k components in terms of the likelihood of a test set. The algorithm is based on recent theoretical results on incremental mixture density estimation, and uses a combination of global and local search each time a new component is added to the mixture. This revised version was published online in August 2006 with corrections to the Cover Date.  相似文献   

10.
    
Gaussian Processes (GP) comprise a powerful kernel-based machine learning paradigm which has recently attracted the attention of the nonlinear system identification community, specially due to its inherent Bayesian-style treatment of the uncertainty. However, since standard GP models assume a Gaussian distribution for the observation noise, i.e., a Gaussian likelihood, the learning and predictive capabilities of such models can be severely degraded when outliers are present in the data. In this paper, motivated by our previous work on GP learning with data containing outliers and recent advances in hierarchical (deep GPs) and recurrent GP (RGP) approaches, we introduce an outlier-robust recurrent GP model, the RGP-t. Our approach explicitly models the observation layer, which includes a heavy-tailed Student-t likelihood, and allows for a hierarchy of multiple transition layers to learn the system dynamics directly from estimation data contaminated by outliers. In addition, we modify the original variational framework of standard RGP in order to perform inference with the new RGP-t model. The proposed approach is comprehensively evaluated using six artificial benchmarks, within several outlier contamination levels, and two datasets related to process industry systems (pH neutralization and heat exchanger), whose estimation data undergo large contamination rates. The simulation results obtained by the RGP-t model indicates an impressive resilience to outliers and a superior capability to learn nonlinear dynamics directly from highly outlier-contaminated data in comparison to existing GP models.  相似文献   

11.
贾可新  何子述 《计算机工程》2011,37(19):153-156
基于Mahalanobis距离的EM(MDEM)算法存在过分裂问题。为此,提出一种竞争结束MDEM(CSMDEM)算法。该算法将最小描述长度准则作为竞争结束条件嵌入到MDEM算法中,能够在估计混合模型参数的同时选择模型阶数。实验结果表明,该算法具有较低的平均EM迭代次数,能够较好地拟合高斯混合模型。当其被应用到跳频网台分选时,能够以较高的正确率分选跳频信号。  相似文献   

12.
The joint segmentation of multiple series is considered. A mixed linear model is used to account for both covariates and correlations between signals. An estimation algorithm based on EM which involves a new dynamic programming strategy for the segmentation step is proposed. The computational efficiency of this procedure is shown and its performance is assessed through simulation experiments. Applications are presented in the field of climatic data analysis.  相似文献   

13.
邵楠  张科 《计算机应用》2013,33(10):2874-2877
原始定义下的投影熵特征对于图像信息利用不够充分,而且对图像缩放变换不具有不变性,针对这两方面的不足,给出了扩展规范化投影熵特征的定义,并将规范化后图像的局部投影熵特征向量用于图像识别;在进行图像识别时,利用期望最大化(EM)算法得到训练集图像局部投影熵特征的混合高斯概率分布模型,求取目标图像的相应特征到各个混合高斯函数的Mahalanobis距离,根据距离判别法原理得到目标图像所属类别。实验采用哥伦比亚大学计算机视觉数据库中的图像对算法进行验证,结果表明该算法具有较好的识别效果和良好的并行运算特性  相似文献   

14.
不确定的高斯混合模型和二型Takagi-Sugeno-Kang(TSK)模糊模型之间的对应关系被建立: 任何一个不确定的高斯混合模型都唯一对应着一个二型TSK模糊系统, 不确定的高斯混合模型的条件均值和二型TSK模糊系统的输出是等价的. 基于此, 一种设计二型模糊系统的新方法被提出: 通过建立不确定的高斯混合模型确定二型TSK模糊系统, 即用概率统计的方法设计二型模糊系统. 仿真实验结果表明利用不确定高斯混合模型设计的二型模糊系统比其它模型具有更强的抗噪性和更快的速度.  相似文献   

15.
For multimode processes, Gaussian mixture model (GMM) has been applied to estimate the probability density function of the process data under normal-operational condition in last few years. However, learning GMM with the expectation maximization (EM) algorithm from process data can be difficult or even infeasible for high-dimensional and collinear process variables. To address this issue, a novel multimode process monitoring approach based on PCA mixture model is proposed. First, the PCA technique is directly applied to the covariance matrix of each Gaussian component to reduce the dimension of process variables and to obtain nonsingular covariance matrices. Then the Bayesian Ying-Yang incremental EM algorithm is adopted to automatically optimize the number of mixture components. With the obtained PCA mixture model, a novel process monitoring scheme is derived for fault detection of multimode processes. Three case studies are provided to evaluate the monitoring performance of the proposed method.  相似文献   

16.
刘鑫  陈强  王兰豪  代伟 《自动化学报》2024,50(10):2022-2035
在现有的系统辨识算法中, 常用的高斯、学生氏t (Student's t, St)、拉普拉斯等噪声分布均呈现出对称的统计特性, 难以描述非对称性、有偏的输出噪声, 使得在非对称偏斜噪声条件下算法的性能下降. 基于此, 研究一类广义双曲倾斜学生氏t (Generalized hyperbolic skew student's t, GHSkewt)分布, 并在非对称偏斜噪声条件下, 提出一种线性系统鲁棒辨识算法. 首先, 对GHSkewt分布的重尾特性和偏斜特性进行详细阐述, 数学上证明了标准学生氏t分布可看作是GHSkewt分布的一个特例; 其次, 引入隐含变量将GHSkewt分布进行数学分解, 以方便算法的推导和实现; 最后, 在期望最大化(Expectation-maximization, EM)算法下, 重构具有隐含变量系统的代价函数, 通过迭代优化的方式, 不断从被污染数据集中学习过程的动态特性和噪声分布, 实现噪声参数和模型参数的联合估计.  相似文献   

17.
一种基于贪心EM算法学习GMM的聚类算法   总被引:2,自引:0,他引:2  
传统的聚类算法如k-means算法需要一些先验知识来确定初始参数,初始参数的选择通常会对聚类结果生产很大的影响.提出一种新的基于模型的聚类算法,通过优化给定的数据和数学模型之间的适应性发现数据对模型的最好匹配.由于高斯混合模型可以看作是一种"软分配聚类"方法,该算法结合一种贪心的EM算法来学习高斯混合模型(GMM),由贪心EM算法实现高斯混合模型结构和参数的自动学习,而不需要先验知识.这种聚类算法可以克服k-means等算法的缺点,实验结果表明该算法具有更好的聚类效果.  相似文献   

18.
基于分层高斯混合模型的半监督学习算法   总被引:10,自引:0,他引:10  
提出了一种基于分层高斯混合模型的半监督学习算法,半监督学习算法的学习样本包括已标记类别样本和未标记类别学习样本。如用高斯混合模型拟合每个类别已标记学习样本的概率分布,进而用高斯数为类别数的分层高斯混合模型拟合全部(已标记和未标记)学习样本的分布,则形成为一个基于分层的高斯混合模型的半监督学习问题。基于EM算法,首先利用每个类别已标记样本学习高斯混合模型,然后以该模型参数和已标记样本的频率分布作为分层高斯混合模型参数的初值,给出了基于分层高斯混合模型的半监督学习算法,以银行票据印刷体数字识别做实验,实验结果表明,本算法能够获得较好的效果。  相似文献   

19.
    
A composite multiple-model approach based on multivariate Gaussian process regression (MGPR) with correlated noises is proposed in this paper. In complex industrial processes, observation noises of multiple response variables can be correlated with each other and process is nonlinear. In order to model the multivariate nonlinear processes with correlated noises, a dependent multivariate Gaussian process regression (DMGPR) model is developed in this paper. The covariance functions of this DMGPR model are formulated by considering the “between-data” correlation, the “between-output” correlation, and the correlation between noise variables. Further, owing to the complexity of nonlinear systems as well as possible multiple-mode operation of the industrial processes, to improve the performance of the proposed DMGPR model, this paper proposes a composite multiple-model DMGPR approach based on the Gaussian Mixture Model algorithm (GMM-DMGPR). The proposed modelling approach utilizes the weights of all the samples belonging to each sub-DMGPR model which are evaluated by utilizing the GMM algorithm when estimating model parameters through expectation and maximization (EM) algorithm. The effectiveness of the proposed GMM-DMGPR approach is demonstrated by two numerical examples and a three-level drawing process of Carbon fiber production.  相似文献   

20.
韩光  孙宁  李晓飞  赵春霞 《计算机科学》2014,41(8):289-292,305
提出了一种基于改进的混合粒子群优化(particle swarm optimization,PSO)算法的高斯混合模型地形分类方法。高斯混合模型的求解通常是使用期望最大化算法(expectation maximization,EM),然而EM算法易陷入局部最优,收敛速度不稳定且对初值敏感。因此引入混合PSO算法,并对其进行了一系列改进。实验结果表明:改进后的算法较其它优化算法提高了全局搜索能力和收敛速度,利用该算法求解高斯混合模型可以提高参数估计的精度,并且在户外场景图像的地形分类实验中所提出的地形分类方法也表现优良。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号