首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 22 毫秒
1.
Multivariate outlier identification requires the choice of reliable cut-off points for the robust distances that measure the discrepancy from the fit provided by high-breakdown estimators of location and scatter. Multiplicity issues affect the identification of the appropriate cut-off points. It is described how a careful choice of the error rate which is controlled during the outlier detection process can yield a good compromise between high power and low swamping, when alternatives to the Family Wise Error Rate are considered. Multivariate outlier detection rules based on the False Discovery Rate and the False Discovery Exceedance criteria are proposed. The properties of these rules are evaluated through simulation. The rules are then applied to real data examples. The conclusion is that the proposed approach provides a sensible strategy in many situations of practical interest.  相似文献   

2.
在数据同化方法中,观测误差协方差矩阵是相关的,且与时间和状态有一定的依赖性。针对这种相关特性,将鲁棒滤波方法与观测误差协方差估计方法相结合,得到随状态时间变化的观测误差协方差,提出一种带有观测误差估计的鲁棒数据同化新方法,更新观测误差协方差,改善估计效果。从分析误差协方差,转移矩阵特征值放大等角度优化同化方法。利用非线性Lorenz-96混沌系统,对三种不同优化角度下带有观测误差估计的鲁棒滤波和原鲁棒滤波方法的鲁棒性和同化精度进行评估,并比较分析了两种方法在模型误差、观测数目和性能水平系数变化时的性能。结果表明:观测误差估计技术能够提高状态估计的精确性,带有观测误差估计的鲁棒滤波对系统参数变化具有较好的鲁棒性。  相似文献   

3.
Given n training examples, the training of a least squares support vector machine (LS-SVM) or kernel ridge regression (KRR) corresponds to solving a linear system of dimension n. In cross-validating LS-SVM or KRR, the training examples are split into two distinct subsets for a number of times (l) wherein a subset of m examples are used for validation and the other subset of (n-m) examples are used for training the classifier. In this case l linear systems of dimension (n-m) need to be solved. We propose a novel method for cross-validation (CV) of LS-SVM or KRR in which instead of solving l linear systems of dimension (n-m), we compute the inverse of an n dimensional square matrix and solve l linear systems of dimension m, thereby reducing the complexity when l is large and/or m is small. Typical multi-fold, leave-one-out cross-validation (LOO-CV) and leave-many-out cross-validations are considered. For five-fold CV used in practice with five repetitions over randomly drawn slices, the proposed algorithm is approximately four times as efficient as the naive implementation. For large data sets, we propose to evaluate the CV approximately by applying the well-known incomplete Cholesky decomposition technique and the complexity of these approximate algorithms will scale linearly on the data size if the rank of the associated kernel matrix is much smaller than n. Simulations are provided to demonstrate the performance of LS-SVM and the efficiency of the proposed algorithm with comparisons to the naive and some existent implementations of multi-fold and LOO-CV.  相似文献   

4.
Exploring process data with the use of robust outlier detection algorithms   总被引:3,自引:0,他引:3  
To implement on-line process monitoring techniques such as principal component analysis (PCA) or partial least squares (PLS), it is necessary to extract data associated with the normal operating conditions from the plant historical database for calibrating the models. One way to do this is to use robust outlier detection algorithms such as resampling by half-means (RHM), smallest half volume (SHV), or ellipsoidal multivariate trimming (MVT) in the off-line model building phase. While RHM and SHV are conceptually clear and statistically sound, the computational requirements are heavy. Closest distance to center (CDC) is proposed in this paper as an alternative for outlier detection. The use of Mahalanobis distance in the initial step of MVT for detecting outliers is known to be ineffective. To improve MVT, CDC is incorporated with MVT. The performance was evaluated relative to the goal of finding the best half of a data set. Data sets were derived from the Tennessee Eastman process (TEP) simulator. Comparable results were obtained for RHM, SHV, and CDC. Better performance was obtained when CDC is incorporated with MVT, compared to using CDC and MVT alone. All robust outlier detection algorithms outperformed the standard PCA algorithm. The effect of auto scaling, robust scaling and a new scaling approach called modified scaling were investigated. With the presence of multiple outliers, auto scaling was found to degrade the performance of all the robust techniques. Reasonable results were obtained with the use of robust scaling and modified scaling.  相似文献   

5.
Stochastic adaptive estimation and control algorithms involving recursive prediction estimates have guaranteed convergence rates when the noise is not ‘too’ coloured, as when a positive-real condition on the noise mode is satisfied. Moreover, the whiter the noise environment the more robust are the algorithms. This paper shows that for linear regression signal models, the suitable introduction of while noise into the estimation algorithm can make it more robust without compromising on convergence rates. Indeed, there are guaranteed attractive convergence rates independent of the process noise colour. No positive-real condition is imposed on the noise model.  相似文献   

6.
The present paper investigates the influence of both the imbalance ratio and the classifier on the performance of several resampling strategies to deal with imbalanced data sets. The study focuses on evaluating how learning is affected when different resampling algorithms transform the originally imbalanced data into artificially balanced class distributions. Experiments over 17 real data sets using eight different classifiers, four resampling algorithms and four performance evaluation measures show that over-sampling the minority class consistently outperforms under-sampling the majority class when data sets are strongly imbalanced, whereas there are not significant differences for databases with a low imbalance. Results also indicate that the classifier has a very poor influence on the effectiveness of the resampling strategies.  相似文献   

7.
Successful implementation of many control strategies is mainly based on accurate knowledge of the system and its parameters. Besides the stochastic nature of the systems, nonlinearity is one more feature that may be found in almost all physical systems. The application of extended Kalman filter for the joint state and parameter estimation of stochastic nonlinear systems is well known and widely spread. It is a known fact that in measurements, there are inconsistent observations with the largest part of population of observations (outliers). The presence of outliers can significantly reduce the efficiency of linear estimation algorithms derived on the assumptions that observations have Gaussian distributions. Hence, synthesis of robust algorithms is very important. Because of increased practical value in robust filtering as well as the rate of convergence, the modified extended Masreliez–Martin filter presents the natural frame for realization of the joint state and parameter estimator of nonlinear stochastic systems. The strong consistency is proved using the methodology of an associated ODE system. The behaviour of the new approach to joint estimation of states and unknown parameters of nonlinear systems in the case when measurements have non‐Gaussian distributions is illustrated by intensive simulations. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

8.
The contribution of non-linear orthogonal regression for estimation of individual pharmacokinetic parameters when drug concentrations and sampling times are subject to error was studied. The first objective was to introduce and compare four numerical approaches that involve different degrees of approximation for parameter estimation by orthogonal regression. The second objective was to compare orthogonal with non-orthogonal regression. These evaluations were based on simulated data sets from 300 'subjects', thereby enabling precision and accuracy of parameter estimates to be determined. The pharmacokinetic model was a one-compartment open model with first-order absorption and elimination rates. The inter-individual coefficients of variation (CV) of the pharmacokinetic parameters were in the range 33-100%. Eight measurement-error models for times and concentrations (homo- or heteroscedastic with constant CV) were considered. Accuracy of the four algorithms was very close in almost all instances (typical bias, 1-4%). Precision showed three expected trends: root mean squared error (RMSE) increased when the residual error was larger or the number of observations was smaller, and it was highest for the absorption rate constant and common error variance. Overall, RMSE ranged from 5 to 40%. It was found that the simplest algorithm for othogonal regression performed as well as the more complicated approaches. Errors in sampling time resulted in an increased bias and imprecision in individual parameter estimates (especially for k(a) in our example) and in common error variance when the estimation method did not take into account these errors. In this situation, use of orthogonal regression resulted in smaller bias and better precision.  相似文献   

9.
A statistical methodology to handle aggregate data is proposed. Aggregate data arise in many fields such as medical science, ecology, social science, reliability, etc. They can be described as follows: individuals are moving progressively along a finite set of states and observations are made in a time window split into several intervals. At each observation time, the only available information is the number of individuals in each state and the history of each item viewed as a stochastic process is thus lost. The time spent in a given state is unknown. Using a data completion technique, an estimation of the hazard rate in each state based on sojourn times is obtained and an estimation of the survival function is deduced. These methods are studied through simulations and applied to a data set. The simulation study shows that the algorithms involved in the methods converge and are robust.  相似文献   

10.
目前遥感图像分类算法面临的主要问题是分类精度与算法复杂度的矛盾及算法缺乏鲁棒性。为此,提出了一种基于特征空间重采样的非参数化核密度估计聚类与边缘检测相融合的多模型鲁棒性遥感图像分类方法。首先对遥感图像进行边缘检测以获取图像中每个像素的边缘梯度和方向信息;然后利用重采样策略,在联合域中对新的样本集合进行加权均值平移滤波,找到图像各区域的核密度函数局部最大值,通过迭代移动附近的数据点到此局部最大值;最后对各个分割区域进行合并,得到最终的分类图。实验结果表明,算法可获得高精度的遥感图像分类结果,且具有很强的鲁棒性。  相似文献   

11.
A General Method for Geometric Feature Matching and Model Extraction   总被引:1,自引:0,他引:1  
Popular algorithms for feature matching and model extraction fall into two broad categories: generate-and-test and Hough transform variations. However, both methods suffer from problems in practical implementations. Generate-and-test methods are sensitive to noise in the data. They often fail when the generated model fit is poor due to error in the data used to generate the model position. Hough transform variations are less sensitive to noise, but implementations for complex problems suffer from large time and space requirements and from the detection of false positives. This paper describes a general method for solving problems where a model is extracted from, or fit to, data that draws benefits from both generate-and-test methods and those based on the Hough transform, yielding a method superior to both. An important component of the method is the subdivision of the problem into many subproblems. This allows efficient generate-and-test techniques to be used, including the use of randomization to limit the number of subproblems that must be examined. Each subproblem is solved using pose space analysis techniques similar to the Hough transform, which lowers the sensitivity of the method to noise. This strategy is easy to implement and results in practical algorithms that are efficient and robust. We describe case studies of the application of this method to object recognition, geometric primitive extraction, robust regression, and motion segmentation.  相似文献   

12.
随着微博机器人账户的不断增多,对其识别检测已成为当前数据挖掘领域的热点问题。已有的微博机器人识别研究多使用爬取搜集的相关数据,在小规模平衡分布的机器人与普通用户数据集上训练并验证算法模型, 在样本分布不平衡的真实情况下存在局限性。重采样是一种针对不平衡数据集分类的常用技术,为探究重采样对 相关监督学习机器人识别算法的影响,该文以微热点数据挖掘竞赛的真实数据为基础,提出一种结合重采样的微 博机器人识别框架,在5种不同采样方式的基础上使用多种评价指标,综合评估了7种监督学习算法在不平衡验 证集上的分类性能。实验结果表明,以往基于小规模平衡样本数据训练的模型在真实情况下的Recall有较大降低,而结合重采样的算法框架能够大幅提高机器人账户的识别率,其中使用 NearMiss欠采样会让算法的 Recall大幅提升,而使用 ADASYN 过采样会让算法的 G_mean有所提高。一般而言,微博用户的发布时间、发布地域以及 发布时间间隔等属性是区分正常用户和机器人的重要特征属性。重采样调整了机器学习算法所依赖的特征属性, 从而获得更好的预测性能。  相似文献   

13.
Automated segmentation of images has been considered an important intermediate processing task to extract semantic meaning from pixels. We propose an integrated approach for image segmentation based on a generative clustering model combined with coarse shape information and robust parameter estimation. The sensitivity of segmentation solutions to image variations is measured by image resampling. Shape information is included in the inference process to guide ambiguous groupings of color and texture features. Shape and similarity-based grouping information is combined into a semantic likelihood map in the framework of Bayesian statistics. Experimental evidence shows that semantically meaningful segments are inferred even when image data alone gives rise to ambiguous segmentations.  相似文献   

14.
In this paper, we propose an affine parameter estimation algorithm from block motion vectors for extracting accurate motion information with the assumption that the undergoing motion can be characterized by an affine model. The motion may be caused either by a moving camera or a moving object. The proposed method first extracts motion vectors from a sequence of images by using size-variable block matching and then processes them by adaptive robust estimation to estimate affine parameters. Typically, a robust estimation filters out outliers (velocity vectors that do not fit into the model) by fitting velocity vectors to a predefined model. To filter out potential outliers, our adaptive robust estimation defines a continuous weight function based on a Sigmoid function. During the estimation process, we tune the Sigmoid function gradually to its hard-limit as the errors between the model and input data are decreased, so that we can effectively separate non-outliers from outliers with the help of the finally tuned hard-limit form of the weight function. Experimental results show that the suggested approach is very effective in estimating affine parameters reliably.  相似文献   

15.
A new theory applicable to data treatment is briefly exposed. This (gnostical) theory derives a mathematical model of data disturbed by uncertainty, the statistical model of which may be unknown or even unjustifiable. Gnostical theory is based on two simple axioms. It results in laws governing the uncertainty of each individual datum such as variational principles of virtual kinematics of real data and of their dynamics, closely related to entropy and information of data. Algorithms resulting from gnostical theory maximize the information obtained from data and yield data characteristics robust with respect to outlying or inlying data. Fields of application include the estimation of both location and scale parameters of small data samples and of their generalized correlations, robust estimation of probability and of probability distribution, non-linear discrete filtering, prediction and smoothing, identification of systems under strong disturbances and adaptive setting of alarm systems, robust identification of regression models, robust control systems etc. The main advantage of the new approach is that it leads to algorithms efficient even in applications to small samples of bad data.  相似文献   

16.
高婉玲  洪玫  杨秋辉  赵鹤 《计算机科学》2017,44(Z6):499-503, 533
近年来,统计模型检测技术已经得到了广泛的应用,不同的统计算法对统计模型检测的性能有所影响。主要对比不同统计算法对统计模型检测的时间开销影响,从而分析算法的适用环境。选择的统计算法包括切诺夫算法、序贯算法、智能概率估计算法、智能假设检验算法及蒙特卡罗算法。采用无线局域网协议验证和哲学家就餐问题的状态可达性验证为实例进行分析,使用PLASMA模型检测工具进行验证。实验结果表明,不同的统计算法在不同的环境中对模型检测的效率有不同的影响。序贯算法适用于状态可达性性质的验证,时间性能最优;智能假设检验算法与蒙特卡罗算法适合验证复杂模型。这一结论有助于在模型检测时对统计算法的选择,从而提高模型检测的效率。  相似文献   

17.
《Advanced Robotics》2013,27(4):585-604
In order to solve the simultaneous localization and mapping (SLAM) problem of mobile robots, the Rao–Blackwellized particle filter (RBPF) has been intensively employed. However, it suffers from particle depletion problem, i.e., the number of distinct particles becomes smaller during the SLAM process. As a result, the particles optimistically estimate the SLAM posterior, meaning that particles tend to underestimate their own uncertainty and the filter quickly becomes inconsistent. The main reason of loss of particle diversity is the resampling process of RBPF-SLAM. Standard resampling algorithms for RBPF-SLAM cannot preserve particle diversity due to the behavior of their removing and replicating particles. Thus, we propose rank-based resampling (RBR), which assigns selection probabilities to resample particles based on the rankings of particles. In addition, we provide an extensive analysis on the performance of RBR, including scheduling of resampling. Through the simulation results, we show that the estimation capability of RBPF-SLAM by RBR outperforms that by standard resampling algorithms. More importantly, RBR preserves particle diversity much longer, so it can prevent a certain particle from dominating the particle set and reduce the estimation errors. In addition, through consistency tests, it is shown that RBPF-SLAM by the standard resampling algorithms is optimistically inconsistent, but RBPF-SLAM by RBR is so pessimistically inconsistent that it gives a chance to reduce the estimation errors.  相似文献   

18.
Linear subspace methods that provide sufficient reconstruction of the data, such as PCA, offer an efficient way of dealing with missing pixels, outliers, and occlusions that often appear in the visual data. Discriminative methods, such as LDA, which, on the other hand, are better suited for classification tasks, are highly sensitive to corrupted data. We present a theoretical framework for achieving the best of both types of methods: an approach that combines the discrimination power of discriminative methods with the reconstruction property of reconstructive methods which enables one to work on subsets of pixels in images to efficiently detect and reject the outliers. The proposed approach is therefore capable of robust classification with a high-breakdown point. We also show that subspace methods, such as CCA, which are used for solving regression tasks, can be treated in a similar manner. The theoretical results are demonstrated on several computer vision tasks showing that the proposed approach significantly outperforms the standard discriminative methods in the case of missing pixels and images containing occlusions and outliers.  相似文献   

19.
刘一  刘本永 《计算机应用》2014,34(3):815-819
重采样是图像篡改中的一种典型操作,针对现有重采样伪作检测算法对于JPEG压缩格式的图像检测效果不理想,也无法准确估计其中的缩放因子的问题,提出了基于再采样的图像重采样检测算法。该算法先将待检验的JPEG图像以缩放因子小于1进行再次重采样,以削弱JPEG压缩对算法的影响,再利用重采样信号二阶导数具有的周期性来进行重采样操作检测。实验结果表明,该算法具有很强的抗JPEG压缩能力,同时能够准确估计真实的缩放因子。另外,该算法对于经过不同缩放因子而得到的图像进行合成时的重采样操作,也有明显的检测效果。  相似文献   

20.
The effect of the Bidirectional Reflectance Distribution Function (BRDF) is one of the most important factors in correcting the reflectance obtained from remotely sensed data. Estimation of BRDF model parameters can be deteriorated by various factors; contamination of the observations by undetected subresolution clouds or snow patches, inconsistent atmospheric correction in multiangular time series due to uncertainties in the atmospheric parameters, slight variations of the surface condition during a period of observation, for example due to soil moisture changes, diurnal effects on vegetation structure, and geolocation errors [Lucht and Roujean, 2000]. In the present paper, parameter estimation robustness is examined using Bidirectional Reflectance Factor (BRF) data measured for paddy fields in Japan. We compare both the M-estimator and the least median of squares (LMedS) methods for robust parameter estimation to the ordinary least squares method (LSM). In experiments, simulated data that were produced by adding noises to the data measured on the ground surface were used. Experimental results demonstrate that if a robust estimation is sought, the LMedS method can be adopted for the robust estimation of a BRDF model parameter.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号