首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This article gives a robust estimator of the trend parameters in multivariate spatial linear models. This estimator is presented as an alternative to the classical one which is obtained by using cokriging. The goal focuses on improving predictions of spatial variables when data contain both atypical and high influence observations. The procedure consists of extending robust methods used in linear regression models to the multivariate spatial context. The resulting estimator belongs to the class of GM-estimators and then, it is a bounded influence estimator and it has good robust properties, in particular, a high breakdown point and a high efficiency. An illustrative example is given to show how the proposed estimator works. Research partially supported by Ministerio de Ciencia y Tecnología, Project AGL2000-0978.  相似文献   

2.
A methodology to estimate overall travel time from individual travel time measurements within a time window is presented. To better handle data with complex outlier generation mechanisms, fuzzy clustering techniques have been used to represent relationships between individual travel time data collected within a measuring time window. The data set is considered to be a fuzzy set to which each data point belongs at some degrees of membership. This allows transitions from the main body of data to extreme data points to be treated in a smooth and fuzzy fashion. Two algorithms have been developed based on `point? and `line? fuzzy cluster prototypes. Iterative procedures have been developed to calculate the fuzzy cluster centre and the fuzzy line. A novel estimation method based on time projection of a fuzzy line has been proposed. The method has the advantage of being robust by using a wide time window and the timeliness by employing time projection in resolving the most recent travel time estimation. Unlike deterministic approaches where hard thresholds need to be specified in order to exclude outliers, the proposed methods estimate travel times using all available data and, thus, can be applied in a wide variety of scenarios without fine tuning of the threshold.  相似文献   

3.
方俊涛  何桢  宋琳曦  张阳 《工业工程》2012,15(3):98-103
响应曲面方法是生产过程改进和优化的一种非常有效的方法。在传统的响应曲面模型的建立过程中,通常假定随机误差服从正态分布且相互独立具有相同的方差。但是实际生产中随机误差的方差并不是完全相同,观测值中会存在异常点,这就需要稳健的估计方法来抑制异常点对模型估计的影响。为了降低异常点对响应曲面模型最优值的影响,针对响应曲面方法中的中心复合设计,〖JP2〗充分考虑到不同实验设计位置上可能出现异常点的情况,对稳健M 回归方法:Huber 估计、Tukey 估计和Welsch 估计进行了理论比较研究。研究结果表明Welsch和Tukey 估计能有效改善异常点对响应曲面模型最优值的影响,消弱异常点对中心复合设计的干扰。通过一个来自化工方面的案例,计算了中心复合设计不同位置存在异常点与不存在异常点时,响应曲面模型的最优值,对比分析得出当异常点与响应均值的偏离程度较大时(10倍标准差),稳健M 估计尤其是Welsch和Tukey 估计显著提高响应曲面建模的稳健性。  相似文献   

4.
Outliers are one of the main concerns in statistics. Parametric identification results of ordinary least squares are sensitive to outliers. Many robust estimators have been proposed to overcome this problem but there are still some drawbacks in existing methods. In this paper, a novel probabilistic method is proposed for robust parametric identification and outlier detection in linear regression problems. The crux of this method is to calculate the probability of outlier, which quantifies how probable it is that a data point is an outlier. There are several appealing features of the proposed method. First, not only the optimal values of the parameters and residuals but also the associated uncertainties are taken into account for outlier detection. Second, the size of the dataset is incorporated because it is one of the key variables to determine the probability of obtaining a large-residual data point. Third, the proposed method requires no information on the outlier distribution model. Fourth, the proposed approach provides the probability of outlier. In the illustrative examples, the proposed method is compared with three well-known methods. It turns out that the proposed method is substantially superior and it is capable of robust parametric identification and outlier detection even for very challenging situations.  相似文献   

5.
The two‐parameter Weibull distribution is one of the most widely applied probability distributions, particularly in reliability and lifetime modelings. Correct estimation of the shape parameter of the Weibull distribution plays a central role in these areas of statistical analysis. Many different methods can be used to estimate this parameter, most of which utilize regression methods. In this paper, we presented various regression methods for estimating the Weibull shape parameter and an experimental study using classical regression methods to compare the results of the methods. A complete list of the parameter estimators considered in this study is as follows: ordinary least squares (OLS), weighted least squares (WLS, Bergman, F&T, Lu), non‐parametric robust Theil's (Theil) and weighted Theil's (WeTheil), robust Winsorized least squares (WinLS), and M‐estimators (Huber, Andrew, Tukey, Cauchy, Welsch, Hampel and Logistic). Estimator performances were compared based on bias and mean square error criteria using Monte‐Carlo simulations. The simulation results demonstrated that for small, complete, and non‐outlier data sets, the Bergman, F&T, and Lu estimators are more efficient than the others. When the data set contains one or two outliers in the X direction, Theil is the most efficient estimator. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

6.
Chebyshev-inequality-based convex relaxations of Chance-Constrained Programs (CCPs) are shown to be useful for learning classifiers on massive datasets. In particular, an algorithm that integrates efficient clustering procedures and CCP approaches for computing classifiers on large datasets is proposed. The key idea is to identify high density regions or clusters from individual class conditional densities and then use a CCP formulation to learn a classifier on the clusters. The CCP formulation ensures that most of the data points in a cluster are correctly classified by employing a Chebyshev-inequality-based convex relaxation. This relaxation is heavily dependent on the second-order statistics. However, this formulation and in general such relaxations that depend on the second-order moments are susceptible to moment estimation errors. One of the contributions of the paper is to propose several formulations that are robust to such errors. In particular a generic way of making such formulations robust to moment estimation errors is illustrated using two novel confidence sets. An important contribution is to show that when either of the confidence sets is employed, for the special case of a spherical normal distribution of clusters, the robust variant of the formulation can be posed as a second-order cone program. Empirical results show that the robust formulations achieve accuracies comparable to that with true moments, even when moment estimates are erroneous. Results also illustrate the benefits of employing the proposed methodology for robust classification of large-scale datasets.  相似文献   

7.
Discussion     
Quantitative high throughput screening (qHTS) assays use cells or tissues to screen thousands of compounds in a short period of time. Data generated from qHTS assays are then evaluated using nonlinear regression models, such as the Hill model, and decisions regarding toxicity are made using the estimates of the parameters of the model. For any given compound, the variability in the observed response may either be constant across dose groups (homoscedasticity) or vary with dose (heteroscedasticity). Since thousands of compounds are simultaneously evaluated in a qHTS assay, it is not practically feasible for an investigator to perform residual analysis to determine the variance structure before performing statistical inferences on each compound. Since it is well known that the variance structure plays an important role in the analysis of linear and nonlinear regression models, it is therefore important to have practically useful and easy to interpret methodology that is robust to the variance structure. Furthermore, given the number of chemicals that are investigated in the qHTS assay, outliers and influential observations are not uncommon. In this article, we describe preliminary test estimation (PTE)-based methodology that is robust to the variance structure as well as any potential outliers and influential observations. Performance of the proposed methodology is evaluated in terms of false discovery rate (FDR) and power using a simulation study mimicking a real qHTS data. Of the two methods currently in use, our simulations studies suggest that one is extremely conservative with very small power in comparison to the proposed PTE-based method whereas the other method is very liberal. In contrast, the proposed PTE-based methodology achieves a better control of FDR while maintaining good power. The proposed methodology is illustrated using a dataset obtained from the National Toxicology Program (NTP). Additional information, simulation results, data, and computer code are available online as supplementary materials.  相似文献   

8.
The objective of this paper is to propose a cluster analysis methodology for measuring the performance of research activities in terms of productivity, visibility, quality, prestige and international collaboration. The proposed methodology is based on bibliometric techniques and permits a robust multi-dimensional cluster analysis at different levels. The main goal is to form different clusters, maximizing within-cluster homogeneity and between-cluster heterogeneity. The cluster analysis methodology has been applied to the Spanish public universities and their academic staff in the computer science area. Results show that Spanish public universities fall into four different clusters, whereas academic staff belong into six different clusters. Each cluster is interpreted as providing a characterization of research activity by universities and academic staff, identifying both their strengths and weaknesses. The resulting clusters could have potential implications on research policy, proposing collaborations and alliances among universities, supporting institutions in the processes of strategic planning, and verifying the effectiveness of research policies, among others.  相似文献   

9.
This paper develops a methodology for robust Bayesian inference through the use of disparities. Metrics such as Hellinger distance and negative exponential disparity have a long history in robust estimation in frequentist inference. We demonstrate that an equivalent robustification may be made in Bayesian inference by substituting an appropriately scaled disparity for the log likelihood to which standard Monte Carlo Markov Chain methods may be applied. A particularly appealing property of minimum-disparity methods is that while they yield robustness with a breakdown point of 1/2, the resulting parameter estimates are also efficient when the posited probabilistic model is correct. We demonstrate that a similar property holds for disparity-based Bayesian inference. We further show that in the Bayesian setting, it is also possible to extend these methods to robustify regression models, random effects distributions and other hierarchical models. These models require integrating out a random effect; this is achieved via MCMC but would otherwise be numerically challenging. The methods are demonstrated on real-world data.  相似文献   

10.
The developments in linear regression methodology that have taken place during the 25-year history of Technometrics are summarized. Major topics covered are variable selection, biased estimation, robust estimation, and regression diagnostics.  相似文献   

11.
The Weibull shape parameter is important in reliability estimation as it characterizes the ageing property of the system. Hence, this parameter has to be estimated accurately. This paper presents a study of the efficiency of using robust regression methods over the ordinary least‐squares regression method based on a Weibull probability plot. The emphasis is on the estimation of the shape parameter of the two‐parameter Weibull distribution. Both the case of small data sets with outliers and the case of data sets with multiple‐censoring are considered. Maximum‐likelihood estimation is also compared with linear regression methods. Simulation results show that robust regression is an effective method in reducing bias and it performs well in most cases. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

12.
应用Voronoi图的点群范围自动识别   总被引:2,自引:0,他引:2  
运用动态阈值对点群Delaunay三角网进行"蚕食"处理,获得不同视觉邻近距离下的点群分布范围多边形,在多边形边界外侧构建一系列虚拟边界点,在构建中使用两个重要参数:扩展距离、扩展方向.由虚拟边界点与初始点群共同组成新点群,通过对新点群Voronoi图的构建,确定边界点Voronoi区域的准确范围.最后,以点群状分布的教育资源优化配置为例,说明了该方法的具体应用.  相似文献   

13.
We consider change‐point detection and estimation in sequences of functional observations. This setting often arises when the quality of a process is characterized by such observations, called profiles, and monitoring profiles for changes in structure can be used to ensure the stability of the process over time. While interest in phase II profile monitoring has grown, few methods approach the problem from a Bayesian perspective. We propose a wavelet‐based Bayesian methodology that bases inference on the posterior distribution of the change point without placing restrictive assumptions on the form of profiles. By obtaining an analytic form of this posterior distribution, we allow the proposed method to run online without using Markov chain Monte Carlo (MCMC) approximation. Wavelets, an effective tool for estimating nonlinear signals from noise‐contaminated observations, enable us to flexibly distinguish between sustained changes in profiles and the inherent variability of the process. We analyze observed profiles in the wavelet domain and consider two possible prior distributions for coefficients corresponding to the unknown change in the sequence. These priors, previously applied in the nonparametric regression setting, yield tuning‐free choices of hyperparameters. We present additional considerations for controlling computational complexity over time and their effects on performance. The proposed method significantly outperforms a relevant frequentist competitor on simulated data.  相似文献   

14.
主动声纳通常采用分裂波束双通道互谱法进行目标方位估计,但信号处理过程中目标回波脉冲长度往往和信号处理帧周期长度不匹配,或者在整个信号处理帧周期的信噪比不一致,造成瞬时方位估计目标走向算法具有较大的误差,不利于目标参量估计与分类。采用Robust统计对瞬时方位估计数据进行处理,能够有效降低目标方位走向估计的误差。本文通过对Huber估计进行了分析并提出了改进Huber估计,给出了其工程实现算法,实验验证该Robust统计在降低目标方位走向估计误差上具有稳健收敛的性能。  相似文献   

15.
Use of Hotelling's T2 charts with high breakdown robust estimates to monitor multivariate individual observations are the recent trend in the control chart methodology. Vargas (J. Qual. Tech. 2003; 35: 367‐376) introduced Hotelling's T2 charts based on the minimum volume ellipsoid (MVE) and the minimum covariance determinant (MCD) estimates to identify outliers in Phase I data. Studies carried out by Jensen et al. (Qual. Rel. Eng. Int. 2007; 23: 615‐629) indicated that the performance of these charts heavily depends on the sample size, amount of outliers and the dimensionality of the Phase I data. Chenouri et al. (J. Qual. Tech. 2009; 41: 259‐271) recently proposed robust Hotelling's T2 control charts for monitoring Phase II data based on the reweighted MCD (RMCD) estimates of the mean vector and covariance matrix from Phase I. They showed that Phase II RMCD charts have better performance compared with Phase II standard Hotelling's T2 charts based on outlier free Phase I data, where the outlier free Phase I data were obtained by applying MCD and MVE T2 charts to historical data. Reweighted MVE (RMVE) and S‐estimators are two competitors of the RMCD estimators and it is a natural question whether the performance of Phase II Hotelling's T2 charts with RMCD and RMVE estimates exhibits similar pattern observed by Jensen et al. (Qual. Rel. Eng. Int. 2007; 23: 615‐629) in the case of MCD and MVE‐based Phase I Hotelling's T2 charts. In this paper, we conduct a comparative study to assess the performance of Hotelling's T2 charts with RMCD, RMVE and S‐estimators using large number of Monte Carlo simulations by considering different data scenarios. Our results are generally in favor of the RMCD‐based charts irrespective of sample size, outliers and dimensionality of Phase I data. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

16.
Chen L  Sze SK  Yang H 《Analytical chemistry》2006,78(14):5006-5018
This paper describes a new automated intensity descent algorithm for analysis of complex high-resolution mass spectra. The algorithm has been successfully applied to interpret Fourier transform mass spectra of proteins; however, it should be generally applicable to complex high-resolution mass spectra of large molecules recorded by other instruments. The algorithm locates all possible isotopic clusters by a novel peak selection method and a robust cluster subtraction technique according to the order of descending peak intensity after global noise level estimation and baseline correction. The peak selection method speeds up charge state determination and isotopic cluster identification. A Lorentzian-based peak subtraction technique resolves overlapping clusters in high peak density regions. A noise flag value is introduced to minimize false positive isotopic clusters. Moreover, correlation coefficients and matching errors between the identified isotopic multiplets and the averagine isotopic abundance distribution are the criteria for real isotopic clusters. The best fitted averagine isotopic abundance distribution of each isotopic cluster determines the charge state and the monoisotopic mass. Three high-resolution mass spectra were interpreted by the program. The results show that the algorithm is fast in computational speed, robust in identification of overlapping clusters, and efficient in minimization of false positives. In approximately 2 min, the program identified 611 isotopic clusters for a plasma ECD spectrum of carbonic anhydrase. Among them, 50 new identified isotopic clusters, which were missed previously by other methods, have been discovered in the high peak density regions or as weak clusters by this algorithm. As a result, 18 additional new bond cleavages have been identified from the 50 new clusters of carbonic anhydrase.  相似文献   

17.
Near‐field plasmonic coupling and local field enhancement in metal nanoarchitectures, such as arrangements of nanoparticle clusters, have application in many technologies from medical diagnostics, solar cells, to sensors. Although nanoparticle‐based cluster assemblies have exhibited signal enhancements in surface‐enhanced Raman scattering (SERS) sensors, it is challenging to achieve high reproducibility in SERS response using low‐cost fabrication methods. Here an innovative method is developed for fabricating self‐organized clusters of metal nanoparticles on diblock copolymer thin films as SERS‐active structures. Monodisperse, colloidal gold nanoparticles are attached via a crosslinking reaction on self‐organized chemically functionalized poly(methyl methacrylate) domains on polystyrene‐block‐poly(methyl methacrylate) templates. Thereby nanoparticle clusters with sub‐10‐nanometer interparticle spacing are achieved. Varying the molar concentration of functional chemical groups and crosslinking agent during the assembly process is found to affect the agglomeration of Au nanoparticles into clusters. Samples with a high surface coverage of nanoparticle cluster assemblies yield relative enhancement factors on the order of 109 while simultaneously producing uniform signal enhancements in point‐to‐point measurements across each sample. High enhancement factors are associated with the narrow gap between nanoparticles assembled in clusters in full‐wave electromagnetic simulations. Reusability for small‐molecule detection is also demonstrated. Thus it is shown that the combination of high signal enhancement and reproducibility is achievable using a completely non‐lithographic fabrication process, thereby producing SERS substrates having high performance at low cost.  相似文献   

18.
Clustering procedures allowing for general covariance structures of the obtained clusters need some constraints on the solutions. With this in mind, several proposals have been introduced in the literature. The TCLUST procedure works with a restriction on the “eigenvalues-ratio” of the clusters scatter matrices. In order to try to achieve robustness with respect to outliers, the procedure allows to trim off a proportion α of the most outlying observations. The resistance to infinitesimal contamination of the TCLUST has already been studied. This paper aims to look at its resistance to a higher amount of contamination by means of the study of its breakdown behavior. The rather new concept of restricted breakdown point will demonstrate that the TCLUST procedure resists to a proportion α of contamination as soon as the data set is sufficiently “well clustered”.  相似文献   

19.
A Phase I estimator of the dispersion should be efficient under in‐control data and robust against contaminations. Most estimation methods proposed in the literature are either efficient or robust against either sustained shifts or scattered disturbances. In this article, we propose a new estimation method of the dispersion parameter, based on exponentially weighted moving average charting, which is efficient and robust to both types of unacceptable observations in Phase I. We compare the method with various existing estimation methods and show that the proposed method has the best overall performance if it is unknown what type of contaminations are present in Phase I. We also study the effect of the robust estimator from Phase I on the Phase II exponentially weighted moving average control chart performance. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

20.
Accident data sets can include some unusual data points that are not typical of the rest of the data. The presence of these data points (usually termed outliers) can have a significant impact on the estimates of the parameters of safety performance functions (SPFs). Few studies have considered outliers analysis in the development of SPFs. In these studies, the practice has been to identify and then exclude outliers from further analysis. This paper introduces alternative mixture models based on the multivariate Poisson lognormal (MVPLN) regression. The proposed approach presents outlier resistance modeling techniques that provide robust safety inferences by down-weighting the outlying observations rather than rejecting them. The first proposed model is a scale-mixture model that is obtained by replacing the normal distribution in the Poisson-lognormal hierarchy by the Student t distribution, which has heavier tails. The second model is a two-component mixture (contaminated normal model) where it is assumed that most of the observations come from a basic distribution, whereas the remaining few outliers arise from an alternative distribution that has a larger variance. The results indicate that the estimates of the extra-Poisson variation parameters were considerably smaller under the mixture models leading to higher precision. Also, both mixture models have identified the same set of outliers. In terms of goodness-of-fit, both mixture models have outperformed the MVPLN. The outlier rejecting MVPLN model provided a superior fit in terms of a much smaller DIC and standard deviations for the parameter estimates. However, this approach tends to underestimate uncertainty by producing too small standard deviations for the parameter estimates, which may lead to incorrect conclusions. It is recommended that the proposed outlier resistance modeling techniques be used unless the exclusion of the outlying observations can be justified because of data related reasons (e.g., data collection errors).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号