首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Methods specifically targeting missing values in a wide spectrum of statistical analyses are now part of serious statistical thinking due to many advances in computational statistics and increased awareness among sophisticated consumers of statistics. Despite many advances in both theory and applied methods for missing data, missing-data methods in multilevel applications lack equal development. In this paper, I consider a popular inferential tool via multiple imputation in multilevel applications with missing values. I specifically consider missing values occurring arbitrarily at any level of observational units. I use Bayesian arguments for drawing multiple imputations from the underlying (posterior) predictive distribution of missing data. Multivariate extensions of well-known mixed-effects models form the basis for simulating the posterior predictive distribution, hence creating the multiple imputations. The discussion of these topics is demonstrated in an application assessing correlates to unmet need for mental health care among children with special health care needs.  相似文献   

2.
Knowledge discovery in ergonomics is complicated by the presence of missing data, because most methodologies do not tolerate incomplete sample instances. Data-miners cannot always remove sample instances when they occur. Imputation methods are needed to ‘fill in’ estimated values for the missing instances in order to construct a complete dataset. Even with emerging methodologies, the ergonomics field seems to rely on outdated imputation techniques. This survey presents an overview of a variety of imputation methods found in current academic research, which is not limited to ergonomic studies. The objective is to strengthen the communities’ understanding of imputation methodologies and briefly highlight their benefits and limitations. This survey suggests that the multiple imputation method is the current state-of-the-art missing value technique. This method has proven to be robust to many of the shortcomings that plague other methods and should be considered the primary choice for missing value problems found in ergonomic studies.  相似文献   

3.
This paper examines the impact of differential item functioning (DIF), missing item values, and different methods for handling missing item values on theta estimates with data simulated from the partial credit model and Andrich's rating scale model. Both Rasch family models are commonly used when obtaining an estimate of a respondent's attitude. The degree of missing data, DIF magnitude, and the percentage of DIF items were varied in MCAR data conditions in which the focal group was 10% of the total population. Four methods for handling missing data were compared: complete-case analysis, mean substitution, hot-decking, and multiple imputation. Bias, RMSE, means, and standard errors of the theta estimates for the focal group were adversely affected by the amount and magnitude of DIF items. RMSE and fidelity coefficients for both the reference and focal group were adversely impacted by the amount of missing data. While all methods of handling missing data performed fairly similarly, multiple imputation and hot-decking showed slightly better performance.  相似文献   

4.
In production data, missing values commonly appear for several reasons including changes in measurement and inspection items, sampling inspections, and unexpected process events. When applied to product failure prediction, the incompleteness of data should be properly addressed to avoid performance degradation in prediction models. Well-known approaches for missing data treatment, such as elimination and imputation, would not perform well under usual scenarios in production data, including high missing rate, systematic missing and class imbalance. To address these limitations, here we present a method for predictive modelling with missing data by considering the characteristics of production data. It builds multiple prediction models on different complete data subsets derived from the original data-set, each of which has different coverage of instances and input variables. These models are selectively used to make predictions for new instances with missing values. We demonstrate the effectiveness of the proposed method through a case study using actual data-sets from a home appliance manufacturer.  相似文献   

5.
Based on the two-dimensional relation table, this paper studies the missing values in the sample data of land price of Shunde District of Foshan City. GeoDa software was used to eliminate the insignificant factors by stepwise regression analysis; NORM software was adopted to construct the multiple imputation models; EM algorithm and the augmentation algorithm were applied to fit multiple linear regression equations to construct five different filling datasets. Statistical analysis is performed on the imputation data set in order to calculate the mean and variance of each data set, and the weight is determined according to the differences. Finally, comprehensive integration is implemented to achieve the imputation expression of missing values. The results showed that in the three missing cases where the PRICE variable was missing and the deletion rate was 5%, the PRICE variable was missing and the deletion rate was 10%, and the PRICE variable and the CBD variable were both missing. The new method compared to the traditional multiple filling methods of true value closer ratio is 75% to 25%, 62.5% to 37.5%, 100% to 0%. Therefore, the new method is obviously better than the traditional multiple imputation methods, and the missing value data estimated by the new method bears certain reference value.  相似文献   

6.
Surveys are mainly conducted to obtain valuable information on some criteria from a specified population. But, the survey results often become biased due to non-response of the subjects under study for highly significant attributes. Such non-ignorable missingness need to be treated and the actual values should be retrieved. Many methods have already been proposed for handling missing values in either discrete or continuous attributes. But, there exists a large gap in handling non-ignorable missing values in datasets with mixed attributes. With the intent of addressing this gap, this paper proposes a methodology called as Bayesian Genetic Algorithm (BAGEL) with hybridized Bayesian and Genetic Algorithm principles. In BAGEL, the initial population is generated using Bayesian model and fitness values of the chromosomes are evaluated using Bayesian principles. BAGEL is implemented in real datasets for imputing both discrete and continuous missing values and the imputation accuracy is observed. The experimental results show the superior performance of BAGEL than other standard imputation techniques. Statistical tests conducted to validate the experimental results also prove that BAGEL outperforms at all missing rates from 5% to 50%.  相似文献   

7.
The effect of the methods for handling missing values on the performance of Phase I multivariate control charts has not been investigated. In this paper, we discuss the effect of four imputation methods: mean substitution, regression, stochastic regression and the expectation maximization algorithm. Estimates of mean vector and variance covariance matrix from the treated data set are used to estimate the unknown parameters in the Hotelling's T2 chart statistic. Based on a Monte Carlo simulation study, the performance of each of the four methods is investigated in terms of its ability to obtain the nominal in‐control and out‐of‐control overall probability of a signal. We consider three sample sizes, five levels of the percentage of missing values and three types of variable numbers. Our simulation results show that the stochastic regression method has the best overall performance among all the competing methods. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

8.
Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research.  相似文献   

9.
The class of exponential smoothing models which vary the values of their parameters to adapt to changing conditions in a time series are referred to as adaptive forecasting techniques. In this article criteria for evaluating forecasting models are presented and the features of a simple exponential smoothing model that are exploited by the adaptive techniques are discussed. Several adaptive forecasting schemes are described and classified, and examples of the performance of these techniques are presented.  相似文献   

10.
Many methods are available for air quality forecasting based on statistical and back trajectory models which require past time series data. Future air quality prediction through models is the best tool to make rational decisions by policy maker. Limited work has been done on air quality forecasting using dispersion models which require better meteorological boundary conditions. The Weather Research and Forecasting (WRF) and American Meteorological Society/Environmental Policy Agency Regulatory Model (AERMOD) models have not yet been combined for air quality forecasting. Here, a case study has been carried out to forecast air quality using onsite meteorological data from WRF model and a dispersion model named AERMOD. Prior to the use of AERMOD, a comprehensive emission inventory has been prepared for all the sources in the study region Chembur of Mumbai city. Chembur has been notified as the “air pollution control region” by local authority due to high levels of air pollution caused by the presence of four major industries, six major roads in addition to a crematorium and a biomedical waste incineration facility. The WRF–AERMOD system was applied for prediction of concentration levels of pollutants SO2, NO x and PM10. A reasonable agreement was obtained when predicted values were compared with observed data. Results of the study indicated that forecasting of air quality can be carried out using AERMOD with forecasted meteorological parameters derived from WRF without any requirement of past time series air quality data. Such kind of forecasting method can be used for air quality management of any region by policy makers.  相似文献   

11.
A multivariate data matrix containing a number of missing values was obtained from a study on the changes in colour and phenolic composition during the ageing of port. Two approaches were taken in the analysis of the data. The first involved the use of multiple imputation (MI) followed by principal components analysis (PCA). The second examined the use of maximum likelihood principal component analysis (MLPCA). The use of multiple imputation allows for missing value uncertainty to be incorporated into the analysis of the data. Initial estimates of missing values were firstly calculated using the Expectation Maximization algorithm (EM), followed by Data Augmentation (DA) in order to generate five imputed data matrices. Each complete data matrix was subsequently analysed by PCA, then averaging their principal component (PC) scores and loadings to give an estimation of errors. The first three PCs accounted for 93.3% of the explained variance. Changes to colour and monomeric anthocyanin composition were explained on PC1 (79.63% explained variance), phenolic composition and hue mainly on PC2 (8.61% explained variance) and phenolic composition and the formation of polymeric pigment on PC3 (5.04% explained variance). In MLPCA estimates of measurement uncertainty is incorporated in the decomposition step, with missing values being assigned large measurement uncertainties. PC scores on the first two PCs after multiple imputation and PCA (MI+PCA) were comparable to maximum likelihood scores on the first two PCs extracted by MLPCA.  相似文献   

12.
In this study, missing value analysis and homogeneity tests were conducted for 267 precipitation stations throughout Turkey. For this purpose, the monthly and annual total precipitation records at stations operated by Turkish State Meteorological Service (DMI) from 1968 to 1998 were considered. In these stations, precipitation records for each month was investigated separately and the stations with missing values for too many years were eliminated. The missing values of the stations were completed by Expectation Maximization (EM) method by using the precipitation records of the nearest gauging station. In this analysis, 38 stations were eliminated because they had missing values for more than 5 years, 161 stations had no missing values and missing precipitation values were completed in the remaining 68 stations. By this analysis, annual total precipitation data were obtained by using the monthly values. These data should be hydrologically and statistically reliable for later hydrological, meteorological, climate change modelling and forecasting studies. For this reason, Standard Normal Homogeneity Test (SNHT), (Swed-Eisenhart) Runs Test and Pettitt homogeneity tests were applied for the annual total precipitation data at 229 gauging stations from 1968 to 1998. The results of each of the testing methods were evaluated separately at a significance level of 95% and the inhomogeneous years were determined. With the application of the aforementioned methods, inhomogeneity was detected at 50 stations of which the natural structure was deteriorated and 179 stations were found to be homogeneous.  相似文献   

13.
Time series analysis methods have been applied to a large number of practical problems, including modeling and forecasting economic time series and process and quality control. One aspect of time series analysis is the use of discrete linear transfer functions to model the interrelationships between input and output time series. This paper is an introduction to the identification, estimation, and diagnostic checking of these models. Some aspects of forecasting with transfer function models are also discussed. A survey of intervention analysis models in which the input series is an indicator variable corresponding to an isolated event thought to influence the output is also given. Familiarity with univariate autoregressive integrated moving average modeling is assumed. Extensions to more general multiple time series analysis methods are also briefly discussed.  相似文献   

14.
Economic time series are of two types: stock and flows, and may be available at different levels of aggregation (for instance, monthly or quarterly). The economist, in many situations, is interested in forecasting the aggregated observations. The forecast function, in this case, can be based either on the disaggregated series or the aggregated series. The forecasts based on the disaggregated data are at least as efficient, in terms of mean squared forecast errors, as the forecasts based on temporally aggregated observations when the data generating process (DGP) is a known ARIMA process. However, the effect of outliers on both forecast functions is not known. In this paper, we consider the effect of additive and innovation outliers on forecasting aggregated values based on aggregated and disaggregated models when the DGP is a known ARIMA process and the presence of the outliers is ignored. Results when the model is not known and tests applied for the detection of outliers are derived through simulation.  相似文献   

15.
Forecasting Method for Product Reliability Along with Performance Data   总被引:3,自引:0,他引:3  
The existing reliability theory is based on current knowledge of probability distributions and trends of product performances. This article proposes a forecasting method of the product reliability along with the performance data, without any prior information on probability distributions and trends. Fusing an evaluating indicator with five chaotic forecasting methods, five runtime data of the future performance are predicted by current performance data. Via the bootstrap, many generated runtime data along with the performance data are gained, and the predicted reliability function of the product runtime can therefore be established. The experimental investigation on the rolling bearing friction torque shows that the calculated values are in very good accordance with the measured values.  相似文献   

16.
The parameter selection is very important for successful modelling of input-output relationship in a function approximation model. In this study, support vector machine (SVM) has been used as a function approximation tool for a price series and genetic algorithm (GA) has been utilised for optimisation of the parameters of the SVM model. Instead of using single time series, separate time series for each trading interval has been employed to model each day-s price profile, and SVM parameters of these separate series have been optimised using GA. The developed model has been applied to two large power systems from National electricity market (NEM) of Australia. The forecasting performance of the proposed model has been compared with a heuristic technique, a linear regression model and the other reported works in the literature. Effect of price volatility on the performance of the models has also been analysed. Testing results show that the proposed GA-SVM model has better forecasting ability than the other forecasting techniques.  相似文献   

17.
使用回答概率的回归插补   总被引:1,自引:0,他引:1  
对于缺失数据,本文根据目标变量和辅助变量的无回答者总体总量的无偏估计,利用再抽样(复制)技术,构造了使用回答概率的回归插补;进而,利用再抽样(复制)技术,得到了该插补估计的方差估计;并进行了大量模拟,模拟结果表明使用回答概率的回归插补估计及其方差估计具有良好的性质。  相似文献   

18.
Non‐Gaussian dynamic models are proposed to analyse time series of counts. Three models are proposed for responses generated by a Poisson, a negative binomial, and a mixture of Poisson distributions. The parameters of these distributions are allowed to vary dynamically according to state space models. Particle filters or sequential Monte Carlo methods are used for inference and forecasting purposes. The performance of the proposed methodology is evaluated by two simulation studies for the Poisson and the negative binomial models. The methodology is illustrated by considering data consisting of medical contacts of schoolchildren suffering from asthma in England.  相似文献   

19.
One step-ahead ANFIS time series model for forecasting electricity loads   总被引:2,自引:1,他引:1  
In electric industry, electricity loads forecasting has become more and more important, because demand quantity is a major determinant in electricity supply strategy. Furthermore, accurate regional loads forecasting is one of principal factors for electric industry to improve the management performance. Recently, time series analysis and statistical methods have been developed for electricity loads forecasting. However, there are two drawbacks in the past forecasting models: (1) conventional statistical methods, such as regression models are unable to deal with the nonlinear relationships well, because of electricity loads are known to be nonlinear; and (2) the rules generated from conventional statistical methods (i.e., ARIMA), and artificial intelligence technologies (i.e., support vector machines (SVM) and artificial neural networks (ANN)) are not easily comprehensive for policy-maker. Based on these reasons above, this paper proposes a new model, which incorporates one step-ahead concept into adaptive-network-based fuzzy inference system (ANFIS) to build a fusion ANFIS model and enhances forecasting for electricity loads by adaptive forecasting equation. The fuzzy if-then rules produced from fusion ANFIS model, which can be understood for human recognition, and the adaptive network in fusion ANFIS model can deal with the nonlinear relationships. This study optimizes the proposed model by adaptive network and adaptive forecasting equation to improve electricity loads forecasting accuracy. To evaluate forecasting performances, six different models are used as comparison models. The experimental results indicate that the proposed model is superior to the listing models in terms of mean absolute percentage errors (MAPE).  相似文献   

20.
Financial forecasting is an important and challenging task for both academic researchers and business practitioners. The recent trend to improve the prediction accuracy is to combine individual forecasts using a simple average or weighted average where the weight reflects the inverse of the prediction error. In the existing combining methods, however, the errors between actual and predicted values are equally reflected in the weights regardless of the time order in a forecasting horizon. In this paper, we propose a new approach where the forecasting results of Generalized AutoRegressive Conditional Heteroskedastic (GARCH), neural network, and random walk models are combined based on a weight that reflects the inverse of the exponentially weighted moving average of the Mean Absolute Percentage Error (MAPE) of each individual prediction model. The results of an empirical study indicate that the proposed method has a better accuracy than the GARCH, neural network, and random walk models, and also combining methods based on using the MAPE for the weight.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号