首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The use of ordinary least squares estimators (OLS) in regression analysis is widespread. The OLS estimates are, however, very sensitive to the presence of large disturbances. As an alternative to the OLS estimator, the minimum absolute deviation estimator (MAD) is studied. The purpose of this study is, first, to determine the effect of error distributions, with progressively heavier tails starting from the normal distribution and ending with the Cauchy distribution, on the performance of the MAD estimates and the OLS estimates. This provides a framework for when to choose the MAD estimator over the OLS estimator. Second, the effect of some of the other parameters in regression analysis, namely, the unknown parameter vector, the multicollinearity between the independent variables, and the size of the sample on the relative performance of the MAD and OLS estimators is investigated. Some guidelines regarding the choice of the MAD estimator in regression analysis are provided.  相似文献   

2.
Handling outliers are one of the primary concerns of today’s data mining techniques. The concept of outliers, it’s handling, and diagnosis is context specific and varies according to the field of application. The existence of outliers while mining web data is inevitable by virtue of unique characteristic features exhibited by a typical web user. As the output of a regression algorithm is always different from the actual value, it poses a challenge to the knowledge workers and researchers about the notion of an outlier in such cases. In this paper, we propose to develop the concept of an outlier with respect to regression analysis of any Web-based dataset. A framework to find outliers in the output of a regression algorithm is being formulated with the help of Ordered Weighted operators. The underlying idea is to find an error rectification value, ϵ, that will work, in association with the predicted value from the regression model and then help to distinguish an outlier. This will, in addition, also provide a possible range of deviation from the predicted output. A case study on a web dataset is being done to show the usefulness of the proposed approach.  相似文献   

3.
This research examined the use of the International Software Benchmarking Standards Group (ISBSG) repository for estimating effort for software projects in an organization not involved in ISBSG. The study investigates two questions: (1) What are the differences in accuracy between ordinary least-squares (OLS) regression and Analogy-based estimation? (2) Is there a difference in accuracy between estimates derived from the multi-company ISBSG data and estimates derived from company-specific data? Regarding the first question, we found that OLS regression performed as well as Analogy-based estimation when using company-specific data for model building. Using multi-company data the OLS regression model provided significantly more accurate results than Analogy-based predictions. Addressing the second question, we found in general that models based on the company-specific data resulted in significantly more accurate estimates.  相似文献   

4.
Environmental diversity and net primary productivity (NPP) are powerful indicators of local plant species richness (α-diversity). Remote sensing proxies of environmental diversity, such as spectral heterogeneity and NPP, are often used in modelling species richness variability, usually through regression analysis. As multicollinearity may affect analysis of species diversity, the interdependence of such proxies should be a major concern in their use. However, few attempts have been made to examine the interdependence between spectral heterogeneity and NPP proxies such as the Normalized Difference Vegetation Index (NDVI), in most cases using Ordinary Least Square (OLS) regression or Pearson correlations. We test the possible dependence of Landsat Enhanced Thematic Mapper (ETM+) local spectral heterogeneity versus NDVI using quantile regression and rejecting the main assumption of OLS regression, i.e. the symmetry of model residuals. A second-order polynomial function was fitted to the data and both OLS and quantile regression led to a humped-back relationship between spectral heterogeneity and biomass. Nonetheless while for most of the quantiles the humped-back curve was significant (with a negative and significant quadratic slope), for quantiles higher than 0.90, the parabola opened up until it reached an almost linear shape, showing that, at very low values of biomass, pixels may show high levels of local heterogeneity. Hence, patterns of spectral heterogeneity versus NDVI are possible when considering maximum potential spectral variability. We show that the investigation of all possible subsets within a scatter plot may lead to identification of patterns that remain hidden in OLS regression.  相似文献   

5.
Remote sensing often involves the estimation of in situ quantities from remote measurements. Linear regression, where there are no non-linear combinations of regressors, is a common approach to this prediction problem in the remote sensing community. A review of recent remote sensing articles using univariate linear regression indicates that in the majority of cases, ordinary least squares (OLS) linear regression has been applied, with approximately half the articles using the in situ observations as regressors and the other half using the inverse regression with remote measurements as regressors. OLS implicitly assume an underlying normal structural data model to arrive at unbiased estimates of the response. OLS regression can be a biased predictor in the presence of measurement errors when the regression problem is based on a functional rather than structural data model. Parametric (Modified Least Squares) and non-parametric (Theil-Sen) consistent predictors are given for linear regression in the presence of measurement errors together with analytical approximations of their prediction confidence intervals. Three case studies involving estimation of leaf area index from nadir reflectance estimates are used to compare these unbiased estimators with OLS linear regression. A comparison to Geometric Mean regression, a standardized version of Reduced Major Axis regression, is also performed. The Theil-Sen approach is suggested as a potential replacement of OLS for linear regression in remote sensing applications. It offers simplicity in computation, analytical estimates of confidence intervals, robustness to outliers, testable assumptions regarding residuals and requires limited a priori information regarding measurement errors.  相似文献   

6.
Ge  Yan  Wu  Haixia 《Neural computing & applications》2020,32(22):16843-16855

This paper mainly analyzes the changing trend of corn price and the factors that affect the price of corn. Using the data and regression analysis, the univariate nonlinear and multivariate linear regression models are established to predict the corn price, respectively. First, this paper establishes a univariate nonlinear regression model with time as the independent variable, and corn price is used as the dependent variable through the analysis of the trend of big data related to Chinese corn price from 2005 to 2016 by MATLAB, which is the computer-based analysis and processing method. The variation of the maize price with time was fitted. To a certain extent, the price trend of corn is predicted. However, the estimated price of corn in 2017 with this model will deviate from the actual value. According to the changes of related policies in our country, we analyzed the deviation of the original model, and the relationship between supply and demand is the main underlying factor that affects the price of corn. This paper selects maize-related big data from 2005 to 2016, we set its production consumption, import and export volume as independent variables, and we still use maize price as the dependent variable to establish a multiple linear regression model. At this stage, the time series analysis of the independent variable has obtained the forecast value of each independent variable in 2017, and then the model is used to predict the corn in 2017 more accurately.

  相似文献   

7.
基于DMSP/OLS影像的城市化水平遥感估算方法   总被引:1,自引:0,他引:1  
提出了一种利用DMSP/OLS夜间稳定灯光影像定量估算区域城市化水平的方法。首先,从美国国防气象卫星计划的线性扫描业务系统(Defense Meteorological Satellite Program/Operational Linescan System,DM-SP/OLS)获取的夜间稳定灯光数据中提取了夜间灯光综合指数(Night Light Compositive Index,NLCI)。然后建立了基于社会经济统计数据的城市化水平指数(Urbanization Level Index,ULI)与NLCI间的统计学模型,并据此对中国大陆地区级尺度的城市化水平进行了估算。结果表明,ULI与NLCI间存在明显的线性关系,NLCI能较好地反映城市化水平,根据该模型得到的其他年份ULI估算值与基于统计数据的ULI实际值具有较强的一致性,故模型具有一定的可靠性和普适性。本文的创新之处在于对前人提出的NLCI公式进行了改进,并提出了公式中最优参数的搜寻算法。  相似文献   

8.
In recent years, data broadcasting has become a promising technique to design a mobile information system with power conservation, high scalability, and high bandwidth utilization. In many applications, the query issued by a mobile client corresponds to multiple items that should be accessed in a sequential order. In this paper, we study the scheduling approach in such a sequential data broadcasting environment. Explicitly, we propose a general framework referred to as MULS (standing for MUltiLevel Service) for an information system. There are two primary stages in MULS: online scheduling (OLS) and optimization procedure. In the first stage, we propose an OLS algorithm to allocate the data items into multiple channels. As for the second stage, we devise an optimization procedure, called sampling with controlled iteration (SCI), to enhance the quality of broadcast programs generated by algorithm OLS. Procedure SCI is able to strike a compromise between effectiveness and efficiency by tuning the control parameters. According to the experimental results, we show that algorithm OLS with procedure SCI outperforms the approaches in prior works prominently in both effectiveness (that is, the average access time of mobile users) and efficiency (that is, the complexity of the scheduling algorithm). Therefore, by cooperating algorithm OLS with procedure SCI, the proposed MULS framework is able to generate broadcast programs with the flexibility of providing different service qualities under different requirements of effectiveness and efficiency: in the dynamic environment in which the access patterns and information contents change rapidly, the parameters used in SCI will perform OLS with satisfactory service quality. As for the static environment in which the query profile and the database are updated infrequently, larger values of parameters are helpful to generate an optimized broadcast program, indicating the advantageous feature of MULS.  相似文献   

9.
A pharmacokinetic program that allows individualization of Factor VIII dosage regiments in hemophilia patients undergoing major surgery is described. The program, which is designed for the IBM PC microcomputer and compatible machines, is based upon the one-compartment open model with instantaneous input. In the framework of such a pharmacokinetic model, it is assumed that the elimination of Factor VIII is faster during the early post-operative period and that it decreases progressively over the following days. Since Factor VIII half-life is dependent on the time elapsed since the operation (short half-life values during the early post-operative period, longer half-life values thereafter), the pharmacokinetic model is a nonlinear one. A first-order 'variation' rate constant is used to describe the prolongation of Factor VIII half-life from the initial value immediately after surgery to the final value achieved several days later. Individualized estimation of the patient's kinetic parameters (initial half-life, 'variation' rate constant and volume of distribution) is performed through the Bayesian method. Therefore, for such estimation the program exploits the Factor VIII plasma levels measured in the individual patient as well as the population pharmacokinetic data of Factor VIII. After estimating the individual's Bayesian parameters, the program predicts the dosage regimen that will elicit the desired time-course of Factor VIII plasma levels. If requested, the program is able to calculate the least-squares estimates for the parameters of the pharmacokinetic model and dosage prediction can also be made on the basis of such estimates. The least-squares estimates are useful for calculating population pharmacokinetic parameters according to the Standard Two-Stage method. Some examples of clinical use of the program are presented.  相似文献   

10.
Formerly, tree height has been more difficult to measure accurately in the field than tree diameter at breast height. As a consequence, models to predict height from diameter measurements have been widely developed in the forestry literature. Through the use of airborne laser scanning technology (e.g., LiDAR), tree variables such as height and crown diameter can be measured accurately, a development which has spawned the need for models to predict diameter from airborne laser-derived measurements. Although some work has been done for fitting such models, none have incorporated spatial information to improve the accuracy of the predicted diameters. Using a simple linear model for predicting tree diameter from laser-derived tree height and crown diameter measurements, we compared the performance of ordinary least squares (OLS), generalized least squares with a non-null correlation structure (GLS), linear mixed-effects model (LME), and geographically weighted regression (GWR). Our data were obtained from 36 sample plots established in Norway. This is the first study to examine the use of spatial statistical models for tree-level LiDAR data. Root mean square prediction errors in tree diameter with LME are 3.5%, with GWR are 10%, and with OLS and GLS are 17%. LME also exhibited low variability in predicting performance across all the validation classes (based on laser-derived height). Giving the difficulties of using parametric statistical inference (such as maximum likelihood-based indices) for GWR, we used permutation tests as a way for detecting statistical differences. LME was significantly better than the other models, as well as GWR was to OLS and GLS. Our results indicate that the LME model produced the best predictions of tree diameter from LiDAR-based variables to a degree that has previously not been possible.  相似文献   

11.
运用线性回归对预测数据进行分析,剔除异常数据,用GM(1,1)模型进行预测,有效降低了数据相对误差,提高了预测数据的精度。选用印刷包衬压缩变形的压缩变形量值,用线性回归进行数据分析并剔除异常数据后用GM(1,1)进行预测,使得预测数据具有更高的准确性和适应性。实验及仿真结果表明,经过前期数据分析整理后的灰色预测模型,其预测期望值远优于单纯的回归模型和GM(1,1)模型。  相似文献   

12.
In recent years, data broadcasting becomes a promising technique to design a mobile information system with power conservation, high scalability and high bandwidth utilization. In many applications, the query issued by a mobile client corresponds to multiple items which should be accessed in a sequential order. In this paper, we study the scheduling approach in such a sequential data broadcasting environment. Explicitly, we propose a general framework referred to as MULS (standing for MUlti-Level Service) for an information system. There are two primary stages in MULS: on-line scheduling and optimization procedure. In the first stage, we propose an On- Line Scheduling algorithm (denoted by OLS) to allocate the data items into multiple channels. As for the second stage, we devise an optimization procedure SCI, standing for Sampling with Controlled Iteration, to enhance the quality of broadcast programs generated by algorithm OLS. Procedure SCI is able to strike a compromise between effectiveness and efficiency by tuning the control parameters. According to the experimental results, we show that algorithm OLS with procedure SCI outperforms the approaches in prior works prominently in both effectiveness (i.e., the average access time of mobile users) and efficiency (i.e., the complexity of the scheduling algorithm). Therefore, by cooperating algorithm OLS with procedure SCI, the proposed MULS framework is able to generate broadcast programs with flexibility of providing different service qualities under different requirements of effectiveness and efficiency: in the dynamic environment in which the access patterns and information contents change rapidly, the parameters used in SCI will perform online scheduling with satisfactory service quality. As for the static environment in which the query profile and the database are updated infrequently, larger values of parameters are helpful to generate an optimized broadcast program, indicating the advantageous feature of MULS.  相似文献   

13.
提出了一种最小正交二乘算法(OLS)和进化粒子群优化算法(EPSO)相结合构建RBF神经网络的企业订单预测模型。OLS采用前向回归算法,从输入数据中选取适当的中心,动态地避免网络规模过大和随机选择中心带来的数值病态问题;EPSO方法调整网络中的参数,如RBF中心位置,RBF宽度和隐层与输出层之间的权值,以提高网络的泛化能力。  相似文献   

14.
15.
Empirical validation of software metrics used to predict software quality attributes is important to ensure their practical relevance in software organizations. The aim of this work is to find the relation of object-oriented (OO) metrics with fault proneness at different severity levels of faults. For this purpose, different prediction models have been developed using regression and machine learning methods. We evaluate and compare the performance of these methods to find which method performs better at different severity levels of faults and empirically validate OO metrics given by Chidamber and Kemerer. The results of the empirical study are based on public domain NASA data set. The performance of the predicted models was evaluated using Receiver Operating Characteristic (ROC) analysis. The results show that the area under the curve (measured from the ROC analysis) of models predicted using high severity faults is low as compared with the area under the curve of the model predicted with respect to medium and low severity faults. However, the number of faults in the classes correctly classified by predicted models with respect to high severity faults is not low. This study also shows that the performance of machine learning methods is better than logistic regression method with respect to all the severities of faults. Based on the results, it is reasonable to claim that models targeted at different severity levels of faults could help for planning and executing testing by focusing resources on fault-prone parts of the design and code that are likely to cause serious failures.  相似文献   

16.
针对某丙酮精制过程,提出采用FA与SVR相结合的方法建立丙酮产品质量的软测量模型。采用因子分析(FA)方法提取辅助变量的特征信息,并消除各变量之间的相关性,然后利用支持向量回归(SVR)建立丙酮产品质量指标的软测量模型。在实际生产过程数据上进行了仿真实验,并与传统的稳健回归分析及神经网络等方法进行了比较,结果表明本方法具有良好的预测效果。  相似文献   

17.
A new optimized classification algorithm assembled by neural network based on Ordinary Least Squares (OLS) is established here. While recognizing complex high-dimensional data by neural network, the design of network is a challenge. Besides, single network model can hardly get satisfying recognition accuracy. Firstly, feature dimension reduction is carried on so that the design of network is more convenient. Take Elman neural network algorithm based on PCA as sub-classifier I. The recognition precision of this classifier is relatively high, but the convergence rate is not satisfying. Take RBF neural network algorithm based on factor analysis as sub-classifier II. The convergence rate of the classifier algorithm is fast, but the recognition precision is relatively low. In order to make up for the deficiency, by carrying on ensemble learning of the two sub-classifiers and determining optimal weights of each sub-classifier by OLS principle, assembled optimized classification algorithm is obtained, so to some extent, information loss caused by dimensionality reduction in data is made up. In the end, validation of the model can be tested by case analysis.  相似文献   

18.
针对Web应用带宽资源管理问题,提出了一种基于网络仿真的Web应用带宽需求和服务质量(QoS)预测方法,该方法给出了适用于Web服务的建模框架与形式说明,采用简化的并行负载模型,并运用自动化数据挖掘方法从Web应用访问日志中提取模型参数,并使用网络仿真工具建立系统模型模拟复杂网络传输过程,能够预测不同负载强度下的带宽需求和QoS变化。通过TPC-W基准测试系统验证该方法预测结果的准确性,理论分析和仿真结果表明,与传统的线性回归预测相比,网络仿真可以稳定地模拟真实系统,其对总请求数和总字节数的预测平均相对误差分别为4.6%和3.3%。最后以TPC-W基准系统为例,对Web应用不同带宽伸缩方案进行仿真评估,评估结果可以为Web应用资源管理提供决策支持。  相似文献   

19.
The detection and subsequent treatment of influential observations have been well covered with respect to ordinary least squares (OLS) under an assumed multiple linear regression (MLR) model using measures such as Cook’s Distance. However, OLS can be shown to be a useful method under a much wider variety of models. The purpose of this paper is twofold. Firstly we introduce a new diagnostic, similar to Cook’s Distance, that is useful for detecting influential observations under an assumed single-index model. Secondly we show, via simulation, how trimming observations according to such diagnostics can greatly benefit the analysis even when no gross outliers are evident.  相似文献   

20.
随着我国大力推进电商行业的发展,越来越多的电商企业加入到线上的竞争之中.随着销量的增大,第三方电商企业所掌握的销售数据也越来越多,这些分类上零散的销售数据给数据处理预测带来了一定的难度,常常导致在预测过程中数据不完备或者预测结果存在非常大的偏差.为了改善这一问题,这里提出了一种基于销售数据的产品重分类预测模型,利用产品销售共性提取产品聚类簇,再使用时间序列模型得出预测结果并通过隐马尔科夫预测模型给出预测结果的概率分布.通过实验分析,利用以上模型的预测获得较好的预测结果,对电商企业制定营销策略具有一定的参考价值.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号