首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
Susceptibility or hazard models are often established by means of logistic regression techniques in order to describe the effect of a group of explanatory variables on the probability of a dichotomous or binary response. Since the available variables do not always meet the assumptions of logit-linearity of the logistic regression, a modified approach is proposed. Firstly a favorability function associated with each explanatory variable based on the conditional probability measures is introduced. Next, a simple transformation based on the empirical probability function for non-continuous variables is suggested, while nonparametric kernel estimation is considered for continuous ones. The favorability-based transformations lead to new explanatory variables for the logistic regression model. The performance of the method is evaluated using simulated data. In addition, a real case-study is presented, in which a GIS-based landslides susceptibility model is carried out.  相似文献   

2.
3.
Effective identification of the change point of a multivariate process is an important research issue since it is associated with the determination of assignable causes which may seriously affect the underlying process. Most existing studies either use the maximum likelihood estimator (MLE) method or the machine learning (ML) method to estimate or identify the change point of a process. Typically, the MLE method may be criticized for its assumption that the process distribution is known, and the ML method may have the deficiency of using a large number of input variables in the modeling procedure. Diverging from existing approaches, this study proposes an integrated hybrid scheme to mitigate the difficulties of the MLE and ML methods. The proposed scheme includes four components: the logistic regression (LR) model, the multivariate adaptive regression splines (MARS) model, the support vector machine (SVM) classifier and the change point identification strategy. It performs three tasks in order to effectively identify the change point in a multivariate process. The initial task is to use the LR and MARS models to reduce and refine the whole set of input or explanatory variables. The remaining variables are then served as input variables to the SVM in the second task. The last task is to integrate use of the SVM outputs with our proposed identification strategy to determine the change point in a multivariate process. Experimental simulation results reveal that the proposed hybrid scheme is able to effectively identify the change point and outperform the typical statistical process control (SPC) chart alone and the single stage SVM methods.  相似文献   

4.
土地利用最佳模拟尺度选择及空间格局模拟   总被引:1,自引:0,他引:1  
土地利用变化是一个受到多重因素相互影响的动态过程。目前,已经成为全球环境变化和可持续发展的重要内容,而区域土地利用空间格局模拟已成为LUCC研究的关键内容。以2000年以及2010年的TM遥感影像解译数据以及数字高程模型、水系、铁路、公路、降雨量和气温等数据为基础,运用二元逻辑斯蒂回归模型对黄土台塬区的土地利用最佳模拟尺度进行了选择,并在此基础上对研究区的各种土地利用进行了空间格局模拟。研究结果显示:(1)在土地利用格局模拟的十个空间尺度上,土地利用变化空间格局与其驱动力因子之间存在着一定的尺度相关性特征;(2)黄土台塬区耕地、林地、草地的ROC值在十个空间尺度上均呈现出先增加后减少的趋势,转折点在400 m尺度附近,说明黄土台塬区的土地利用在尺度效应和尺度转换的效应下,400 m×400 m是此区域土地利用格局优化的最佳模拟尺度;(3)在400 m最佳模拟尺度上所模拟出的草地和林地的分布格局都与人均GDP和地形综合指数两个变量显著相关,而对耕地的分布影响最为明显的变量则是地形综合指数。  相似文献   

5.
A spatially explicit land use change model is typically based on the assumption that the relationship between land use change and its explanatory processes is stationary. This means that model structure and parameterization are usually kept constant over the model runtime, ignoring potential systemic changes in this relationship resulting from societal changes. We have developed a methodology to test for systemic changes and demonstrate it by assessing whether or not a land use change model with a constant model structure is an adequate representation of the land use system given a time series of observations of past land use. This was done by assimilating observations of real land use into a land use change model, using a Bayesian data assimilation technique, the particle filter. The particle filter was used to update the prior knowledge about the model structure, i.e. the selection and relative importance of the explanatory processes for land use change allocation, and about the parameters. For each point in time for which observations were available the optimal model structure and parameterization were determined. In a case study of sugar cane expansion in Brazil, it was found that the assumption of a constant model structure was not fully adequate, indicating systemic change in the modelling period (2003–2012). The systemic change appeared to be indirect: a factor has an effect on the demand for sugar cane, an input variable, in such a way that the transition rules and parameters have to change as well. Although an inventory was made of societal changes in the study area during the studied period, none of them could be directly related to the onset of the observed systemic change in the land use system. Our method which allows for systemic changes in the model structure resulted in an average increase in the 95% confidence interval of the projected sugar cane fractions of a factor of two compared to the assumption of a stationary system. This shows the importance of taking into account systemic changes in projections of land use change in order not to underestimate the uncertainty of future projections.  相似文献   

6.
Urban cellular automata (CA) models are broadly used in quantitative analyses and predictions of urban land-use dynamics. However, most urban CA developed with neighborhood rules consider only a small neighborhood scope under a specific spatial resolution. Here, we quantify neighborhood effects in a relatively large cellular space and analyze their role in the performance of an urban land use model. The extracted neighborhood rules were integrated into a commonly used logistic regression urban CA model (Logistic-CA), resulting in a large neighborhood urban land use model (Logistic-LNCA). Land-use simulations with both models were evaluated with urban expansion data in Xiamen City, China. Simulations with the Logistic-LNCA model raised the accuracies of built-up land by 3.0%–3.9% in two simulation periods compared with the Logistic-CA model with a 3 × 3 kernel. Parameter sensitivity analysis indicated that there was an optimal large window size in cellular space and a corresponding optimal parameter configuration.  相似文献   

7.
This paper compares six land use change (LUC) models, including artificial neural networks (ANNs), support vector regression (SVR), random forest (RF), classification and regression trees (CART), logistic regression (LR), and multivariate adaptive regression splines (MARS). These models were used to simulate urban growth in the megacity of Tehran Metropolitan Area (TMA). These LUC models were integrated with cellular automata (CA) and validated using a variety of goodness-of-fit metrics. The results showed that the percent correct metrics (PCMs) varied between 54.6% for LR and 59.6% for MARS, while the area under curve (AUC) ranged from 67.6% for LR to 74.7% for ANNs. The results also showed a considerable difference between the spatial patterns within the error maps. The results of this comparative study will enable decision makers and scholars to better understand the performance of the models when reducing the number of misses and false alarms is a priority.  相似文献   

8.
The most commonly used techniques for credit scoring is logistic regression, and more recent research has proposed that the support vector machine is a more effective method. However, both logistic regression and support vector machine suffers from curse of dimension. In this paper, we introduce a new way to address this problem which is defined as orthogonal dimension reduction. We discuss the related properties of this method in detail and test it against other common statistical approaches—principal component analysis and hybridizing logistic regression to better solve and evaluate the data. With experiments on German data set, there is also an interesting phenomenon with respect to the use of support vector machine, which we define as ‘Dimensional interference’, and discuss in general. Based on the results of cross-validation, it can be found that through the use of logistic regression filtering the dummy variables and orthogonal extracting feature, the support vector machine not only reduces complexity and accelerates convergence, but also achieves better performance.  相似文献   

9.
GWR模型下农用地土壤镍空间分布预测   总被引:2,自引:0,他引:2  
传统农用地土壤分析方法耗时耗力,利用光谱及各类其他因子进行重金属浓度快速反演的方法因为其高效、快速、低成本的优点受到越来越多研究者的青睐。文章试图探索利用光谱信息与土壤镍含量信息构建地理加权回归(geographic weighted regression,GWR)反演模型,并实现对农业土壤镍空间分布的预测。首先,以栾川县石宝沟农用地中镍的实测含量为目标变量,综合利用Landsat-8波段反射率、样品采样点的空间位置及地形信息等作为变量,使用相关性分析及逐步回归的方法,选择3个变量,即采样点与厂区的最短距离、ln(band2/band3)、band3-band5,作为解释变量;其次,利用地理加权回归的方式进行建模,模型决定系数达到0.64;然后,用测试样本点进行模型验证,Acc值达到96.51%,可见所建立的模型能够较好地拟合农用地土壤中镍含量;最后,对整个研究区域内农用地进行反演,并对其空间分布进行评价。  相似文献   

10.
Analysis of cancer data: a data mining approach   总被引:1,自引:1,他引:0  
Abstract: Even though cancer research has traditionally been clinical and biological in nature, in recent years data driven analytic studies have become a common complement. In medical domains where data and analytics driven research is successfully applied, new and novel research directions are identified to further advance the clinical and biological studies. In this research, we used three popular data mining techniques (decision trees, artificial neural networks and support vector machines) along with the most commonly used statistical analysis technique logistic regression to develop prediction models for prostate cancer survivability. The data set contained around 120 000 records and 77 variables. A k-fold cross-validation methodology was used in model building, evaluation and comparison. The results showed that support vector machines are the most accurate predictor (with a test set accuracy of 92.85%) for this domain, followed by artificial neural networks and decision trees.  相似文献   

11.
马江洪  张文修  梁怡 《计算机学报》2003,26(12):1652-1659
复杂海量数据往往表现为多种结构特征的混合体,回归类混合模型就是对这种混合体的一个描述.该文基于统计学的有限混合分布理论和可识别性的相关结果,针对回归变量的三种情形:(1)解释变量固定,(2)解释变量随机,(3)解释变量固定且类别参数指定,分别讨论挖掘一般回归类的混合模型的可识别性问题,并给出同族回归类混合模型可识别的相应充分条件.这些条件的一个共同特点是它们都与一类特别的解释变量集合有关,而该类集合是由同族的回归函数与回归参数唯一确定的,其元素使不同的回归参数对应回归函数的相同值.特别地,当回归函数线性时,这类集合就是解释变量空间中的超平面.  相似文献   

12.
The aim of this study was to produce models for the prediction of high risk pregnancies, with particular emphasis on pre-term delivery. Neural network and logistic regression models have been developed utilising pregnancy and delivery data spanning a period of seven years. Five input factors were used as explanatory variables: age, number of previous still births, gestational age at first clinical assessment, diabetes and a measure of socio-economic status. There was little difference between average model performance for the two techniques: optimal neural network performance was achieved with a fully connected feed forward network comprising a single hidden layer of three nodes and single output node. This produced a Receiver Operating Characteristic (ROC) curve area of 0.700. The ROC area for logistic regression models was 0.695. The performance of these models reflected weak associations within the data. However, performance is encouraging given the relatively limited number of predictive inputs.  相似文献   

13.
Support vector machines for urban growth modeling   总被引:1,自引:0,他引:1  
This paper presents a novel method to model urban land use conversion using support vector machines (SVMs), a new generation of machine learning algorithms used in the classification and regression domains. This method derives the relationship between rural-urban land use change and various factors, such as population, distance to road and facilities, and surrounding land use. Our study showed that SVMs are an effective approach to estimating the land use conversion model, owing to their ability to model non-linear relationships, good generalization performance, and achievement of a global and unique optimum. The rural-urban land use conversions of New Castle County, Delaware between 1984–1992, 1992–1997, and 1997–2002 were used as a case study to demonstrate the applicability of SVMs to urban expansion modeling. The performance of SVMs was also compared with a commonly used binomial logistic regression (BLR) model, and the results, in terms of the overall modeling accuracy and McNamara’s test, consistently corroborated the better performance of SVMs.  相似文献   

14.
ContextSoftware has been developed since the 1960s but the success rate of software development projects is still low. During the development of software, the probability of success is affected by various practices or aspects. To date, it is not clear which of these aspects are more important in influencing project outcome.ObjectiveIn this research, we identify aspects which could influence project success, build prediction models based on the aspects using data collected from multiple companies, and then test their performance on data from a single organization.MethodA survey-based empirical investigation was used to examine variables and factors that contribute to project outcome. Variables that were highly correlated to project success were selected and the set of variables was reduced to three factors by using principal components analysis. A logistic regression model was built for both the set of variables and the set of factors, using heterogeneous data collected from two different countries and a variety of organizations. We tested these models by using a homogeneous hold-out dataset from one organization. We used the receiver operating characteristic (ROC) analysis to compare the performance of the variable and factor-based models when applied to the homogeneous dataset.ResultsWe found that using raw variables or factors in the logistic regression models did not make any significant difference in predictive capability. The prediction accuracy of these models is more balanced when the cut-off is set to the ratio of success to failures in the datasets used to build the models. We found that the raw variable and factor-based models predict significantly better than random chance.ConclusionWe conclude that an organization wishing to estimate whether a project will succeed or fail may use a model created from heterogeneous data derived from multiple organizations.  相似文献   

15.
This study aims to predict the spatial distribution of tropical deforestation. Landsat images dated 1974, 1986 and 1991 were classified in order to generate digital deforestation maps which locate deforestation and forest persistence areas. The deforestation maps were overlaid with various spatial variables such as the proximity to roads and to settlements, forest fragmentation, elevation, slope and soil type to determine the relationship between deforestation and these explanatory variables. A multi-layer perceptron was trained in order to estimate the propensity to deforestation as a function of the explanatory variables and was used to develop deforestation risk assessment maps. The comparison of risk assessment map and actual deforestation indicates that the model was able to classify correctly 69% of the grid cells, for two categories: forest persistence versus deforestation. Artificial neural networks approach was found to have a great potential to predict land cover changes because it permits to develop complex, non-linear models.  相似文献   

16.
Oak forests are essential for the ecosystems of many countries, particularly when they are used in vegetal restoration. Therefore, models for predicting the potential habitat of oaks can be a valuable tool for work in the environment. In accordance with this objective, the building and comparison of data mining models are presented for the prediction of potential habitats for the oak forest type in Mediterranean areas (southern Spain), with conclusions applicable to other regions. Thirty-one environmental input variables were measured and six base models for supervised classification problems were selected: linear and quadratic discriminant analysis, logistic regression, classification trees, neural networks and support vector machines. Three ensemble methods, based on the combination of classification tree models fitted from samples and sets of variables generated from the original data set were also evaluated: bagging, random forests and boosting. The available data set was randomly split into three parts: training set (50%), validation set (25%), and test set (25%). The analysis of the accuracy, the sensitivity, the specificity, together with the area under the ROC curve for the test set reveal that the best models for our oak data set are those of bagging and random forests. All of these models can be fitted by free R programs which use the libraries and functions described in this paper. Furthermore, the methodology used in this study will allow researchers to determine the potential distribution of oaks in other kinds of areas.  相似文献   

17.
A fuzzy regression model is developed to construct the relationship between the response and explanatory variables in fuzzy environments. To enhance explanatory power and take into account the uncertainty of the formulated model and parameters, a new operator, called the fuzzy product core (FPC), is proposed for the formulation processes to establish fuzzy regression models with fuzzy parameters using fuzzy observations that include fuzzy response and explanatory variables. In addition, the sign of parameters can be determined in the model-building processes. Compared to existing approaches, the proposed approach reduces the amount of unnecessary or unimportant information arising from fuzzy observations and determines the sign of parameters in the models to increase model performance. This improves the weakness of the relevant approaches in which the parameters in the models are fuzzy and must be predetermined in the formulation processes. The proposed approach outperforms existing models in terms of distance, mean similarity, and credibility measures, even when crisp explanatory variables are used.  相似文献   

18.
When examining the relationship between landscape characteristics and water quality, most previous studies did not pay enough attention to the spatial aspects of landscape characteristics and water quality sampling stations. We analyzed the spatial pattern of total nitrogen (TN), total phosphorus (TP), chemical oxygen demand (COD), and suspended solids (SS) in the Han River basin of South Korea to explore the role of different distance considerations and spatial statistical approaches to explaining the variation in water quality. Five-year (2012 through 2016) seasonal averages of those water quality attributes were used in the analysis as the response variables, while explanatory variables like land cover, elevation, slope, and hydrologic soil groups were subjected to different weighting treatments based on distance and flow accumulation. Moran's Eigenvector-based spatial filters were used to consider spatial relations among water quality sampling sites and were used in regression models. Distinct spatial patterns of seasonal water quality exist, with the highest concentrations of TN, TP, COD, and SS in downstream urban areas and the lowest concentrations in upstream forest areas. TN concentrations are higher in dry winter than the wet summer season, while SS concentrations are higher in wet summer than the dry season. Spatial models substantially improved the model fit compared to aspatial models. The flow accumulation-based models performed best when the spatial filters were not used, but all models performed similarly when spatial filters were used. The distance weighting approaches were instrumental in understanding watershed level processes affecting source, mobilization, and delivery of physicochemical parameters that flow into the river water. We conclude that a consideration of the spatial aspects of sampling sites is as important as accounting for different distances and hydrological processes in modeling water quality.  相似文献   

19.
Simulation models based on cellular automata (CA) are useful for revealing the complex mechanisms and processes involved in urban growth and have become supplementary tools for urban land use planning and management. Although the urban growth mechanism is characterized by multilevel and spatiotemporal heterogeneity, most existing studies focus only on simulating the urban growth of singular regions without considering the heterogeneity of the urban growth process and the multilevel factors driving urban growth within regions that consist of multiple subregions. Thus, urban growth models have limited performance when simulating the urban growth of multi-regional areas. To address this issue, we propose a multilevel logistic CA model (MLCA) by incorporating a multilevel logistic regression model into the traditional logistic CA model (LCA). In the MLCA, multilevel driving factors are considered, and the multilevel logistic model allows the transition rules to not only vary in space, but also change when the subregional level factors change. To verify the MLCA's validity, it was applied to simulate the urban growth of Tongshan County, located in China's Xuzhou Prefecture. The results were compared with three comparative models, LCA1, which only considered grid cell-level factors; LCA2, which considered both grid cell- and subregional-level factors; and artificial neural network CA. Urban growth data for the periods 2000–2009 and 2009–2017 were used. The results show that the MLCA performs better on both visual comparison and indicators for accuracy verification. The Kappa of the results increased by <5%, but the improvement was significant, while increases for the accuracy of urban land and figure of merit were much higher than 5%. In addition, the results of MLCA had the smallest mean absolute percentage error when allocating new urban land areas to the various subregions. The results reveal that higher-level (e.g., town level) factors either strengthened or weakened the effects of grid cell-level factors on urban growth, which indirectly affected the spatial allocation of new urban land. The MLCA model is an effective step towards simulating nonstationary urban growth of multi-regional areas, using the comprehensive effects of multilevel driving factors.  相似文献   

20.
The purpose of this study is to detect landslide locations using web-based digital aerial photographs and to map landslide susceptibility using landslide locations in Jinbu, Korea. The landslide susceptibility map was generated and validated using frequency ratio, weight of evidence, logistic regression and artificial neural network models with a geographic information system (GIS). The landslide locations were identified in the study area from interpretation of digital aerial photographs that were provided on an Internet portal (http://map.daum.net) and checked by field survey. A spatial database of the topography, soil, forest, geology and land use was constructed and landslide-related factors were extracted. Using these factors, landslide susceptibility was analysed using four models. Seventy percent of the landslides were used in landslide susceptibility mapping and the remaining 30% were used for validation. The validation result showed that the frequency ratio, weight of evidence, logistic regression and artificial neural network models had 84.94%, 82.82%, 87.72% and 81.44% accuracies, respectively, representing an overall satisfactory agreement of more than 80%, with the logistic regression model giving the best result. The maps generated could be used to estimate the risk to population, property and existing infrastructure such as the transportation network.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号