首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, a distributed scheme is proposed for ensemble learning method of bagging, which aims to address the classification problems for large dataset by developing a group of cooperative logistic regression learners in a connected network. Moveover, each weak learner/agent can share the local weight vector with its immediate neighbors through diffusion strategy in a fully distributed manner. Our diffusion logistic regression algorithms can effectively avoid overfitting and obtain high classification accuracy compared to the non-cooperation mode. Furthermore, simulations with a real dataset are given to demonstrate the effectiveness of the proposed methods in comparison with the centralized one.  相似文献   

2.
Logistic regression models are frequently used in epidemiological studies for estimating associations that demographic, behavioral, and risk factor variables have on a dichotomous outcome, such as disease being present versus absent. After the coefficients in a logistic regression model have been estimated, goodness-of-fit of the resulting model should be examined, particularly if the purpose of the model is to estimate probabilities of event occurrences. While various goodness-of-fit tests have been proposed, the properties of these tests have been studied under the assumption that observations selected were independent and identically distributed. Increasingly, epidemiologists are using large-scale sample survey data when fitting logistic regression models, such as the National Health Interview Survey or the National Health and Nutrition Examination Survey. Unfortunately, for such situations no goodness-of-fit testing procedures have been developed or implemented in available software. To address this problem, goodness-of-fit tests for logistic regression models when data are collected using complex sampling designs are proposed. Properties of the proposed tests were examined using extensive simulation studies and results were compared to traditional goodness-of-fit tests. A Stata ado function svylogitgof for estimating the F-adjusted mean residual test after svylogit fit is available at the author's website http://www.people.vcu.edu/~kjarcher/Research/Data.htm.  相似文献   

3.
Active learning for logistic regression: an evaluation   总被引:1,自引:1,他引:1  
Which active learning methods can we expect to yield good performance in learning binary and multi-category logistic regression classifiers? Addressing this question is a natural first step in providing robust solutions for active learning across a wide variety of exponential models including maximum entropy, generalized linear, log-linear, and conditional random field models. For the logistic regression model we re-derive the variance reduction method known in experimental design circles as ‘A-optimality.’ We then run comparisons against different variations of the most widely used heuristic schemes: query by committee and uncertainty sampling, to discover which methods work best for different classes of problems and why. We find that among the strategies tested, the experimental design methods are most likely to match or beat a random sample baseline. The heuristic alternatives produced mixed results, with an uncertainty sampling variant called margin sampling and a derivative method called QBB-MM providing the most promising performance at very low computational cost. Computational running times of the experimental design methods were a bottleneck to the evaluations. Meanwhile, evaluation of the heuristic methods lead to an accumulation of negative results. We explore alternative evaluation design parameters to test whether these negative results are merely an artifact of settings where experimental design methods can be applied. The results demonstrate a need for improved active learning methods that will provide reliable performance at a reasonable computational cost.  相似文献   

4.
Understanding and forecasting the dynamics of urban growth can be helpful for making sustainable land-use policies. Computing models can simulate urban growth but many require extensive data input, which cannot be always met. Here we proposed coupling localized spatio-temporal association (LSTA) analysis and binary logistic regression (BLR) to model urban growth from historical land cover configurations. An indicator called neighborhood aggregation index (NAI) was defined first to measure configuration enrichment for any land cover type under spatial-and-temporal contexts. Multiple NAIs for different land cover types were taken into the proposed LSTA-BLR model to project future urban growth. A case study was selected in Wuhan, China where land covers were classified for each year during 2014–2017 based on the Landsat Imagery from Google Earth Engine. Urban growth from the year 2016 to 2017 was extracted from the classified land cover maps as the dependent variable which was modeled by the LSTA-BLR using predictors of the NAIs from the previous years. The LSTA-BLR models were tested under different neighborhood sizes (3 × 3, 5 × 5, 7 × 7, 9 × 9, and 11 × 11) and time windows (2016, 2015–2016, and 2014–2016). Results indicated that the best accuracy of the modeled urban growth reached 72.9% under the setting of 5 × 5 neighborhood size and time window 2014–2016. Urbanization was most likely to occur close to previously urbanized areas and unlikely near the neighborhood of enriched forest and water bodies. The neighborhood size affected the modeled result and the time window defining the NAIs had the most significant impact on model performance. We conclude that prior land cover configurations should be integrated for mapping future urban growth and the proposed LSTA-BLR model can serve as a useful tool to understand the near-future urbanization process based on the historical land cover configurations.  相似文献   

5.
Sepsis is one of the main causes of death for non-coronary ICU (Intensive Care Unit) patients and has become the 10th most common cause of death in western societies. This is a transversal condition affecting immunocompromised patients, critically ill patients, post-surgery patients, patients with AIDS, and the elderly. In western countries, septic patients account for as much as 25% of ICU bed utilization and the pathology affects 1-2% of all hospitalizations. Its mortality rates range from 12.8% for sepsis to 45.7% for septic shock.The prediction of mortality caused by sepsis is, therefore, a relevant research challenge from a medical viewpoint. The clinical indicators currently in use for this type of prediction have been criticized for their poor prognostic significance. In this study, we redescribe sepsis indicators through latent model-based feature extraction, using factor analysis. These extracted indicators are then applied to the prediction of mortality caused by sepsis. The reported results show that the proposed method improves on the results obtained with the current standard mortality predictor, which is based on the APACHE II score.  相似文献   

6.
Recent developments in computing and technology, along with the availability of large amounts of raw data, have contributed to the creation of many effective techniques and algorithms in the fields of pattern recognition and machine learning. The main objectives for developing these algorithms include identifying patterns within the available data or making predictions, or both. Great success has been achieved with many classification techniques in real-life applications. With regard to binary data classification in particular, analysis of data containing rare events or disproportionate class distributions poses a great challenge to industry and to the machine learning community. This study examines rare events (REs) with binary dependent variables containing many more non-events (zeros) than events (ones). These variables are difficult to predict and to explain as has been evidenced in the literature. This research combines rare events corrections to Logistic Regression (LR) with truncated Newton methods and applies these techniques to Kernel Logistic Regression (KLR). The resulting model, Rare Event Weighted Kernel Logistic Regression (RE-WKLR), is a combination of weighting, regularization, approximate numerical methods, kernelization, bias correction, and efficient implementation, all of which are critical to enabling RE-WKLR to be an effective and powerful method for predicting rare events. Comparing RE-WKLR to SVM and TR-KLR, using non-linearly separable, small and large binary rare event datasets, we find that RE-WKLR is as fast as TR-KLR and much faster than SVM. In addition, according to the statistical significance test, RE-WKLR is more accurate than both SVM and TR-KLR.  相似文献   

7.
We present a methodology for managing outsourcing projects from the vendor's perspective, designed to maximize the value to both the vendor and its clients. The methodology is applicable across the outsourcing lifecycle, providing the capability to select and target new clients, manage the existing client portfolio and quantify the realized benefits to the client resulting from the outsourcing agreement. Specifically, we develop a statistical analysis framework to model client behavior at each stage of the outsourcing lifecycle, including: (1) a predictive model and tool for white space client targeting and selection—opportunity identification (2) a model and tool for client risk assessment and project portfolio management—client tracking, and (3) a systematic analysis of outsourcing results, impact analysis, to gain insights into potential benefits of IT outsourcing as a part of a successful management strategy. Our analysis is formulated in a logistic regression framework, modified to allow for non-linear input–output relationships, auxiliary variables, and small sample sizes. We provide examples to illustrate how the methodology has been successfully implemented for targeting, tracking, and assessing outsourcing clients within IBM global services division.Scope and purposeThe predominant literature on IT outsourcing often examines various aspects of vendor–client relationship, strategies for successful outsourcing from the client perspective, and key sources of risk to the client, generally ignoring the risk to the vendor. However, in the rapidly changing market, a significant share of risks and responsibilities falls on vendor, as outsourcing contracts are often renegotiated, providers replaced, or services brought back in house. With the transformation of outsourcing engagements, the risk on the vendor's side has increased substantially, driving the vendor's financial and business performance and eventually impacting the value delivery to the client. As a result, only well-ran vendor firms with robust processes and tools that allow identification and active management of risk at all stages of the outsourcing lifecycle are able to deliver value to the client. This paper presents a framework and methodology for managing a portfolio of outsourcing projects from the vendor's perspective, throughout the entire outsourcing lifecycle. We address three key stages of the outsourcing process: (1) opportunity identification and qualification (i.e. selection of the most likely new clients), (2) client portfolio risk management during engagement and delivery, and (3) quantification of benefits to the client throughout the life of the deal.  相似文献   

8.
This paper deals with the problem of image retrieval from large image databases. A particularly interesting problem is the retrieval of all images which are similar to one in the user's mind, taking into account his/her feedback which is expressed as positive or negative preferences for the images that the system progressively shows during the search. Here we present a novel algorithm for the incorporation of user preferences in an image retrieval system based exclusively on the visual content of the image, which is stored as a vector of low-level features. The algorithm considers the probability of an image belonging to the set of those sought by the user, and models the logit of this probability as the output of a generalized linear model whose inputs are the low-level image features. The image database is ranked by the output of the model and shown to the user, who selects a few positive and negative samples, repeating the process in an iterative way until he/she is satisfied. The problem of the small sample size with respect to the number of features is solved by adjusting several partial generalized linear models and combining their relevance probabilities by means of an ordered averaged weighted operator. Experiments were made with 40 users and they exhibited good performance in finding a target image (4 iterations on average) in a database of about 4700 images. The mean number of positive and negative examples is of 4 and 6 per iteration. A clustering of users into sets also shows consistent patterns of behavior.  相似文献   

9.
We develop a goodness-of-fit measure with desirable properties for use in the hierarchical logistic regression setting. The statistic is an unweighted sum of squares (USS) of the kernel smoothed model residuals. We develop expressions for the moments of this statistic and create a standardized statistic with hypothesized asymptotic standard normal distribution under the null hypothesis that the model is correctly specified. Extensive simulation studies demonstrate satisfactory adherence to Type I error rates of the Kernel smoothed USS statistic in a variety of likely data settings. Finally, we discuss issues of bandwidth selection for using our proposed statistic in practice and illustrate its use in an example.  相似文献   

10.
11.
The ridge logistic regression has successfully been used in text categorization problems and it has been shown to reach the same performance as the Support Vector Machine but with the main advantage of computing a probability value rather than a score. However, the dense solution of the ridge makes its use unpractical for large scale categorization. On the other side, LASSO regularization is able to produce sparse solutions but its performance is dominated by the ridge when the number of features is larger than the number of observations and/or when the features are highly correlated. In this paper, we propose a new model selection method which tries to approach the ridge solution by a sparse solution. The method first computes the ridge solution and then performs feature selection. The experimental evaluations show that our method gives a solution which is a good trade-off between the ridge and LASSO solutions.  相似文献   

12.
In this study, it is aimed that comparing logistic regression model with classification tree method in determining social-demographic risk factors which have effected depression status of 1447 women in separate postpartum periods. In determination of risk factors, data obtained from prevalence study of postpartum depression were used. Cut-off value of postpartum depression scores that calculated was taken as 13. Social and demographic risk factors were brought up by helping of the classification tree and logistic regression model. According to optimal classification tree total of six risk factors were determined, but in logistic regression model 3 of their effect were found significantly. In addition, during the relations among risk factors in tree structure were being evaluated, in logistic regression model corrected main effects belong to risk factors were calculated. In spite of, classification success of maximal tree was found better than both optimal tree and logistic regression model, it is seen that using this tree structure in practice is very difficult. But we say that the logistic regression model and optimal tree had the lower sensitivity, possibly due to the fact that numbers of the individuals in both two groups were not equal and clinical risk factors were not considered in this study. Classification tree method gives more information with detail on diagnosis by evaluating a lot of risk factors together than logistic regression model. But making correct selection through constructed tree structures is very important to increase the success of results and to reach information which can provide appropriate explanations.  相似文献   

13.
In clinical studies, covariates are often measured with error due to biological fluctuations, device error and other sources. Summary statistics and regression models that are based on mis-measured data will differ from the corresponding analysis based on the “true” covariate. Statistical analysis can be adjusted for measurement error, however various methods exhibit a tradeoff between convenience and performance. Moment Adjusted Imputation (MAI) is a measurement error in a scalar latent variable that is easy to implement and performs well in a variety of settings. In practice, multiple covariates may be similarly influenced by biological fluctuations, inducing correlated, multivariate measurement error. The extension of MAI to the setting of multivariate latent variables involves unique challenges. Alternative strategies are described, including a computationally feasible option that is shown to perform well.  相似文献   

14.
The classical machinery of supervised learning machines relies on a correct set of training labels. Unfortunately, there is no guarantee that all of the labels are correct. Labelling errors are increasingly noticeable in today?s classification tasks, as the scale and difficulty of these tasks increases so much that perfect label assignment becomes nearly impossible. Several algorithms have been proposed to alleviate the problem of which a robust Kernel Fisher Discriminant is a successful example. However, for classification, discriminative models are of primary interest, and rather curiously, the very few existing label-robust discriminative classifiers are limited to linear problems.  相似文献   

15.
Modeling urban growth and generating scenarios are essential for studying the impact and sustainability of an urban hydrologic system. Urban systems are regarded as complex self-organizing systems, where the dynamic transitions from one form of landuse to another occur over a period of time. Therefore, a modeling framework that captures and simulates this complex behavior is essential for generating urban growth scenarios. Cellular Automata (CA)-based models have the potential to model such discrete dynamic systems. In this study, a constraint-based binary CA model was used to predict the future urban growth scenario of the city of Roorkee (India). A hydrologic model was applied on the simulated urban catchment to study its hydrologic response. The Natural Resources Conservation Service Curve Number (NRCS-CN) method, which is suitable for ungauged urban watersheds, was adopted to determine the impact of urban growth on the quantity of storm water runoff over a period of time. The results indicate that urban growth has a linear relationship with peak discharge and time to peak for the catchment under investigation.  相似文献   

16.
We propose a logistic regression method based on the hybridation of a linear model and product-unit neural network models for binary classification. In a first step we use an evolutionary algorithm to determine the basic structure of the product-unit model and afterwards we apply logistic regression in the new space of the derived features. This hybrid model has been applied to seven benchmark data sets and a new microbiological problem. The hybrid model outperforms the linear part and the nonlinear part obtaining a good compromise between them and they perform well compared to several other learning classification techniques. We obtain a binary classifier with very promising results in terms of classification accuracy and the complexity of the classifier.  相似文献   

17.
18.
Forecasting the direction of the daily changes of stock indices is an important yet difficult task for market participants. Advances on data mining and machine learning make it possible to develop more accurate predictions to assist investment decision making. This paper attempts to develop a learning architecture LR2GBDT for forecasting and trading stock indices, mainly by cascading the logistic regression (LR) model onto the gradient boosted decision trees (GBDT) model. Without any assumption on the underlying data generating process, raw price data and twelve technical indicators are employed for extracting the information contained in the stock indices. The proposed architecture is evaluated by comparing the experimental results with the LR, GBDT, SVM (support vector machine), NN (neural network) and TPOT (tree-based pipeline optimization tool) models on three stock indices data of two different stock markets, which are an emerging market (Shanghai Stock Exchange Composite Index) and a mature stock market (Nasdaq Composite Index and S&P 500 Composite Stock Price Index). Given the same test conditions, the cascaded model not only outperforms the other models, but also shows statistically and economically significant improvements for exploiting simple trading strategies, even when transaction cost is taken into account.  相似文献   

19.
When continuous predictors are present, classical Pearson and deviance goodness-of-fit tests to assess logistic model fit break down. The Hosmer-Lemeshow test can be used in these situations. While simple to perform and widely used, it does not have desirable power in many cases and provides no further information on the source of any detectable lack of fit. Tsiatis proposed a score statistic to test for covariate regional effects. While conceptually elegant, its lack of a general rule for how to partition the covariate space has, to a certain degree, limited its popularity. We propose a new method for goodness-of-fit testing that uses a very general partitioning strategy (clustering) in the covariate space and either a Pearson statistic or a score statistic. Properties of the proposed statistics are discussed, and a simulation study demonstrates increased power to detect model misspecification in a variety of settings. An application of these different methods on data from a clinical trial illustrates their use. Discussions on further improvement of the proposed tests and extending this new method to other data situations, such as ordinal response regression models are also included.  相似文献   

20.
A stochastically constrained cellular model of urban growth   总被引:4,自引:0,他引:4  
Recent approaches to modeling urban growth use the notion that urban development can be conceived as a self-organizing system in which natural constraints and institutional controls (land-use policies) temper the way in which local decision-making processes produce macroscopic patterns of urban form. In this paper a cellular automata (CA) model that simulates local decision-making processes associated with fine-scale urban form is developed and used to explore the notion of urban systems as self-organizing phenomenon. The CA model is integrated with a stochastic constraint model that incorporates broad-scale factors that modify or constrain urban growth. Local neighborhood access rules are applied within a broader neighborhood in which friction-of-distance limitations and constraints associated with socio-economic and bio-physical variables are stochastically realized. The model provides a means for simulating the different land-use scenarios that may result from alternative land-use policies. Application results are presented for possible growth scenarios in a rapidly urbanizing region in south east Queensland, Australia.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号