首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
Zoran  Igor   《Data & Knowledge Engineering》2008,67(3):504-516
The paper compares different approaches to estimate the reliability of individual predictions in regression. We compare the sensitivity-based reliability estimates developed in our previous work with four approaches found in the literature: variance of bagged models, local cross-validation, density estimation, and local modeling. By combining pairs of individual estimates, we compose a combined estimate that performs better than the individual estimates. We tested the estimates by running data from 28 domains through eight regression models: regression trees, linear regression, neural networks, bagging, support vector machines, locally weighted regression, random forests, and generalized additive model. The results demonstrate the potential of a sensitivity-based estimate, as well as the local modeling of prediction error with regression trees. Among the tested approaches, the best average performance was achieved by estimation using the bagging variance approach, which achieved the best performance with neural networks, bagging and locally weighted regression.  相似文献   

2.
Analysis of cancer data: a data mining approach   总被引:1,自引:1,他引:0  
Abstract: Even though cancer research has traditionally been clinical and biological in nature, in recent years data driven analytic studies have become a common complement. In medical domains where data and analytics driven research is successfully applied, new and novel research directions are identified to further advance the clinical and biological studies. In this research, we used three popular data mining techniques (decision trees, artificial neural networks and support vector machines) along with the most commonly used statistical analysis technique logistic regression to develop prediction models for prostate cancer survivability. The data set contained around 120 000 records and 77 variables. A k-fold cross-validation methodology was used in model building, evaluation and comparison. The results showed that support vector machines are the most accurate predictor (with a test set accuracy of 92.85%) for this domain, followed by artificial neural networks and decision trees.  相似文献   

3.
The thin-film transistor liquid–crystal display (TFT-LCD) industry has developed rapidly in recent years. Because TFT-LCD manufacturing is highly complex and requires different tools for different products, accurately estimating the cost of manufacturing TFT-LCD equipment is essential. Conventional cost estimation models include linear regression (LR), artificial neural networks (ANNs), and support vector regression (SVR). Nevertheless, in accordance with recent evidence that a hierarchical structure outperforms a flat structure, this study proposes a hierarchical classification and regression (HCR) approach for improving the accuracy of cost predictions for TFT-LCD inspection and repair equipment. Specifically, first-level analyses by HCR classify new unknown cases into specific classes. The cases are then inputted into the corresponding prediction models for the final output. In this study, experimental results based on a real world dataset containing data for TFT-LCD equipment development projects performed by a leading Taiwan provider show that three prediction models based on HCR approach are generally comparable or better than three conventional flat models (LR, ANN, and SVR) in terms of prediction accuracy. In particular, the 4-class and 5-class support vector machines in the first-level HCR combined with individual SVR obtain the lowest root mean square error (RMSE) and mean average percentage error (MAPE) rates, respectively.  相似文献   

4.
ContextIn software industry, project managers usually rely on their previous experience to estimate the number men/hours required for each software project. The accuracy of such estimates is a key factor for the efficient application of human resources. Machine learning techniques such as radial basis function (RBF) neural networks, multi-layer perceptron (MLP) neural networks, support vector regression (SVR), bagging predictors and regression-based trees have recently been applied for estimating software development effort. Some works have demonstrated that the level of accuracy in software effort estimates strongly depends on the values of the parameters of these methods. In addition, it has been shown that the selection of the input features may also have an important influence on estimation accuracy.ObjectiveThis paper proposes and investigates the use of a genetic algorithm method for simultaneously (1) select an optimal input feature subset and (2) optimize the parameters of machine learning methods, aiming at a higher accuracy level for the software effort estimates.MethodSimulations are carried out using six benchmark data sets of software projects, namely, Desharnais, NASA, COCOMO, Albrecht, Kemerer and Koten and Gray. The results are compared to those obtained by methods proposed in the literature using neural networks, support vector machines, multiple additive regression trees, bagging, and Bayesian statistical models.ResultsIn all data sets, the simulations have shown that the proposed GA-based method was able to improve the performance of the machine learning methods. The simulations have also demonstrated that the proposed method outperforms some recent methods reported in the recent literature for software effort estimation. Furthermore, the use of GA for feature selection considerably reduced the number of input features for five of the data sets used in our analysis.ConclusionsThe combination of input features selection and parameters optimization of machine learning methods improves the accuracy of software development effort. In addition, this reduces model complexity, which may help understanding the relevance of each input feature. Therefore, some input parameters can be ignored without loss of accuracy in the estimations.  相似文献   

5.
We present a data-driven method for monitoring machine status in manufacturing processes. Audio and vibration data from precision machining are used for inference in two operating scenarios: (a) variable machine health states (anomaly detection); and (b) settings of machine operation (state estimation). Audio and vibration signals are first processed through Fast Fourier Transform and Principal Component Analysis to extract transformed and informative features. These features are then used in the training of classification and regression models for machine state monitoring. Specifically, three classifiers (K-nearest neighbors, convolutional neural networks and support vector machines) and two regressors (support vector regression and neural network regression) were explored, in terms of their accuracy in machine state prediction. It is shown that the audio and vibration signals are sufficiently rich in information about the machine that 100% state classification accuracy could be accomplished. Data fusion was also explored, showing overall superior accuracy of data-driven regression models.  相似文献   

6.
Predicting student attrition is an intriguing yet challenging problem for any academic institution. Class-imbalanced data is a common in the field of student retention, mainly because a lot of students register but fewer students drop out. Classification techniques for imbalanced dataset can yield deceivingly high prediction accuracy where the overall predictive accuracy is usually driven by the majority class at the expense of having very poor performance on the crucial minority class. In this study, we compared different data balancing techniques to improve the predictive accuracy in minority class while maintaining satisfactory overall classification performance. Specifically, we tested three balancing techniques—over-sampling, under-sampling and synthetic minority over-sampling (SMOTE)—along with four popular classification methods—logistic regression, decision trees, neuron networks and support vector machines. We used a large and feature rich institutional student data (between the years 2005 and 2011) to assess the efficacy of both balancing techniques as well as prediction methods. The results indicated that the support vector machine combined with SMOTE data-balancing technique achieved the best classification performance with a 90.24% overall accuracy on the 10-fold holdout sample. All three data-balancing techniques improved the prediction accuracy for the minority class. Applying sensitivity analyses on developed models, we also identified the most important variables for accurate prediction of student attrition. Application of these models has the potential to accurately predict at-risk students and help reduce student dropout rates.  相似文献   

7.
There has been intensive research from academics and practitioners regarding models for predicting bankruptcy and default events, for credit risk management. Seminal academic research has evaluated bankruptcy using traditional statistics techniques (e.g. discriminant analysis and logistic regression) and early artificial intelligence models (e.g. artificial neural networks). In this study, we test machine learning models (support vector machines, bagging, boosting, and random forest) to predict bankruptcy one year prior to the event, and compare their performance with results from discriminant analysis, logistic regression, and neural networks. We use data from 1985 to 2013 on North American firms, integrating information from the Salomon Center database and Compustat, analysing more than 10,000 firm-year observations. The key insight of the study is a substantial improvement in prediction accuracy using machine learning techniques especially when, in addition to the original Altman’s Z-score variables, we include six complementary financial indicators. Based on Carton and Hofer (2006), we use new variables, such as the operating margin, change in return-on-equity, change in price-to-book, and growth measures related to assets, sales, and number of employees, as predictive variables. Machine learning models show, on average, approximately 10% more accuracy in relation to traditional models. Comparing the best models, with all predictive variables, the machine learning technique related to random forest led to 87% accuracy, whereas logistic regression and linear discriminant analysis led to 69% and 50% accuracy, respectively, in the testing sample. We find that bagging, boosting, and random forest models outperform the others techniques, and that all prediction accuracy in the testing sample improves when the additional variables are included. Our research adds to the discussion of the continuing debate about superiority of computational methods over statistical techniques such as in Tsai, Hsu, and Yen (2014) and Yeh, Chi, and Lin (2014). In particular, for machine learning mechanisms, we do not find SVM to lead to higher accuracy rates than other models. This result contradicts outcomes from Danenas and Garsva (2015) and Cleofas-Sanchez, Garcia, Marques, and Senchez (2016), but corroborates, for instance, Wang, Ma, and Yang (2014), Liang, Lu, Tsai, and Shih (2016), and Cano et al. (2017). Our study supports the applicability of the expert systems by practitioners as in Heo and Yang (2014), Kim, Kang, and Kim (2015) and Xiao, Xiao, and Wang (2016).  相似文献   

8.
The research on the stock market prediction has been more popular in recent years. Numerous researchers tried to predict the immediate future stock prices or indices based on technical indices with various mathematical models and machine learning techniques such as artificial neural networks (ANN), support vector machines (SVM) and ARIMA models. Although some researches in the literature exhibit satisfactory prediction achievement when the average percentage error and root mean square error are used as the performance metrics, the prediction accuracy of whether stock market goes or down is seldom analyzed. This paper employs wrapper approach to select the optimal feature subset from original feature set composed of 23 technical indices and then uses voting scheme that combines different classification algorithms to predict the trend in Korea and Taiwan stock markets. Experimental result shows that wrapper approach can achieve better performance than the commonly used feature filters, such as χ2-Statistic, Information gain, ReliefF, Symmetrical uncertainty and CFS. Moreover, the proposed voting scheme outperforms single classifier such as SVM, kth nearest neighbor, back-propagation neural network, decision tree, and logistic regression.  相似文献   

9.
Comparative analysis of data mining methods for bankruptcy prediction   总被引:1,自引:0,他引:1  
A great deal of research has been devoted to prediction of bankruptcy, to include application of data mining. Neural networks, support vector machines, and other algorithms often fit data well, but because of lack of comprehensibility, they are considered black box technologies. Conversely, decision trees are more comprehensible by human users. However, sometimes far too many rules result in another form of incomprehensibility. The number of rules obtained from decision tree algorithms can be controlled to some degree through setting different minimum support levels. This study applies a variety of data mining tools to bankruptcy data, with the purpose of comparing accuracy and number of rules. For this data, decision trees were found to be relatively more accurate compared to neural networks and support vector machines, but there were more rule nodes than desired. Adjustment of minimum support yielded more tractable rule sets.  相似文献   

10.
Bank failures threaten the economic system as a whole. Therefore, predicting bank financial failures is crucial to prevent and/or lessen the incoming negative effects on the economic system. This is originally a classification problem to categorize banks as healthy or non-healthy ones. This study aims to apply various neural network techniques, support vector machines and multivariate statistical methods to the bank failure prediction problem in a Turkish case, and to present a comprehensive computational comparison of the classification performances of the techniques tested. Twenty financial ratios with six feature groups including capital adequacy, asset quality, management quality, earnings, liquidity and sensitivity to market risk (CAMELS) are selected as predictor variables in the study. Four different data sets with different characteristics are developed using official financial data to improve the prediction performance. Each data set is also divided into training and validation sets. In the category of neural networks, four different architectures namely multi-layer perceptron, competitive learning, self-organizing map and learning vector quantization are employed. The multivariate statistical methods; multivariate discriminant analysis, k-means cluster analysis and logistic regression analysis are tested. Experimental results are evaluated with respect to the correct accuracy performance of techniques. Results show that multi-layer perceptron and learning vector quantization can be considered as the most successful models in predicting the financial failure of banks.  相似文献   

11.
Industrial robots (IRs) are widely used to increase productivity and efficiency in manufacturing industries. Therefore, it is critical to reduce the energy consumption of IRs to maximize their use in polishing, assembly, welding, and handling tasks. This study adopted a data-driven modeling approach using a batch-normalized long short-term memory (BN-LSTM) network to construct a robust energy-consumption prediction model for IRs. The adopted method applies batch normalization (BN) to the input-to-hidden transition to allow faster convergence of the model. We compared the prediction accuracy with that of the 1D-ResNet14 model in a UR (UR3e and UR10e) public database. The adopted model achieved a root mean square (RMS) error of 2.82 W compared with the error of 6.52 W achieved by 1D-ResNet14 model prediction, indicating a performance improvement of 56.74%. We also compared the prediction accuracy over the UR3e dataset using machine learning and deep learning models, such as regression trees, linear regression, ensemble trees, support vector regression, multilayer perceptron, and convolutional neural network-gated recurrent unit. Furthermore, the layers of the well-trained UR3e power model were transferred to the UR10e cobot to construct a rapid power model with 80% reduced UR10e datasets. This transfer learning approach showed an RMS error of 3.67 W, outperforming the 1D-ResNet14 model (RMS error: 4.78 W). Finally, the BN-LSTM model was validated using unseen test datasets from the Yaskawa polishing motion task, with an average prediction accuracy of 99%.  相似文献   

12.
In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to accurately classify some e-learning students, whereas another may succeed, three decision schemes, which combine in different ways the results of the three machine learning techniques, were also tested. The method was examined in terms of overall accuracy, sensitivity and precision and its results were found to be significantly better than those reported in relevant literature.  相似文献   

13.
We present a framework for the unsupervised segmentation of switching dynamics using support vector machines. Following the architecture by Pawelzik et al., where annealed competing neural networks were used to segment a nonstationary time series, in this paper, we exploit the use of support vector machines, a well-known learning technique. First, a new formulation of support vector regression is proposed. Second, an expectation-maximization step is suggested to adaptively adjust the annealing parameter. Results indicate that the proposed approach is promising.  相似文献   

14.
Demand for high-quality, affordable healthcare services increasing with the aging population in the US. In order to cope with this situation, decision makers in healthcare (managerial, administrative and/or clinical) need to be increasingly more effective and efficient at what they do. Along with expertise, information and knowledge are the other key sources for better decisions. Data mining techniques are becoming a popular tool for extracting information/knowledge hidden deep into large healthcare databases. In this study, using a large, feature-rich, nationwide inpatient databases along with four popular machine learning techniques, we developed predictive models; and using an information fusion based sensitivity analysis on these models, we explained the surgical outcome of a patient undergoing a coronary artery bypass grafting. In this study, support vector machines produced the best prediction results (87.74%) followed by decision trees and neural networks. Studies like this illustrate the fact that accurate prediction and better understanding of such complex medical interventions can potentially lead to more favorable outcomes and optimal use of limited healthcare resources.  相似文献   

15.
基于SVR的混沌时间序列预测   总被引:11,自引:0,他引:11  
支持向量机是一种基于统计学习理论的新颖的机器学习方法,由于其出色的学习性能,该技术已成为当前国际机器学习界的研究热点。这种方法已广泛用于解决分类和回归问题。论文介绍了支持向量回归算法的各种版本,同时将它们应用到混沌时间序列预测中,并且比较了它们的预测性能,为实际应用合理选择模型提供一定的依据。  相似文献   

16.
The scour below spillways can endanger the stability of the dams. Hence, determining the scour depth downstream of spillways is of vital importance. Recently, soft computing models and, in particular, artificial neural networks (ANNs) have been used for scour depth prediction. However, ANNs are not as comprehensible and easy to use as empirical formulas for the estimation of scour depth. Therefore, in this study, two decision-tree methods based on model trees and classification and regression trees were employed for the prediction of scour depth downstream of free overfall spillways. The advantage of model trees and classification and regression trees compared to ANNs is that these models are able to provide practical prediction equations. A comparison between the results obtained in the present study and those obtained using empirical formulas is made. The statistical measures indicate that the proposed soft computing approaches outperform empirical formulas. Results of the present study indicated that model trees were more accurate than classification and regression trees for the estimation of scour depth.  相似文献   

17.
We present a method for explaining predictions for individual instances. The presented approach is general and can be used with all classification models that output probabilities. It is based on the decomposition of a model's predictions on individual contributions of each attribute. Our method works for the so-called black box models such as support vector machines, neural networks, and nearest neighbor algorithms, as well as for ensemble methods such as boosting and random forests. We demonstrate that the generated explanations closely follow the learned models and present a visualization technique that shows the utility of our approach and enables the comparison of different prediction methods.  相似文献   

18.
In recent years, there have been many studies focusing on improving the accuracy of prediction of transmembrane segments, and many significant results have been achieved. In spite of these considerable results, the existing methods lack the ability to explain the process of how a learning result is reached and why a prediction decision is made. The explanation of a decision made is important for the acceptance of machine learning technology in bioinformatics applications such as protein structure prediction. While support vector machines (SVM) have shown strong generalization ability in a number of application areas, including protein structure prediction, they are black box models and hard to understand. On the other hand, decision trees provide insightful interpretation, however, they have lower prediction accuracy. In this paper, we present an innovative approach to rule generation for understanding prediction of transmembrane segments by integrating the merits of both SVMs and decision trees. This approach combines SVMs with decision trees into a new algorithm called SVM_DT. The results of the experiments for prediction of transmembrane segments on 165 low-resolution test data set show that not only the comprehensibility of SVM_DT is much better than that of SVMs, but also that the test accuracy of these rules is high as well. Rules with confidence values over 90% have an average prediction accuracy of 93.4%. We also found that confidence and prediction accuracy values of the rules generated by SVM_DT are quite consistent. We believe that SVM_DT can be used not only for transmembrane segments prediction, but also for understanding the prediction. The prediction and its interpretation obtained can be used for guiding biological experiments.  相似文献   

19.
This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.  相似文献   

20.
Over the past years, some artificial intelligence techniques like artificial neural networks have been widely used in the hydrological modeling studies. In spite of their some advantages, these techniques have some drawbacks including possibility of getting trapped in local minima, overtraining and subjectivity in the determining of model parameters. In the last few years, a new alternative kernel-based technique called a support vector machines (SVM) has been found to be popular in modeling studies due to its advantages over popular artificial intelligence techniques. In addition, the relevance vector machines (RVM) approach has been proposed to recast the main ideas behind SVM in a Bayesian context. The main purpose of this study is to examine the applicability and capability of the RVM on long-term flow prediction and to compare its performance with feed forward neural networks, SVM, and multiple linear regression models. Meteorological data (rainfall and temperature) and lagged data of rainfall were used in modeling application. Some mostly used statistical performance evaluation measures were considered to evaluate models. According to evaluations, RVM method provided an improvement in model performance as compared to other employed methods. In addition, it is an alternative way to popular soft computing methods for long-term flow prediction providing at least comparable efficiency.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号