首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we compare some traditional statistical methods for predicting financial distress to some more “unconventional” methods, such as decision tree classification, neural networks, and evolutionary computation techniques, using data collected from 200 Taiwan Stock Exchange Corporation (TSEC) listed companies. Empirical experiments were conducted using a total of 42 ratios including 33 financial, 8 non-financial and 1 combined macroeconomic index, using principle component analysis (PCA) to extract suitable variables.This paper makes four critical contributions: (1) with nearly 80% fewer financial ratios by the PCA method, the prediction performance is still able to provide highly-accurate forecasts of financial bankruptcy; (2) we show that traditional statistical methods are better able to handle large datasets without sacrificing prediction performance, while intelligent techniques achieve better performance with smaller datasets and would be adversely affected by huge datasets; (3) empirical results show that C5.0 and CART provide the best prediction performance for imminent bankruptcies; and (4) Support Vector Machines (SVMs) with evolutionary computation provide a good balance of high-accuracy short- and long-term performance predictions for healthy and distressed firms. Therefore, the experimental results show that the Particle Swarm Optimization (PSO) integrated with SVM (PSO-SVM) approach could be considered for predicting potential financial distress.  相似文献   

2.
Lately, stock and derivative securities markets continuously and rapidly evolve in the world. As quick market developments, enterprise operating status will be disclosed periodically on financial statement. Unfortunately, if executives of firms intentionally dress financial statements up, it will not be observed any financial distress possibility in the short or long run. Recently, there were occurred many financial crises in the international marketing, such as Enron, Kmart, Global Crossing, WorldCom and Lehman Brothers events. How these financial events affect world’s business, especially for the financial service industry or investors has been public’s concern. To improve the accuracy of the financial distress prediction model, this paper referred to the operating rules of the Taiwan Stock Exchange Corporation (TSEC) and collected 100 listed companies as the initial samples. Moreover, the empirical experiment with a total of 37 ratios which composed of financial and other non-financial ratios and used principle component analysis (PCA) to extract suitable variables. The decision tree (DT) classification methods (C5.0, CART, and CHAID) and logistic regression (LR) techniques were used to implement the financial distress prediction model. Finally, the experiments acquired a satisfying result, which testifies for the possibility and validity of our proposed methods for the financial distress prediction of listed companies.This paper makes four critical contributions: (1) the more PCA we used, the less accuracy we obtained by the DT classification approach. However, the LR approach has no significant impact with PCA; (2) the closer we get to the actual occurrence of financial distress, the higher the accuracy we obtain in DT classification approach, with an 97.01% correct percentage for 2 seasons prior to the occurrence of financial distress; (3) our empirical results show that PCA increases the error of classifying companies that are in a financial crisis as normal companies; and (4) the DT classification approach obtains better prediction accuracy than the LR approach in short run (less one year). On the contrary, the LR approach gets better prediction accuracy in long run (above one and half year). Therefore, this paper proposes that the artificial intelligent (AI) approach could be a more suitable methodology than traditional statistics for predicting the potential financial distress of a company in short run.  相似文献   

3.
The human liver is one of the major organs in the body and liver disease can cause many problems in human life. Fast and accurate prediction of liver disease allows early and effective treatments. In this regard, various data mining techniques help in better prediction of this disease. Because of the importance of liver disease and increase the number of people who suffer from this disease, we studied on liver disease through using two well-known methods in data mining area.In this paper, novel decision tree based algorithms is used which leads to considering more factors in general and predictions with high accuracy compared to other studies in liver disease. In this application, 583 UCI instances of liver disease dataset from the UCI repository are considered. This dataset consists of 416 records of liver disease and 167 records of healthy liver. This dataset is analyzed by two algorithms named Boosted C5.0 and CHAID algorithms. Until now there is no work in the literature that uses boosted C5.0 and CHAID for creating the rules in liver disease. Our results show that in both algorithms, the DB, ALB, SGPT, TB and A/G factors have a significant impact on predicting liver disease which according to the rules generated by both algorithms important ranges are DB = [10.900–1.200], ALB [4.00–4.300], SGPT = [34–37], TB = [0.600–1.200] (by boosted C5.0), A/G = [1.180–1.390], as well as in the Boosted C5.0 algorithm, Alkphos, SGOT and Age have significant impact in prediction of liver disease. By comparing the performance of these algorithms, it becomes clear that C5.0 algorithm via Boosting technique has an accuracy of 93.75% and this result reveals that it has a better performance than the CHAID algorithm which is 65.00%. Another important achievement of this paper is about the ability of both algorithms to produce rules in one class for liver disease. The results of our assessment show that Boosted C5.0 and CHAID algorithms are capable to produce rules for liver disease. Our results also show that boosted C5.0 considers the gender in liver disease, a factor which is missing in many other studies. Meanwhile, using the rules generated in boosted C5.0 algorithm, we obtained the important result about low susceptibility of female to liver disease than male. This factor is missing in other studies of liver disease. Therefore, our proposed computer-aided diagnostic methods as an expert and intelligent system have impressive impact on liver disease detection. Based on obtained results, we observed that our model had better performance compared to existing methods in the literature.  相似文献   

4.
A decision tree approach was applied and validated for analysis of landslide susceptibility using a geographic information system (GIS). The study area was the Pyeongchang area in Gangwon Province, Korea, where many landslides occurred in 2006 and where the 2018 Winter Olympics are to be held. Spatial data, such as landslides, topography, and geology, were detected, collected, and compiled in a database using remote sensing and GIS. The 3994 recorded landslide locations were randomly split 50/50 for training and validation of the models. A decision tree model, which is a type of data-mining classification model, was applied and decision trees were constructed using the chi-squared (χ2) automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. Also, as a reference, a frequency-ratio model was applied using the same database. The relationships between the detected landslide locations and their factors were identified and quantified by frequency-ratio and decision tree models. The relationships were used as factor ratings in the overlay analysis to create landslide susceptibility indices and maps. Then, the resulting landslide-susceptibility maps were validated using area-under-the-curve (AUC) analysis with the landslide area data that had not been used for training the model. The decision tree models using the CHAID and QUEST algorithms had accuracies of 81.56% and 80.91%, respectively, which were somewhat better than the results for the frequency-ratio model (80.15%). These results indicate that decision tree models using the CHAID and QUEST algorithms can be useful for landslide susceptibility analysis.  相似文献   

5.
While extensive research in data mining has been devoted to developing better classification algorithms, relatively little research has been conducted to examine the effects of feature construction, guided by domain knowledge, on classification performance. However, in many application domains, domain knowledge can be used to construct higher-level features to potentially improve performance. For example, past research and regulatory practice in early warning of bank failures has resulted in various explanatory variables, in the form of financial ratios, that are constructed based on bank accounting variables and are believed to be more effective than the original variables in identifying potential problem banks. In this study, we empirically compare the performance of two sets of classifiers for bank failure prediction, one built using raw accounting variables and the other built using constructed financial ratios. Four popular data mining methods are used to learn the classifiers: logistic regression, decision tree, neural network, and k-nearest neighbor. We evaluate the classifiers on the basis of expected misclassification cost under a wide range of possible settings. The results of the study strongly indicate that feature construction, guided by domain knowledge, significantly improves classifier performance and that the degree of improvement varies significantly across the methods.  相似文献   

6.
Corporate financial failure prediction is of critical importance for decision making of managers, investors and shareholders. In current financial failure prediction models, various financial ratios are usually selected as prediction variables, which implicates that these financial ratios represent the possible cause of financial failure. It is widely recognized that a main cause of financial failure is poor management, and that business operation efficiency is a good reflection of a firm’s management. In this paper, we propose a financial failure prediction model using efficiency as a predictor variable. In the proposed method, data envelopment analysis (DEA) are employed as a tool to evaluate the input/output efficiency of each corporation. To verify the efficacy of efficiency as a predictor, we use the data of corporations listed in Shanghai stock exchange (SSE), and compare the accuracy of the same prediction method with and without the variable. Experimental results of three main financial failure prediction models, i.e., multiple discriminant approach (MDA), logistic regression, and support vector machines (SVMs), all suggest that efficiency is an effective predictor variable.  相似文献   

7.
Decision trees have been widely used in data mining and machine learning as a comprehensible knowledge representation. While ant colony optimization (ACO) algorithms have been successfully applied to extract classification rules, decision tree induction with ACO algorithms remains an almost unexplored research area. In this paper we propose a novel ACO algorithm to induce decision trees, combining commonly used strategies from both traditional decision tree induction algorithms and ACO. The proposed algorithm is compared against three decision tree induction algorithms, namely C4.5, CART and cACDT, in 22 publicly available data sets. The results show that the predictive accuracy of the proposed algorithm is statistically significantly higher than the accuracy of both C4.5 and CART, which are well-known conventional algorithms for decision tree induction, and the accuracy of the ACO-based cACDT decision tree algorithm.  相似文献   

8.
Predicting corporate failure is an important management science problem. This is a typical classification question where the objective is to determine which indicators are involved in the failure/success of a corporation. Despite the importance of this problem, until now only classical machine learning tools have been considered to tackle this classification task. The objective of this paper is twofold. On the one hand, we introduce novel discerning measures to rank independent variables in a generic classification task. On the other hand, we apply boosting techniques to improve the accuracy of a classification tree. We apply this methodology to a set of European firms, considering the usual predicting variables such as financial ratios, as well as including novel variables rarely used before in corporate failure prediction, such as firm size, activity and legal structure. We show that our approach decreases the generalization error about thirty percent with respect to the error produced with a classification tree. In addition, the most important ratios deal with profitability and indebtedness, as is usual in failure prediction studies. E. A. Cortés · M. G. Martínez · N. G. Rubio. The authors teach Statistics at the Faculty of Economic and Business Sciences in the University of Castilla-La Mancha. Esteban Alfaro completed his degree in Business in 1999 and got his Ph.D. in Economics in 2005, both in the University of Castilla-La Mancha. His thesis dealt with the application of ensemble classifiers to corporate failure prediction. Matías Gámez got his degree in Mathematics at the University of Granada in 1991 and finished a Master in Applied Statistics a year after. He completed his Ph.D. in Economics at the University of Castilla-La Mancha in 1998 on the application of geo-statistical techniques to the estimation of housing prices. Noelia García got her degree in Economics at the University of Madrid (UAM) in 1996 and completed her Ph.D. in Economics in 2004 on the construction of an intelligent and automated system for property valuation through the combination of neural nets and a geographic information system (GIS). Current research deals with spatial statistics and the combination of classifiers (decision trees and neural nets) for solving heated topics in the Economics.  相似文献   

9.

Automated textual analysis of firm-related documents has become an important decision support tool for stock market investors. Previous studies tended to adopt either dictionary-based or machine learning approach. Nevertheless, little is known about their concurrent use. Here we use the combination of financial indicators, readability, sentiment categories, and bag-of-words (BoW) to increase prediction accuracy. This paper aims to extract both sentiment and BoW information from the annual reports of US firms. The sentiment analysis is based on two commonly used dictionaries, namely a general dictionary Diction 7.0 and a finance-specific dictionary proposed by Loughran and McDonald (J Finance 66:35–65, 2011. doi:10.1111/j.1540-6261.2010.01625.x). The BoW are selected according to their tf–idf. We combine these features with financial indicators to predict abnormal stock returns using a multilayer perceptron neural network with dropout regularization and rectified linear units. We show that this method performs similarly as naïve Bayes and outperforms other machine learning algorithms (support vector machine, C4.5 decision tree, and k-nearest neighbour classifier) in predicting positive/negative abnormal stock returns in terms of ROC. We also show that the quality of the prediction significantly increased when using the correlation-based feature selection of BoW. This prediction performance is robust to industry categorization and event window.

  相似文献   

10.
Credit classification is an important component of critical financial decision making tasks such as credit scoring and bankruptcy prediction. Credit classification methods are usually evaluated in terms of their accuracy, interpretability, and computational efficiency. In this paper, we propose an approach for automatic designing of fuzzy rule-based classifiers (FRBCs) from financial data using multi-objective evolutionary optimization algorithms (MOEOAs). Our method generates, in a single experiment, an optimized collection of solutions (financial FRBCs) characterized by various levels of accuracy-interpretability trade-off. In our approach we address the complexity- and semantics-related interpretability issues, we introduce original genetic operators for the classifier's rule base processing, and we implement our ideas in the context of Non-dominated Sorting Genetic Algorithm II (NSGA-II), i.e., one of the presently most advanced MOEOAs. A significant part of the paper is devoted to an extensive comparative analysis of our approach and 24 alternative methods applied to three standard financial benchmark data sets, i.e., Statlog (Australian Credit Approval), Statlog (German Credit Approval), and Credit Approval (also referred to as Japanese Credit) sets available from the UCI repository of machine learning databases (http://archive.ics.uci.edu/ml). Several performance measures including accuracy, sensitivity, specificity, and some number of interpretability measures are employed in order to evaluate the obtained systems. Our approach significantly outperforms the alternative methods in terms of the interpretability of the obtained financial data classifiers while remaining either competitive or superior in terms of their accuracy and the speed of decision making.  相似文献   

11.
Predicting financial activity through examining the short-term liquidity is crucial within today’s turbulent financial environment. Firms, governments, and individuals all need an effective methodology based on liquidity information that plays performance deterioration warning a priori bankruptcy prediction. In this paper, we propose a hybrid decision model using case-based reasoning augmented with genetic algorithms (GAs) and the fuzzy k nearest neighbor (fuzzy k-NN) methods for predicting the financial activity rate. GAs are used to determine the optimal or near-optimal weight vector of financial features expressed in linguistic values by the expert. A fuzzy k-NN-based CBR scheme is designed to compute memberships of financial activity rates and to provide a more flexible and practical mechanism for acquiring, creating, and reusing the expert’s decision knowledge. An empirical experimentation using 746 publicly traded Taiwanese firms shows that the average accuracy of the rating is about 92.36%, which is superior to other related models. The proposed approach not only can lend support to the decision of an expert, but also allow proper feedback for the expert to improve the quality of the decision.  相似文献   

12.
Balance-sheet data offer a potentially large number of candidate predictors of corporate financial failure. In this paper we provide a novel predictor selection procedure based on non-parametric regression and classification tree method (CART) and test its performance within a standard logit model. We show that a simple logit model with dummy variables created in accordance with the nodes of estimated classification tree outperforms both standard logit model with step-wise-selected financial ratios, and CART itself. On a population of Slovenian companies our method achieves remarkable rates of precision in out-of-sample bankruptcy prediction. Our selection method thus represents an efficient way of introducing non-linear effects of predictor variables on the default probability in standard single-index models like logit. These findings are robust to choice-based sampling of estimation samples.  相似文献   

13.
Hybrid system is a potential tool to deal with construction engineering and management problems. This study proposes an optimized hybrid artificial intelligence model to integrate a fast messy genetic algorithm (fmGA) with a support vector machine (SVM). The fmGA-based SVM (GASVM) is used for early prediction of dispute propensity in the initial phase of public–private partnership projects. Particularly, the SVM mainly provides learning and curve fitting while the fmGA optimizes SVM parameters. Measures in term of accuracy, precision, sensitivity, specificity, and area under the curve and synthesis index are used for performance evaluation of proposed hybrid intelligence classification model. Experimental comparisons indicate that GASVM achieves better cross-fold prediction accuracy compared to other baseline models (i.e., CART, CHAID, QUEST, and C5.0) and previous works. The forecasting results provide the proactive-warning and decision-support information needed to manage potential disputes.  相似文献   

14.
A newly introduced method called isotonic separation is evaluated in the prediction of firm bankruptcy. Feature reduction methods are first applied to reduce the ratios used in the prediction. Then, various classification methods, including discriminant analysis, neural networks, decision tree induction, learning vector quantization, rough sets, and isotonic separation, are used with the reduced ratios. Experiments show that the isotonic separation method is a viable technique, performing generally better than other methods for short-term bankruptcy prediction.  相似文献   

15.
Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or non-existent, which makes problem formulation open ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search space. Second, the weak relationships among variables tend to be nonlinear, and may hold only in limited areas of the search space. Third, in financial practice, where analysts conduct extensive manual analysis of historically well performing indicators, a key is to find the hidden interactions among variables that perform well in combination. Unfortunately, these are exactly the patterns that the greedy search biases incorporated by many standard rule learning algorithms will miss. In this paper, we describe and evaluate several variations of a new genetic learning algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been motivated by financial prediction problems, but incorporates successful ideas from tree induction and rule learning. We examine the performance of several GLOWER variants on two UCI data sets as well as on a standard financial prediction problem (S&P500 stock returns), using the results to identify one of the better variants for further comparisons. We introduce a new (to KDD) financial prediction problem (predicting positive and negative earnings surprises), and experiment with GLOWER, contrasting it with tree- and rule-induction approaches. Our results are encouraging, showing that GLOWER has the ability to uncover effective patterns for difficult problems that have weak structure and significant nonlinearities.  相似文献   

16.
Prediction of non-life insurance companies insolvency has arised as an important problem in the field of financial research, due to the necessity of protecting the general public whilst minimizing the costs associated to this problem, such as the effects on state insurance guaranty funds or the responsibilities for management and auditors. Most methods applied in the past to predict business failure in non-life insurance companies are traditional statistical techniques, which use financial ratios as explicative variables. However, these variables do not usually satisfy statistical assumptions, what complicates the application of the mentioned methods. Emergent statistical learning methods like neural networks or SVMs provide a successful approach in terms of error rate, but their character of black-box methods make the obtained results difficult to be interpreted and discussed. In this paper, we propose an approach to predict insolvency of non-life insurance companies based on the application of genetic programming (GP). GP is a class of evolutionary algorithms, which operates by codifying the solution of the problem as a population of LISP trees. This type of algorithm provides a diagnosis output in the form of a decision tree with given functions and data. We can treat it like a computer program which returns an answer depending on the input, and, more importantly, the tree can potentially be inspected, interpreted and re-used for different data sets. We have compared the performance of GP with other classifiers approaches, a Support Vector Machine and a Rough Set algorithm. The final purpose is to create an automatic diagnostic system for analysing non-insurance firms using their financial ratios as explicative variables.  相似文献   

17.
This article uses an integrated methodology based on a chi-squared automatic interaction detection (CHAID) model combined with analytic hierarchy process (AHP) for pair-wise comparison to assess medium-scale landslide susceptibility in a catchment in the Inje region of South Korea. An inventory of 3596 landslide locations was collected using remote sensing, and a random sample comprising 30% of these was used to validate the model. The remaining portion (70%) was processed by the nearest-neighbour index (NNI) technique and used for extracting the cluster patterns at each location. These data were used for model training purposes. Ten landslide-conditioning factors (independent variables) representing four main domains, namely (1) topology, (2) geology, (3) hydrology, and (4) land cover, were used to produce two landslide-susceptibility maps. The first landslide-susceptibility map (LSM1) was produced by overlaying the terminal nodes of the CHAID result tree. The second landslide-susceptibility map (LSM2) was produced using the overlay result of AHP pair-wise comparisons of CHAID terminal nodes. The prediction rate curve results were better with LSM2 (area under the prediction curve (AUC) = 0.80) than with LSM1 (AUC = 0.76). The results confirmed that the integrated hybrid model has superior prediction performance and reliability, and it is recommended for future use in medium-scale landslide-susceptibility mapping.  相似文献   

18.
交通流量数据的分类规则挖掘   总被引:2,自引:0,他引:2  
巩帅 《计算机工程与应用》2006,42(6):219-220,232
概述了数据挖掘的分类算法,并简要介绍了C5.0决策树算法。以北京市“三横两纵”主干道交通流量数据为例,采用C5.0决策树算法提取交通流量的分类规则,用于分析交通流量规律、信息模式和数据趋势,并对分类树进行量化,为交通信号设计、路网规划、道路设计、路网节点设计等提供决策支持。  相似文献   

19.
We analyze the performance of top–down algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a proof that such algorithms areboostingalgorithms. By this we mean that if the functions that label the internal nodes of the decision tree can weakly approximate the unknown target function, then the top–down algorithms we study will amplify this weaks advantage to build a tree achieving any desired level of accuracy. The bounds we obtain for this amplification show an interesting dependence on thesplitting criterionused by the top–down algorithm. More precisely, if the functions used to label the internal nodes have error 1/2−γas approximations to the target function, then for the splitting criteria used by CART and C4.5, trees of size (1/ε)O(1/γ2ε2)and (1/ε)O(log(1/ε)/γ2)(respectively) suffice to drive the error belowε. Thus (for example), a small constant advantage over random guessing is amplified to any larger constant advantage with trees of constant size. For a new splitting criterion suggested by our analysis, the much stronger bound of (1/ε)O(1/γ2)which is polynomial in 1/ε) is obtained, which is provably optimal for decision tree algorithms. The differing bounds have a natured explanation in terms of concavity properties of the splitting criterion. The primary contribution of this work is in proving that some popular and empirically successful heuristics that are base on first principles meet the criteria of an independently motivated theoretical model.  相似文献   

20.
The financial distress forecasting has long been of great interest both to scholars and practitioners. The financial distress forecasting is basically a dichotomous decision, either being financial distress or not. Most statistical and artificial intelligence methods estimate the probability of financial distress, and if this probability is greater than the cutoff value, then the prediction is to be financial distress. To improve the accuracy of the financial distress prediction, this paper first analyzed the yearly financial data of 1888 manufacturing corporations collected by the Korea Credit Guarantee Fund (KODIT). Then we developed a financial distress prediction model based on radial basis function support vector machines (RSVM). We compare the classification accuracy performance between our RSVM and artificial intelligence techniques, and suggest a better financial distress predicting model to help a chief finance officer or a board of directors make better decision in a corporate financial distress. The experiments demonstrate that RSVM always outperforms other models in the performance of corporate financial distress predicting, and hence we can predict future financial distress more correctly than any other models. This enhancement in predictability of future financial distress can significantly contribute to the correct valuation of a company, and hence those people from investors to financial managers to any decision makers of a company can make use of RSVM for the better financing and investing decision making which can lead to higher profits and firm values eventually.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号