首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
在数据挖掘中,分期是一个很重要的问题,有很多流行的分类器可以创建决策树木产生类模型。本文介绍了通过信息增益或熵的比较来构造一棵决策树的数桩挖掘算法思想,给出了用粗糙集理论构造决策树的一种方法,并用曲面造型方面的实例说明了决策树的生成过程。通过与ID3方法的比较,该种方法可以降低决策树的复杂性,优化决策树的结构,能挖掘较好的规则信息。  相似文献   

2.
数据挖掘中决策树算法的最新进展   总被引:28,自引:1,他引:27  
概述了传统决策树方法的基本原理和优越性,指出了该方法应用于超大数据集的数据挖掘环境时的局限性;着重分五个方面概括了近年来决策树方法在数据挖掘中的主要进展,并讨论了决策树方法面临的挑战及其发展趋势。  相似文献   

3.
本文首先阐述了数据挖掘中决策树的基本思想,然后简单介绍了决策树经典算法(ID3算法),重点基于ID3算法论述了对于决策树的影响4个要素,并使用真实的数据详细地分析了4个要素,实验表明,只要4个要素中的任何一个改变,决策树必须要重新被构建。  相似文献   

4.
While simple crop and hydrological models are limited with respect to the number and accuracy of the processes they incorporate, complex models have high demand for data. Due to the limitations of both categories of models, there is a need for new agro-hydrological models that simulate both crop productivity and water availability in agricultural catchments, with low data and calibration requirements. This study aimed at developing a widely applicable parsimonious agro-hydrological model, AquaCrop-Hydro, which couples the AquaCrop crop water productivity model with a conceptual hydrological model. AquaCrop-Hydro, simulating crop productivity, the daily soil water balance and discharge at the catchment outlet, performed well for an agricultural catchment in Belgium. The model can be used to investigate the effect of agricultural management and environmental changes from field to catchment scale in support of sustainable water management in agricultural areas.  相似文献   

5.
Comparative analysis of data mining methods for bankruptcy prediction   总被引:1,自引:0,他引:1  
A great deal of research has been devoted to prediction of bankruptcy, to include application of data mining. Neural networks, support vector machines, and other algorithms often fit data well, but because of lack of comprehensibility, they are considered black box technologies. Conversely, decision trees are more comprehensible by human users. However, sometimes far too many rules result in another form of incomprehensibility. The number of rules obtained from decision tree algorithms can be controlled to some degree through setting different minimum support levels. This study applies a variety of data mining tools to bankruptcy data, with the purpose of comparing accuracy and number of rules. For this data, decision trees were found to be relatively more accurate compared to neural networks and support vector machines, but there were more rule nodes than desired. Adjustment of minimum support yielded more tractable rule sets.  相似文献   

6.
Successive emission of high resolution satellite has created new opportunities for the application of domestic high resolution remote sensing data.In order to explore the feasibility of GF data in the field of small and medium scale crop remote sensing monitoring and to establish a suitable technical system,with Yangzhou as an example,using decision tree model and object oriented classification method to research the feasibilityon crop planting information extraction of GF wide field viewdata.And explore the method to improve the accuracy.The results showed that,sub\|regionpretreatmentcan reduce the adverse effects of crop spatial distribution on the extraction of the planting area.The overall accuracy of winter wheat was 97%,the Kappa coefficient was 0.93;the overall accuracy of rape was 96%,the Kappa coefficient was 0.84.Research shows thatdomestic GF\|1 WFV images can be applied to the crop planting informationextraction,and toprovide an important reference and decision support for adjusting crop spatial and optimizing management of gain producing areas.  相似文献   

7.
The decision tree-based classification is a popular approach for pattern recognition and data mining. Most decision tree induction methods assume training data being present at one central location. Given the growth in distributed databases at geographically dispersed locations, the methods for decision tree induction in distributed settings are gaining importance. This paper describes one such method that generates compact trees using multifeature splits in place of single feature split decision trees generated by most existing methods for distributed data. Our method is based on Fisher's linear discriminant function, and is capable of dealing with multiple classes in the data. For homogeneously distributed data, the decision trees produced by our method are identical to decision trees generated using Fisher's linear discriminant function with centrally stored data. For heterogeneously distributed data, a certain approximation is involved with a small change in performance with respect to the tree generated with centrally stored data. Experimental results for several well-known datasets are presented and compared with decision trees generated using Fisher's linear discriminant function with centrally stored data.  相似文献   

8.
We study the possibility of constructing decision trees with evolutionary algorithms in order to increase their predictive accuracy. We present a self-adapting evolutionary algorithm for the induction of decision trees and describe the principle of decision making based on multiple evolutionary induced decision trees—decision forest. The developed model is used as a fault predictive approach to foresee dangerous software modules, which identification can largely enhance the reliability of software.  相似文献   

9.
An important objective of data mining is the development of predictive models. Based on a number of observations, a model is constructed that allows the analysts to provide classifications or predictions for new observations. Currently, most research focuses on improving the accuracy or precision of these models and comparatively little research has been undertaken to increase their comprehensibility to the analyst or end-user. This is mainly due to the subjective nature of ‘comprehensibility’, which depends on many factors outside the model, such as the user's experience and his/her prior knowledge. Despite this influence of the observer, some representation formats are generally considered to be more easily interpretable than others. In this paper, an empirical study is presented which investigates the suitability of a number of alternative representation formats for classification when interpretability is a key requirement. The formats under consideration are decision tables, (binary) decision trees, propositional rules, and oblique rules. An end-user experiment was designed to test the accuracy, response time, and answer confidence for a set of problem-solving tasks involving the former representations. Analysis of the results reveals that decision tables perform significantly better on all three criteria, while post-test voting also reveals a clear preference of users for decision tables in terms of ease of use.  相似文献   

10.
基于不完备信息系统的决策树生成算法   总被引:1,自引:1,他引:0  
决策树是一种有效地进行实例分类的数据挖掘方法。在处理不完备信息系统中的缺省值数据时,现有决策树算法大多使用猜测技术。在不改变缺失值的情况下,利用极大相容块的概念定义了不完备决策表中条件属性对决策属性的决策支持度,将其作为属性选择的启发式信息。同时,提出了一种不完备信息系统中的决策树生成算法IDTBDS,该算法不仅可以快速得到规则集,而且具有较高的准确率。  相似文献   

11.
A successful water management scheme for irrigated crops requires an integrated approach that accounts for water, crop, soil and field management. Most existing models are designed for a specific irrigation system, specific process such as water and solute movement, infiltration, leaching or water uptake by plant roots or a combination of them. There is a shortage in models of a generic nature, models that can be used for a variety of irrigation systems, soil types, soil stratifications, crops and trees, water management strategies (blending or cyclic), leaching requirements and water quality. SALTMED model has been developed for such generic applications. The model employs established water and solute transport, evapotranspiration and crop water uptake equations. In this paper, the model has been run with five examples of applications for one growing season using data from the literature. The model successfully illustrated the effect of the irrigation system, the soil type, the salinity level of irrigation water on soil moisture and salinity distribution, leaching requirements, and crop yield in all cases. Due to the scarcity of data sets that are suitable for model testing over the complete growing season, where different processes are acting simultaneously, a follow up paper will show the results of the model tests using data being collected from two sites in Egypt and in Syria as a part of ongoing SALTMED project.  相似文献   

12.
A method for analyzing production systems by applying multi-objective optimization and data mining techniques on discrete-event simulation models, the so-called Simulation-based Innovization (SBI) is presented in this paper. The aim of the SBI analysis is to reveal insight on the parameters that affect the performance measures as well as to gain deeper understanding of the problem, through post-optimality analysis of the solutions acquired from multi-objective optimization. This paper provides empirical results from an industrial case study, carried out on an automotive machining line, in order to explain the SBI procedure. The SBI method has been found to be particularly suitable in this case study as the three objectives under study, namely total tardiness, makespan and average work-in-process, are in conflict with each other. Depending on the system load of the line, different decision variables have been found to be influencing. How the SBI method is used to find important patterns in the explored solution set and how it can be valuable to support decision making in order to improve the scheduling under different system loadings in the machining line are addressed.  相似文献   

13.
Seasonal climate forecasts (SCFs) have received a lot of attention for climate risk management in agriculture. The question is, how can we use SCFs for informing decisions in agriculture? SCFs are provided in formats not so conducive for decision-making. The commonly issued tercile probabilities of most likely rainfall categories i.e., below normal (BN), near normal (NN) and above normal (AN), are not easy to translate into metrics useful for decision support. Linking SCF with crop models is one way that can produce useful information for supporting strategic and tactical decisions in crop production e.g., crop choices, management practices, insurance, etc. Here, we developed a decision support system (DSS) tool, Climate-Agriculture-Modeling and Decision Tool (CAMDT), that aims to facilitate translations of probabilistic SCFs to crop responses that can help decision makers adjust crop and water management practices that may improve outcomes given the expected climatic condition of the growing season.  相似文献   

14.
The paper examines the validation of prediction models of acceptance of academic placement offers by students in the context of international applications at a large metropolitan Australian University using data mining techniques. Earlier works in enrolment management have examined various classification problems such as inquiry to enrol, persistence and graduation. The data and settings from different institutions are often different, which implies that in order to find out which models and techniques are applicable at a given university, the dataset from that university needs to be used in the validation effort. The whole dataset from the Australian university comprised 24,283 offers made to international applicants from the year 2008 to 2013. Every year around 2000–2500 new international students who accept offers of academic placement commence their studies. The important predictors for the acceptance of offers were as follows: the chosen course and faculty, whether the student was awarded any form of scholarship, and also the visa assessment level of the country by the immigration department. Prediction models were developed using a number of classification methods such as logistic regression, Naïve Bayes, decision trees, support vector machines, random forests, k-nearest neighbour, neural networks and their performances compared. Overall, the neural network prediction model with a single hidden layer produced the best result.  相似文献   

15.
Precise prediction of stock prices is difficult chiefly because of the many intervening factors. Unpredictability is particularly notable in the aftermath of the global financial crisis. Data mining may however be used to discover highly correlated estimation models. This study looks at artificial neural networks (ANN), decision trees and the hybrid model of ANN and decision trees (hybrid model), the three common algorithm methods used for numerical analysis, to forecast stock prices. The author compared the stock price forecasting models derived from the three methods, and applied the models on 10 different stocks in 320 data sets in an empirical forecast. Average accuracy of ANN is 15.31%, the highest, in terms of match with real market stock prices, followed by decision trees, at 14.06%; hybrid model is 13.75%. The study also discovers that compared to the other two methods, ANN is a more stable method for predicting stock prices in the volatile post-crisis stock market.  相似文献   

16.
本文说明了一种决策模型的自动生成及管理系统的设计与实现方法。该系统能够辅助人们去建立一种基于决策表的决策模型,自动进行各种检验,并能将此模型转换成另外两种决策模型。此外,它还具有决策模型库和应用案例库的管理功能。该系统已成功地运用到多个大型信息系统的开发之中。  相似文献   

17.
Although a substantial number of decision and management science models of production management decisions have been developed, these models have not generally addressed the strategic framework of manufacturing, considered as an integrated pattern of decisions. They have usually been concerned with only one aspect of manufacturing strategy such as capacity or technology choice and have framed that choice using a single dimension of value such as cost. This study develops a comprehensive decision analysis model base which can be implemented as a DSS, to gain insight about the broader manufacturing strategy set. Decision trees, influence diagrams, Monte Carlo risk analysis and multiple criteria utility functions can contribute to a better understanding of and to decision support for manufacturing strategy formulation.  相似文献   

18.
Software architects consider capturing and sharing architectural decisions increasingly important; many tacit dependencies exist in this architectural knowledge. Architectural decision modeling makes these dependencies explicit and serves as a foundation for knowledge management tools. In practice, however, text templates and informal rich pictures rather than models are used to capture the knowledge; a formal definition of model entities and their relations is missing in the current state of the art. In this paper, we propose such a formal definition of architectural decision models as directed acyclic graphs with several types of nodes and edges. In our models, architectural decision topic groups, issues, alternatives, and outcomes form trees of nodes connected by edges expressing containment and refinement, decomposition, and triggers dependencies, as well as logical relations such as (in)compatibility of alternatives. The formalization can be used to verify integrity constraints and to organize the decision making process; production rules and dependency patterns can be defined. A reusable architectural decision model supporting service-oriented architecture design demonstrates how we use these concepts. We also present tool support and give a quantitative evaluation.  相似文献   

19.
Fan Min  Qihe Liu 《Information Sciences》2009,179(14):2442-2452
Cost-sensitive learning is an important issue in both data mining and machine learning, in that it deals with the problem of learning from decision systems relative to a variety of costs. In this paper, we introduce a hierarchy of cost-sensitive decision systems from a test cost perspective. Two major issues are addressed with regard to test cost dependency. The first is concerned with the common test cost, where a group of tests share a common cost, while the other relates to the sequence-dependent test cost, where the order of the test sequence influences the total cost. Theoretical aspects of each of the six models in our hierarchy are investigated and illustrated via examples. The proposed models are shown to be useful for exploring cost related information in various different applications.  相似文献   

20.
Previous studies about ensembles of classifiers for bankruptcy prediction and credit scoring have been presented. In these studies, different ensemble schemes for complex classifiers were applied, and the best results were obtained using the Random Subspace method. The Bagging scheme was one of the ensemble methods used in the comparison. However, it was not correctly used. It is very important to use this ensemble scheme on weak and unstable classifiers for producing diversity in the combination. In order to improve the comparison, Bagging scheme on several decision trees models is applied to bankruptcy prediction and credit scoring. Decision trees encourage diversity for the combination of classifiers. Finally, an experimental study shows that Bagging scheme on decision trees present the best results for bankruptcy prediction and credit scoring.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号