期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Building a cost-constrained decision tree with multiple condition attributes

Yen-Liang Chen Chia-Chi Wu 《Information Sciences》2009,179(7):967-5226

Costs are often an important part of the classification process. Cost factors have been taken into consideration in many previous studies regarding decision tree models. In this study, we also consider a cost-sensitive decision tree construction problem. We assume that there are test costs that must be paid to obtain the values of the decision attribute and that a record must be classified without exceeding the spending cost threshold. Unlike previous studies, however, in which records were classified with only a single condition attribute, in this study, we are able to simultaneously classify records with multiple condition attributes. An algorithm is developed to build a cost-constrained decision tree, which allows us to simultaneously classify multiple condition attributes. The experimental results show that our algorithm satisfactorily handles data with multiple condition attributes under different cost constraints. 相似文献

2.

Performance of classification models from a user perspective

David Martens Jan Vanthienen Wouter Verbeke Bart BaesensAuthor vitae 《Decision Support Systems》2011,51(4):782-793

This paper proposes a complete framework to assess the overall performance of classification models from a user perspective in terms of accuracy, comprehensibility, and justifiability. A review is provided of accuracy and comprehensibility measures, and a novel metric is introduced that allows one to measure the justifiability of classification models. Furthermore, taxonomy of domain constraints is introduced, and an overview of the existing approaches to impose constraints and include domain knowledge in data mining techniques is presented. Finally, justifiability metric is applied to a credit scoring and customer churn prediction case. 相似文献

3.

Extracting decision trees from trained neural networks 总被引：4，自引：0，他引：4

R. Krishnan G. Sivakumar P. Bhattacharya 《Pattern recognition》1999,32(12):1-2009

In this paper we present a methodology for extracting decision trees from input data generated from trained neural networks instead of doing it directly from the data. A genetic algorithm is used to query the trained network and extract prototypes. A prototype selection mechanism is then used to select a subset of the prototypes. Finally, a standard induction method like ID3 or C5.0 is used to extract the decision tree. The extracted decision trees can be used to understand the working of the neural network besides performing classification. This method is able to extract different decision trees of high accuracy and comprehensibility from the trained neural network. 相似文献

4.

Effective solution for unhandled exception in decision tree induction algorithms

S. Appavu alias Balamurugan Ramasamy Rajaram 《Expert systems with applications》2009,36(10):12113-12119

This paper deals with some improvements to rule induction algorithms in order to resolve the tie that appear in special cases during the rule generation procedure for specific training data sets. These improvements are demonstrated by experimental results on various data sets. The tie occurs in decision tree induction algorithm when the class prediction at a leaf node cannot be determined by majority voting. When there is a conflict in the leaf node, we need to find the source and the solution to the problem. In this paper, we propose to calculate the Influence factor for each attribute and an update procedure to the decision tree has been suggested to deal with the problem and provide subsequent rectification steps. 相似文献

5.

Introducing functional classification theory to land use planning by means of decision tables 总被引：1，自引：0，他引：1

Frank Marc Antrop Peter Bogaert Philippe De Maeyer Ben Derudder Tijs Neutens Veronique Van Acker Nico Van de Weghe 《Decision Support Systems》2009,46(4):875

This paper contributes to the conceptualisation and analysis of double-sided matching problems, taking the land use planning problem as an example. It does so by introducing functional classification theory at the knowledge level, the symbol level and the system level of a DSS. This theory explicitly expresses the methodological viewpoint of relational realism. At the knowledge level this implies defining knowledge on the basis of matching the intension and extension of concepts. At the symbol level it deals with knowledge representation and here decision tables are advanced and formally introduced. At the system level the formalism used at the symbol level is implemented to develop a relational matching DSS. 相似文献

6.

Research on decision tree induction from self-map space based on web

Shuyu Zhongying 《Knowledge》2006,19(8):675-680

This paper proposes an improved decision tree method for web information retrieval with self-map attributes. Our self-map tree has a value of self-map attribute in its internal node, and information based on dissimilarity between a pair of map sequences. Our method selects self-map which exists between data by exhaustive search based on relation and attribute information. Experimental results confirm that our improved method constructs comprehensive and accurate decision tree. Moreover, an example shows that our self-map decision tree is promising for data mining and knowledge discovery. 相似文献

7.

An assessment of the effectiveness of decision tree methods for land cover classification 总被引：11，自引：0，他引：11

Mahesh PalPaul M Mather 《Remote sensing of environment》2003,86(4):554-565

Choice of a classification algorithm is generally based upon a number of factors, among which are availability of software, ease of use, and performance, measured here by overall classification accuracy. The maximum likelihood (ML) procedure is, for many users, the algorithm of choice because of its ready availability and the fact that it does not require an extended training process. Artificial neural networks (ANNs) are now widely used by researchers, but their operational applications are hindered by the need for the user to specify the configuration of the network architecture and to provide values for a number of parameters, both of which affect performance. The ANN also requires an extended training phase.In the past few years, the use of decision trees (DTs) to classify remotely sensed data has increased. Proponents of the method claim that it has a number of advantages over the ML and ANN algorithms. The DT is computationally fast, make no statistical assumptions, and can handle data that are represented on different measurement scales. Software to implement DTs is readily available over the Internet. Pruning of DTs can make them smaller and more easily interpretable, while the use of boosting techniques can improve performance.In this study, separate test and training data sets from two different geographical areas and two different sensors—multispectral Landsat ETM+ and hyperspectral DAIS—are used to evaluate the performance of univariate and multivariate DTs for land cover classification. Factors considered are: the effects of variations in training data set size and of the dimensionality of the feature space, together with the impact of boosting, attribute selection measures, and pruning. The level of classification accuracy achieved by the DT is compared to results from back-propagating ANN and the ML classifiers. Our results indicate that the performance of the univariate DT is acceptably good in comparison with that of other classifiers, except with high-dimensional data. Classification accuracy increases linearly with training data set size to a limit of 300 pixels per class in this case. Multivariate DTs do not appear to perform better than univariate DTs. While boosting produces an increase in classification accuracy of between 3% and 6%, the use of attribute selection methods does not appear to be justified in terms of accuracy increases. However, neither the univariate DT nor the multivariate DT performed as well as the ANN or ML classifiers with high-dimensional data. 相似文献

8.

Flexible decision tree for data stream classification in the presence of concept change, noise and missing values 总被引：1，自引：0，他引：1

Sattar Hashemi Ying Yang 《Data mining and knowledge discovery》2009,19(1):95-131

In recent years, classification learning for data streams has become an important and active research topic. A major challenge posed by data streams is that their underlying concepts can change over time, which requires current classifiers to be revised accordingly and timely. To detect concept change, a common methodology is to observe the online classification accuracy. If accuracy drops below some threshold value, a concept change is deemed to have taken place. An implicit assumption behind this methodology is that any drop in classification accuracy can be interpreted as a symptom of concept change. Unfortunately however, this assumption is often violated in the real world where data streams carry noise that can also introduce a significant reduction in classification accuracy. To compound this problem, traditional noise cleansing methods are incompetent for data streams. Those methods normally need to scan data multiple times whereas learning for data streams can only afford one-pass scan because of data’s high speed and huge volume. Another open problem in data stream classification is how to deal with missing values. When new instances containing missing values arrive, how a learning model classifies them and how the learning model updates itself according to them is an issue whose solution is far from being explored. To solve these problems, this paper proposes a novel classification algorithm, flexible decision tree (FlexDT), which extends fuzzy logic to data stream classification. The advantages are three-fold. First, FlexDT offers a flexible structure to effectively and efficiently handle concept change. Second, FlexDT is robust to noise. Hence it can prevent noise from interfering with classification accuracy, and accuracy drop can be safely attributed to concept change. Third, it deals with missing values in an elegant way. Extensive evaluations are conducted to compare FlexDT with representative existing data stream classification algorithms using a large suite of data streams and various statistical tests. Experimental results suggest that FlexDT offers a significant benefit to data stream classification in real-world scenarios where concept change, noise and missing values coexist. 相似文献

9.

Measures for evaluating the decision performance of a decision table in rough set theory

Yuhua Qian Jiye Liang Deyu Li Haiyun Zhang Chuangyin Dang 《Information Sciences》2008,178(1):181-202

As two classical measures, approximation accuracy and consistency degree can be employed to evaluate the decision performance of a decision table. However, these two measures cannot give elaborate depictions of the certainty and consistency of a decision table when their values are equal to zero. To overcome this shortcoming, we first classify decision tables in rough set theory into three types according to their consistency and introduce three new measures for evaluating the decision performance of a decision-rule set extracted from a decision table. We then analyze how each of these three measures depends on the condition granulation and decision granulation of each of the three types of decision tables. Experimental analyses on three practical data sets show that the three new measures appear to be well suited for evaluating the decision performance of a decision-rule set and are much better than the two classical measures. 相似文献

10.

An incident information management framework based on data integration, data mining, and multi-criteria decision making 总被引：1，自引：0，他引：1

Yi PengAuthor Vitae Yong ZhangAuthor VitaeYu TangAuthor Vitae Shiming LiAuthor Vitae 《Decision Support Systems》2011,51(2):316-327

An effective incident information management system needs to deal with several challenges. It must support heterogeneous distributed incident data, allow decision makers (DMs) to detect anomalies and extract useful knowledge, assist DMs in evaluating the risks and selecting an appropriate alternative during an incident, and provide differentiated services to satisfy the requirements of different incident management phases. To address these challenges, this paper proposes an incident information management framework that consists of three major components. The first component is a high-level data integration module in which heterogeneous data sources are integrated and presented in a uniform format. The second component is a data mining module that uses data mining methods to identify useful patterns and presents a process to provide differentiated services for pre-incident and post-incident information management. The third component is a multi-criteria decision-making (MCDM) module that utilizes MCDM methods to assess the current situation, find the satisfactory solutions, and take appropriate responses in a timely manner. To validate the proposed framework, this paper conducts a case study on agrometeorological disasters that occurred in China between 1997 and 2001. The case study demonstrates that the combination of data mining and MCDM methods can provide objective and comprehensive assessments of incident risks. 相似文献

11.

Decision tree models for characterizing smoking patterns of older adults

Sung Seek MoonSuk-Young Kang Weerawat JitpitaklertSeoung Bum Kim 《Expert systems with applications》2012,39(1):445-451

The main objective of the present paper is to characterize smoking behavior among older adults by assessing the psychological distress, physical health status, alcohol use, and demographic variables in relations to the current smoking. We targeted 466 senior American smokers who are 65 years of age or older from the 2006 National Survey on Drug Use and Health (NSDUH, 2006). We employed a decision tree algorithm to conduct classification analysis to find the relationship between the average numbers of cigarette use per day. The results showed that the most important explanatory variable for prediction of the average number of cigarette use per day is the age when first started smoking cigarettes every day, followed by education level, and psychological distress. These results suggest that social workers need to provide more customized and individualized intervention to older adults. 相似文献

12.

TACO-miner: An ant colony based algorithm for rule extraction from trained neural networks 总被引：1，自引：0，他引：1

Lale zbakir Adil Baykasolu Sinem Kulluk Hüseyin Yap&#x;c&#x; 《Expert systems with applications》2009,36(10):12295-12305

Extracting classification rules from data is an important task of data mining and gaining considerable more attention in recent years. In this paper, a new meta-heuristic algorithm which is called as TACO-miner is proposed for rule extraction from artificial neural networks (ANN). The proposed rule extraction algorithm actually works on the trained ANNs in order to discover the hidden knowledge which is available in the form of connection weights within ANN structure. The proposed algorithm is mainly based on a meta-heuristic which is known as touring ant colony optimization (TACO) and consists of two-step hierarchical structure. The proposed algorithm is experimentally evaluated on six binary and n-ary classification benchmark data sets. Results of the comparative study show that TACO-miner is able to discover accurate and concise classification rules. 相似文献

13.

Splitting methods for decision tree induction: An exploration of the relative performance of two entropy-based families

Kweku-Muata Osei-Bryson Kendall Giles 《Information Systems Frontiers》2006,8(3):195-209

Decision tree (DT) induction is among the more popular of the data mining techniques. An important component of DT induction algorithms is the splitting method, with the most commonly used method being based on the Conditional Entropy (CE) family. However, it is well known that there is no single splitting method that will give the best performance for all problem instances. In this paper we explore the relative performance of the Conditional Entropy family and another family that is based on the Class-Attribute Mutual Information (CAMI) measure. Our results suggest that while some datasets are insensitive to the choice of splitting methods, other datasets are very sensitive to the choice of splitting methods. For example, some of the CAMI family methods may be more appropriate than the popular Gain Ratio (GR) method for datasets which have nominal predictor attributes, and are competitive with the GR method for those datasets where all predictor attributes are numeric. Given that it is never known beforehand which splitting method will lead to the best DT for a given dataset, and given the relatively good performance of the CAMI methods, it seems appropriate to suggest that splitting methods from the CAMI family should be included in data mining toolsets. Kweku-Mauta Osei-Bryson is Professor of Information Systems at Virginia Commonwealth University, where he also served as the Coordinator of the Ph.D. program in Information Systems during 2001–2003. Previously he was Professor of Information Systems and Decision Analysis in the School of Business at Howard University, Washington, DC, U.S.A. He has also worked as an Information Systems practitioner in both industry and government. He holds a Ph.D. in Applied Mathematics (Management Science & Information Systems) from the University of Maryland at College Park, a M.S. in Systems Engineering from Howard University, and a B.Sc. in Natural Sciences from the University of the West Indies at Mona. He currently does research in various areas including: Data Mining, Expert Systems, Decision Support Systems, Group Support Systems, Information Systems Outsourcing, Multi-Criteria Decision Analysis. His papers have been published in various journals including: Information & Management, Information Systems Journal, Information Systems Frontiers, Business Process Management Journal, International Journal of Intelligent Systems, IEEE Transactions on Knowledge & Data Engineering, Data & Knowledge Engineering, Information & Software Technology, Decision Support Systems, Information Processing and Management, Computers & Operations Research, European Journal of Operational Research, Journal of the Operational Research Society, Journal of the Association for Information Systems, Journal of Multi-Criteria Decision Analysis, Applications of Management Science. Currently he serves an Associate Editor of the INFORMS Journal on Computing, and is a member of the Editorial Board of the Computers & Operations Research journal. Kendall E. Giles received the BS degree in Electrical Engineering from Virginia Tech in 1991, the MS degree in Electrical Engineering from Purdue University in 1993, the MS degree in Information Systems from Virginia Commonwealth University in 2002, and the MS degree in Computer Science from Johns Hopkins University in 2004. Currently he is a PhD student (ABD) in Computer Science at Johns Hopkins, and is a Research Assistant in the Applied Mathematics and Statistics department. He has over 15 years of work experience in industry, government, and academic institutions. His research interests can be partially summarized by the following keywords: network security, mathematical modeling, pattern classification, and high dimensional data analysis. 相似文献

14.

Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems 总被引：1，自引：0，他引：1

Salvador García Alberto Fernndez Francisco Herrera 《Applied Soft Computing》2009,9(4):1304-1314

Classification in imbalanced domains is a recent challenge in data mining. We refer to imbalanced classification when data presents many examples from one class and few from the other class, and the less representative class is the one which has more interest from the point of view of the learning task. One of the most used techniques to tackle this problem consists in preprocessing the data previously to the learning process. This preprocessing could be done through under-sampling; removing examples, mainly belonging to the majority class; and over-sampling, by means of replicating or generating new minority examples. In this paper, we propose an under-sampling procedure guided by evolutionary algorithms to perform a training set selection for enhancing the decision trees obtained by the C4.5 algorithm and the rule sets obtained by PART rule induction algorithm. The proposal has been compared with other under-sampling and over-sampling techniques and the results indicate that the new approach is very competitive in terms of accuracy when comparing with over-sampling and it outperforms standard under-sampling. Moreover, the obtained models are smaller in terms of number of leaves or rules generated and they can considered more interpretable. The results have been contrasted through non-parametric statistical tests over multiple data sets. 相似文献

15.

Lazy和Eager分类算法的比较研究

陈昊王熙照袁方湛燕《计算机工程与应用》2004,40(4):72-73,106

数据挖掘的两个高层目标是预测和描述,这个过程中分类算法的应用是非常广泛的。分类算法在机器学习领域中可以分为Lazy和Eager两种类型,分别具有不同的算法特点。文章基于实验对这两种类型的分类算法进行了分析,概括出适宜两种类型的环境条件,旨在提出实际应用中进行算法选择的经验性结论。相似文献

16.

一个基于数据仓库的企业财务管理决策支持系统框架初探 总被引：2，自引：0，他引：2

陶国飞《计算机与数字工程》2001,29(5):57-59,66

基于Internet环境,应用数据仓库和数据挖掘技术,本文构建了一个企业财务管理决策支持系统框架,提出了系统的体系结构。相似文献

17.

浅析数据挖掘的分类与预测

方书晴《软件》2012,(6):77-79,82

数据挖掘技术是信息时代的宠儿,而分类和预测是数据分析的两种基本形式,能预测未知数据的趋势。本文主要介绍了何为数据的分类和预测,并且通过判定树归纳细化了数据分类的划分步骤;通过介绍线性回归、多元回归以及非线性回归等预测方法加深了对数据预测的认识;并介绍了分类法准确率评估方法以及分类和预测的异同点。相似文献

18.

Application of probabilistic and fuzzy cognitive approaches in semantic web framework for medical decision support

Elpiniki I. Papageorgiou Csaba Huszka Jos De Roo Nassim Douali Marie-Christine Jaulent Dirk Colaert 《Computer methods and programs in biomedicine》2013

This study aimed to focus on medical knowledge representation and reasoning using the probabilistic and fuzzy influence processes, implemented in the semantic web, for decision support tasks. Bayesian belief networks (BBNs) and fuzzy cognitive maps (FCMs), as dynamic influence graphs, were applied to handle the task of medical knowledge formalization for decision support. In order to perform reasoning on these knowledge models, a general purpose reasoning engine, EYE, with the necessary plug-ins was developed in the semantic web. The two formal approaches constitute the proposed decision support system (DSS) aiming to recognize the appropriate guidelines of a medical problem, and to propose easily understandable course of actions to guide the practitioners. The urinary tract infection (UTI) problem was selected as the proof-of-concept example to examine the proposed formalization techniques implemented in the semantic web. The medical guidelines for UTI treatment were formalized into BBN and FCM knowledge models. To assess the formal models’ performance, 55 patient cases were extracted from a database and analyzed. The results showed that the suggested approaches formalized medical knowledge efficiently in the semantic web, and gave a front-end decision on antibiotics’ suggestion for UTI. 相似文献

19.

Managing architectural decision models with dependency relations, integrity constraints, and production rules

Olaf Zimmermann Author Vitae Jana Koehler^{Author Vitae} 《Journal of Systems and Software》2009,82(8):1249-1267

Software architects consider capturing and sharing architectural decisions increasingly important; many tacit dependencies exist in this architectural knowledge. Architectural decision modeling makes these dependencies explicit and serves as a foundation for knowledge management tools. In practice, however, text templates and informal rich pictures rather than models are used to capture the knowledge; a formal definition of model entities and their relations is missing in the current state of the art. In this paper, we propose such a formal definition of architectural decision models as directed acyclic graphs with several types of nodes and edges. In our models, architectural decision topic groups, issues, alternatives, and outcomes form trees of nodes connected by edges expressing containment and refinement, decomposition, and triggers dependencies, as well as logical relations such as (in)compatibility of alternatives. The formalization can be used to verify integrity constraints and to organize the decision making process; production rules and dependency patterns can be defined. A reusable architectural decision model supporting service-oriented architecture design demonstrates how we use these concepts. We also present tool support and give a quantitative evaluation. 相似文献

20.

An empirical evaluation of multi-media based learning of a procedural task

Doo Young Lee Dong-Hee Shin 《Computers in human behavior》2012

The present study investigated the effects of multi-media modules and their combinations on the learning of procedural tasks. In the experiment, 72 participants were classified as having either low- or high spatial ability based on their spatial ability test. They were randomly assigned to one of the six experimental conditions in a 2 × 3 factorial design with verbal modality (on-screen text procedure vs. auditory procedure) and the format of visual representation (static visual representation vs. static visual representation with motion cues vs. animated visual representation). After they completed their learning session, the ability to perform the procedural task was directly measured in a realistic setting. The results revealed that: (1) in the condition of static visual representation, the high spatial ability group outperformed the low spatial ability group, (2) for the low spatial ability participants, the animated visual representation group outperformed the static visual representation group, however, the static visual representation with motion cues group did not outperform the static visual representation group, (3) the use of animated visual representation helped participants with low spatial ability more than those with high spatial ability, and (4) a modality effect was found for the measure of satisfaction when viewing the animated visual representation. Since the participants with low spatial ability benefited from the use of animation, the results might support an idea that people are better able to retrieve the procedural information by viewing animated representation. The findings also might reflect a preference for the auditory mode of presentation with greater familiarity with the type of visual representation. 相似文献