期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Admission control for a responsive distributed middleware using decision trees to model run-time parameters

Luis Garcés-Erice 《Parallel Computing》2011,37(8):379-391

The software in modern systems has become too complex to make accurate predictions about their performance under different configurations. Real-time or even responsiveness requirements cannot be met because it is not possible to perform admission control for new or changing tasks if we cannot tell how their execution affects the other tasks already running. Previously, we proposed a resource-allocation middleware that manages the execution of tasks in a complex distributed system with real-time requirements. The middleware behavior can be modeled depending on the configuration of the tasks running, so that the performance of any given configuration can be calculated. This makes it possible to have admission control in such a system, but the model requires knowledge of run-time parameters. We propose the utilization of machine-learning algorithms to obtain the model parameters, and be able to predict the system performance under any configuration, so that we can provide a full admission control mechanism for complex software systems. In this paper, we present such an admission control mechanism, we measure its accuracy in estimating the parameters of the model, and we evaluate its performance to determine its suitability for a real-time or responsive system. 相似文献

2.

Inducing decision trees with an ant colony optimization algorithm

Fernando E.B. Otero Alex A. Freitas Colin G. Johnson 《Applied Soft Computing》2012,12(11):3615-3626

Decision trees have been widely used in data mining and machine learning as a comprehensible knowledge representation. While ant colony optimization (ACO) algorithms have been successfully applied to extract classification rules, decision tree induction with ACO algorithms remains an almost unexplored research area. In this paper we propose a novel ACO algorithm to induce decision trees, combining commonly used strategies from both traditional decision tree induction algorithms and ACO. The proposed algorithm is compared against three decision tree induction algorithms, namely C4.5, CART and cACDT, in 22 publicly available data sets. The results show that the predictive accuracy of the proposed algorithm is statistically significantly higher than the accuracy of both C4.5 and CART, which are well-known conventional algorithms for decision tree induction, and the accuracy of the ACO-based cACDT decision tree algorithm. 相似文献

3.

Ranking with decision tree

Fen Xia Wensheng Zhang Fuxin Li Yanwu Yang 《Knowledge and Information Systems》2008,17(3):381-395

Ranking problems have recently become an important research topic in the joint field of machine learning and information retrieval. This paper presented a new splitting rule that introduces a metric, i.e., an impurity measure, to construct decision trees for ranking tasks. We provided a theoretical basis and some intuitive explanations for the splitting rule. Our approach is also meaningful to collaborative filtering in the sense of dealing with categorical data and selecting relative features. Some experiments were made to illustrate our ranking approach, whose results showed that our algorithm outperforms both perceptron-based ranking and the classification tree algorithms in term of accuracy as well as speed.

Fen XiaEmail:

相似文献

4.

Optimizing airline passenger prescreening systems with Bayesian decision models

Karl D. Majeske 《Computers & Operations Research》2012,39(8):1827-1836

The Transportation Security Agency provides airline security in the United States using a variety of measures including a computer based passenger prescreening system. This paper develops Bayesian decision models of two prescreening systems: one that places ticketed passengers into two classifications (fly and no-fly), and a three classification system that includes potential flight. Using a parameterized cost structure, and the expected monetary value decision criteria, this paper develops optimal levels of undesirable personal characteristics that should place people into the various categories. The models are explored from both the government perspective and the passenger's perspective. 相似文献

5.

Building a cost-constrained decision tree with multiple condition attributes

Yen-Liang Chen Chia-Chi Wu 《Information Sciences》2009,179(7):967-5226

Costs are often an important part of the classification process. Cost factors have been taken into consideration in many previous studies regarding decision tree models. In this study, we also consider a cost-sensitive decision tree construction problem. We assume that there are test costs that must be paid to obtain the values of the decision attribute and that a record must be classified without exceeding the spending cost threshold. Unlike previous studies, however, in which records were classified with only a single condition attribute, in this study, we are able to simultaneously classify records with multiple condition attributes. An algorithm is developed to build a cost-constrained decision tree, which allows us to simultaneously classify multiple condition attributes. The experimental results show that our algorithm satisfactorily handles data with multiple condition attributes under different cost constraints. 相似文献

6.

Flexible decision tree for data stream classification in the presence of concept change, noise and missing values 总被引：1，自引：0，他引：1

Sattar Hashemi Ying Yang 《Data mining and knowledge discovery》2009,19(1):95-131

In recent years, classification learning for data streams has become an important and active research topic. A major challenge posed by data streams is that their underlying concepts can change over time, which requires current classifiers to be revised accordingly and timely. To detect concept change, a common methodology is to observe the online classification accuracy. If accuracy drops below some threshold value, a concept change is deemed to have taken place. An implicit assumption behind this methodology is that any drop in classification accuracy can be interpreted as a symptom of concept change. Unfortunately however, this assumption is often violated in the real world where data streams carry noise that can also introduce a significant reduction in classification accuracy. To compound this problem, traditional noise cleansing methods are incompetent for data streams. Those methods normally need to scan data multiple times whereas learning for data streams can only afford one-pass scan because of data’s high speed and huge volume. Another open problem in data stream classification is how to deal with missing values. When new instances containing missing values arrive, how a learning model classifies them and how the learning model updates itself according to them is an issue whose solution is far from being explored. To solve these problems, this paper proposes a novel classification algorithm, flexible decision tree (FlexDT), which extends fuzzy logic to data stream classification. The advantages are three-fold. First, FlexDT offers a flexible structure to effectively and efficiently handle concept change. Second, FlexDT is robust to noise. Hence it can prevent noise from interfering with classification accuracy, and accuracy drop can be safely attributed to concept change. Third, it deals with missing values in an elegant way. Extensive evaluations are conducted to compare FlexDT with representative existing data stream classification algorithms using a large suite of data streams and various statistical tests. Experimental results suggest that FlexDT offers a significant benefit to data stream classification in real-world scenarios where concept change, noise and missing values coexist. 相似文献

7.

Splitting methods for decision tree induction: An exploration of the relative performance of two entropy-based families

Kweku-Muata Osei-Bryson Kendall Giles 《Information Systems Frontiers》2006,8(3):195-209

Decision tree (DT) induction is among the more popular of the data mining techniques. An important component of DT induction algorithms is the splitting method, with the most commonly used method being based on the Conditional Entropy (CE) family. However, it is well known that there is no single splitting method that will give the best performance for all problem instances. In this paper we explore the relative performance of the Conditional Entropy family and another family that is based on the Class-Attribute Mutual Information (CAMI) measure. Our results suggest that while some datasets are insensitive to the choice of splitting methods, other datasets are very sensitive to the choice of splitting methods. For example, some of the CAMI family methods may be more appropriate than the popular Gain Ratio (GR) method for datasets which have nominal predictor attributes, and are competitive with the GR method for those datasets where all predictor attributes are numeric. Given that it is never known beforehand which splitting method will lead to the best DT for a given dataset, and given the relatively good performance of the CAMI methods, it seems appropriate to suggest that splitting methods from the CAMI family should be included in data mining toolsets. Kweku-Mauta Osei-Bryson is Professor of Information Systems at Virginia Commonwealth University, where he also served as the Coordinator of the Ph.D. program in Information Systems during 2001–2003. Previously he was Professor of Information Systems and Decision Analysis in the School of Business at Howard University, Washington, DC, U.S.A. He has also worked as an Information Systems practitioner in both industry and government. He holds a Ph.D. in Applied Mathematics (Management Science & Information Systems) from the University of Maryland at College Park, a M.S. in Systems Engineering from Howard University, and a B.Sc. in Natural Sciences from the University of the West Indies at Mona. He currently does research in various areas including: Data Mining, Expert Systems, Decision Support Systems, Group Support Systems, Information Systems Outsourcing, Multi-Criteria Decision Analysis. His papers have been published in various journals including: Information & Management, Information Systems Journal, Information Systems Frontiers, Business Process Management Journal, International Journal of Intelligent Systems, IEEE Transactions on Knowledge & Data Engineering, Data & Knowledge Engineering, Information & Software Technology, Decision Support Systems, Information Processing and Management, Computers & Operations Research, European Journal of Operational Research, Journal of the Operational Research Society, Journal of the Association for Information Systems, Journal of Multi-Criteria Decision Analysis, Applications of Management Science. Currently he serves an Associate Editor of the INFORMS Journal on Computing, and is a member of the Editorial Board of the Computers & Operations Research journal. Kendall E. Giles received the BS degree in Electrical Engineering from Virginia Tech in 1991, the MS degree in Electrical Engineering from Purdue University in 1993, the MS degree in Information Systems from Virginia Commonwealth University in 2002, and the MS degree in Computer Science from Johns Hopkins University in 2004. Currently he is a PhD student (ABD) in Computer Science at Johns Hopkins, and is a Research Assistant in the Applied Mathematics and Statistics department. He has over 15 years of work experience in industry, government, and academic institutions. His research interests can be partially summarized by the following keywords: network security, mathematical modeling, pattern classification, and high dimensional data analysis. 相似文献

8.

Discovering interobserver variability in the cytodiagnosis of breast cancer using decision trees and Bayesian networks

Nicandro Hctor-Gabriel Humberto Rocío-Erandi 《Applied Soft Computing》2009,9(4):1331-1342

We evaluate the performance of two decision tree procedures and four Bayesian network classifiers as potential decision support systems in the cytodiagnosis of breast cancer. In order to test their performance thoroughly, we use two real-world databases containing 692 cases and 322 cases collected by a single observer and 19 observers, respectively. The results show that, in general, there are considerable differences in all tests (accuracy, sensitivity, specificity, PV+, PV− and ROC) when a specific classifier uses the single-observer dataset compared to those when this same classifier uses the multiple-observer dataset. These results suggest that different observers see different things: a problem known as interobserver variability. We graphically unveil such a problem by presenting the structures of the decision trees and Bayesian networks resultant from running both databases. 相似文献

9.

Development of two-level decision tree-based real-time scheduling system under product mix variety environment 总被引：1，自引：0，他引：1

Yeou-Ren Shiue 《Robotics and Computer》2009,25(4-5):709-720

Most of the research on machine learning-based real-time scheduling (RTS) systems has been aimed toward product constant mix environments. However, in a product mix variety manufacturing environment, the scheduling knowledge base (KB) is dynamic; therefore, it would be interesting to develop a procedure that would automatically modify the scheduling knowledge when important changes occur in the manufacturing system. All of the machine learning-based RTS systems (including a KB refinement mechanism) proposed in earlier studies periodically require the addition of new training samples and regeneration of new KBs. Hence, previous approaches investigating machine learning-based RTS systems have been confronted with the training data overflow problem and an increase in the scheduling KB building time, which are unsuitable for RTS control. The objective of this paper is to develop a KB class selection mechanism that can be supported in various product mix ratio environments. Hence, the RTS KB is developed by a two-level decision tree (DT) learning approach. First, a suitable scheduling KB class is selected. Then, for each KB class, the best (proper) dispatching rule is selected for the next scheduling period. Here, the proposed two-level DT RTS system comprises five key components: (1) training samples generation mechanism, (2) GA/DT-based feature selection mechanism, (3) building a KB class label by a two-level self-organizing map, (4) DT-based KB class selection module, and (5) DT-based dynamic dispatching rule selection module. The proposed two-level DT-based KB RTS system yields better system performance than that by a one-level DT-based RTS system and heuristic individual dispatching rules in a flexible manufacturing system under various performance criteria over a long period. 相似文献

10.

Web网页识别中的特征选择问题研究 总被引：26，自引：0，他引：26

朱明王军王俊普《计算机工程》2000,26(8):35-37

对Ｗｅｂ网页识别中有关特征选择的两个重要问题进行了深入的探讨,提出了一种新的描述特征选择方法,并将其与３种已有的描述特征方法进行实验比较,证实其有效性,此外还对５种在文本归类中,具有代表性的识别特征选择方法在Ｗｅｂ网页识别中的实际应用效果进行了评估比较,并发现信息增益和统计方法,选择识别特征效果最佳。相似文献

11.

Decision trees using model ensemble-based nodes

Hakan 《Pattern recognition》2007,40(12):3540-3551

Decision trees recursively partition the instance space by generating nodes that implement a decision function belonging to an a priori specified model class. Each decision may be univariate, linear or nonlinear. Alternatively, in omnivariate decision trees, one of the model types is dynamically selected by taking into account the complexity of the problem defined by the samples reaching that node. The selection is based on statistical tests where the most appropriate model type is selected as the one providing significantly better accuracy than others. In this study, we propose the use of model ensemble-based nodes where a multitude of models are considered for making decisions at each node. The ensemble members are generated by perturbing the model parameters and input attributes. Experiments conducted on several datasets and three model types indicate that the proposed approach achieves better classification accuracies compared to individual nodes, even in cases when only one model class is used in generating ensemble members. 相似文献

12.

GA-based learning bias selection mechanism for real-time scheduling systems 总被引：1，自引：0，他引：1

Yeou-Ren Shiue Ruey-Shiang Guh Tsung-Yuan Tseng 《Expert systems with applications》2009,36(9):11451-11460

The use of machine learning technologies in order to develop knowledge bases (KBs) for real-time scheduling (RTS) problems has produced encouraging results in recent researches. However, few researches focus on the manner of selecting proper learning biases in the early developing stage of the RTS system to enhance the generalization ability of the resulting KBs. The selected learning bias usually assumes a set of proper system features that are known in advance. Moreover, the machine learning algorithm for developing scheduling KBs is predetermined. The purpose of this study is to develop a genetic algorithm (GA)-based learning bias selection mechanism to determine an appropriate learning bias that includes the machine learning algorithm, feature subset, and learning parameters. Three machine learning algorithms are considered: the back propagation neural network (BPNN), C4.5 decision tree (DT) learning, and support vector machines (SVMs). The proposed GA-based learning bias selection mechanism can search the best machine learning algorithm and simultaneously determine the optimal subset of features and the learning parameters used to build the RTS system KBs. In terms of the accuracy of prediction of unseen data under various performance criteria, it also offers better generalization ability as compared to the case where the learning bias selection mechanism is not used. Furthermore, the proposed approach to build RTS system KBs can improve the system performance as compared to other classifier KBs under various performance criteria over a long period. 相似文献

13.

Cost-sensitive decision tree ensembles for effective imbalanced classification

《Applied Soft Computing》2014

Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on oversampling, undersampling or cost-sensitive classification. In this paper, we introduce an effective ensemble of cost-sensitive decision trees for imbalanced classification. Base classifiers are constructed according to a given cost matrix, but are trained on random feature subspaces to ensure sufficient diversity of the ensemble members. We employ an evolutionary algorithm for simultaneous classifier selection and assignment of committee member weights for the fusion process. Our proposed algorithm is evaluated on a variety of benchmark datasets, and is confirmed to lead to improved recognition of the minority class, to be capable of outperforming other state-of-the-art algorithms, and hence to represent a useful and effective approach for dealing with imbalanced datasets. 相似文献

14.

Region-based image retrieval with high-level semantics using decision tree learning

Ying Dengsheng Guojun 《Pattern recognition》2008,41(8):2554-2570

Semantic-based image retrieval has attracted great interest in recent years. This paper proposes a region-based image retrieval system with high-level semantic learning. The key features of the system are: (1) it supports both query by keyword and query by region of interest. The system segments an image into different regions and extracts low-level features of each region. From these features, high-level concepts are obtained using a proposed decision tree-based learning algorithm named DT-ST. During retrieval, a set of images whose semantic concept matches the query is returned. Experiments on a standard real-world image database confirm that the proposed system significantly improves the retrieval performance, compared with a conventional content-based image retrieval system. (2) The proposed decision tree induction method DT-ST for image semantic learning is different from other decision tree induction algorithms in that it makes use of the semantic templates to discretize continuous-valued region features and avoids the difficult image feature discretization problem. Furthermore, it introduces a hybrid tree simplification method to handle the noise and tree fragmentation problems, thereby improving the classification performance of the tree. Experimental results indicate that DT-ST outperforms two well-established decision tree induction algorithms ID3 and C4.5 in image semantic learning. 相似文献

15.

基于SVM的软件需求分析风险评估模型 总被引：1，自引：0，他引：1

下载免费PDF全文

潘梅森熊齐《计算机工程》2007,33(12):78-81

需求分析风险是软件项目风险管理的重要内容。该文以13种风险为基础,建立了一个新的软件项目需求分析风险评估模型,把以往每个软件项目的13种需求分析风险看作一个1×13维行向量,作为SVM的训练向量,把其分成风险低、风险中等、风险高3个类别,并对项目需求分析风险水平进行了预测。相似文献

16.

Decision trees and genetic algorithms for condition monitoring forecasting of aircraft air conditioning

M. Gerdes 《Expert systems with applications》2013,40(12):5021-5026

Unscheduled maintenance of aircraft can cause significant costs. The machine needs to be repaired before it can operate again. Thus it is desirable to have concepts and methods to prevent unscheduled maintenance. This paper proposes a method for forecasting the condition of aircraft air conditioning system based on observed past data. Forecasting is done in a point by point way, by iterating the algorithm. The proposed method uses decision trees to find and learn patterns in past data and use these patterns to select the best forecasting method to forecast future data points. Forecasting a data point is based on selecting the best applicable approximation method. The selection is done by calculating different features/attributes of the time series and then evaluating the decision tree. A genetic algorithm is used to find the best feature set for the given problem to increase the forecasting performance. The experiments show a good forecasting ability even when the function is disturbed by noise. 相似文献

17.

Recognition of hand-printed Chinese characters using decision trees/machine learning C4.5 system

A. Amin S. Singh 《Pattern Analysis & Applications》1998,1(2):130-141

Recognition of Chinese characters has been an area of major interest for many years, and a large number of research papers and reports have already been published in this area. There are several major problems with Chinese character recognition: Chinese characters are distinct and ideographic, the character size is very large and a lot of structurally similar characters exist in the character set. Thus, classification criteria are difficult to generate. This paper presents a new technique for the recognition of hand-printed Chinese characters using the C4.5 machine learning system. Conventional methods have relied on hand-constructed dictionaries which are tedious to construct and difficult to make tolerant to variation in writing styles. The paper discusses Chinese character recognition using theHough transform for feature extraction and C4.5 system. The system was tested with 900 characters written by different writers from poor to acceptable quality (each character has 40 samples) and the rate of recognition obtained was 84%. 相似文献

18.

Multispectral and LiDAR data fusion for fuel type mapping using Support Vector Machine and decision rules 总被引：1，自引：0，他引：1

Mariano García David Riaño Emilio Chuvieco F. Mark Danson 《Remote sensing of environment》2011,115(6):1369-1379

This paper presents a method for mapping fuel types using LiDAR and multispectral data. A two-phase classification method is proposed to discriminate the fuel classes of the Prometheus classification system, which is adapted to the ecological characteristics of the European Mediterranean basin. The first step mapped the main fuel groups, namely grass, shrub and tree, as well as non-fuel classes. This phase was carried out using a Support Vector Machine (SVM) classification combining LiDAR and multispectral data. The overall accuracy of this classification was 92.8% with a kappa coefficient of 0.9. The second phase of the proposed method focused on discriminating additional fuel categories based on vertical information provided by the LiDAR measurements. Decision rules were applied to the output of the SVM classification based on the mean height of LiDAR returns and the vertical distribution of fuels, described by the relative LiDAR point density in different height intervals. The final fuel type classification yielded an overall accuracy of 88.24% with a kappa coefficient of 0.86. Some confusion was observed between fuel types 7 (dense tree cover presenting vertical continuity with understory vegetation) and 5 (trees with less than 30% of shrub cover) in some areas covered by Holm oak, which showed low LiDAR pulses penetration so that the understory vegetation was not correctly sampled. 相似文献

19.

模型自动选择方法研究的进展 总被引：2，自引：0，他引：2

黄梯云吴菲卢涛《计算机应用研究》2001,18(4):6-8

在系统分析现有模型自动选择方法的基础上,提出一种新的基于自然语言理解和遗传算法的模型自动选择方法。相似文献

20.

Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning 总被引：1，自引：0，他引：1

Tao Wang Author Vitae 《Journal of Systems and Software》2010,83(7):1137-1147

Cost-sensitive learning algorithms are typically designed for minimizing the total cost when multiple costs are taken into account. Like other learning algorithms, cost-sensitive learning algorithms must face a significant challenge, over-fitting, in an applied context of cost-sensitive learning. Specifically speaking, they can generate good results on training data but normally do not produce an optimal model when applied to unseen data in real world applications. It is called data over-fitting. This paper deals with the issue of data over-fitting by designing three simple and efficient strategies, feature selection, smoothing and threshold pruning, against the TCSDT (test cost-sensitive decision tree) method. The feature selection approach is used to pre-process the data set before applying the TCSDT algorithm. The smoothing and threshold pruning are used in a TCSDT algorithm before calculating the class probability estimate for each decision tree leaf. To evaluate our approaches, we conduct extensive experiments on the selected UCI data sets across different cost ratios, and on a real world data set, KDD-98 with real misclassification cost. The experimental results show that our algorithms outperform both the original TCSDT and other competing algorithms on reducing data over-fitting. 相似文献