共查询到20条相似文献,搜索用时 15 毫秒
1.
Luis Garcés-Erice 《Parallel Computing》2011,37(8):379-391
The software in modern systems has become too complex to make accurate predictions about their performance under different configurations. Real-time or even responsiveness requirements cannot be met because it is not possible to perform admission control for new or changing tasks if we cannot tell how their execution affects the other tasks already running. Previously, we proposed a resource-allocation middleware that manages the execution of tasks in a complex distributed system with real-time requirements. The middleware behavior can be modeled depending on the configuration of the tasks running, so that the performance of any given configuration can be calculated. This makes it possible to have admission control in such a system, but the model requires knowledge of run-time parameters. We propose the utilization of machine-learning algorithms to obtain the model parameters, and be able to predict the system performance under any configuration, so that we can provide a full admission control mechanism for complex software systems. In this paper, we present such an admission control mechanism, we measure its accuracy in estimating the parameters of the model, and we evaluate its performance to determine its suitability for a real-time or responsive system. 相似文献
2.
3.
Decision trees have been widely used in data mining and machine learning as a comprehensible knowledge representation. While ant colony optimization (ACO) algorithms have been successfully applied to extract classification rules, decision tree induction with ACO algorithms remains an almost unexplored research area. In this paper we propose a novel ACO algorithm to induce decision trees, combining commonly used strategies from both traditional decision tree induction algorithms and ACO. The proposed algorithm is compared against three decision tree induction algorithms, namely C4.5, CART and cACDT, in 22 publicly available data sets. The results show that the predictive accuracy of the proposed algorithm is statistically significantly higher than the accuracy of both C4.5 and CART, which are well-known conventional algorithms for decision tree induction, and the accuracy of the ACO-based cACDT decision tree algorithm. 相似文献
4.
Ranking problems have recently become an important research topic in the joint field of machine learning and information retrieval.
This paper presented a new splitting rule that introduces a metric, i.e., an impurity measure, to construct decision trees
for ranking tasks. We provided a theoretical basis and some intuitive explanations for the splitting rule. Our approach is
also meaningful to collaborative filtering in the sense of dealing with categorical data and selecting relative features.
Some experiments were made to illustrate our ranking approach, whose results showed that our algorithm outperforms both perceptron-based
ranking and the classification tree algorithms in term of accuracy as well as speed.
相似文献
Fen XiaEmail: |
5.
6.
Karl D. Majeske 《Computers & Operations Research》2012,39(8):1827-1836
The Transportation Security Agency provides airline security in the United States using a variety of measures including a computer based passenger prescreening system. This paper develops Bayesian decision models of two prescreening systems: one that places ticketed passengers into two classifications (fly and no-fly), and a three classification system that includes potential flight. Using a parameterized cost structure, and the expected monetary value decision criteria, this paper develops optimal levels of undesirable personal characteristics that should place people into the various categories. The models are explored from both the government perspective and the passenger's perspective. 相似文献
7.
Costs are often an important part of the classification process. Cost factors have been taken into consideration in many previous studies regarding decision tree models. In this study, we also consider a cost-sensitive decision tree construction problem. We assume that there are test costs that must be paid to obtain the values of the decision attribute and that a record must be classified without exceeding the spending cost threshold. Unlike previous studies, however, in which records were classified with only a single condition attribute, in this study, we are able to simultaneously classify records with multiple condition attributes. An algorithm is developed to build a cost-constrained decision tree, which allows us to simultaneously classify multiple condition attributes. The experimental results show that our algorithm satisfactorily handles data with multiple condition attributes under different cost constraints. 相似文献
8.
Flexible decision tree for data stream classification in the presence of concept change, noise and missing values 总被引:1,自引:0,他引:1
In recent years, classification learning for data streams has become an important and active research topic. A major challenge
posed by data streams is that their underlying concepts can change over time, which requires current classifiers to be revised
accordingly and timely. To detect concept change, a common methodology is to observe the online classification accuracy. If
accuracy drops below some threshold value, a concept change is deemed to have taken place. An implicit assumption behind this
methodology is that any drop in classification accuracy can be interpreted as a symptom of concept change. Unfortunately however,
this assumption is often violated in the real world where data streams carry noise that can also introduce a significant reduction
in classification accuracy. To compound this problem, traditional noise cleansing methods are incompetent for data streams.
Those methods normally need to scan data multiple times whereas learning for data streams can only afford one-pass scan because
of data’s high speed and huge volume. Another open problem in data stream classification is how to deal with missing values.
When new instances containing missing values arrive, how a learning model classifies them and how the learning model updates
itself according to them is an issue whose solution is far from being explored. To solve these problems, this paper proposes
a novel classification algorithm, flexible decision tree (FlexDT), which extends fuzzy logic to data stream classification.
The advantages are three-fold. First, FlexDT offers a flexible structure to effectively and efficiently handle concept change. Second, FlexDT is robust to noise. Hence it can prevent noise
from interfering with classification accuracy, and accuracy drop can be safely attributed to concept change. Third, it deals
with missing values in an elegant way. Extensive evaluations are conducted to compare FlexDT with representative existing
data stream classification algorithms using a large suite of data streams and various statistical tests. Experimental results
suggest that FlexDT offers a significant benefit to data stream classification in real-world scenarios where concept change,
noise and missing values coexist. 相似文献
9.
10.
Decision tree (DT) induction is among the more popular of the data mining techniques. An important component of DT induction
algorithms is the splitting method, with the most commonly used method being based on the Conditional Entropy (CE) family.
However, it is well known that there is no single splitting method that will give the best performance for all problem instances.
In this paper we explore the relative performance of the Conditional Entropy family and another family that is based on the
Class-Attribute Mutual Information (CAMI) measure. Our results suggest that while some datasets are insensitive to the choice
of splitting methods, other datasets are very sensitive to the choice of splitting methods. For example, some of the CAMI
family methods may be more appropriate than the popular Gain Ratio (GR) method for datasets which have nominal predictor attributes,
and are competitive with the GR method for those datasets where all predictor attributes are numeric. Given that it is never
known beforehand which splitting method will lead to the best DT for a given dataset, and given the relatively good performance
of the CAMI methods, it seems appropriate to suggest that splitting methods from the CAMI family should be included in data
mining toolsets.
Kweku-Mauta Osei-Bryson is Professor of Information Systems at Virginia Commonwealth University, where he also served as the Coordinator of the Ph.D.
program in Information Systems during 2001–2003. Previously he was Professor of Information Systems and Decision Analysis
in the School of Business at Howard University, Washington, DC, U.S.A. He has also worked as an Information Systems practitioner
in both industry and government. He holds a Ph.D. in Applied Mathematics (Management Science & Information Systems) from the
University of Maryland at College Park, a M.S. in Systems Engineering from Howard University, and a B.Sc. in Natural Sciences
from the University of the West Indies at Mona. He currently does research in various areas including: Data Mining, Expert
Systems, Decision Support Systems, Group Support Systems, Information Systems Outsourcing, Multi-Criteria Decision Analysis.
His papers have been published in various journals including: Information & Management, Information Systems Journal, Information
Systems Frontiers, Business Process Management Journal, International Journal of Intelligent Systems, IEEE Transactions on
Knowledge & Data Engineering, Data & Knowledge Engineering, Information & Software Technology, Decision Support Systems, Information
Processing and Management, Computers & Operations Research, European Journal of Operational Research, Journal of the Operational
Research Society, Journal of the Association for Information Systems, Journal of Multi-Criteria Decision Analysis, Applications
of Management Science. Currently he serves an Associate Editor of the INFORMS Journal on Computing, and is a member of the
Editorial Board of the Computers & Operations Research journal.
Kendall E. Giles received the BS degree in Electrical Engineering from Virginia Tech in 1991, the MS degree in Electrical Engineering from
Purdue University in 1993, the MS degree in Information Systems from Virginia Commonwealth University in 2002, and the MS
degree in Computer Science from Johns Hopkins University in 2004. Currently he is a PhD student (ABD) in Computer Science
at Johns Hopkins, and is a Research Assistant in the Applied Mathematics and Statistics department. He has over 15 years of
work experience in industry, government, and academic institutions. His research interests can be partially summarized by
the following keywords: network security, mathematical modeling, pattern classification, and high dimensional data analysis. 相似文献
11.
12.
Data from many real-world applications can be high dimensional and features of such data are usually highly redundant. Identifying informative features has become an important step for data mining to not only circumvent the curse of dimensionality but to reduce the amount of data for processing. In this paper, we propose a novel feature selection method based on bee colony and gradient boosting decision tree aiming at addressing problems such as efficiency and informative quality of the selected features. Our method achieves global optimization of the inputs of the decision tree using the bee colony algorithm to identify the informative features. The method initializes the feature space spanned by the dataset. Less relevant features are suppressed according to the information they contribute to the decision making using an artificial bee colony algorithm. Experiments are conducted with two breast cancer datasets and six datasets from the public data repository. Experimental results demonstrate that the proposed method effectively reduces the dimensions of the dataset and achieves superior classification accuracy using the selected features. 相似文献
13.
We evaluate the performance of two decision tree procedures and four Bayesian network classifiers as potential decision support systems in the cytodiagnosis of breast cancer. In order to test their performance thoroughly, we use two real-world databases containing 692 cases and 322 cases collected by a single observer and 19 observers, respectively. The results show that, in general, there are considerable differences in all tests (accuracy, sensitivity, specificity, PV+, PV− and ROC) when a specific classifier uses the single-observer dataset compared to those when this same classifier uses the multiple-observer dataset. These results suggest that different observers see different things: a problem known as interobserver variability. We graphically unveil such a problem by presenting the structures of the decision trees and Bayesian networks resultant from running both databases. 相似文献
14.
Development of two-level decision tree-based real-time scheduling system under product mix variety environment 总被引:1,自引:0,他引:1
Most of the research on machine learning-based real-time scheduling (RTS) systems has been aimed toward product constant mix environments. However, in a product mix variety manufacturing environment, the scheduling knowledge base (KB) is dynamic; therefore, it would be interesting to develop a procedure that would automatically modify the scheduling knowledge when important changes occur in the manufacturing system. All of the machine learning-based RTS systems (including a KB refinement mechanism) proposed in earlier studies periodically require the addition of new training samples and regeneration of new KBs. Hence, previous approaches investigating machine learning-based RTS systems have been confronted with the training data overflow problem and an increase in the scheduling KB building time, which are unsuitable for RTS control. The objective of this paper is to develop a KB class selection mechanism that can be supported in various product mix ratio environments. Hence, the RTS KB is developed by a two-level decision tree (DT) learning approach. First, a suitable scheduling KB class is selected. Then, for each KB class, the best (proper) dispatching rule is selected for the next scheduling period. Here, the proposed two-level DT RTS system comprises five key components: (1) training samples generation mechanism, (2) GA/DT-based feature selection mechanism, (3) building a KB class label by a two-level self-organizing map, (4) DT-based KB class selection module, and (5) DT-based dynamic dispatching rule selection module. The proposed two-level DT-based KB RTS system yields better system performance than that by a one-level DT-based RTS system and heuristic individual dispatching rules in a flexible manufacturing system under various performance criteria over a long period. 相似文献
15.
Decision trees recursively partition the instance space by generating nodes that implement a decision function belonging to an a priori specified model class. Each decision may be univariate, linear or nonlinear. Alternatively, in omnivariate decision trees, one of the model types is dynamically selected by taking into account the complexity of the problem defined by the samples reaching that node. The selection is based on statistical tests where the most appropriate model type is selected as the one providing significantly better accuracy than others. In this study, we propose the use of model ensemble-based nodes where a multitude of models are considered for making decisions at each node. The ensemble members are generated by perturbing the model parameters and input attributes. Experiments conducted on several datasets and three model types indicate that the proposed approach achieves better classification accuracies compared to individual nodes, even in cases when only one model class is used in generating ensemble members. 相似文献
16.
Yeou-Ren Shiue Ruey-Shiang Guh Tsung-Yuan Tseng 《Expert systems with applications》2009,36(9):11451-11460
The use of machine learning technologies in order to develop knowledge bases (KBs) for real-time scheduling (RTS) problems has produced encouraging results in recent researches. However, few researches focus on the manner of selecting proper learning biases in the early developing stage of the RTS system to enhance the generalization ability of the resulting KBs. The selected learning bias usually assumes a set of proper system features that are known in advance. Moreover, the machine learning algorithm for developing scheduling KBs is predetermined. The purpose of this study is to develop a genetic algorithm (GA)-based learning bias selection mechanism to determine an appropriate learning bias that includes the machine learning algorithm, feature subset, and learning parameters. Three machine learning algorithms are considered: the back propagation neural network (BPNN), C4.5 decision tree (DT) learning, and support vector machines (SVMs). The proposed GA-based learning bias selection mechanism can search the best machine learning algorithm and simultaneously determine the optimal subset of features and the learning parameters used to build the RTS system KBs. In terms of the accuracy of prediction of unseen data under various performance criteria, it also offers better generalization ability as compared to the case where the learning bias selection mechanism is not used. Furthermore, the proposed approach to build RTS system KBs can improve the system performance as compared to other classifier KBs under various performance criteria over a long period. 相似文献
17.
Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on oversampling, undersampling or cost-sensitive classification. In this paper, we introduce an effective ensemble of cost-sensitive decision trees for imbalanced classification. Base classifiers are constructed according to a given cost matrix, but are trained on random feature subspaces to ensure sufficient diversity of the ensemble members. We employ an evolutionary algorithm for simultaneous classifier selection and assignment of committee member weights for the fusion process. Our proposed algorithm is evaluated on a variety of benchmark datasets, and is confirmed to lead to improved recognition of the minority class, to be capable of outperforming other state-of-the-art algorithms, and hence to represent a useful and effective approach for dealing with imbalanced datasets. 相似文献
18.
Semantic-based image retrieval has attracted great interest in recent years. This paper proposes a region-based image retrieval system with high-level semantic learning. The key features of the system are: (1) it supports both query by keyword and query by region of interest. The system segments an image into different regions and extracts low-level features of each region. From these features, high-level concepts are obtained using a proposed decision tree-based learning algorithm named DT-ST. During retrieval, a set of images whose semantic concept matches the query is returned. Experiments on a standard real-world image database confirm that the proposed system significantly improves the retrieval performance, compared with a conventional content-based image retrieval system. (2) The proposed decision tree induction method DT-ST for image semantic learning is different from other decision tree induction algorithms in that it makes use of the semantic templates to discretize continuous-valued region features and avoids the difficult image feature discretization problem. Furthermore, it introduces a hybrid tree simplification method to handle the noise and tree fragmentation problems, thereby improving the classification performance of the tree. Experimental results indicate that DT-ST outperforms two well-established decision tree induction algorithms ID3 and C4.5 in image semantic learning. 相似文献
19.
20.
M. Gerdes 《Expert systems with applications》2013,40(12):5021-5026
Unscheduled maintenance of aircraft can cause significant costs. The machine needs to be repaired before it can operate again. Thus it is desirable to have concepts and methods to prevent unscheduled maintenance. This paper proposes a method for forecasting the condition of aircraft air conditioning system based on observed past data. Forecasting is done in a point by point way, by iterating the algorithm. The proposed method uses decision trees to find and learn patterns in past data and use these patterns to select the best forecasting method to forecast future data points. Forecasting a data point is based on selecting the best applicable approximation method. The selection is done by calculating different features/attributes of the time series and then evaluating the decision tree. A genetic algorithm is used to find the best feature set for the given problem to increase the forecasting performance. The experiments show a good forecasting ability even when the function is disturbed by noise. 相似文献