首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Numerous interestingness measures have been proposed in statistics and data mining to assess object relationships. This is especially important in recent studies of association or correlation pattern mining. However, it is still not clear whether there is any intrinsic relationship among many proposed measures, and which one is truly effective at gauging object relationships in large data sets. Recent studies have identified a critical property, null-(transaction) invariance, for measuring associations among events in large data sets, but many measures do not have this property. In this study, we re-examine a set of null-invariant interestingness measures and find that they can be expressed as the generalized mathematical mean, leading to a total ordering of them. Such a unified framework provides insights into the underlying philosophy of the measures and helps us understand and select the proper measure for different applications. Moreover, we propose a new measure called Imbalance Ratio to gauge the degree of skewness of a data set. We also discuss the efficient computation of interesting patterns of different null-invariant interestingness measures by proposing an algorithm, GAMiner, which complements previous studies. Experimental evaluation verifies the effectiveness of the unified framework and shows that GAMiner speeds up the state-of-the-art algorithm by an order of magnitude.  相似文献   

2.
Academic research has produced many model-based specification and analysis techniques, however, most organisations continue to document requirements as textual statements. To help bridge this gap between academic research and requirements practice, this paper reports an extension to the RESCUE process in which patterns for generating requirements statements from i* system models were manually applied to i* models developed for a complex air traffic control system. The paper reports the results of this application and describes them with examples, the benefits of the approach to the project, and ongoing research to implement these patterns in the REDEPEND modelling tool to make requirements engineers more productive. We review similar work on requirements modelling and expression, and compare our work to it to demonstrate the proposed advance in the state of the art. Finally the paper discusses future uses of requirements generation from model patterns in RESCUE.  相似文献   

3.
A pattern is a model or a template used to summarize and describe the behavior (or the trend) of data having generally some recurrent events. Patterns have received a considerable attention in recent years and were widely studied in the data mining field. Various pattern mining approaches have been proposed and used for different applications such as network monitoring, moving object tracking, financial or medical data analysis, scientific data processing, etc. In these different contexts, discovered patterns were useful to detect anomalies, to predict data behavior (or trend) or, more generally, to simplify data processing or to improve system performance. However, to the best of our knowledge, patterns have never been used in the context of Web archiving. Web archiving is the process of continuously collecting and preserving portions of the World Wide Web for future generations. In this paper, we show how patterns of page changes can be useful tools to efficiently archive Websites. We first define our pattern model that describes the importance of page changes. Then, we present the strategy used to (i) extract the temporal evolution of page changes, (ii) discover patterns, to (iii) exploit them to improve Web archives. The archive of French public TV channels France Télévisions is chosen as a case study to validate our approach. Our experimental evaluation based on real Web pages shows the utility of patterns to improve archive quality and to optimize indexing or storing.  相似文献   

4.
A standard approach to determining decision trees is to learn them from examples. A disadvantage of this approach is that once a decision tree is learned, it is difficult to modify it to suit different decision making situations. Such problems arise, for example, when an attribute assigned to some node cannot be measured, or there is a significant change in the costs of measuring attributes or in the frequency distribution of events from different decision classes. An attractive approach to resolving this problem is to learn and store knowledge in the form of decision rules, and to generate from them, whenever needed, a decision tree that is most suitable in a given situation. An additional advantage of such an approach is that it facilitates buildingcompact decision trees, which can be much simpler than the logically equivalent conventional decision trees (by compact trees are meant decision trees that may contain branches assigned aset of values, and nodes assignedderived attributes, i.e., attributes that are logical or mathematical functions of the original ones). The paper describes an efficient method, AQDT-1, that takes decision rules generated by an AQ-type learning system (AQ15 or AQ17), and builds from them a decision tree optimizing a given optimality criterion. The method can work in two modes: thestandard mode, which produces conventional decision trees, andcompact mode, which produces compact decision trees. The preliminary experiments with AQDT-1 have shown that the decision trees generated by it from decision rules (conventional and compact) have outperformed those generated from examples by the well-known C4.5 program both in terms of their simplicity and their predictive accuracy.  相似文献   

5.
In this paper, we present an experimental comparison among different strategies for combining decision trees built by means of imprecise probabilities and uncertainty measures. It has been proven that the combination or fusion of the information obtained from several classifiers can improve the final process of the classification. We use previously developed schemes, known as Bagging and Boosting, along with a new one based on the variation of the root node via the information rank of each feature of the class variable. To this end, we applied two different approaches to deal with missing data and continuous variables. We use a set of tests on the performance of the methods analyzed here, to show that, with the appropriate approach, the Boosting scheme constitutes an excellent way to combine this type of decision tree. It should be noted that it provides good results, even compared with a standard Random Forest classifier, a successful procedure very commonly used in the literature.  相似文献   

6.
If software for embedded processors is based on a time-triggered architecture, using co-operative task scheduling, the resulting system can have very predictable behaviour. Such a system characteristic is highly desirable in many applications, including (but not restricted to) those with safety-related or safety-critical functions. In practice, a time-triggered, co-operatively scheduled (TTCS) architecture is less widely employed than might be expected, not least because care must be taken during the design and implementation of such systems if the theoretically predicted behaviour is to be obtained. In this paper, we argue that the use of appropriate ‘design patterns’ can greatly simplify the process of creating TTCS systems. We briefly explain the origins of design patterns. We then illustrate how an appropriate set of patterns can be used to facilitate the development of a non-trivial embedded system.  相似文献   

7.
Data mining (DM) techniques are being increasingly used in many modern organizations to retrieve valuable knowledge structures from organizational databases, including data warehouses. An important knowledge structure that can result from data mining activities is the decision tree (DT) that is used for the classification of future events. The induction of the decision tree is done using a supervised knowledge discovery process in which prior knowledge regarding classes in the database is used to guide the discovery. The generation of a DT is a relatively easy task but in order to select the most appropriate DT it is necessary for the DM project team to generate and analyze a significant number of DTs based on multiple performance measures. We propose a multi-criteria decision analysis based process that would empower DM project teams to do thorough experimentation and analysis without being overwhelmed by the task of analyzing a significant number of DTs would offer a positive contribution to the DM process. We also offer some new approaches for measuring some of the performance criteria.  相似文献   

8.
Measures of interestingness play a crucial role in association rule mining. An important methodological problem, on which several papers appeared in the literature, is to provide a reasonable classification of the measures. In this paper, we explore Boolean factor analysis, which uses formal concepts corresponding to classes of measures as factors, for the purpose of clustering of the measures. Unlike the existing studies, our method reveals overlapping clusters of interestingness measures. We argue that the overlap between clusters is a desired feature of natural groupings of measures and that because formal concepts are used as factors in Boolean factor analysis, the resulting clusters have a clear meaning and are easy to interpret. We conduct three case studies on clustering of measures, provide interpretations of the resulting clusters and compare the results to those of the previous approaches reported in the literature.  相似文献   

9.
10.
A number of studies, theoretical, empirical, or both, have been conducted to provide insight into the properties and behavior of interestingness measures for association rule mining. While each has value in its own right, most are either limited in scope or, more importantly, ignore the purpose for which interestingness measures are intended, namely the ultimate ranking of discovered association rules. This paper, therefore, focuses on an analysis of the rule-ranking behavior of 61 well-known interestingness measures tested on the rules generated from 110 different datasets. By clustering based on ranking behavior, we highlight, and formally prove, previously unreported equivalences among interestingness measures. We also show that there appear to be distinct clusters of interestingness measures, but that there remain differences among clusters, confirming that domain knowledge is essential to the selection of an appropriate interestingness measure for a particular task and business objective.  相似文献   

11.
A new decision tree method for application in data mining, machine learning, pattern recognition, and other areas is proposed in this paper. The new method incorporates a classical multivariate statistical method, linear discriminant function, into decision trees' recursive partitioning process. The proposed method considers not only the linear combination with all variables, but also combinations with fewer variables. It uses a tabu search technique to find appropriate variable combinations within a reasonable length of time. For problems with more than two classes, the tabu search technique is also used to group the data into two superclasses before each split. The results of our experimental study indicate that the proposed algorithm appears to outperform some of the major classification algorithms in terms of classification accuracy, the proposed algorithm generates decision trees with relatively small sizes, and the proposed algorithm runs faster than most multivariate decision trees and its computing time increases linearly with data size, indicating that the algorithm is scalable to large datasets.  相似文献   

12.
Air traffic controllers are responsible for the safe, expeditious and orderly flow of the air traffic. Their training relies heavily on the use of simulators that can represent various normal and emergency situations. Accurate classification of air traffic scenarios can provide assistance towards a better understanding of how controllers respond to the complexity of a traffic scenario. To this end, we conducted a field study using qualified air traffic controllers, who participated in simulator sessions of terminal radar approach control in a variety of scenarios. The aim of the study was twofold, firstly to explore how decision trees and classification rules can be used for realistic classification of air traffic scenarios and secondly to explore which factors reflect better operational complexity. We applied machine learning methods to the data and developed decision trees and classification rules for these scenarios. Results indicated that decision trees and classification rules are useful tools in accurately categorizing scenarios and that complexity requires a larger set of predictors beyond simple aircraft counts. The derived decision trees and classification rules performed well in prediction, stability and interpretability. Practical benefits can be derived in the areas of operations and system design in the context of air traffic flow and capacity management systems.  相似文献   

13.
Speech emotion recognition has been one of the interesting issues in speech processing over the last few decades. Modelling of the emotion recognition process serves to understand as well as assess the performance of the system. This paper compares two different models for speech emotion recognition using vocal tract features namely, the first four formants and their respective bandwidths. The first model is based on a decision tree and the second one employs logistic regression. Whereas the decision tree models are based on machine learning, regression models have a strong statistical basis. The logistic regression models and the decision tree models developed in this work for several cases of binary classifications were validated by speech emotion recognition experiments conducted on a Malayalam emotional speech database of 2800 speech files, collected from ten speakers. The models are not only simple, but also meaningful since they indicate the contribution of each predictor. The experimental results indicate that speech emotion recognition using formants and bandwidths was better modelled using decision trees, which gave higher emotion recognition accuracies compared to logistic regression. The highest accuracy obtained using decision tree was 93.63%, for the classification of positive valence emotional speech as surprised or happy, using seven features. When using logistic regression for the same binary classification, the highest accuracy obtained was 73%, with eight features.  相似文献   

14.
Fuzzy decision trees: issues and methods   总被引:15,自引:0,他引:15  
Decision trees are one of the most popular choices for learning and reasoning from feature-based examples. They have undergone a number of alterations to deal with language and measurement uncertainties. We present another modification, aimed at combining symbolic decision trees with approximate reasoning offered by fuzzy representation. The intent is to exploit complementary advantages of both: popularity in applications to learning from examples, high knowledge comprehensibility of decision trees, and the ability to deal with inexact and uncertain information of fuzzy representation. The merger utilizes existing methodologies in both areas to full advantage, but is by no means trivial. In particular, knowledge inferences must be newly defined for the fuzzy tree. We propose a number of alternatives, based on rule-based systems and fuzzy control. We also explore capabilities that the new framework provides. The resulting learning method is most suitable for stationary problems, with both numerical and symbolic features, when the goal is both high knowledge comprehensibility and gradually changing output. We describe the methodology and provide simple illustrations.  相似文献   

15.
Not‐for‐profit private organisations that provide social services to children, the elderly and the disabled apply for financial support to develop or to renew their social infrastructures, through the Portuguese Institute for Social Welfare. In the context of scarce financial resources, the Institute decision‐makers felt the need to adopt an improved “rationality” in resource allocation, in order to increase transparency and to ensure that the collective best use is made of a limited budget. This paper describes the socio‐technical process followed in building a multicriteria value model, under a decision conferencing framework in which participation and interaction among decision‐actors were key features in the development of the three main phases of problem structuring, evaluation and prioritisation.  相似文献   

16.

Back break is an unsolicited phenomenon caused due to rock condition, blast geometry, explosive and initiation system in mines. It does not help in creating a smooth high wall and free face for next blasting due to cracks, overhang and under-hang. It can cause rockfall during drilling due to the cracks present in the in situ rock mass at the perimeter. Due to improper free face created from the previous blast and the presence of loose strata in the face increases the overall cost of production. Therefore, predicting and subsequently optimising back break shall reduce their problems to some extent. In this paper, an attempt is made to predict back break using the random forest method. The variables used for the study was such as burden to spacing ratio, stemming to hole-depth ratio, p-wave velocity and the density of explosive. For the random forest model, R2 0.9791 and RMSE 0.87899 and for linear regression was R2 was 0.824 and root mean square error (RMSE) 0.72, respectively. From the field trials, it was evident that the use of low-density emulsion can help in reducing the back break and optimise the overall cost of the blasting process. The same results were validated using Random forest method wherein the model R2 was 0.9791 and RMSE was 0.8799.

  相似文献   

17.
18.
Cybernetics studies information process in the context of interaction with physical systems. Because such information is sometimes vague and exhibits complex interactions; it can only be discerned using approximate representations. Machine learning provides solutions that create approximate models of information and decision trees are one of its main components. However, decision trees are susceptible to information overload and can get overly complex when a large amount of data is inputted in them. Granulation of decision tree remedies this problem by providing the essential structure of the decision tree, which can decrease its utility. To evaluate the relationship that exists between granulation and decision tree complexity, data uncertainty and prediction accuracy, the deficiencies obtained by nursing homes during annual inspections were taken as a case study. Using rough sets, three forms of granulation were performed: (1) attribute grouping, (2) removing insignificant attributes and (3) removing uncertain records. Attribute grouping significantly reduces tree complexity without having any strong effect upon data consistency and accuracy. On the other hand, removing insignificant features decrease data consistency and tree complexity, while increasing the error in prediction. Finally, decrease in the uncertainty of the dataset results in an increase in accuracy and has no impact on tree complexity.  相似文献   

19.
A decision tree approach was applied and validated for analysis of landslide susceptibility using a geographic information system (GIS). The study area was the Pyeongchang area in Gangwon Province, Korea, where many landslides occurred in 2006 and where the 2018 Winter Olympics are to be held. Spatial data, such as landslides, topography, and geology, were detected, collected, and compiled in a database using remote sensing and GIS. The 3994 recorded landslide locations were randomly split 50/50 for training and validation of the models. A decision tree model, which is a type of data-mining classification model, was applied and decision trees were constructed using the chi-squared (χ2) automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. Also, as a reference, a frequency-ratio model was applied using the same database. The relationships between the detected landslide locations and their factors were identified and quantified by frequency-ratio and decision tree models. The relationships were used as factor ratings in the overlay analysis to create landslide susceptibility indices and maps. Then, the resulting landslide-susceptibility maps were validated using area-under-the-curve (AUC) analysis with the landslide area data that had not been used for training the model. The decision tree models using the CHAID and QUEST algorithms had accuracies of 81.56% and 80.91%, respectively, which were somewhat better than the results for the frequency-ratio model (80.15%). These results indicate that decision tree models using the CHAID and QUEST algorithms can be useful for landslide susceptibility analysis.  相似文献   

20.
王永生  柴佩琪 《计算机应用》2006,26(3):651-0654
英文文语转换系统中的韵律生成模块和多音词消歧模块均必须用到单词的词性信息,因而词性标注是英文TTS系统中一个非常重要的部分。讨论了在只有一个词库的有限条件下,如何通过决策树中的C4.5算法进行词性标注的非监督学习,同时讨论了未登录词的词性猜测问题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号