首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
关联分类具有较高的分类精度和较强的适应性,然而由于分类器是由一组高置信度的规则构成,有时会存在过度拟合问题。提出了基于规则兴趣度的关联分类(ACIR)。它扩展了TD-FP-growth算法,使之有效地挖掘训练集,产生满足最小支持度和最小置信度的有趣的规则。通过剪枝选择一个小规则集构造分类器。在规则剪枝过程中,采用规则兴趣度来评价规则的质量,综合考虑规则的预测精度和规则中项的兴趣度。实验结果表明该方法在分类精度上优于See5、CBA和CMAR,并且具有较好的可理解性和扩展性。  相似文献   

2.
Measures of interestingness play a crucial role in association rule mining. An important methodological problem, on which several papers appeared in the literature, is to provide a reasonable classification of the measures. In this paper, we explore Boolean factor analysis, which uses formal concepts corresponding to classes of measures as factors, for the purpose of clustering of the measures. Unlike the existing studies, our method reveals overlapping clusters of interestingness measures. We argue that the overlap between clusters is a desired feature of natural groupings of measures and that because formal concepts are used as factors in Boolean factor analysis, the resulting clusters have a clear meaning and are easy to interpret. We conduct three case studies on clustering of measures, provide interpretations of the resulting clusters and compare the results to those of the previous approaches reported in the literature.  相似文献   

3.
关联规则是一种常见的知识表达形式。本文介绍了关联规则的提取模式和基于PS架构提取模式的不足;介绍了关联规则兴趣度的定义,包括客观兴趣度和主观兴趣度以及综合兴趣度。  相似文献   

4.
Mining fuzzy association rules for classification problems   总被引:3,自引:0,他引:3  
The effective development of data mining techniques for the discovery of knowledge from training samples for classification problems in industrial engineering is necessary in applications, such as group technology. This paper proposes a learning algorithm, which can be viewed as a knowledge acquisition tool, to effectively discover fuzzy association rules for classification problems. The consequence part of each rule is one class label. The proposed learning algorithm consists of two phases: one to generate large fuzzy grids from training samples by fuzzy partitioning in each attribute, and the other to generate fuzzy association rules for classification problems by large fuzzy grids. The proposed learning algorithm is implemented by scanning training samples stored in a database only once and applying a sequence of Boolean operations to generate fuzzy grids and fuzzy rules; therefore, it can be easily extended to discover other types of fuzzy association rules. The simulation results from the iris data demonstrate that the proposed learning algorithm can effectively derive fuzzy association rules for classification problems.  相似文献   

5.
提出了一种基于关联规则的多类标算法(MLAC).利用多类标FP-tree来分解组合生成多类标规则.并通过组合多重关联规则分类器进行分类预测,降低了由高维属性带来的高计算复杂度,有效地提高了算法的性能和效率.针对多类标数据集的实验结果表明,MLAC算法在性能和效率等方面均优干ML-KNN等多类标分类算法.  相似文献   

6.
The wavelet domain association rules method is proposed for efficient texture characterization. The concept of association rules to capture the frequently occurring local intensity variation in textures. The frequency of occurrence of these local patterns within a region is used as texture features. Since texture is basically a multi-scale phenomenon, multi-resolution approaches such as wavelets, are expected to perform efficiently for texture analysis. Thus, this study proposes a new algorithm which uses the wavelet domain association rules for texture classification. Essentially, this work is an extension version of an early work of the Rushing et al. [10], [11], where the generation of intensity domain association rules generation was proposed for efficient texture characterization. The wavelet domain and the intensity domain (gray scale) association rules were generated for performance comparison purposes. As a result, Rushing et al. [10], [11] demonstrated that intensity domain association rules performs much more accurate results than those of the methods which were compared in the Rushing et al. work. Moreover, the performed experimental studies showed the effectiveness of the wavelet domain association rules than the intensity domain association rules for texture classification problem. The overall success rate is about 97%.  相似文献   

7.
This paper proposes a cellular automata-based solution of a binary classification problem. The proposed method is based on a two-dimensional, three-state cellular automaton (CA) with the von Neumann neighborhood. Since the number of possible CA rules (potential CA-based classifiers) is huge, searching efficient rules is conducted with use of a genetic algorithm (GA). Experiments show an excellent performance of discovered rules in solving the classification problem. The best found rules perform better than the heuristic CA rule designed by a human and also better than one of the most widely used statistical method: the k-nearest neighbors algorithm (k-NN). Experiments show that CAs rules can be successfully reused in the process of searching new rules.  相似文献   

8.
对Web文档进行分类可以较好地解决网上信息杂乱的现象,介绍了Web文档分类的相关知识以及关键技术,并对目前的分类方法进行了总结,对Web文档分类中关联规则挖掘研究现状和主要技术进行了论述,指出了负关联规则在Web文档分类中的发展趋势.  相似文献   

9.
针对MLKNN算法仅对独立标签进行处理,忽略现实世界中标签之间相关性这一问题,提出了一种基于关联规则的MLKNN多标签分类算法(FP-MLKNN)。该算法采用关联规则算法挖掘标签之间的高阶相关性,并用标签之间的关联规则改进MLKNN算法,以达到提升分类性能的目的。首先,使用MLKNN算法求样本的特征置信度;采用关联规则算法挖掘生成一系列强关联规则,进而将2种算法进行融合来构造多标签分类器,对新标签进行预测;在此基础上,将本文提出的算法与MLKNN、AdaBoostMH和BPMLL这3种算法进行实验对比。实验结果表明,本文所提算法在yeast、emotions和enron数据集上的分类性能均优于这3种算法,具有较好的分类效果。  相似文献   

10.
针对医学图像数据的特殊性,提出了一种适合挖掘大量医学图像数据的关联分类算法.该算法以频繁模式树为基础,通过引入双支持度,排除一部分对分类无意义且存在干扰的项,以提高分类正确率.实验结果表明,当用于医学图像分类时,该算法可以取得同样的基于关联规则的分类算法CMAR更高的执行效率及更好的分类效果.  相似文献   

11.
元规则制导的关联规则挖掘可以提高挖掘过程的效率和精确度,目前已经提出了许多关联规则的元规则制导挖掘算法,尤其是在关系数据库中;而在数据立方体上的元规则制导挖掘算法相对较少,且大多数是基于Apriori思想的算法,它们都存在冗余谓词搜索的问题。针对这种情况,提出了一种以元规则中维度的不同类型为依据的改进算法LRS,并在实验中证明了算法的有效性。  相似文献   

12.
Using association rules as texture features   总被引:1,自引:0,他引:1  
A new type of texture feature based on association rules is proposed in this paper. Association rules have been used in applications such as market basket analysis to capture relationships present among items in large data sets. It is shown that association rules can be adapted to capture frequently occurring local structures in images. Association rules capture both structural and statistical information, and automatically identifies the structures that occur most frequently and relationships that have significant discriminative power. Methods for classification and segmentation of textured images using association rules as texture features are described. Simulation results using images consisting of man made and natural textures show that association rule features perform well compared to other widely used texture features. It is shown that association rule features can distinguish texture pairs with identical first, second, and third order statistics, and texture pairs that are not easily discriminable visually  相似文献   

13.
Formulating Question Answering Validation as a classification problem facilitates the introduction of Machine Learning techniques to improve the overall performance of Question Answering systems. The different proportion of positive and negative examples in the evaluation collections has led to the use of measures based on precision and recall. However, an evaluation based on the analysis of Receiver Operating Characteristic (ROC) space is sometimes preferred in classification with unbalanced collections. In this article we compare both evaluation approaches according to their rationale, their stability, their discrimination power and their adequacy to the particularities of the Answer Validation task.  相似文献   

14.
遗传算法是数据挖掘中一种重要的分类挖掘算法,但简单的遗传算法具有很大的随机性,出错率较高,难以满足数据挖掘的需要。为此,提出一种基于遗传算法和Apriori的分类挖掘算法——GAA。从编码设计、适应度函数、遗传算子的设计方面进行讨论和分析,结合一个具体实例进行应用,结果表明算法在代数较少情况下,可有效提高分类的准确性,具有一定的应用价值。  相似文献   

15.
16.
A number of studies, theoretical, empirical, or both, have been conducted to provide insight into the properties and behavior of interestingness measures for association rule mining. While each has value in its own right, most are either limited in scope or, more importantly, ignore the purpose for which interestingness measures are intended, namely the ultimate ranking of discovered association rules. This paper, therefore, focuses on an analysis of the rule-ranking behavior of 61 well-known interestingness measures tested on the rules generated from 110 different datasets. By clustering based on ranking behavior, we highlight, and formally prove, previously unreported equivalences among interestingness measures. We also show that there appear to be distinct clusters of interestingness measures, but that there remain differences among clusters, confirming that domain knowledge is essential to the selection of an appropriate interestingness measure for a particular task and business objective.  相似文献   

17.
This paper presents a constructive method for association rule extraction, where the knowledge of data is encoded into an SVM classification tree (SVMT), and linguistic association rule is extracted by decoding of the trained SVMT. The method of rule extraction over the SVMT (SVMT-rule), in the spirit of decision-tree rule extraction, achieves rule extraction not only from SVM, but also over the decision-tree structure of SVMT. Thus, the obtained rules from SVMT-rule have the better comprehensibility of decision-tree rule, meanwhile retains the good classification accuracy of SVM. Moreover, profiting from the super generalization ability of SVMT owing to the aggregation of a group of SVMs, the SVMT-rule is capable of performing a very robust classification on such datasets that have seriously, even overwhelmingly, class-imbalanced data distribution. Experiments with a Gaussian synthetic data, seven benchmark cancers diagnosis, and one application of cell-phone fraud detection have highlighted the utility of SVMT and SVMT-rule on comprehensible and effective knowledge discovery, as well as the superior properties of SVMT-rule as compared to a purely support-vector based rule extraction. (A version of SVMT Matlab software is available online at )
Nikola KasabovEmail:
  相似文献   

18.
This paper presents a novel technique for recognizing broken characters found in degraded text documents by modeling it as a set-partitioning problem (SPP). The proposed technique searches for the optimal set-partition of the connected components by which each subset yields a reconstructed character. Given the non-linear nature of the objective function needed for optimal set-partitioning, we design an algorithm that we call Heuristic Incremental Integer Programming (HIIP). The algorithm employs integer programming (IP) with an incremental approach using heuristics to hasten the convergence. The objective function is formulated as probability functions that reflect common OCR measurements – pattern resemblance, sizing conformity and distance between connected components. We applied the HIIP technique to Thai and English degraded text documents and achieved accuracy rates over 90%. We also compared HIIP against three competing algorithms and achieved higher comparative accuracy in each case.  相似文献   

19.
20.
Bui-Thi  Danh  Meysman  Pieter  Laukens  Kris 《Applied Intelligence》2022,52(3):3090-3102
Applied Intelligence - A crucial characteristic of machine learning models in various domains (such as medical diagnosis, financial analysis, or real-time process monitoring) is the...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号