首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 55 毫秒
1.
《Applied Soft Computing》2007,7(3):1102-1111
Classification and association rule discovery are important data mining tasks. Using association rule discovery to construct classification systems, also known as associative classification, is a promising approach. In this paper, a new associative classification technique, Ranked Multilabel Rule (RMR) algorithm is introduced, which generates rules with multiple labels. Rules derived by current associative classification algorithms overlap in their training objects, resulting in many redundant and useless rules. However, the proposed algorithm resolves the overlapping between rules in the classifier by generating rules that does not share training objects during the training phase, resulting in a more accurate classifier. Results obtained from experimenting on 20 binary, multi-class and multi-label data sets show that the proposed technique is able to produce classifiers that contain rules associated with multiple classes. Furthermore, the results reveal that removing overlapping of training objects between the derived rules produces highly competitive classifiers if compared with those extracted by decision trees and other associative classification techniques, with respect to error rate.  相似文献   

2.
One of the known classification approaches in data mining is rule induction (RI). RI algorithms such as PRISM usually produce If-Then classifiers, which have a comparable predictive performance to other traditional classification approaches such as decision trees and associative classification. Hence, these classifiers are favourable for carrying out decisions by users and therefore they can be utilised as decision making tools. Nevertheless, RI methods, including PRISM and its successors, suffer from a number of drawbacks primarily the large number of rules derived. This can be a burden especially when the input data is largely dimensional. Therefore, pruning unnecessary rules becomes essential for the success of this type of classifiers. This article proposes a new RI algorithm that reduces the search space for candidate rules by early pruning any irrelevant items during the process of building the classifier. Whenever a rule is generated, our algorithm updates the candidate items frequency to reflect the discarded data examples associated with the rules derived. This makes items frequency dynamic rather static and ensures that irrelevant rules are deleted in preliminary stages when they don't hold enough data representation. The major benefit will be a concise set of decision making rules that are easy to understand and controlled by the decision maker. The proposed algorithm has been implemented in WEKA (Waikato Environment for Knowledge Analysis) environment and hence it can now be utilised by different types of users such as managers, researchers, students and others. Experimental results using real data from the security domain as well as sixteen classification datasets from University of California Irvine (UCI) repository reveal that the proposed algorithm is competitive in regards to classification accuracy when compared to known RI algorithms. Moreover, the classifiers produced by our algorithm are smaller in size which increase their possible use in practical applications.  相似文献   

3.
Dynamic weighting ensemble classifiers based on cross-validation   总被引:1,自引:1,他引:0  
Ensemble of classifiers constitutes one of the main current directions in machine learning and data mining. It is accepted that the ensemble methods can be divided into static and dynamic ones. Dynamic ensemble methods explore the use of different classifiers for different samples and therefore may get better generalization ability than static ensemble methods. However, for most of dynamic approaches based on KNN rule, additional part of training samples should be taken out for estimating “local classification performance” of each base classifier. When the number of training samples is not sufficient enough, it would lead to the lower accuracy of the training model and the unreliableness for estimating local performances of base classifiers, so further hurt the integrated performance. This paper presents a new dynamic ensemble model that introduces cross-validation technique in the process of local performances’ evaluation and then dynamically assigns a weight to each component classifier. Experimental results with 10 UCI data sets demonstrate that when the size of training set is not large enough, the proposed method can achieve better performances compared with some dynamic ensemble methods as well as some classical static ensemble approaches.  相似文献   

4.
一种大数据环境中分布式辅助关联分类算法   总被引:4,自引:0,他引:4  
张明卫  朱志良  刘莹  张斌 《软件学报》2015,26(11):2795-2810
在很多现实的分类应用中,新数据的类标需要由领域专家最终确定,而分类器的分类结果仅起辅助作用.另外,随着大数据所隐含价值越发被人们重视,分类器的训练会从面向单一数据集逐渐过渡到面向分布式空间数据集,大数据环境下辅助分类也将成为未来分类应用的重要分支.然而,现有的分类研究缺乏对此类应用的关注.大数据环境中的辅助分类面临以下3个问题:1) 训练集是分布式大数据集;2) 在空间上,训练集所包含的各局部数据源的类别分布不尽相同;3) 在时间上,训练集是动态变化的,会发生类别迁移现象.在考虑以上问题的基础上,提出一种大数据环境中分布式辅助关联分类方法.该方法首先给出一种大数据环境中分布式关联分类器构建算法,在该算法中,通过横向加权考虑分类数据集在空间上的类别分布差异,并给出"前件空间支持度-相关系数"的度量框架,改进关联分类算法面对不平衡数据的性能缺陷;然后,给出一种基于适应因子的辅助关联分类器动态调整方法,能够在分类器应用过程中充分利用领域专家实时反馈的结果对分类器进行动态调整,以提升其面向动态数据集的分类性能,减缓分类器的退化和重新训练的频率.实验结果表明,该方法能够面向分布式数据集较快地训练出有较高分类准确率的关联分类器,并在数据集不断扩充变化时提升分类性能,是一种有效的大数据环境中辅助分类应用方法.  相似文献   

5.
Building a highly-compact and accurate associative classifier   总被引:1,自引:1,他引:0  
Associative classification has aroused significant research attention in recent years due to its advantage in rule forms with satisfactory accuracy. However, the rules in associative classifiers derived from typical association rule mining (e.g., Apriori-type) may easily become too many to be understood and even be sometimes redundant or conflicting. To deal with these issues of concern, a recently proposed approach (i.e., GARC) appears to be superior to other existing approaches (e.g., C4.5-type, NN, SVM, CBA) in two respects: one is its classification accuracy that is equally satisfactory; the other is the compactness that the generated classifier is constituted with much fewer rules. Along with this line of methodological thinking, this paper presents a novel GARC-type approach, namely GEAR, to build an associative classifier with three distinctive and desirable features. First, the rules in the GEAR classifier are more intuitively appealing; second, the GEAR classification accuracy is improved or at least as good as others; and third, the GEAR classifier is significantly more compact in size. In doing so, a number of notions including rule redundancy and compact set are provided, together with related properties that could be incorporated into the rule mining process as algorithmic pruning strategies. The experimental results with benchmarking datasets also reveal that GEAR outperforms GARC and other approaches in an effective manner.  相似文献   

6.
In handwritten pattern recognition, the multiple classifier system has been shown to be useful for improving recognition rates. One of the most important tasks in optimizing a multiple classifier system is to select a group of adequate classifiers, known as an Ensemble of Classifiers (EoC), from a pool of classifiers. Static selection schemes select an EoC for all test patterns, and dynamic selection schemes select different classifiers for different test patterns. Nevertheless, it has been shown that traditional dynamic selection performs no better than static selection. We propose four new dynamic selection schemes which explore the properties of the oracle concept. Our results suggest that the proposed schemes, using the majority voting rule for combining classifiers, perform better than the static selection method.  相似文献   

7.
Knowledge-based systems such as expert systems are of particular interest in medical applications as extracted if-then rules can provide interpretable results. Various rule induction algorithms have been proposed to effectively extract knowledge from data, and they can be combined with classification methods to form rule-based classifiers. However, most of the rule-based classifiers can not directly handle numerical data such as blood pressure. A data preprocessing step called discretization is required to convert such numerical data into a categorical format. Existing discretization algorithms do not take into account the multimodal class densities of numerical variables in datasets, which may degrade the performance of rule-based classifiers. In this paper, a new Gaussian Mixture Model based Discretization Algorithm (GMBD) is proposed that preserve the most frequent patterns of the original dataset by taking into account the multimodal distribution of the numerical variables. The effectiveness of GMBD algorithm was verified using six publicly available medical datasets. According to the experimental results, the GMBD algorithm outperformed five other static discretization methods in terms of the number of generated rules and classification accuracy in the associative classification algorithm. Consequently, our proposed approach has a potential to enhance the performance of rule-based classifiers used in clinical expert systems.  相似文献   

8.
Associative classifiers are a classification system based on associative classification rules. Although associative classification is more accurate than a traditional classification approach, it cannot handle numerical data and its relationships. Therefore, an ongoing research problem is how to build associative classifiers from numerical data. In this work, we focus on stock trading data with many numerical technical indicators, and the classification problem is finding sell and buy signals from the technical indicators. This study proposes a GA-based algorithm used to build an associative classifier that can discover trading rules from these numerical indicators. The experiment results show that the proposed approach is an effective classification technique with high prediction accuracy and is highly competitive when compared with the data distribution method.  相似文献   

9.
基于动态加权的粗糙子空间集成   总被引:1,自引:0,他引:1       下载免费PDF全文
提出一种基于动态加权的粗糙子空间集成方法EROS-DW。利用粗糙集属性约简方法获得多个特征约简子集,并据此训练基分类器。在分类阶段,根据给定待测样本的具体特征动态地为每个基分类器指派相应的权重,采用加权投票组合规则集成各分类器的输出结果。利用UCI标准数据集对该方法的性能进行测试。实验结果表明,相较于经典的集成方法,EROS-DW方法可以获得更高的分类准确率。  相似文献   

10.
From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However, mining for a complete set of sequential patterns on a large dataset can be extremely time-consuming and the large number of patterns discovered also makes the pattern selection and classifier building very time-consuming. The fact is that, in sequence classification, it is much more important to discover discriminative patterns than a complete pattern set. In this paper, we propose a novel hierarchical algorithm to build sequential classifiers using discriminative sequential patterns. Firstly, we mine for the sequential patterns which are the most strongly correlated to each target class. In this step, an aggressive strategy is employed to select a small set of sequential patterns. Secondly, pattern pruning and serial coverage test are done on the mined patterns. The patterns that pass the serial test are used to build the sub-classifier at the first level of the final classifier. And thirdly, the training samples that cannot be covered are fed back to the sequential pattern mining stage with updated parameters. This process continues until predefined interestingness measure thresholds are reached, or all samples are covered. The patterns generated in each loop form the sub-classifier at each level of the final classifier. Within this framework, the searching space can be reduced dramatically while a good classification performance is achieved. The proposed algorithm is tested in a real-world business application for debt prevention in social security area. The novel sequence classification algorithm shows the effectiveness and efficiency for predicting debt occurrences based on customer activity sequence data.  相似文献   

11.
多分类器融合实现机型识别   总被引:2,自引:0,他引:2  
针对空战目标识别中机型识别这一问题,提出了基于多分类器融合的识别方法。该方法以战术性能参数为输入,便于满足空战的实时性要求。通过广泛收集数据,得到机型识别的分类特征,选取分类特征的子集作为单分类器的特征,用BP网络设计单分类器,然后选用性能优良的和规则进行分类器融合,求得最终的决策。实验结果表明,多分类器融合的识别性能明显优于参与融合的分类器,也优于相同输入的单分类器。该方法的另一特点是能够进行缺省推理,因而有较强的抗干扰能力,适合真实战场环境的需要。  相似文献   

12.
The output of a classifier is usually determined by the value of a discriminant function and a decision is made based on this output which does not necessarily represent the posterior probability for the soft decision of classification. In this context, it is desirable that the output of a classifier be calibrated in such a way to give the meaning of the posterior probability of class membership. This paper presents a new method of postprocessing for the probabilistic scaling of classifier's output. For this purpose, the output of a classifier is analyzed and the distribution of the output is described by the beta distribution parameters. For more accurate approximation of class output distribution, the beta distribution parameters as well as the kernel parameters describing the discriminant function are adjusted in such a way to improve the uniformity of beta cumulative distribution function (CDF) values for the given class output samples. As a result, the classifier with the proposed scaling method referred to as the class probability output network (CPON) can provide accurate posterior probabilities for the soft decision of classification. To show the effectiveness of the proposed method, the simulation for pattern classification using the support vector machine (SVM) classifiers is performed for the University of California at Irvine (UCI) data sets. The simulation results using the SVM classifiers with the proposed CPON demonstrated a statistically meaningful performance improvement over the SVM and SVM-related classifiers, and also other probabilistic scaling methods.  相似文献   

13.
一种基于粒子群算法的分类器设计   总被引:9,自引:2,他引:7  
将粒子群算法应用于数据分类,给出了适用于粒子群算法的分类规则编码,构造了新的分类规则适应度函数来更准确的提取规则集,并通过修改粒子位置更新方程使粒子群算法适于解决分类规则挖掘问题,进而实现了基于粒子群算法的分类器设计。该文进一步用UCI基准数据集对作者提出的粒子群分类器进行了测试,并将几种不同速度与位置更新策略的粒子群算法分类器与遗传算法分类器进行对比,实验结果表明,这种粒子群分类器是一种有效、可行的分类器设计方案。  相似文献   

14.
集成分类通过将若干个弱分类器依据某种规则进行组合,能有效改善分类性能。在组合过程中,各个弱分类器对分类结果的重要程度往往不一样。极限学习机是最近提出的一个新的训练单隐层前馈神经网络的学习算法。以极限学习机为基分类器,提出了一个基于差分进化的极限学习机加权集成方法。提出的方法通过差分进化算法来优化集成方法中各个基分类器的权值。实验结果表明,该方法与基于简单投票集成方法和基于Adaboost集成方法相比,具有较高的分类准确性和较好的泛化能力。  相似文献   

15.
杜超  王志海  江晶晶  孙艳歌 《软件学报》2017,28(11):2891-2904
基于模式的贝叶斯分类模型是解决数据挖掘领域分类问题的一种有效方法.然而,大多数基于模式的贝叶斯分类器只考虑模式在目标类数据集中的支持度,而忽略了模式在对立类数据集合中的支持度.此外,对于高速动态变化的无限数据流环境,在静态数据集下的基于模式的贝叶斯分类器就不能适用.为了解决这些问题,提出了基于显露模式的数据流贝叶斯分类模型EPDS(Bayesian classifier algorithm based on emerging pattern for data stream).该模型使用一个简单的混合森林结构来维护内存中事务的项集,并采用一种快速的模式抽取机制来提高算法速度.EPDS采用半懒惰式学习策略持续更新显露模式,并为待分类事务在每个类下建立局部分类模型.大量实验结果表明,该算法比其他数据流分类模型有较高的准确度.  相似文献   

16.
Non-parametric classification procedures based on a certainty measure and nearest neighbour rule for motor unit potential classification (MUP) during electromyographic (EMG) signal decomposition were explored. A diversity-based classifier fusion approach is developed and evaluated to achieve improved classification performance. The developed system allows the construction of a set of non-parametric base classifiers and then automatically chooses, from the pool of base classifiers, subsets of classifiers to form candidate classifier ensembles. The system selects the classifier ensemble members by exploiting a diversity measure for selecting classifier teams. The kappa statistic is used as the diversity measure to estimate the level of agreement between base classifier outputs, i.e., to measure the degree of decision similarity between base classifiers. The pool of base classifiers consists of two kinds of classifiers: adaptive certainty-based classifiers (ACCs) and adaptive fuzzy k-NN classifiers (AFNNCs) and both utilize different types of features. Once the patterns are assigned to their classes, by the classifier fusion system, firing pattern consistency statistics for each class are calculated to detect classification errors in an adaptive fashion. Performance of the developed system was evaluated using real and simulated EMG signals and was compared with the performance of the constituent base classifiers and the performance of the fixed ensemble containing the full set of base classifiers. Across the EMG signal data sets used, the diversity-based classifier fusion approach had better average classification performance overall, especially in terms of reducing classification errors.  相似文献   

17.
集成学习是一种可以有效改善分类系统性能的数据挖掘方法。采用动态分类器集成选择算法对卷烟感官质量进行智能评估。产生包含多个基分类器的分类器池;根据基分类器在被测样本邻域内的表现选择满足要求的分类器;采用被选择的分类器产生最终的预测结果。为了验证该方法的有效性,采用国内某烟草公司提供的卷烟感官评估历史数据集进行了实验比较分析。实验结果表明,与其他方法相比,该方法获得的效果明显改善。  相似文献   

18.
Recently, considerable attention has focused on compound sequence classification methods which integrate multiple data mining techniques. Among these methods, sequential pattern mining (SPM) based sequence classifiers are considered to be efficient for solving complex sequence classification problems. Although previous studies have demonstrated the strength of SPM-based sequence classification methods, the challenges of pattern redundancy, inappropriate sequence similarity measures, and hard-to-classify sequences remain unsolved. This paper proposes an efficient two-stage SPM-based sequence classification method to address these three problems. In the first stage, during the sequential pattern mining process, redundant sequential patterns are identified if the pattern is a sub-sequence of other sequential patterns. A list of compact sequential patterns is generated excluding redundant patterns and used as representative features for the second stage. In the second stage, a sequence similarity measurement is used to evaluate partial similarity between sequences and patterns. Finally, a particle swarm optimization-AdaBoost (PSO-AB) sequence classifier is developed to improve sequence classification accuracy. In the PSO-AB sequence classifier, the PSO algorithm is used to optimize the weights in the individual sequence classifier, while the AdaBoost strategy is used to adaptively change the distribution of patterns that are hard to classify. The experiments show that the proposed two-stage SPM-based sequence classification method is efficient and superior to other approaches.  相似文献   

19.
Peculiarity-oriented mining is a data mining method consisting of peculiar data identification and peculiar data analysis. Peculiarity factor and local peculiarity factor are important concepts employed to describe the peculiarity of a data point in the identification step. One can study the notions at both attribute and record levels. In this paper, a new record LPF called distance-based record LPF (D-record LPF) is proposed, which is defined as the sum of distances between a point and its nearest neighbors. The authors prove that D-record LPF can characterize the probability density of a continuous m-dimensional distribution accurately. This provides a theoretical basis for some existing distance-based anomaly detection techniques. More importantly, it also provides an effective method for describing the class-conditional probabilities in a Bayesian classifier. The result enables us to apply D-record LPF to solve classification problems. A novel algorithm called LPF-Bayes classifier and its kernelized implementation are proposed, which have some connection to the Bayesian classifier. Experimental results on several benchmark datasets demonstrate that the proposed classifiers are competitive to some excellent classifiers such as AdaBoost, support vector machines and kernel Fisher discriminant.  相似文献   

20.
在多分类器集成时,每个基分类器的效能不同,如每个权值都相同,则会影响基分类器发挥作用。基于此,提出基于PSO拓展的多分类器加权集成方法BCPSO。该方法采用随机子空间生成各个独立的子分类器,输出结果通过各分类器加权投票组合规则集成。实验结果表明,该方法有效可行,具有较高的分类正确率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号