期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A method for improving the accuracy of data mining classification algorithms

Nikolaos Mastrogiannis Basilis Boutsinas Ioannis Giannikos 《Computers & Operations Research》2009

In this paper we introduce a method called CL.E.D.M. (CLassification through ELECTRE and Data Mining), that employs aspects of the methodological framework of the ELECTRE I outranking method, and aims at increasing the accuracy of existing data mining classification algorithms. In particular, the method chooses the best decision rules extracted from the training process of the data mining classification algorithms, and then it assigns the classes that correspond to these rules, to the objects that must be classified. Three well known data mining classification algorithms are tested in five different widely used databases to verify the robustness of the proposed method. 相似文献

2.

The discovery of experts'' decision rules from qualitative bankruptcy data using genetic algorithms 总被引：1，自引：0，他引：1

Myoung-Jong Kim Ingoo Han 《Expert systems with applications》2003,25(4):10210-646

Numerous studies on bankruptcy prediction have widely applied data mining techniques to finding out the useful knowledge automatically from financial databases, while few studies have proposed qualitative data mining approaches capable of eliciting and representing experts' problem-solving knowledge from experts' qualitative decisions. In an actual risk assessment process, the discovery of bankruptcy prediction knowledge from experts is still regarded as an important task because experts' predictions depend on their subjectivity. This paper proposes a genetic algorithm-based data mining method for discovering bankruptcy decision rules from experts' qualitative decisions. The results of the experiment show that the genetic algorithm generates the rules which have the higher accuracy and larger coverage than inductive learning methods and neural networks. They also indicate that considerable agreement is achieved between the GA method and experts' problem-solving knowledge. This means that the proposed method is a suitable tool for eliciting and representing experts' decision rules and thus it provides effective decision supports for solving bankruptcy prediction problems. 相似文献

3.

An evolutionary algorithm to discover quantitative association rules from huge databases without the need for an a priori discretization 总被引：1，自引：0，他引：1

Victoria Pachón Álvarez Jacinto Mata Vázquez 《Expert systems with applications》2012,39(1):585-593

Association rules are one of the most frequently used tools for finding relationships between different attributes in a database. There are various techniques for obtaining these rules, the most common of which are those which give categorical association rules. However, when we need to relate attributes which are numeric and discrete, we turn to methods which generate quantitative association rules, a far less studied method than the above. In addition, when the database is extremely large, many of these tools cannot be used. In this paper, we present an evolutionary tool for finding association rules in databases (both small and large) comprising quantitative and categorical attributes without the need for an a priori discretization of the domain of the numeric attributes. Finally, we evaluate the tool using both real and synthetic databases. 相似文献

4.

浅析数据挖掘技术

王晓燕《办公自动化》2009,(10)

本文主要讨论了数据挖掘方法的研究发展,简要地介绍了数据挖掘的定义、功能、方法等,详细的介绍了一些常用的数据挖掘技术,包括概念,应用范围,选择划分的标准等。相似文献

5.

基于信息赢取的适应度函数

杨新武刘椿年《计算机工程》2004,30(2):38-39,161

用遗传算法挖掘一阶规则的关键在于如何准确地评价一阶规则,即规则的适应度能有效地区分规则的优劣,从而指导算法逼近目标规则。该文在绑定概念的基础上,依据信息理论,提出了新的基于信息赢取的适应度函数。相比通常采用的基于规则覆盖的正、负例数目的评判标准,新的适应度函数能充分利用隐藏在例子和背景知识中的信息,更准确地量化一阶规则的优劣,从而提高算法的搜索性能和规则的可读性。相似文献

6.

Classifier hierarchy learning by means of genetic algorithms

J.M. Martínez-Otzeta B. Sierra E. Lazkano A. Astigarraga 《Pattern recognition letters》2006,27(16):1998-2004

Classifier combination falls in the so called data mining area. Its aim is to combine some paradigms from the supervised classification – sometimes with a previous non-supervised data division phase – in order to improve the individual accuracy of the component classifiers. Formation of classifier hierarchies is an alternative among the several methods of classifier combination. In this paper we present a novel method to find good hierarchies of classifiers for given databases. In this new proposal, a search is performed by means of genetic algorithms, returning the best individual according to the classification accuracy over the dataset, estimated through 10-fold cross-validation. Experiments have been carried out over 14 databases from the UCI repository, showing an improvement in the performance compared to the single classifiers. Moreover, similar or better results than other approaches, such as decision tree bagging and boosting, have been obtained. 相似文献

7.

并行关联规则挖掘综述 总被引：3，自引：0，他引：3

尚学群沈均毅《计算机工程》2004,30(14):1-3,13

关联规则发现作为数据挖掘的重要研究内容,在许多实际领域内得到了广泛的应用。因为在挖掘过程中涉及到大量的数据和计算,高性能计算成为大规模数据挖掘应用的一个重要组成部分。该文介绍了当前并行关联规则挖掘方面的研究进展,对一些典型算法进行了分析和评价,从并行度、负载平衡以及和数据库的集成等方面展望了并行关联规则挖掘的研究方向。相似文献

8.

广义关联规则及算法研究 总被引：2，自引：0，他引：2

欧阳军马稳沈钧毅史保怀《计算机工程与应用》2002,38(20):201-204

挖掘广义关联规则是数据挖掘研究的一个重要方面,数据挖掘领域的研究者在挖掘广义关联规则上作了大量的工作,使之成为一个具有普遍和实用意义的数据挖掘方法。文章就挖掘广义关联规则的算法进行了深入的研究。相似文献

9.

一种两阶段决策树建树方法及其应用 总被引：2，自引：0，他引：2

朱应庄吴耿锋《计算机工程》2004,30(1):82-84

提出一种新颖的两阶段决策树建树方法;在对数据集进行较粗的分类后,通过遗传算法寻找规则集来建立决策树叶子节点.该方法可以同时对多个属性进行度量,并避免了决策树的剪枝过程。相似文献

10.

目前数据挖掘算法的评价 总被引：11，自引：2，他引：11

王清毅张波蔡庆生《小型微型计算机系统》2000,21(1):75-78

首先讨论了数据挖掘算法的评价标准问题,然后运用数据封装分析的方法评价了目前的分类算法,基于实验结果,对目前的关联规则挖掘算法进行了评价。相似文献

11.

数据挖掘的并行策略研究 总被引：3，自引：1，他引：3

颜雪松蔡之华周燕叶静《计算机工程与应用》2003,39(3):187-189

文章对数据挖掘算法的并行策略进行了分类,分类技术主要集中在分割训练数据以及在每一个阶段的最后从处理器中抽取属性。这种方法在关联规则和决策树中得到了广泛的研究。在策略应用中,以DD算法为例进行了说明。在文章的最后,展望了并行数据挖掘的发展方向。相似文献

12.

Improving the prediction of the clinical outcome of breast cancer using evolutionary algorithms

M. Wahde Z. Szallasi 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2006,10(4):338-345

There exist several methods for binary classification of gene expression data sets. However, in the majority of published methods, little effort has been made to minimize classifier complexity. In view of the small number of samples available in most gene expression data sets, there is a strong motivation for minimizing the number of free parameters that must be fitted to the data. In this paper, a method is introduced for evolving (using an evolutionary algorithm) simple classifiers involving a minimal subset of the available genes. The classifiers obtained by this method perform well, reaching 97% correct classification of clinical outcome on training samples from the breast cancer data set published by van't Veer, and up to 89% correct classification on validation samples from the same data set, easily outperforming previously published results. 相似文献

13.

Multi-objective PSO algorithm for mining numerical association rules without a priori discretization

《Expert systems with applications》2014,41(9):4259-4273

In the domain of association rules mining (ARM) discovering the rules for numerical attributes is still a challenging issue. Most of the popular approaches for numerical ARM require a priori data discretization to handle the numerical attributes. Moreover, in the process of discovering relations among data, often more than one objective (quality measure) is required, and in most cases, such objectives include conflicting measures. In such a situation, it is recommended to obtain the optimal trade-off between objectives. This paper deals with the numerical ARM problem using a multi-objective perspective by proposing a multi-objective particle swarm optimization algorithm (i.e., MOPAR) for numerical ARM that discovers numerical association rules (ARs) in only one single step. To identify more efficient ARs, several objectives are defined in the proposed multi-objective optimization approach, including confidence, comprehensibility, and interestingness. Finally, by using the Pareto optimality the best ARs are extracted. To deal with numerical attributes, we use rough values containing lower and upper bounds to show the intervals of attributes. In the experimental section of the paper, we analyze the effect of operators used in this study, compare our method to the most popular evolutionary-based proposals for ARM and present an analysis of the mined ARs. The results show that MOPAR extracts reliable (with confidence values close to 95%), comprehensible, and interesting numerical ARs when attaining the optimal trade-off between confidence, comprehensibility and interestingness. 相似文献

14.

An efficient genetic algorithm for automated mining of both positive and negative quantitative association rules

Bilal Alataş Erhan Akin 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2006,10(3):230-237

In this paper, a genetic algorithm (GA) is proposed as a search strategy for not only positive but also negative quantitative association rule (AR) mining within databases. Contrary to the methods used as usual, ARs are directly mined without generating frequent itemsets. The proposed GA performs a database-independent approach that does not rely upon the minimum support and the minimum confidence thresholds that are hard to determine for each database. Instead of randomly generated initial population, uniform population that forces the initial population to be not far away from the solutions and distributes it in the feasible region uniformly is used. An adaptive mutation probability, a new operator called uniform operator that ensures the genetic diversity, and an efficient adjusted fitness function are used for mining all interesting ARs from the last population in only single run of GA. The efficiency of the proposed GA is validated upon synthetic and real databases. 相似文献

15.

一种基于决策表的分类规则挖掘新算法 总被引：2，自引：0，他引：2

谢娟英冯德民《计算机科学》2003,30(10):61-63

The mining of classification rules is an important field in Data Mining. Decision table of rough sets theory is an efficient tool for mining classification rules. The elementary concepts corresponding to decision table of Rough Sets Theory are introduced in this paper. A new algorithm for mining classification rules based on Decision Table is presented, along with a discernable function in reduction of attribute values, and a new principle for accuracy of rules. An example of its application to the car‘s classification problem is included, and the accuracy of rules discovered is analyzed. The potential fields for its application in data mining are also discussed. 相似文献

16.

Attribute Generation Based on Association Rules 总被引：1，自引：0，他引：1

Masahiro Terabe Takashi Washio Hiroshi Motoda Osamu Katai Tetsuo Sawaragi 《Knowledge and Information Systems》2002,4(3):329-349

A decision tree is considered to be appropriate (1) if the tree can classify the unseen data accurately, and (2) if the size of the tree is small. One of the approaches to induce such a good decision tree is to add new attributes and their values to enhance the expressiveness of the training data at the data pre-processing stage. There are many existing methods for attribute extraction and construction, but constructing new attributes is still an art. These methods are very time consuming, and some of them need a priori knowledge of the data domain. They are not suitable for data mining dealing with large volumes of data. We propose a novel approach that the knowledge on attributes relevant to the class is extracted as association rules from the training data. The new attributes and the values are generated from the association rules among the originally given attributes. We elaborate on the method and investigate its feature. The effectiveness of our approach is demonstrated through some experiments. Received 6 December 1999 / Revised 28 October 2000 / Accepted in revised form 9 March 2001 相似文献

17.

基于连续属性分类规则挖掘的新算法研究

厍向阳薛惠锋《计算机工程》2005,31(18):28-30

分析了针对连续属性样本进行数据挖掘的缺陷,提出一种直接对连续属性样本进行分类规则挖掘的算法.它基于样本属性值分割点对实例样本进行分类,把分割点对实例样本的分类能力作为分割点选择的依据,将所有相容样本划分为分类属性值相同的子集作为停机条件,实现连续属性样本分类规则挖掘的完全自动化.它考虑到数据挖掘的目标和要求,充分利用属性与类间的依赖性、属性间的互补性,达到样本分割点数少、分类规则简单和属性约减的目的.最后通过实例进行了验证,并与C4.5算法进行了比较. 相似文献

18.

利用MLC++实现数据挖掘

刘晓平《计算机仿真》2006,23(4):103-105,113

数据挖掘是从大量原始数据中抽取隐藏知识的过程。大部分数据挖掘工具采用规则发现和决策树分类技术来发现数据模式和规则,其核心是归纳算法。与传统统计方法相比,基于机器学习技术得到的分类结果具有较好的可解释性。在针对特定的数据集进行数据挖掘时,如果缺乏相应的领域知识,用户或决策者就很难确定选择何种归纳算法。因此,需要尝试各种算法。借助MLC＋＋,决策者能够轻而易举地比较不同分类算法对特定数据集的有效性,从而选择合适的分类算法。同时,系统开发人员也可以利用MLC＋＋设计各种混合算法。相似文献

19.

基于Apriori算法的多循环关联规则挖掘综述 总被引：5，自引：0，他引：5

袁军鹏朱东华《计算机科学》2004,31(1):114-117

介绍了基于Apriori算法的关联规则挖掘的研究状况,对一些典型采掘算法进行了分析和评价,总结和归纳了国内外学者对Apriori的改进,展望了关联规则挖掘的未来研究方向。相似文献

20.

Adequacy of training data for evolutionary mining of trading rules

Kumar Siddhartha 《Decision Support Systems》2004,37(4):461

A crucial issue related to data mining on time-series is that of training period duration. The training horizon used impacts the nature of rules obtained and their predictability over time. Longer training horizons are generally sought, in order to discern sustained patterns with robust training data performance that extends well into the predictive period. However, in dynamic environments patterns that persist over time may be unavailable, and shorter-term patterns may hold higher predictive ability, albeit with shorter predictive periods. Such potentially useful shorter-term patterns may be lost when the training duration covers much longer periods. Too short a training duration can, of course, be susceptible to over-fitting to noise. We conduct experiments using different training horizons with daily-data for the S&P500 index and report the sensitivity of the performance of the obtained rules with respect to the training durations. We show that while the performance of the rules in the training period is important for inducing the “best” rules, it is not indicative of their performance in the test-period and propose alternative measures that can be used to help identify the appropriate training durations. 相似文献