期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An evolutionary algorithm to discover quantitative association rules from huge databases without the need for an a priori discretization 总被引：1，自引：0，他引：1

Victoria Pachón Álvarez Jacinto Mata Vázquez 《Expert systems with applications》2012,39(1):585-593

Association rules are one of the most frequently used tools for finding relationships between different attributes in a database. There are various techniques for obtaining these rules, the most common of which are those which give categorical association rules. However, when we need to relate attributes which are numeric and discrete, we turn to methods which generate quantitative association rules, a far less studied method than the above. In addition, when the database is extremely large, many of these tools cannot be used. In this paper, we present an evolutionary tool for finding association rules in databases (both small and large) comprising quantitative and categorical attributes without the need for an a priori discretization of the domain of the numeric attributes. Finally, we evaluate the tool using both real and synthetic databases. 相似文献

2.

Multi-objective PSO algorithm for mining numerical association rules without a priori discretization

《Expert systems with applications》2014,41(9):4259-4273

In the domain of association rules mining (ARM) discovering the rules for numerical attributes is still a challenging issue. Most of the popular approaches for numerical ARM require a priori data discretization to handle the numerical attributes. Moreover, in the process of discovering relations among data, often more than one objective (quality measure) is required, and in most cases, such objectives include conflicting measures. In such a situation, it is recommended to obtain the optimal trade-off between objectives. This paper deals with the numerical ARM problem using a multi-objective perspective by proposing a multi-objective particle swarm optimization algorithm (i.e., MOPAR) for numerical ARM that discovers numerical association rules (ARs) in only one single step. To identify more efficient ARs, several objectives are defined in the proposed multi-objective optimization approach, including confidence, comprehensibility, and interestingness. Finally, by using the Pareto optimality the best ARs are extracted. To deal with numerical attributes, we use rough values containing lower and upper bounds to show the intervals of attributes. In the experimental section of the paper, we analyze the effect of operators used in this study, compare our method to the most popular evolutionary-based proposals for ARM and present an analysis of the mined ARs. The results show that MOPAR extracts reliable (with confidence values close to 95%), comprehensible, and interesting numerical ARs when attaining the optimal trade-off between confidence, comprehensibility and interestingness. 相似文献

3.

An evolutionary approach for automatically extracting intelligible classification rules

I. De Falco A. Della Cioppa A. Iazzetta E. Tarantino 《Knowledge and Information Systems》2005,7(2):179-201

The process of automatically extracting novel, useful and ultimately comprehensible information from large databases, known as data mining, has become of great importance due to the ever-increasing amounts of data collected by large organizations. In particular, the emphasis is devoted to heuristic search methods able to discover patterns that are hard or impossible to detect using standard query mechanisms and classical statistical techniques. In this paper an evolutionary system capable of extracting explicit classification rules is presented. Special interest is dedicated to find easily interpretable rules that may be used to make crucial decisions. A comparison with the findings achieved by other methods on a real problem, the breast cancer diagnosis, is performed. 相似文献

4.

用于分类规则提取的演化算法分析与设计

覃俊康立山陈毓屏《计算机工程与应用》2004,40(2):13-15,31

在知识发现流程中,分类规则是主要的挖掘任务之一。针对传统的基于统计分析的挖掘算法在保证知识的有趣性方面的缺陷,提出了利用演化计算这种智能计算模型的全局搜索特性和完全适应值导向特性来进行分类知识的自动挖掘和处理,不需要先验知识,以确保知识的有趣性。提出了用IF-THEN这种高层次的知识表示形式来提高知识的可理解性。并给出了个体表示,遗传操作和适应值评估等几个在演化算法中起重要作用的成分的设计原则和方法。相似文献

5.

A memetic algorithm for evolutionary prototype selection: A scaling up approach

Salvador Jos Ramn Francisco 《Pattern recognition》2008,41(8):2693-2709

Prototype selection problem consists of reducing the size of databases by removing samples that are considered noisy or not influential on nearest neighbour classification tasks. Evolutionary algorithms have been used recently for prototype selection showing good results. However, due to the complexity of this problem when the size of the databases increases, the behaviour of evolutionary algorithms could deteriorate considerably because of a lack of convergence. This additional problem is known as the scaling up problem.

Memetic algorithms are approaches for heuristic searches in optimization problems that combine a population-based algorithm with a local search. In this paper, we propose a model of memetic algorithm that incorporates an ad hoc local search specifically designed for optimizing the properties of prototype selection problem with the aim of tackling the scaling up problem. In order to check its performance, we have carried out an empirical study including a comparison between our proposal and previous evolutionary and non-evolutionary approaches studied in the literature.

The results have been contrasted with the use of non-parametric statistical procedures and show that our approach outperforms previously studied methods, especially when the database scales up. 相似文献

6.

Web文本分类及其阻塞减少策略 总被引：1，自引：0，他引：1

徐春荣欧阳为民勾海波《计算机应用与软件》2007,24(1):58-60,128

Web挖掘中,根据内容对Web文档进行分类是至关重要的一步.在Web文档分类中一种通常的方法是层次型分类方法,这种方法采用自顶向下的方式把文档分类到一个分类树的相应类别.然而,层次型分类方法在对文档进行分类时经常产生待分类的文档在分类树的上层分类器被错误地拒绝的现象(阻塞).针对这种现象,采用了以分类器为中心的阻塞因子去衡量阻塞的程度,并介绍了两种新的层次型分类方法,即基于降低阈值的方法和基于限制投票的方法,去改善Web文档分类中文档被错误阻塞的情况. 相似文献

7.

基于连续属性分类规则挖掘的新算法研究

厍向阳薛惠锋《计算机工程》2005,31(18):28-30

分析了针对连续属性样本进行数据挖掘的缺陷,提出一种直接对连续属性样本进行分类规则挖掘的算法.它基于样本属性值分割点对实例样本进行分类,把分割点对实例样本的分类能力作为分割点选择的依据,将所有相容样本划分为分类属性值相同的子集作为停机条件,实现连续属性样本分类规则挖掘的完全自动化.它考虑到数据挖掘的目标和要求,充分利用属性与类间的依赖性、属性间的互补性,达到样本分割点数少、分类规则简单和属性约减的目的.最后通过实例进行了验证,并与C4.5算法进行了比较. 相似文献

8.

利用MLC++实现数据挖掘

刘晓平《计算机仿真》2006,23(4):103-105,113

数据挖掘是从大量原始数据中抽取隐藏知识的过程。大部分数据挖掘工具采用规则发现和决策树分类技术来发现数据模式和规则,其核心是归纳算法。与传统统计方法相比,基于机器学习技术得到的分类结果具有较好的可解释性。在针对特定的数据集进行数据挖掘时,如果缺乏相应的领域知识,用户或决策者就很难确定选择何种归纳算法。因此,需要尝试各种算法。借助MLC＋＋,决策者能够轻而易举地比较不同分类算法对特定数据集的有效性,从而选择合适的分类算法。同时,系统开发人员也可以利用MLC＋＋设计各种混合算法。相似文献

9.

Improving the prediction of the clinical outcome of breast cancer using evolutionary algorithms

M. Wahde Z. Szallasi 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2006,10(4):338-345

There exist several methods for binary classification of gene expression data sets. However, in the majority of published methods, little effort has been made to minimize classifier complexity. In view of the small number of samples available in most gene expression data sets, there is a strong motivation for minimizing the number of free parameters that must be fitted to the data. In this paper, a method is introduced for evolving (using an evolutionary algorithm) simple classifiers involving a minimal subset of the available genes. The classifiers obtained by this method perform well, reaching 97% correct classification of clinical outcome on training samples from the breast cancer data set published by van't Veer, and up to 89% correct classification on validation samples from the same data set, easily outperforming previously published results. 相似文献

10.

Evolving rule induction algorithms with multi-objective grammar-based genetic programming 总被引：4，自引：4，他引：0

Gisele L. Pappa Alex A. Freitas 《Knowledge and Information Systems》2009,19(3):283-309

Multi-objective optimization has played a major role in solving problems where two or more conflicting objectives need to be simultaneously optimized. This paper presents a Multi-Objective grammar-based genetic programming (MOGGP) system that automatically evolves complete rule induction algorithms, which in turn produce both accurate and compact rule models. The system was compared with a single objective GGP and three other rule induction algorithms. In total, 20 UCI data sets were used to generate and test generic rule induction algorithms, which can be now applied to any classification data set. Experiments showed that, in general, the proposed MOGGP finds rule induction algorithms with competitive predictive accuracies and more compact models than the algorithms it was compared with.

Gisele L. PappaEmail: Email:

相似文献

11.

《Applied Soft Computing》2016

One of the most accurate types of prototype selection algorithms, preprocessing techniques that select a subset of instances from the data before applying nearest neighbor classification to it, are evolutionary approaches. These algorithms result in very high accuracy and reduction rates, but unfortunately come at a substantial computational cost. In this paper, we introduce a framework that allows to efficiently use the intermediary results of the prototype selection algorithms to further increase their accuracy performance. Instead of only using the fittest prototype subset generated by the evolutionary algorithm, we use multiple prototype subsets in an ensemble setting. Secondly, in order to classify a test instance, we only use prototype subsets that accurately classify training instances in the neighborhood of that test instance. In an experimental evaluation, we apply our new framework to four state-of-the-art prototype selection algorithms and show that, by using our framework, more accurate results are obtained after less evaluations of the prototype selection method. We also present a case study with a prototype generation algorithm, showing that our framework is easily extended to other preprocessing paradigms as well. 相似文献

12.

Forest Optimization Algorithm

《Expert systems with applications》2014,41(15):6676-6687

In this article, a new evolutionary algorithm, Forest Optimization Algorithm (FOA), suitable for continuous nonlinear optimization problems has been proposed. It is inspired by few trees in the forests which can survive for several decades, while other trees could live for a limited period. In FOA, seeding procedure of the trees is simulated so that, some seeds fall just under the trees, while others are distributed in wide areas by natural procedures and the animals that feed on the seeds or fruits. Application of the proposed algorithm on some benchmark functions demonstrated its good capability in comparison with Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). Also we tested the performance of FOA on feature weighting as a real optimization problem and the results of the experiments showed the good performance of FOA in some data sets from the UCI repository. 相似文献

13.

基于层次聚类的分类挖掘

战玉彩刘希玉《网络安全技术与应用》2013,(1):54-55

随着数据挖掘技术的日趋成熟,其在生活中的作用也越来越重要。本文首先介绍了数据挖掘,聚类分析和分类分析的相关知识,然后将层次聚类应用到分类规则挖掘中。相似文献

14.

挖掘多支持率分类规则的虚拟投影算法

刘君强孙晓莹王勋《计算机应用与软件》2003,20(9):8-10

本文首先提出了一种挖掘频集的高效算法PP。它采用了一种基于树的模式支持集表示,避免了反复扫描数据库和递归建造个数与频繁模式数相同的模式支持集,其效率比Apriori和FPGrowth高1—3个数量级。PP被进一步扩展成发现分类规则的有效算法CRM-PP。CRM-PP将多支持率剪裁集成到频集发现阶段,将二阶段挖掘法改进为单阶段挖掘法。CRM-PP的效率也比基于Apriori和FPGrowth的二阶段算法高1—3个数量级。相似文献

15.

隐私保护数据挖掘算法综述 总被引：1，自引：0，他引：1

陈晓明李军怀彭军刘海玲张璟《计算机科学》2007,34(6):183-186

如何保护私有信息或敏感知识在挖掘过程中不被泄露,同时能得到较为准确的挖掘结果,目前已经成为数据挖掘研究中的一个很有意义的研究课题。本文通过对当前隐私保护数据挖掘中具有代表性的算法按照数据分布对其中的数据更改方法、数据挖掘算法、数据或规则隐藏等进行了详细阐述,并对各自的优缺点进行了分析和比较,总结出了各种算法的特性。此外,通过对比提出了隐私保护数据挖掘算法的评价标准,即保密性、规则效能、算法复杂性、扩展性,以便在今后的研究中提出新的有效算法。相似文献

16.

Knowledge discovery in medicine: Current issue and future trend

《Expert systems with applications》2014,41(9):4434-4463

Data mining is a powerful method to extract knowledge from data. Raw data faces various challenges that make traditional method improper for knowledge extraction. Data mining is supposed to be able to handle various data types in all formats. Relevance of this paper is emphasized by the fact that data mining is an object of research in different areas. In this paper, we review previous works in the context of knowledge extraction from medical data. The main idea in this paper is to describe key papers and provide some guidelines to help medical practitioners. Medical data mining is a multidisciplinary field with contribution of medicine and data mining. Due to this fact, previous works should be classified to cover all users’ requirements from various fields. Because of this, we have studied papers with the aim of extracting knowledge from structural medical data published between 1999 and 2013. We clarify medical data mining and its main goals. Therefore, each paper is studied based on the six medical tasks: screening, diagnosis, treatment, prognosis, monitoring and management. In each task, five data mining approaches are considered: classification, regression, clustering, association and hybrid. At the end of each task, a brief summarization and discussion are stated. A standard framework according to CRISP-DM is additionally adapted to manage all activities. As a discussion, current issue and future trend are mentioned. The amount of the works published in this scope is substantial and it is impossible to discuss all of them on a single work. We hope this paper will make it possible to explore previous works and identify interesting areas for future research. 相似文献

17.

Rule induction in data mining: effect of ordinal scales

Helen M. Moshkovich Alexander I. Mechitov David L. Olson 《Expert systems with applications》2002,22(4)

Many classification tasks can be viewed as ordinal. Use of numeric information usually provides possibilities for more powerful analysis than ordinal data. On the other hand, ordinal data allows more powerful analysis when compared to nominal data. It is therefore important not to overlook knowledge about ordinal dependencies in data sets used in data mining. This paper investigates data mining support available from ordinal data. The effect of considering ordinal dependencies in the data set on the overall results of constructing decision trees and induction rules is illustrated. The degree of improved prediction of ordinal over nominal data is demonstrated. When data was very representative and consistent, use of ordinal information reduced the number of final rules with a lower error rate. Data treatment alternatives are presented to deal with data sets having greater imperfections. 相似文献

18.

一种基于分类的最佳工艺探索算法 总被引：29，自引：1，他引：29

高俊波杨学兵蔡庆生高仲明《计算机应用》2000,20(1):27-29

目前,计算机在企业中扮演的已不再只是办公自动化的角色,它在工业生产过程控制中的应用已十分广泛,往往生产过程中的一些重要的工艺参数已被采集并存于计算机中,如何从这些数据中找出最佳工艺过程便成了十分重要且亟待解决的问题,本文根据分类方法的基本思想,提出了一种最佳工艺探索算法。相似文献

19.

一种基于分类的关联规则研究

王勇张伟《计算机科学》2008,35(7):170-172

传统的Apriori关联法则算法必须经过大量反复的数据库扫描才能产生候选项集,效率较低.提出一个改进的CBA(Classification Based Apriori)算法.此算法仅需扫描数据库一次,将数据库经过预处理后,再将事务数据库进行分类并保存分类结果,比较时可以不与所有事务记录进行比较,从而减少扫描数据库的次数与比较时间,且又能确保挖掘结果的完整性与正确性. 相似文献

20.

一种基于GEP的分类规则挖掘算法 总被引：1，自引：0，他引：1

彭锦国蔡之华康立山《计算机工程》2007,33(9):90-91,102

基于一种新的自动程序设计方法基因表达式程序设计（GEP）,通过设计适应函数、初始化群体的优化、增加新的遗传算子以及采用演化策略中的（λ+μ）淘汰策略等对原始GEP算法进行有效的改进,设计出一种新的数据挖掘算法。采用UCI机器学习知识库中的数据集对该算法进行了实验,并通过与C4.5及文献[3]的比较,检验了该算法的准确性。 相似文献