首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract

The problem of knowledge acquisition has been recognized as the major bottleneck in the development of knowledge-based systems. An encouraging approach to alleviate this problem is inductive learning. Inductive learning systems accept, as input, a set of data that represent instances of the problem domain and produce, as output, the rules of the knowledge base. Each data item is described by a set of attribute values and is assigned to a unique decision class. A common characteristic of the existing inductive learning systems, is that they are empirical in nature and do not take into account the implications of the inductive rule generation process on the performance of the resulting set of rules. That performance is assessed when the rules are used to classify new unlabelled data. This paper demonstrates that the performance of a rule set is a function of the rule generation and rule interpretation processes. These two processes are interrelated and should not be considered separately. The interrelation of rule generation and rule interpretation is analysed and suggestions to improve the performance of existing inductive learning systems, are forwarded.  相似文献   

2.

Genetic programming (GP) usually has a wide search space and a high flexibility. Therefore, GP may search for global optimum solution. But, in general, GPs learning speed is not so fast. An apriori algorithm is one of association rule algorithms. It can be applied to a large database. But it is difficult to define its parameters without experience. We propose a rule generation technique from a database using GP combined with an association rule algorithm. It takes rules generated by the association rule algorithm as initial individual of GP. The learning speed of GP is improved by the combined algorithm. To verify the effectiveness of the proposed method, we apply it to the decision tree construction problem from the University of California at Irvine (UCI) machine-learning repository, and rule discovery problem from the occurrence of the hypertension database. We compare the results of the proposed method with prior ones.  相似文献   

3.
Social media, especially Twitter is now one of the most popular platforms where people can freely express their opinion. However, it is difficult to extract important summary information from many millions of tweets sent every hour. In this work we propose a new concept, sentimental causal rules, and techniques for extracting sentimental causal rules from textual data sources such as Twitter which combine sentiment analysis and causal rule discovery. Sentiment analysis refers to the task of extracting public sentiment from textual data. The value in sentiment analysis lies in its ability to reflect popularly voiced perceptions that are stated in natural language. Causal rules on the other hand indicate associations between different concepts in a context where one (or several concepts) cause(s) the other(s). We believe that sentimental causal rules are an effective summarization mechanism that combine causal relations among different aspects extracted from textual data as well as the sentiment embedded in these causal relationships. In order to show the effectiveness of sentimental causal rules, we have conducted experiments on Twitter data collected on the Kurdish political issue in Turkey which has been an ongoing heated public debate for many years. Our experiments on Twitter data show that sentimental causal rule discovery is an effective method to summarize information about important aspects of an issue in Twitter which may further be used by politicians for better policy making.  相似文献   

4.
Several approaches using fuzzy techniques have been proposed to provide a practical method for evaluating student academic performance. However, these approaches are largely based on expert opinions and are difficult to explore and utilize valuable information embedded in collected data. This paper proposes a new method for evaluating student academic performance based on data-driven fuzzy rule induction. A suitable fuzzy inference mechanism and associated Rule Induction Algorithm is given. The new method has been applied to perform Criterion-Referenced Evaluation (CRE) and comparisons are made with typical existing methods, revealing significant advantages of the present work. The new method has also been applied to perform Norm-Referenced Evaluation (NRE), demonstrating its potential as an extended method of evaluation that can produce new and informative scores based on information gathered from data. Khairul Rasmani is a lecturer at the Faculty of Information Technology and Quantitative Sciences, Universiti Teknologi MARA, Malaysia. He received his Masters Degree in Mathematical Education from University of Leeds, UK in 1997 and his Ph.D. degree from University of Wales, Aberystwyth, UK in December 2005. His research interests include fuzzy approximate reasoning, fuzzy rule-based systems and fuzzy classification systems. Qiang Shen is a Professor and the Director of Research with the Department of Computer Science at the University of Wales, Aberystwyth, UK. He is also an Honorary Fellow at the University of Edinburgh, UK. His research interests include fuzzy systems, knowledge modelling, qualitative reasoning, and pattern recognition. Prof. Shen serves as an associate editor or editorial board member of a number of world leading journals, including the IEEE Transactions on Systems, Man, and Cybernetics (Part B), the IEEE Transactions on Fuzzy Systems, and Fuzzy Sets and Systems. He has acted as a Chair or Co-chair at a good number of major conferences in the field of Computational Intelligence. He has published a book and over 170 peer-refereed articles in international journals and conferences in Artificial Intelligence and related areas.  相似文献   

5.
Mining association rules plays an important role in data mining and knowledge discovery since it can reveal strong associations between items in databases. Nevertheless, an important problem with traditional association rule mining methods is that they can generate a huge amount of association rules depending on how parameters are set. However, users are often only interested in finding the strongest rules, and do not want to go through a large amount of rules or wait for these rules to be generated. To address those needs, algorithms have been proposed to mine the top-k association rules in databases, where users can directly set a parameter k to obtain the k most frequent rules. However, a major issue with these techniques is that they remain very costly in terms of execution time and memory. To address this issue, this paper presents a novel algorithm named ETARM (Efficient Top-k Association Rule Miner) to efficiently find the complete set of top-k association rules. The proposed algorithm integrates two novel candidate pruning properties to more effectively reduce the search space. These properties are applied during the candidate selection process to identify items that should not be used to expand a rule based on its confidence, to reduce the number of candidates. An extensive experimental evaluation on six standard benchmark datasets show that the proposed approach outperforms the state-of-the-art TopKRules algorithm both in terms of runtime and memory usage.  相似文献   

6.
The lack of tools for rule generation, analysis, and run-time monitoring appears one of the main obstacles to the widespreading of active database applications. This paper describes a complete tool environment for assisting the design of active rules applications; the tools were developed at Politecnico di Milano in the context of the IDEA Project, a 4-years Esprit project sponsored by the European Commission which was launched in June 1992. We describe tools for active rule generation, analysis, debugging, and browsing; rules are defined in Chimera, a conceptual design model and language for the specification of active rules applications. We also introduce a tool for mapping from Chimera into Oracle, a relational product supporting triggers.Most of the tools described in this paper are fully implemented and currently in operation (beta-testing) within the companies participating to the IDEA Project, with the exception of two of them (called Argonaut-V and Pandora), which will be completed by the end of 1996.Research presented in this paper is supported by Esprit project P6333 IDEA, and by ENEL contract VDS 1/94: Integrity Constraint Management  相似文献   

7.
The analysis of relationships in databases for rule derivation   总被引:2,自引:0,他引:2  
Owing to the rapid growth in the sizes of databases, potentially useful information may be embeded in a large amount of data. Knowledge discovery is the search for semantic relationships which exist in large databases. One of the main problems for knowledge discovery is that the number of possible relationships can be very large, thus searching for interesting relationships and reducing the search complexity are important. The relationships can be represented as rules which can be used in efficient query processing. We present a technique to analyze relationships among attribute values and to derive compact rule set. We also propose a mechanism and some heuristics to reduce the search complexity for the rule derivation process. An evaluation model is presented to evaluate the quality of the derived rules. Moreover, in real world, databases may contain uncertain data. We also propose a technique to analyze the relationships among uncertain data and derive probabilistic rules.  相似文献   

8.
Data mining provides the opportunity to extract useful information from large databases. Various techniques have been proposed in this context in order to extract this information in the most efficient way. However, efficiency is not our only concern in this study. The security and privacy issues over the extracted knowledge must be seriously considered as well. By taking this into consideration, we study the procedure of hiding sensitive association rules in binary data sets by blocking some data values and we present an algorithm for solving this problem. We also provide a fuzzification of the support and the confidence of an association rule in order to accommodate for the existence of blocked/unknown values. In addition, we quantitatively compare the proposed algorithm with other already published algorithms by running experiments on binary data sets, and we also qualitatively compare the efficiency of the proposed algorithm in hiding association rules. We utilize the notion of border rules, by putting weights in each rule, and we use effective data structures for the representation of the rules so as (a) to minimize the side effects created by the hiding process and (b) to speed up the selection of the victim transactions. Finally, we study the overall security of the modified database, using the C4.5 decision tree algorithm of the WEKA data mining tool, and we discuss the advantages and the limitations of blocking.  相似文献   

9.
Association Rule Mining (ARM) can be considered as a combinatorial problem with the purpose of extracting the correlations between items in sizeable datasets. The numerous polynomial exact algorithms already proposed for ARM are unadapted for large databases and especially for those existing on the web. Assuming that datasets are a large space search, intelligent algorithms was used to found high quality rules and solve ARM issue. This paper deals with a cooperative multi-swarm bat algorithm for association rule mining. It is based on the bat-inspired algorithm adapted to rule discovering problem (BAT-ARM). This latter suffers from absence of communication between bats in the population which lessen the exploration of search space. However, it has a powerful rule generation process which leads to perfect local search. Therefore, to maintain a good trade-off between diversification and intensification, in our proposed approach, we introduce cooperative strategies between the swarms that already proved their efficiency in multi-swarm optimization algorithm(Ring, Master-slave). Furthermore, we innovate a new topology called Hybrid that merges Ring strategy with Master-slave plan previously developed in our earlier work [23]. A series of experiments are carried out on nine well known datasets in ARM field and the performance of proposed approach are evaluated and compared with those of other recently published methods. The results show a clear superiority of our proposal against its similar approaches in terms of time and rule quality. The analysis also shows a competitive outcomes in terms of quality in-face-of multi-objective optimization methods.  相似文献   

10.
Simple association rules (SAR) and the SAR-based rule discovery   总被引:13,自引:0,他引:13  
Association rule mining is one of the most important fields in data mining and knowledge discovery in databases. Rules explosion is a problem of concern, as conventional mining algorithms often produce too many rules for decision makers to digest. Instead, this paper concentrates on a smaller set of rules, namely, a set of simple association rules each with its consequent containing only a single attribute. Such a rule set can be used to derive all other association rules, meaning that the original rule set based on conventional algorithms can be ‘recovered’ from the simple rules without any information loss. The number of simple rules is much less than the number of all rules. Moreover, corresponding algorithms are developed such that certain forms of rules (e.g. ‘P?’ or ‘?Q’) can be generated in a more efficient manner based on simple rules.  相似文献   

11.
目的 基于深度学习的多聚焦图像融合方法主要是利用卷积神经网络(convolutional neural network,CNN)将像素分类为聚焦与散焦。监督学习过程常使用人造数据集,标签数据的精确度直接影响了分类精确度,从而影响后续手工设计融合规则的准确度与全聚焦图像的融合效果。为了使融合网络可以自适应地调整融合规则,提出了一种基于自学习融合规则的多聚焦图像融合算法。方法 采用自编码网络架构,提取特征,同时学习融合规则和重构规则,以实现无监督的端到端融合网络;将多聚焦图像的初始决策图作为先验输入,学习图像丰富的细节信息;在损失函数中加入局部策略,包含结构相似度(structural similarity index measure,SSIM)和均方误差(mean squared error,MSE),以确保更加准确地还原图像。结果 在Lytro等公开数据集上从主观和客观角度对本文模型进行评价,以验证融合算法设计的合理性。从主观评价来看,模型不仅可以较好地融合聚焦区域,有效避免融合图像中出现伪影,而且能够保留足够的细节信息,视觉效果自然清晰;从客观评价来看,通过将模型融合的图像与其他主流多聚焦图像融合算法的融合图像进行量化比较,在熵、Qw、相关系数和视觉信息保真度上的平均精度均为最优,分别为7.457 4,0.917 7,0.978 8和0.890 8。结论 提出了一种用于多聚焦图像的融合算法,不仅能够对融合规则进行自学习、调整,并且融合图像效果可与现有方法媲美,有助于进一步理解基于深度学习的多聚焦图像融合机制。  相似文献   

12.
Association rule mining is an effective data mining technique which has been used widely in health informatics research right from its introduction. Since health informatics has received a lot of attention from researchers in last decade, and it has developed various sub-domains, so it is interesting as well as essential to review state of the art health informatics research. As knowledge discovery researchers and practitioners have applied an array of data mining techniques for knowledge extraction from health data, so the application of association rule mining techniques to health informatics domain has been focused and studied in detail in this survey. Through critical analysis of applications of association rule mining literature for health informatics from 2005 to 2014, it has been explored that, instead of the more efficient alternative approaches, the Apriori algorithm is still a widely used frequent itemset generation technique for application of association rule mining for health informatics. Moreover, other limitations related to applications of association rule mining for health informatics have also been identified and recommendations have been made to mitigate those limitations. Furthermore, the algorithms and tools utilized for application of association rule mining have also been identified, conclusions have been drawn from the literature surveyed, and future research directions have been presented.  相似文献   

13.

The discovery of multi-level knowledge is important to allow queries at and across different levels of abstraction. While there are some similarities between our research and that of others in this area, the work reported in this paper does not directly involve databases and is differently motivated. Our research is interested in taking data in the form of rule-bases and finding multi-level knowledge. This paper describes our motivation, our preferred technique for acquiring the initial knowledge known as Ripple-Down Rules, the use of Formal Concept Analysis to develop an abstraction hierarchy, and our application of these ideas to knowledge bases from the domain of chemical pathology. We also provide an example of how the approach can be applied to other prepositional knowledge bases and suggest that it can be used as an additional phase to many existing data mining approaches.  相似文献   

14.
15.
Abstract

Redundancy checking (RC) is a key knowledge reduction technology. Extension rule (ER) is a new reasoning method, first presented in 2003 and well received by experts at home and abroad. Novel extension rule (NER) is an improved ER-based reasoning method, presented in 2009. In this paper, we first analyse the characteristics of the extension rule, and then present a simple algorithm for redundancy checking based on extension rule (RCER). In addition, we introduce MIMF, a type of heuristic strategy. Using the aforementioned rule and strategy, we design and implement RCHER algorithm, which relies on MIMF. Next we design and implement an RCNER (redundancy checking based on NER) algorithm based on NER. Parallel computing greatly accelerates the NER algorithm, which has weak dependence among tasks when executed. Considering this, we present PNER (parallel NER) and apply it to redundancy checking and necessity checking. Furthermore, we design and implement the RCPNER (redundancy checking based on PNER) and NCPPNER (necessary clause partition based on PNER) algorithms as well. The experimental results show that MIMF significantly influences the acceleration of algorithm RCER in formulae on a large scale and high redundancy. Comparing PNER with NER and RCPNER with RCNER, the average speedup can reach up to the number of task decompositions when executed. Comparing NCPNER with the RCNER-based algorithm on separating redundant formulae, speedup increases steadily as the scale of the formulae is incrementing. Finally, we describe the challenges that the extension rule will be faced with and suggest possible solutions.  相似文献   

16.
《Knowledge》2006,19(6):413-421
We present a multi-objective genetic algorithm for mining highly predictive and comprehensible classification rules from large databases. We emphasize predictive accuracy and comprehensibility of the rules. However, accuracy and comprehensibility of the rules often conflict with each other. This makes it an optimization problem that is very difficult to solve efficiently. We have proposed a multi-objective evolutionary algorithm called improved niched Pareto genetic algorithm (INPGA) for this purpose. We have compared the rule generation by INPGA with that by simple genetic algorithm (SGA) and basic niched Pareto genetic algorithm (NPGA). The experimental result confirms that our rule generation has a clear edge over SGA and NPGA.  相似文献   

17.

Traditional association-rule mining (ARM) considers only the frequency of items in a binary database, which provides insufficient knowledge for making efficient decisions and strategies. The mining of useful information from quantitative databases is not a trivial task compared to conventional algorithms in ARM. Fuzzy-set theory was invented to represent a more valuable form of knowledge for human reasoning, which can also be applied and utilized for quantitative databases. Many approaches have adopted fuzzy-set theory to transform the quantitative value into linguistic terms with its corresponding degree based on defined membership functions for the discovery of FFIs, also known as fuzzy frequent itemsets. Only linguistic terms with maximal scalar cardinality are considered in traditional fuzzy frequent itemset mining, but the uncertainty factor is not involved in past approaches. In this paper, an efficient fuzzy mining (EFM) algorithm is presented to quickly discover multiple FFIs from quantitative databases under type-2 fuzzy-set theory. A compressed fuzzy-list (CFL)-structure is developed to maintain complete information for rule generation. Two pruning techniques are developed for reducing the search space and speeding up the mining process. Several experiments are carried out to verify the efficiency and effectiveness of the designed approach in terms of runtime, the number of examined nodes, memory usage, and scalability under different minimum support thresholds and different linguistic terms used in the membership functions.

  相似文献   

18.
Data mining extracts implicit, previously unknown, and potentially useful information from databases. Many approaches have been proposed to extract information, and one of the most important ones is finding association rules. Although a large amount of research has been devoted to this subject, none of it finds association rules from directed acyclic graph (DAG) data. Without such a mining method, the hidden knowledge, if any, cannot be discovered from the databases storing DAG data such as family genealogy profiles, product structures, XML documents, task precedence relations, and course structures. In this article, we define a new kind of association rule in DAG databases called the predecessor–successor rule, where a node x is a predecessor of another node y if we can find a path in DAG where x appears before y. The predecessor–successor rules enable us to observe how the characteristics of the predecessors influence the successors. An approach containing four stages is proposed to discover the predecessor–successor rules. © 2006 Wiley Periodicals, Inc. Int J Int Syst 21: 621–637, 2006.  相似文献   

19.
In recent years, data mining has become one of the most popular techniques for data owners to determine their strategies. Association rule mining is a data mining approach that is used widely in traditional databases and usually to find the positive association rules. However, there are some other challenging rule mining topics like data stream mining and negative association rule mining. Besides, organizations want to concentrate on their own business and outsource the rest of their work. This approach is named “database as a service concept” and provides lots of benefits to data owner, but, at the same time, brings out some security problems. In this paper, a rule mining system has been proposed that provides efficient and secure solution to positive and negative association rule computation on XML data streams in database as a service concept. The system is implemented and several experiments have been done with different synthetic data sets to show the performance and efficiency of the proposed system.  相似文献   

20.
Distributed data mining applications, such as those dealing with health care, finance, counter-terrorism and homeland defence, use sensitive data from distributed databases held by different parties. This comes into direct conflict with an individual’s need and right to privacy. In this paper, we come up with a privacy-preserving distributed association rule mining protocol based on a new semi-trusted mixer model. Our protocol can protect the privacy of each distributed database against the coalition up to n  2 other data sites or even the mixer if the mixer does not collude with any data site. Furthermore, our protocol needs only two communications between each data site and the mixer in one round of data collection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号