首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The purpose of the work described in this paper is to provide an intelligent intrusion detection system (IIDS) that uses two of the most popular data mining tasks, namely classification and association rules mining together for predicting different behaviors in networked computers. To achieve this, we propose a method based on iterative rule learning using a fuzzy rule-based genetic classifier. Our approach is mainly composed of two phases. First, a large number of candidate rules are generated for each class using fuzzy association rules mining, and they are pre-screened using two rule evaluation criteria in order to reduce the fuzzy rule search space. Candidate rules obtained after pre-screening are used in genetic fuzzy classifier to generate rules for the classes specified in IIDS: namely Normal, PRB-probe, DOS-denial of service, U2R-user to root and R2L-remote to local. During the next stage, boosting genetic algorithm is employed for each class to find its fuzzy rules required to classify data each time a fuzzy rule is extracted and included in the system. Boosting mechanism evaluates the weight of each data item to help the rule extraction mechanism focus more on data having relatively more weight, i.e., uncovered less by the rules extracted until the current iteration. Each extracted fuzzy rule is assigned a weight. Weighted fuzzy rules in each class are aggregated to find the vote of each class label for each data item.  相似文献   

2.
A performance model designed for studying active DBMS performance issues is described. The authors present the results of simulation experiments in which system performance was studied as a function of transaction boundary semantics for varying levels of data contention, rule complexity, and data sharing between externally submitted tasks and rule management tasks. The results demonstrate that the way in which transaction boundaries are imposed can have a major impact on the performance of an active DBMS. It is therefore concluded that this aspect of rule semantics must be carefully considered at the time that rules are specified  相似文献   

3.
为构建一种具有实时性的配电网监控信息智能分析规则库,提出了基于机器学习的配电网监控信息智能分析规则库构建方法。将规则库中全部配电网监控规则头排序并设成主链,将规则导进链表里生成规则集,保证各个监控信息数据包都存在一个分析规则。使用基于机器学习的配电网故障数据分类方法,识别配电网监控信息中的故障数据,并提取故障数据频繁项集。使用基于MapReduce的并行关联规则增量更新算法,更新分析规则库中的信息智能分析规则,保证分析规则库中的信息智能分析规则具有实时性。实验结果表明,所提方法的识别结果准确度、检出率均值都大于0.97,假阳性率都是0.01,可以及时识别出配电网监控系统实时检测故障信息,保证信息智能分析规则更新具有实时性。  相似文献   

4.
Model generation by domain refinement and rule reduction   总被引:2,自引:0,他引:2  
The granularity and interpretability of a fuzzy model are influenced by the method used to construct the rule base. Models obtained by a heuristic assessment of the underlying system are generally highly granular with interpretable rules, while models algorithmically generated from an analysis of training data consist of a large number of rules with small granularity. This paper presents a method for increasing the granularity of rules while satisfying a prescribed precision bound on the training data. The model is generated by a two-stage process. The first step iteratively refines the partitions of the input domains until a rule base is generated that satisfies the precision bound. In this step, the antecedents of the rules are obtained from decomposable partitions of the input domains and the consequents are generated using proximity techniques. A greedy merging algorithm is then applied to increase the granularity of the rules while preserving the precision bound. To enhance the representational capabilities of a rule and reduce the number of rules required, the rules constructed by the merging procedure have multi-dimensional antecedents. A model defined with rules of this form incorporates advantageous features of both clustering and proximity methods for rule generation. Experimental results demonstrate the ability of the algorithm to reduce the number of rules in a fuzzy model with both precise and imprecise training information.  相似文献   

5.
A rule quality measure is important to a rule induction system for determining when to stop generalization or specialization. Such measures are also important to a rule-based classification procedure for resolving conflicts among rules. We describe a number of statistical and empirical rule quality formulas and present an experimental comparison of these formulas on a number of standard machine learning datasets. We also present a meta-learning method for generating a set of formula-behavior rules from the experimental results which show the relationships between a formula's performance and the characteristics of a dataset. These formula-behavior rules are combined into formula-selection rules that can be used in a rule induction system to select a rule quality formula before rule induction. We will report the experimental results showing the effects of formula-selection on the predictive performance of a rule induction system.  相似文献   

6.
ABSTRACT

In this article, an SVD–QR-based approach is proposed to extract the important fuzzy rules from a rule base with several fuzzy rule tables to design an appropriate fuzzy system directly from some input-output data of the identified system. A fuzzy system with fuzzy rule tables is defined to approach the input-output pairs of an identified system. In the rule base of the defined fuzzy system, each fuzzy rule table corresponds to a partition of an input space. In order to extract the important fuzzy rules from the rule base of the defined fuzzy system, a firing strength matrix determined by the membership functions of the premise fuzzy sets is constructed. According to the firing strength matrix, the number of important fuzzy rules is determined by the Singular Value Decomposition SVD, and the important fuzzy rules are selected by the SVD–QR-based method. Consequently, a reconstructed fuzzy rule base composed of significant fuzzy rules is determined by the firing strength matrix. Furthermore, the recursive least-squares method is applied to determine the consequent part of the reconstructed fuzzy system according to the gathered input-output data so that a fine fuzzy system is determined by the proposed method. Finally, three nonlinear systems illustrate the efficiency of the proposed method.  相似文献   

7.
Fuzzy inference systems (FIS) are widely used for process simulation or control. They can be designed either from expert knowledge or from data. For complex systems, FIS based on expert knowledge only may suffer from a loss of accuracy. This is the main incentive for using fuzzy rules inferred from data. Designing a FIS from data can be decomposed into two main phases: automatic rule generation and system optimization. Rule generation leads to a basic system with a given space partitioning and the corresponding set of rules. System optimization can be done at various levels. Variable selection can be an overall selection or it can be managed rule by rule. Rule base optimization aims to select the most useful rules and to optimize rule conclusions. Space partitioning can be improved by adding or removing fuzzy sets and by tuning membership function parameters. Structure optimization is of a major importance: selecting variables, reducing the rule base and optimizing the number of fuzzy sets. Over the years, many methods have become available for designing FIS from data. Their efficiency is usually characterized by a numerical performance index. However, for human-computer cooperation another criterion is needed: the rule interpretability. An implicit assumption states that fuzzy rules are by nature easy to be interpreted. This could be wrong when dealing with complex multivariable systems or when the generated partitioning is meaningless for experts. The paper analyzes the main methods for automatic rule generation and structure optimization. They are grouped into several families and compared according to the rule interpretability criterion. For this purpose, three conditions for a set of rules to be interpretable are defined  相似文献   

8.
We investigate the use of the rough set model for financial time-series data analysis and forecasting. The rough set model is an emerging technique for dealing with vagueness and uncertainty in data. It has many advantages over other techniques, such as fuzzy sets and neural networks, including attribute reduction and variable partitioning of data. These characteristics can be very useful for improving the quality of results from data analysis. We demonstrate a rough set data analysis model for the discovery of decision rules from time series data for example, the New Zealand stock exchanges. Rules are generated through reducts and can be used for future prediction. A unique ranking system for the decision rules based both on strength of the rule and stability of the rule is used in this study. The ranking system gives the user confidence regarding their market decisions. Our experiment results indicate that the forecasting of future stock index values using rough sets obtains decision ruleswith high accuracy and coverage.  相似文献   

9.
Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data.  相似文献   

10.
Action rule is an implication rule that shows the expected change in a decision value of an object as a result of changes made to some of its conditional values. An example of an action rule is ‘credit card holders of young age are expected to keep their cards for an extended period of time if they receive a movie ticket once a year’. In this case, the decision value is the account status, and the condition value is whether the movie ticket is sent to the customer. The type of action that can be taken by the company is to send out movie tickets to young customers. The conventional action rule discovery algorithms build action rules from existing classification rules. This paper discusses an agglomerative strategy that generates the shortest action rules directly from a decision system. In particular, the algorithm can be used to discover rules from an incomplete decision system where attribute values are partially incomplete. As one of the testing domains for our research we take HEPAR system that was built through a collaboration between the Institute of Biocybernetics and Biomedical Engineering of the Polish Academy of Sciences and physicians at the Medical Center of Postgraduate Education in Warsaw, Poland. HEPAR was designed for gathering and processing clinical data on patients with liver disorders. Action rules will be used to construct the decision-support module for HEPAR.  相似文献   

11.
由专家经验和输入输出样本数据得到的模糊规则常常是不完备的,规则不完备的模糊系统因为缺少一些规则,对于某些可能的输入值产生的输出会很不合理。文章提出了用插值法在必要时在线生成新规则,并设计了一个模糊系统来插值填充规则库空格。实验结果表明,这种方法大大增强了规则不完备的模糊系统在整个输入域上输出的连续性和稳定性。  相似文献   

12.
This paper describes E-DEVICE, an extensible active knowledge base system (KBS) that supports the processing of event-driven, production, and deductive rules into the same active OODB system. E-DEVICE provides the infrastructure for the smooth integration of various declarative rule types, such as production and deductive rules, into an active OODB system that supports low-level event-driven rules only by: (1) mapping each declarative rule into one event-driven rule, offering centralized rule selection control for correct run-time behavior and conflict resolution, and (2) using complex events to map the conditions of declarative rules and monitor the database to incrementally match those conditions. E-DEVICE provides the infrastructure for easily extending the system by adding: (1) new rule types as subtypes of existing ones, and (2) transparent optimizations to the rule matching network. The resulting system is a flexible, yet efficient, KBS that gives the user the ability to express knowledge in a variety of high-level forms for advanced problem solving in data intensive applications  相似文献   

13.
14.
Two parameters, namely support and confidence, in association rule mining, are used to arrange association rules in either increasing or decreasing order. These two parameters are assigned values by counting the number of transactions satisfying the rule without considering user perspective. Hence, an association rule, with low values of support and confidence, but meaningful to the user, does not receive the same importance as is perceived by the user. Reflecting user perspective is of paramount importance in light of improving user satisfaction for a given recommendation system. In this paper, we propose a model and an algorithm to extract association rules, meaningful to a user, with an ad-hoc support and confidence by allowing the user to specify the importance of each transaction. In addition, we apply the characteristics of a concept lattice, a core data structure of Formal Concept Analysis (FCA) to reflect subsumption relation of association rules when assigning the priority to each rule. Finally, we describe experiment results to verify the potential and efficiency of the proposed method.  相似文献   

15.

Association rules mining is a popular data mining modeling tool. It discovers interesting associations or correlation relationships among a large set of data items, showing attribute values that occur frequently together in a given dataset. Despite their great potential benefit, current association rules modeling tools are far from optimal. This article studies how visualization techniques can be applied to facilitate the association rules modeling process, particularly what visualization elements should be incorporated and how they can be displayed. Original designs for visualization of rules, integration of data and rule visualizations, and visualization of rule derivation process for supporting interactive visual association rules modeling are proposed in this research. Experimental results indicated that, compared to an automatic association rules modeling process, the proposed interactive visual association rules modeling can significantly improve the effectiveness of modeling, enhance understanding of the applied algorithm, and bring users greater satisfaction with the task. The proposed integration of data and rule visualizations can significantly facilitate understanding rules compared to their nonintegrated counterpart.  相似文献   

16.
 Diffuse nutrient emissions from agricultural land is one of the major sources of pollution for ground water, rivers and coastal waters. The quantification of pollutant loads requires mathematical modelling of water and nutrient cycles. The deterministic simulation of nitrogen dynamics, represented by complicated highly non-linear processes, requires the application of detailed models with many parameters and large associated data bases. The operation of those models within integrated assessment tools or decision support systems for large regions is often not feasible. Fuzzy rule based modelling provides a fast, transparent and parameter parsimonious alternative. Besides, it allows regionalisation and integration of results from different models and measurements at a higher generalised level and enables explicit consideration of expert knowledge. In this paper an algorithm for the assessment of fuzzy rules for fuzzy modelling using simulated annealing is presented. The fuzzy rule system is applied to simulate nitrogen leaching for selected agricultural soils within the 23687 km2 Saale River Basin. The fuzzy rules are defined and calibrated using results from simulation experiments carried out with the deterministic modelling system SWIM. Monthly aggregated time series of simulated water balance components (e.g. percolation and evapotranspiration), fertilization amounts, resulting nitrogen leaching and crop parameters are used for the derivation of the fuzzy rules. The 30-year simulation period was divided into 20 years for training and 10 years for validation, with the latter taken from the middle part of the period. Three specific fuzzy rule systems were created from the simulation experiments, one for each selected soil profile. Each rule system includes 15 rules as well as one prescribed rules from expert knowledge and 7 input variables. The performance of the fuzzy rule system is satisfactory for the assessment of nitrate leaching on annual to long term time steps. The approach allows rapid scenario analysis for large regions and has the potential to become part of decision support systems for generalised integrated assessment of water and nutrients in macroscale regions.  相似文献   

17.
Neglected conditions are an important but difficult-to-find class of software defects. This paper presents a novel approach to revealing neglected conditions that integrates static program analysis and advanced data mining techniques to discover implicit conditional rules in a code base and to discover rule violations that indicate neglected conditions. The approach requires the user to indicate minimal constraints on the context of the rules to be sought, rather than specific rule templates. To permit this generality, rules are modeled as graph minors of enhanced procedure dependence graphs (EPDGs), in which control and data dependence edges are augmented by edges representing shared data dependences. A heuristic maximal frequent subgraph mining algorithm is used to extract candidate rules from EPDGs, and a heuristic graph matching algorithm is used to identify rule violations. We also report the results of an empirical study in which the approach was applied to four open source projects (openssl, make, procmail, amaya). These results indicate that the approach is effective and reasonably efficient.  相似文献   

18.
数据挖掘过程中只考虑数据项权重或者只考虑时态语义会导致挖掘结果不全面。针对该问题,对加权关联规则、时态关联规则和时态数据周期规律进行研究,将权值、K-支持期望和周期等概念引入到时态关联规则中,提出一种基于周期规律的加权时态关联规则挖掘算法。以某管理系统审计数据为例进行实验验证,结果表明该算法能够准确地挖掘出数据库中的加权时态关联规则,与加权关联规则算法相比,在时间复杂度相同的情况下能使关联规则的挖掘结果更加全面。  相似文献   

19.
Much of the research on extracting rules from a large amount of data has focused on the extraction of a general rule that covers as many data as possible. In the field of health care, where people’s lives are at stake, it is necessary to diagnose appropriately without overlooking the small number of patients who show different symptoms. Thus, the exceptional rules for rare cases are also important. From such a viewpoint, multiple rules, each of which covers a part of the data, are needed for covering all data. In this paper, we describe the extraction of such multiple rules, each of which is expressed by a tree structural program. We consider a multi-agent approach to be effective for this purpose. Each agent has a rule that covers a part of the data set, and multiple rules which cover all data are extracted by multi-agent cooperation. In order to realize this approach, we propose a new method for rule extraction using Automatically Defined Groups (ADG). The ADG, which is based on Genetic Programming, is an evolutionary optimization method of multi-agent systems. By using this method, we can acquire both the number of necessary rules and the tree structural programs which represent these respective rules. We applied this method to a database used in the machine learning field and showed its effectiveness. Moreover, we applied this method to medical data and developed a diagnostic system for coronary heart diseases  相似文献   

20.
We present ELEM2, a machine learning system that induces classification rules from a set of data based on a heuristic search over a hypothesis space. ELEM2 is distinguished from other rule induction systems in three aspects. First, it uses a new heuristtic function to guide the heuristic search. The function reflects the degree of relevance of an attribute-value pair to a target concept and leads to selection of the most relevant pairs for formulating rules. Second, ELEM2 handles inconsistent training examples by defining an unlearnable region of a concept based on the probability distribution of that concept in the training data. The unlearnable region is used as a stopping criterion for the concept learning process, which resolves conflicts without removing inconsistent examples. Third, ELEM2 employs a new rule quality measure in its post-pruning process to prevent rules from overfitting the data. The rule quality formula measures the extent to which a rule can discriminate between the positive and negative examples of a class. We describe features of ELEM2, its rule induction algorithm and its classification procedure. We report experimental results that compare ELEM2 with C4.5 and CN2 on a number of datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号