首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The incremental technique is a way to solve the issue of added-in data without re-implementing the original algorithm in a dynamic database. There are numerous studies of incremental rough set based approaches. However, these approaches are applied to traditional rough set based rule induction, which may generate redundant rules without focus, and they do not verify the classification of a decision table. In addition, these previous incremental approaches are not efficient in a large database. In this paper, an incremental rule-extraction algorithm based on the previous rule-extraction algorithm is proposed to resolve there aforementioned issues. Applying this algorithm, while a new object is added to an information system, it is unnecessary to re-compute rule sets from the very beginning. The proposed approach updates rule sets by partially modifying the original rule sets, which increases the efficiency. This is especially useful while extracting rules in a large database.  相似文献   

2.
刘洋  张卓  周清雷 《计算机科学》2014,41(12):164-167
医疗健康数据通常属性较多,且存在连续型、离散型并存的混合数据,这在很大程度上限制了知识发现方法对医疗健康数据的挖掘效率。以模糊粗糙集理论为基础,研究混合数据上的分类规则挖掘方法,通过引入规则获取算法的泛化阈值,来控制获取规则集的大小和复杂程度,提高粗糙集知识发现方法在医疗健康数据上的分类效率。最后通过对比实验验证了该算法在医疗决策表上挖掘规则的有效性。  相似文献   

3.
In many real world applications classification models are required to be in line with domain knowledge and to respect monotone relations between predictor variables and the target class, in order to be acceptable for implementation. This paper presents a novel heuristic approach, called RULEM, to induce monotone ordinal rule based classification models. The proposed approach can be applied in combination with any rule- or tree-based classification technique, since monotonicity is guaranteed in a post-processing step. RULEM checks whether a rule set or decision tree violates the imposed monotonicity constraints and existing violations are resolved by inducing a set of additional rules which enforce monotone classification. The approach is able to handle non-monotonic noise, and can be applied to both partially and totally monotone problems with an ordinal target variable. Two novel justifiability measures are introduced which are based on RULEM and allow to calculate the extent to which a classification model is in line with domain knowledge expressed in the form of monotonicity constraints. An extensive benchmarking experiment and subsequent statistical analysis of the results on 14 public data sets indicates that RULEM preserves the predictive power of a rule induction technique while guaranteeing monotone classification. On the other hand, the post-processed rule sets are found to be significantly larger which is due to the induction of additional rules. E.g., when combined with Ripper a median performance difference was observed in terms of PCC equal to zero and an average difference equal to −0.66%, with on average 5 rules added to the rule sets. The average and minimum justifiability of the original rule sets equal respectively 92.66% and 34.44% in terms of the RULEMF justifiability index, and 91.28% and 40.1% in terms of RULEMS, indicating the effective need for monotonizing the rule sets.  相似文献   

4.
Most rule learning systems posit hard decision boundaries for continuous attributes and point estimates of rule accuracy, with no measures of variance, which may seem arbitrary to a domain expert. These hard boundaries/points change with small perturbations to the training data due to algorithm instability. Moreover, rule induction typically produces a large number of rules that must be filtered and interpreted by an analyst. This paper describes a method of combining rules over multiple bootstrap replications of rule induction so as to reduce the total number of rules presented to an analyst, to measure and increase the stability of the rule induction process, and to provide a measure of variance to continuous attribute decision boundaries and accuracy point estimates. A measure of similarity between rules is also introduced as a basis of multidimensional scaling to visualize rule similarity. The method was applied to perioperative data and to the UCI (University of California, Irvine) thyroid dataset. Minor Revision submitted to the Journal of Intelligent Information Systems, April 2005.  相似文献   

5.
One of the most important problems with rule induction methods is that they cannot extract rules, which plausibly represent experts’ decision processes. In this paper, the characteristics of experts’ rules are closely examined and a new approach to extract plausible rules is introduced, which consists of the following three procedures. First, the characterization of decision attributes (given classes) is extracted from databases and the concept hierarchy for given classes is calculated. Second, based on the hierarchy, rules for each hierarchical level are induced from data. Then, for each given class, rules for all the hierarchical levels are integrated into one rule. The proposed method was evaluated on a medical database, the experimental results of which show that induced rules correctly represent experts’ decision processes.  相似文献   

6.
Classification in imbalanced domains is a recent challenge in data mining. We refer to imbalanced classification when data presents many examples from one class and few from the other class, and the less representative class is the one which has more interest from the point of view of the learning task. One of the most used techniques to tackle this problem consists in preprocessing the data previously to the learning process. This preprocessing could be done through under-sampling; removing examples, mainly belonging to the majority class; and over-sampling, by means of replicating or generating new minority examples. In this paper, we propose an under-sampling procedure guided by evolutionary algorithms to perform a training set selection for enhancing the decision trees obtained by the C4.5 algorithm and the rule sets obtained by PART rule induction algorithm. The proposal has been compared with other under-sampling and over-sampling techniques and the results indicate that the new approach is very competitive in terms of accuracy when comparing with over-sampling and it outperforms standard under-sampling. Moreover, the obtained models are smaller in terms of number of leaves or rules generated and they can considered more interpretable. The results have been contrasted through non-parametric statistical tests over multiple data sets.  相似文献   

7.

During the last decade, databases have been growing rapidly in size and number as a result of rapid advances in database capacity and management techniques. This expansive growth in data and databases has caused a pressing need for the development of more powerful techniques to convert the vast pool of data into valuable information. For the purpose of strategic and decision-making, many companies and researchers have recognized mining useful information and knowledge from large databases as a key research topic and as an opportunity for major revenues and improving competitiveness. In this paper, we will explore a new rule generation algorithm (based on rough sets theory) that can generate a minimal set of rule reducts, and a rule generation and rule induction program (RGRIP) which can efficiently induce decision rules from conflicting information systems. All the methods will also be illustrated with numerical examples.  相似文献   

8.
基于决策表的加权决策规则挖掘算法   总被引:1,自引:0,他引:1  
张宏宇  梁吉业 《计算机工程》2003,29(18):62-63,143
决策规则是一种重要的知识表示方式,粗糙集理论是一种重要的数据挖掘方法。因此,随着对粗糙集理论的深入研究,利用粗糙集进行决策表中的决策规则挖掘便成了一个热点课题。通过对规则支持度提出新的定义,对现有的模型进行了扩展,并由此提出了一种新的决策规则挖掘算法,实验结果表明了其有效性。  相似文献   

9.
将Rough集理论应用于规则归纳系统,提出了一种基于粗糙集获取规则知识库的增量式学习方法,能够有效处理决策表中不一致情形,采用启发式算法获取决策表的最简规则,当新对象加入时在原有规则集基础上进行规则知识库的增量式更新,避免了为更新规则而重新运行规获取算法。并用UCI中多个数据集从规则集的规则数目、数据浓缩率、预测能力等指标对该算法进行了测试。实验表明了该算法的有效性。  相似文献   

10.
Breast cancer has been becoming the main cause of death in women all around the world. An accurate and interpretable method is necessary for diagnosing patients with breast cancer for well-performed treatment. Nowadays, a great many of ensemble methods have been widely applied to breast cancer diagnosis, capable of achieving high accuracy, such as Random Forest. However, they are black-box methods which are unable to explain the reasons behind the diagnosis. To surmount this limitation, a rule extraction method named improved Random Forest (RF)-based rule extraction (IRFRE) method is developed to derive accurate and interpretable classification rules from a decision tree ensemble for breast cancer diagnosis. Firstly, numbers of decision tree models are constructed using Random Forest to generate abundant decision rules available. And then a rule extraction approach is devised to detach decision rules from the trained trees. Finally, an improved multi-objective evolutionary algorithm (MOEA) is employed to seek for an optimal rule predictor where the constituent rule set is the best trade-off between accuracy and interpretability. The developed method is evaluated on three breast cancer data sets, i.e., the Wisconsin Diagnostic Breast Cancer (WDBC) dataset, Wisconsin Original Breast Cancer (WOBC) dataset, and Surveillance, Epidemiology and End Results (SEER) breast cancer dataset. The experimental results demonstrate that the developed method can primely explain the black-box methods and outperform several popular single algorithms, ensemble learning methods, and rule extraction methods from the view of accuracy and interpretability. What is more, the proposed method can be popularized to other cancer diagnoses in practice, which provides an option to a more interpretable, more accurate cancer diagnosis process.  相似文献   

11.
12.
A hybrid coevolutionary algorithm for designing fuzzy classifiers   总被引:1,自引:0,他引:1  
Rule learning is one of the most common tasks in knowledge discovery. In this paper, we investigate the induction of fuzzy classification rules for data mining purposes, and propose a hybrid genetic algorithm for learning approximate fuzzy rules. A novel niching method is employed to promote coevolution within the population, which enables the algorithm to discover multiple rules by means of a coevolutionary scheme in a single run. In order to improve the quality of the learned rules, a local search method was devised to perform fine-tuning on the offspring generated by genetic operators in each generation. After the GA terminates, a fuzzy classifier is built by extracting a rule set from the final population. The proposed algorithm was tested on datasets from the UCI repository, and the experimental results verify its validity in learning rule sets and comparative advantage over conventional methods.  相似文献   

13.
决策信息系统的规则提取是数据挖掘的研究内容之一,概念格理论与粒计算理论是该领域研究的主要数学工具。文中通过探究这两大理论间的关系,利用等价关系定义了最小乐观概念格及其结构,最小乐观概念区别于传统经典概念,但是具有格的结构。在此基础上,提出了一种决策信息系统的规则提取算法,该算法引入了粒度思想,通过求取每一粒层中的最小乐观概念,并根据最小乐观概念的外延与决策属性等价类间的蕴含关系进行决策规则提取,通过设置算法的终止条件来加快其收敛速度,以达到针对决策信息系统知识约简的目的。最小乐观概念的定义比经典概念的定义更宽泛,其生成过程也更简单。最后,通过理论证明、实例验证以及数值实验对比验证了该方法的正确性与优越性。  相似文献   

14.
In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.  相似文献   

15.
Rough Sets Theory is often applied to the task of classification and prediction, in which objects are assigned to some pre-defined decision classes. When the classes are preference-ordered, the process of classification is referred to as sorting. To deal with the specificity of sorting problems an extension of the Classic Rough Sets Approach, called the Dominance-based Rough Sets Approach, was introduced. The final result of the analysis is a set of decision rules induced from what is called rough approximations of decision classes. The main role of the induced decision rules is to discover regularities in the analyzed data set, but the same rules, when combined with a particular classification method, may also be used to classify/sort new objects (i.e. to assign the objects to appropriate classes). There exist many different rule induction strategies, including induction of an exhaustive set of rules. This strategy produces the most comprehensive knowledge base on the analyzed data set, but it requires a considerable amount of computing time, as the complexity of the process is exponential. In this paper we present a shortcut that allows classifying new objects without generating the rules. The presented approach bears some resemblance to the idea of lazy learning.  相似文献   

16.
An important issue in text mining is how to make use of multiple pieces knowledge discovered to improve future decisions. In this paper, we propose a new approach to combining multiple sets of rules for text categorization using Dempster’s rule of combination. We develop a boosting-like technique for generating multiple sets of rules based on rough set theory and model classification decisions from multiple sets of rules as pieces of evidence which can be combined by Dempster’s rule of combination. We apply these methods to 10 of the 20-newsgroups—a benchmark data collection (Baker and McCallum 1998), individually and in combination. Our experimental results show that the performance of the best combination of the multiple sets of rules on the 10 groups of the benchmark data is statistically significant and better than that of the best single set of rules. The comparative analysis between the Dempster–Shafer and the majority voting (MV) methods along with an overfitting study confirm the advantage and the robustness of our approach.  相似文献   

17.
决策树是归纳学习和数据挖掘的重要方法,主要用于分类和预测。文章引入了广义决策树的概念,实现了分类规则集和决策树结构的统一。同时,提出一种新颖的基于DNA编码遗传算法构造决策树的方法。先用C4.5算法对数据集进行分类得到初始规则集,再通过文章中算法优化规则集并由此构建决策树。实验证明了该方法有效地避免了传统决策树构建过程的缺点,且有较好的并行性。  相似文献   

18.
基于非对称相似粗糙集的规则获取算法   总被引:1,自引:0,他引:1  
针对目前应用粗糙集相似关系理论与LEM2算法进行规则推理时获取规则较少以及规则简化程度不高的问题,提出了粗糙集非对称相似关系与近似集的计算方法,并对现有LEM2算法获取规则的过程进行了改进与补充,形成了一种新的基于非对称相似粗糙集的规则获取算法,以便从不完整信息中获取更多潜在规则.最后以实际算例对两种算法分别进行了测试并给出了结果对比分析,仿真结果表明新的规则获取算法在不改变原有信息集结构与内容的基础上具有更好的优化性能,能获得更好的优化结果.  相似文献   

19.
由于数据中存在噪声等主观和客观原因,不一致数据的出现和存在已变得十分普遍,因此需要发展一些能够直接分析和处理不一致数据的方法和技术。研究了不一致决策系统中的广义决策规则获取问题,基于粒度计算探讨了决策规则获取的基本原理,据此给出了计算所有极小广义决策规则集的一般方法。该方法不需要构造分辨矩阵,且可以并行执行,从而可降低空间开销和提高计算效率。此外,可对该方法进行拓展,以用于计算其他类型的极小决策规则集。这为不一致决策系统中的规则获取提供了一般方法。  相似文献   

20.
研究了利用Bayes定理发现分类规则的方法,用Bayes定理可以发现分类规则,然后用分类规则进行数据分类。结合实例针对概念性数据集及包含数值性属性和概念性属性的数据集两种情况进行讨论。通过实例说明Bayes定理是数据挖掘中一种有效的数据分类方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号