首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
《Applied Soft Computing》2013,13(1):372-389
The rough set (RS) theory can be seen as a new mathematical approach to vagueness and is capable of discovering important facts hidden in that data. However, traditional rough set approach ignores that the desired reducts are not necessarily unique since several reducts could include the same value of the strength index. In addition, the current RS algorithms have the ability to generate a set of classification rules efficiently, but they cannot generate rules incrementally when new objects are given. Numerous studies of incremental approaches are not capable to deal with the problems of large database. Therefore, an incremental rule-extraction algorithm is proposed to solve these issues in this study. Using this algorithm, when a new object is added up to an information system, it is unnecessary to re-compute rule sets from the very beginning, which can quickly generate the complete but not repetitive rules. In the case study, the results show that the incremental issues of new data add-in are resolved and a huge computation time is saved.  相似文献   

2.
将Rough集理论应用于规则归纳系统,提出了一种基于粗糙集获取规则知识库的增量式学习方法,能够有效处理决策表中不一致情形,采用启发式算法获取决策表的最简规则,当新对象加入时在原有规则集基础上进行规则知识库的增量式更新,避免了为更新规则而重新运行规获取算法。并用UCI中多个数据集从规则集的规则数目、数据浓缩率、预测能力等指标对该算法进行了测试。实验表明了该算法的有效性。  相似文献   

3.
Apriori算法必须反复地扫描数据库才能求出频繁项集,效率较低,且不支持更新挖掘。为了解决这些问题,提出了一种基于粗糙集、单事务项组合和集合运算的关联规则挖掘算法。本算法首先利用粗糙集进行属性约简,对新决策表中的每个事务进行“数据项”组合并标记地址,然后利用集合运算的方法计算支持度和置信度即可挖掘出有效规则。本算法只需要一次扫描数据库,同时有效地支持了关联规则的更新挖掘。应用实例和实验结果表明,本算法明显优于Apriori算法,是一种有效且快速的关联规则挖掘算法。  相似文献   

4.
增量更新关联规则挖掘主要解决事务数据库中交易记录不断更新和最小支持度发生变化时关联规则的维护问题。针对目前诸多增量更新关联规则挖掘算法存在效率低、计算成本高、规则难以维护等问题,提出一种基于倒排索引树的增量更新关联挖掘算法。该算法有效地将倒排索引技术与树型结构相结合,使得交易数据库中的数据不断更新和最小支持度随应用环境不同而不断改变时,以实现无需扫描原始交易数据库和不产生候选项集的情况下生成频繁项集。实验结果表明,该算法只需占用较小的存储空间、且检索项集的效率较高,能高效地解决增量更新关联规则难以维护的问题。  相似文献   

5.
为了从大量数据中获取有用的知识,提出了基于粗集与神经网络技术的数据挖掘方法。首先利用粗集理论消除冗余的属性,得到数据集的一些规则,然后利用这些规则构造神经网络,利用神经网络技术完善粗糙规则。文章就这一技术的研究方法作了综述,并提出了改进的粗集约简方法.  相似文献   

6.
基于粗糙集和决策树的增量式规则约简算法   总被引:2,自引:0,他引:2  
粗糙集方法是一种处理不确定或模糊知识的重要工具。传统的粗糙集模型对最简规则集的研究都是针对静态数据的,对于动态数据却显得无能为力。但在实际应用中,数据库中的数据往往是动态变化的,因此,对规则约简的增量式算法的研究是知识发现领域所急需解决的问题之一。文章给出了一种基于粗糙集和决策树的增量式规则约简算法,并与传统算法和RRIA算法进行了对比分析,实验结果表明该算法的方法和效果更好。  相似文献   

7.
夏英  刘婉蓉 《计算机应用》2008,28(12):3224-3226
现有的关联规则算法大多都致力于解决增量式更新问题,需要多次扫描数据集,无法对海量数据进行有效处理。针对此问题,提出了基于滑动窗口的关联规则增量式更新算法(SWIUA),利用滑动窗口进行数据更新,挖掘出用户感兴趣的关联规则。该算法只需要扫描原始数据集和更新的数据各一遍,降低了I/O时间;并采用优化策略对候选项集过滤和删除,提高了关联规则的挖掘性能,能有效处理大量新增数据。  相似文献   

8.
基于概念格的规则产生集挖掘算法   总被引:27,自引:0,他引:27  
传统的规则提取算法产生的规则集合相当庞大,其中包含许多冗余的规则.使用闭项集可以减少规则的数目,而概念格结点问的泛化和例化关系非常适用于规则提取.基于概念格理论和闭项集的概念,提出了一种新的更有利于规则提取的格结构,给出了相应的基于闭标记的渐进式构造算法和规则提取算法.最后提供给用户的是直观的、易理解的规则子集,用户可以有选择地从中推导出其他的规则.实验表明该方法能够高效地挖掘规则产生集.  相似文献   

9.
分析了SVM增量学习过程中, 样本SV集跟非SV集的转化, 考虑到初始非SV集和新增样本对分类信息的影响, 改进了原有KKT条件, 并结合改进了的错误驱动策略, 提出了新的基于KKT条件下的错误驱动增量学习算法, 在不影响处理速度的前提下, 尽可能多的保留原始样本中的有用信息, 剔除新增样本中的无用信息, 提高分类器精度, 最后通过实验表明该算法在优化分类器效果, 提高分类器性能方面上有良好的作用。  相似文献   

10.
Our objective is a comparison of two data mining approaches to dealing with imbalanced data sets. The first approach is based on saving the original rule set, induced by the LEM2 (Learning from Example Module) algorithm, and changing the rule strength for all rules for the smaller class (concept) during classification. In the second approach, rule induction is split: the rule set for the larger class is induced by LEM2, while the rule set for the smaller class is induced by EXPLORE, another data mining algorithm. Results of our experiments show that both approaches increase the sensitivity compared to the original LEM2. However, the difference in performance of both approaches is statistically insignificant. Thus the appropriate approach for dealing with imbalanced data sets should be selected individually for a specific data set.  相似文献   

11.
白鹤翔  王健  李德玉  陈千 《计算机应用》2015,35(8):2355-2359
针对"大数据"中常见的大规模无监督数据集中特征选择速度难以满足实际应用要求的问题,在经典粗糙集绝对约简增量式算法的基础上提出了一种快速的属性选择算法。首先,将大规模数据集看作一个随机到来的对象序列,并初始化候选约简为空集;然后每次都从大规模数据集中无放回地随机抽取一个对象,并且每次都判断使用当前候选约简能否区分这一对象和当前对象集中所有应当区分的对象,并将该对象放入到当前对象集中,如果不能区分则向候选约简中添加合适的属性;最后,如果连续I次都没有发现无法区分的对象,那么将候选约简作为大规模数据集的约简。在5个非监督大规模数据集上的实验表明,所求得的约简能够区分95%以上的对象对,并且求取该约简所需的时间不到基于区分矩阵的算法和增量式约简算法的1%;在文本主题挖掘的实验中,使用约简后的数据集挖掘出的文本主题同原始数据集挖掘出的主题基本一致。两组实验结果表明该方法能够有效快速对大规模数据集进行属性选择。  相似文献   

12.
Wang  Ling  Gui  Lingpeng  Zhu  Hui 《Applied Intelligence》2022,52(2):1389-1405

Traditional temporal association rules mining algorithms cannot dynamically update the temporal association rules within the valid time interval with increasing data. In this paper, a new algorithm called incremental fuzzy temporal association rule mining using fuzzy grid table (IFTARMFGT) is proposed by combining the advantages of boolean matrix with incremental mining. First, multivariate time series data are transformed into discrete fuzzy values that contain the time intervals and fuzzy membership. Second, in order to improve the mining efficiency, the concept of boolean matrices was introduced into the fuzzy membership to generate a fuzzy grid table to mine the frequent itemsets. Finally, in view of the Fast UPdate (FUP) algorithm, fuzzy temporal association rules are incrementally mined and updated without repeatedly scanning the original database by considering the lifespan of each item and inheriting the information from previous mining results. The experiments show that our algorithm provides better efficiency and interpretability in mining temporal association rules than other algorithms.

  相似文献   

13.
This paper considers a problem of finding predictive and useful association rules with a new Web mining algorithm, a streaming association rule (SAR) model. We first adopt a weighted order-dependent scheme (assigning more weights for early visited pages) rather than taking a traditional Boolean scheme (assigning 1 for visited and 0 for non-visited pages). This way, we intend to improve the limited representation of navigation patterns in previous association rule mining (ARM) algorithms. We also note that most traditional association rule models are not scalable because they require multiple scans of all records to re-calibrate a predictive model when there are new updates in original databases. The proposed SAR model takes a “divide-and-conquer” approach and requires only single scan of data sets to avoid the curse of dimensionality. Through comparative experiments on a real-world data set, we show that prediction models based on a weighted order-dependent representation are more accurate in predicting the next moves of Web navigators than models based on a Boolean representation. In particular, when combined with several heuristics developed to eliminate redundant association rules, SAR models show a very comparable prediction accuracy while maintaining a small fraction of association rules compared to traditional ARM models. Finally, we quantify and graphically show the significance or contribution of each pages to forming unique rule sets in each database segments.  相似文献   

14.
一种基于粗糙集理论的最简规则挖掘方法   总被引:4,自引:0,他引:4  
赛煜  王海洋 《计算机工程》2003,29(20):77-79
提出了一种基于粗糙集理论的最简规则挖掘方法,它是一个采用基于分类正确度的粗糙集模型进行多概念分类规则挖掘的新方法,能有效处理决策表的不一致性,采用启发式算法,挖掘出满足给定精确度的最简产生式规则知识。用多个UCI数据集对算法进行了测试,并且与著名的Rosetta软件进行实验对比,结果说明此方法大大提高了总的数据约简量,可以有效地简化最终得到的规则知识。  相似文献   

15.
Association rules form one of the most widely used techniques to discover correlations among attribute in a database. So far, some efficient methods have been proposed to obtain these rules with respect to an optimal goal, such as: to maximize the number of large itemsets and interesting rules or the values of support and confidence for the discovered rules. This paper first introduces optimized fuzzy association rule mining in terms of three important criteria; strongness, interestingness and comprehensibility. Then, it proposes multi-objective Genetic Algorithm (GA) based approaches for discovering these optimized rules. Optimization technique according to given criterion may be one of two different forms; The first tries to determine the appropriate fuzzy sets of quantitative attributes in a prespecified rule, which is also called as certain rule. The second deals with finding both uncertain rules and their appropriate fuzzy sets. Experimental results conducted on a real data set show the effectiveness and applicability of the proposed approach.  相似文献   

16.
刘洋  张卓  周清雷 《计算机科学》2014,41(12):164-167
医疗健康数据通常属性较多,且存在连续型、离散型并存的混合数据,这在很大程度上限制了知识发现方法对医疗健康数据的挖掘效率。以模糊粗糙集理论为基础,研究混合数据上的分类规则挖掘方法,通过引入规则获取算法的泛化阈值,来控制获取规则集的大小和复杂程度,提高粗糙集知识发现方法在医疗健康数据上的分类效率。最后通过对比实验验证了该算法在医疗决策表上挖掘规则的有效性。  相似文献   

17.
约简是粗集理论的重要概念,由定义计算约简是一个典型的NP问题且由于约简的不唯一,在面对大数据集或高维数据集问题时获得的属性集往往并非是最小的属性约简集.文中针对Rough sets理论的属性约简进行了研究.研究了通过可辨识矩阵求得属性约简集,利用Rough sets与灰色理论相结合,提出一种属性约简的启发式算法,拟合结果表明本约简算法合有效.  相似文献   

18.
约简是粗集理论的重要概念,由定义计算约简是一个典型的NP问题且由于约简的不唯一,在面对大数据集或高维数据集问题时获得的属性集往往并非是最小的属性约简集。文中针对Rough sets理论的属性约简进行了研究。研究了通过可辨识矩阵求得属性约简集,利用Rough sets与灰色理论相结合,提出一种属性约简的启发式算法,拟合结果表明本约简算法合有效。  相似文献   

19.
An incremental algorithm generating satisfactory decision rules and a rule post-processing technique are presented. The rule induction algorithm is based on the Apriori algorithm. It is extended to handle preference-ordered domains of attributes (called criteria) within Variable Consistency Dominance-based Rough Set Approach. It deals, moreover, with the problem of missing values in the data set. The algorithm has been designed for medical applications which require: (i) a careful selection of the set of decision rules representing medical experience and (ii) an easy update of these decision rules because of data set evolving in time, and (iii) not only a high predictive capacity of the set of decision rules but also a thorough explanation of a proposed decision. To satisfy all these requirements, we propose an incremental algorithm for induction of a satisfactory set of decision rules and a post-processing technique on the generated set of rules. Userʼns preferences with respect to attributes are also taken into account. A measure of the quality of a decision rule is proposed. It is used to select the most interesting representatives in the final set of rules.  相似文献   

20.
Recent research shows that rule based models perform well while classifying large data sets such as data streams with concept drifts. A genetic algorithm is a strong rule based classification algorithm which is used only for mining static small data sets. If the genetic algorithm can be made scalable and adaptable by reducing its I/O intensity, it will become an efficient and effective tool for mining large data sets like data streams. In this paper a scalable and adaptable online genetic algorithm is proposed to mine classification rules for the data streams with concept drifts. Since the data streams are generated continuously in a rapid rate, the proposed method does not use a fixed static data set for fitness calculation. Instead, it extracts a small snapshot of the training example from the current part of data stream whenever data is required for the fitness calculation. The proposed method also builds rules for all the classes separately in a parallel independent iterative manner. This makes the proposed method scalable to the data streams and also adaptable to the concept drifts that occur in the data stream in a fast and more natural way without storing the whole stream or a part of the stream in a compressed form as done by the other rule based algorithms. The results of the proposed method are comparable with the other standard methods which are used for mining the data streams.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号