首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Online mining of fuzzy multidimensional weighted association rules   总被引:1,自引:1,他引:0  
This paper addresses the integration of fuzziness with On-Line Analytical Processing (OLAP) based association rules mining. It contributes to the ongoing research on multidimensional online association rules mining by proposing a general architecture that utilizes a fuzzy data cube for knowledge discovery. A data cube is mainly constructed to provide users with the flexibility to view data from different perspectives as some dimensions of the cube contain multiple levels of abstraction. The first step of the process described in this paper involves introducing fuzzy data cube as a remedy to the problem of handling quantitative values of dimensional attributes in a cube. This facilitates the online mining of fuzzy association rules at different levels within the constructed fuzzy data cube. Then, we investigate combining the concepts of weight and multiple-level to mine fuzzy weighted multi-cross-level association rules from the constructed fuzzy data cube. For this purpose, three different methods are introduced for single dimension, multidimensional and hybrid (integrates the other two methods) fuzzy weighted association rules mining. Each of the three methods utilizes a fuzzy data cube constructed to suite the particular method. To the best of our knowledge, this is the first effort in this direction. We compared the proposed approach to an existing approach that does not utilize fuzziness. Experimental results obtained for each of the three methods on a synthetic dataset and on the adult data of the United States census in year 2000 demonstrate the effectiveness and applicability of the proposed fuzzy OLAP based mining approach. OLAP is one of the most popular tools for on-line, fast and effective multidimensional data analysis. In the OLAP framework, data is mainly stored in data hypercubes (simply called cubes).  相似文献   

2.
王晓鹏 《计算机仿真》2020,37(1):234-238
对区间值属性数据集进行挖掘,可以有效分析出数据之间的关系。针对现有数据挖掘方法未对大规模数据进行聚类,导致挖掘过程占据内存大,挖掘精度低的问题,提出了一种新的区间值属性数据集挖掘算法。对问题定义、数据准备、数据提取、模式预测和数据聚类等模块进行详细分析,完成区间值属性数据聚类。根据聚类结果,将区间值属性数据分成多个数据集,挑选出能够支持最小支持度的项目集,将这些项目集作为频繁项集,进而提取出数据集之间的关联规则,将关联规则融入数据计算步骤,完成数据挖掘。为验证算法效果,进行仿真,结果表明,相较于传统挖掘算法,所提挖掘算法占用容量更小,挖掘精度更高。  相似文献   

3.
多维概念格与多维序列模式的增量挖掘   总被引:1,自引:0,他引:1  
多维序列模式挖掘旨在将一个或多个背景维度信息中发现的关联模式与有序事务序列中发现的序列模式有机结合,从而为用户提供信息内容更加丰富、更具有直接应用价值的多维序列模式.目前虽有一些挖掘多维序列模式的工作,但其关联模式与序列模式的发现过程是基于不同的数据结构分开进行的.提出一种新的概念格结构——多维概念格,它是对概念格的延伸与泛化,其内涵更加丰富,不仅具有多个有序的任务内涵,而且具有多个无序的背景内涵.设计实现了基于该结构的增量式多维序列模式挖掘算法,该算法使用统一的数据模型实现关联模式与序列模式的高效同步挖掘.在合成数据集上的实验结果验证了算法的有效性.同时,算法在实际的银行数据集上的应用效果也说明了算法的实用性.  相似文献   

4.
针对关联数据集合呈现出的大数据特性和蕴含的语义信息,提出了首先建立关联数据集的模式级链接,再进行关联规则挖掘的方法。在同领域RDF数据集上定义RDF数据项模式并提出数据项模式的产生规则;利用RDF数据查询技术从数据项模式获得RDF数据项集合,进而再推导出特定领域内的关联规则。提出的基于关联数据RDF数据项模式的关联规则挖掘方法将关联规则挖掘扩展到同一领域内的数据集合而不再局限于单一数据集,同时给出了基于Hadoop的大规模RDF数据集上的关联规则挖掘的实现方案。实验结果验证了模式级链接对于关联规则挖掘的价值和所提方法的有效性。  相似文献   

5.
In item promotion applications, there is a strong need for tools that can help to unlock the hidden profit within each individual customer’s transaction history. Discovering association patterns based on the data mining technique is helpful for this purpose. However, the conventional association mining approach, while generating “strong” association rules, cannot detect potential profit-building opportunities that can be exposed by “soft” association rules, which recommend items with looser but significant enough associations. This paper proposes a novel mining method that automatically detects hidden profit-building opportunities through discovering soft associations among items from historical transactions. Specifically, this paper proposes a relaxation method of association mining with a new support measurement, called soft support, that can be used for mining soft association patterns expressed with the “most” fuzzy quantifier. In addition, a novel measure for validating the soft-associated rules is proposed based on the estimated possibility of a conditioned quantified fuzzy event. The new measure is shown to be effective by comparison with several existing measures. A new association mining algorithm based on modification of the FT-Tree algorithm is proposed to accommodate this new support measure. Finally, the mining algorithm is applied to several data sets to investigate its effectiveness in finding soft patterns and content recommendation.  相似文献   

6.
An efficient approach to mining indirect associations   总被引:1,自引:0,他引:1  
Discovering association rules is one of the important tasks in data mining. While most of the existing algorithms are developed for efficient mining of frequent patterns, it has been noted recently that some of the infrequent patterns, such as indirect associations, provide useful insight into the data. In this paper, we propose an efficient algorithm, called HI-mine, based on a new data structure, called HI-struct, for mining the complete set of indirect associations between items. Our experimental results show that HI-mine's performance is significantly better than that of the previously developed algorithm for mining indirect associations on both synthetic and real world data sets over practical ranges of support specifications.  相似文献   

7.
董林  舒红  李莎 《计算机应用研究》2013,30(8):2330-2333
为简化空间频繁模式挖掘的预处理步骤并提高挖掘效率, 提出一种可以直接以空间矢量和栅格图层作为输入的挖掘算法FISA(fast intersect spatial Apriori)。该算法利用图层求交和面积计算操作实现谓词集支持度计数进而实现频繁谓词集和关联规则挖掘。相对于基于事务空间关联规则挖掘算法, FISA不需要预先进行空间数据事务化处理, 并且所得结果均有对应图层, 便于实现结果的可视化; 相对于其他基于空间分析的挖掘算法, FISA支持空间数据的矢量和栅格格式, 且引入了快速求交方法以保证其可伸缩性。实验结果表明该算法可以直接从空间数据中高效正确地挖掘出频繁模式。  相似文献   

8.
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. In real-world applications, transactions may contain quantitative values and each item may have a lifespan from a temporal database. In this paper, we thus propose a data mining algorithm for deriving fuzzy temporal association rules. It first transforms each quantitative value into a fuzzy set using the given membership functions. Meanwhile, item lifespans are collected and recorded in a temporal information table through a transformation process. The algorithm then calculates the scalar cardinality of each linguistic term of each item. A mining process based on fuzzy counts and item lifespans is then performed to find fuzzy temporal association rules. Experiments are finally performed on two simulation datasets and the foodmart dataset to show the effectiveness and the efficiency of the proposed approach.  相似文献   

9.
采用MIS-tree结构保存频繁模式的信息提出了基于频繁模式增长挖掘原型的CFP-tax算法,该算法可避免候选集的生成和高代价的数据库扫描并能高效地找出数据库中所有频繁项集.基于虚拟数据集对算法的性能进行了评估,结果表明CFP-tax算法比经典的MMS-Cumulate算法性能有显著的提高.  相似文献   

10.
周明  李宏 《计算机工程》2007,33(2):74-76
传统频繁项集挖掘算法在处理稠密或长数据集(如基因表达数据集)时效率低且产生大量冗余模式,为解决这些问题一些学者提出了闭合模式的概念和挖掘闭合模式的算法,研究证明挖掘闭合模式可以显著减少项集数量并消除大量冗余模式。该文针对生物数据特点提出了一个新颖的挖掘频繁闭合模式的算法REMFOR,该算法在闭合模式概念和行枚举思想的基础上,采用垂直数据结构和fp-tree技术,对行集建立行fp-tree来挖掘频繁闭合模式。通过实例和实验证明该算法是正确有效的。  相似文献   

11.
One of the major challenges in data mining is the extraction of comprehensible knowledge from recorded data. In this paper, a coevolutionary-based classification technique, namely COevolutionary Rule Extractor (CORE), is proposed to discover classification rules in data mining. Unlike existing approaches where candidate rules and rule sets are evolved at different stages in the classification process, the proposed CORE coevolves rules and rule sets concurrently in two cooperative populations to confine the search space and to produce good rule sets that are comprehensive. The proposed coevolutionary classification technique is extensively validated upon seven datasets obtained from the University of California, Irvine (UCI) machine learning repository, which are representative artificial and real-world data from various domains. Comparison results show that the proposed CORE produces comprehensive and good classification rules for most datasets, which are competitive as compared with existing classifiers in literature. Simulation results obtained from box plots also unveil that CORE is relatively robust and invariant to random partition of datasets.  相似文献   

12.
虽然FP-Growth算法能够有效地从数据库中挖掘频繁模式,但如何由其挖掘出的频繁模式中高效地产生关联规则仍是一个相当复杂的问题。该文提出了用于组织频繁模式的线索频繁模式树(TFPT)和一个从TFPT中挖掘关联规则的高效算法—最短模式优先算法(SPF)。挖掘模式Y的关联规则时,SPF算法应用了两个优化策略,避免了对大量的不可能成为规则XY-X左部的Y的子集的检查,从而获得了很好的性能。实验表明:与类FP-Growth算法结合时,SPF算法运行速度远远快于Apriori算法,并有相当好的可伸缩性。  相似文献   

13.
基于多维标度的快速挖掘关联规则算法   总被引:13,自引:0,他引:13  
挖掘关联规则是数据挖掘研究的一个重要方面.文章在分析其基本模型和研究多维标度基本性质的基础上,提出一个新的基于多维标度的挖掘关联规则算法.该算法以数据项间的关联度量为依据,将各个数据项投影到多维空间上,进行降维处理,最后将数据项集间的关联关系以可视结果提供给用户.  相似文献   

14.
传统的类Apriori频繁序列模式挖掘算法都是基于支持度框架理论,需要预先设定支持度阈值,而这通常需要较深的领域知识或大量的实践,因此目前仍没有一种很好的设定方法.同时,序列模式的挖掘结果往往数量很大且不易理解,可用性较低.针对上述问题,提出了一种基于逻辑的频繁序列模式挖掘算法即LFSPM算法,并首次在频繁序列模式挖掘算法中引入了逻辑的思想,通过逻辑规则过滤,大大优化了结果集.实验证明,该算法较好地解决了支持度设置问题及挖掘结果可理解性不高的问题.  相似文献   

15.
A genetic-fuzzy mining approach for items with multiple minimum supports   总被引:2,自引:2,他引:0  
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Mining association rules from transaction data is most commonly seen among the mining techniques. Most of the previous mining approaches set a single minimum support threshold for all the items and identify the relationships among transactions using binary values. In the past, we proposed a genetic-fuzzy data-mining algorithm for extracting both association rules and membership functions from quantitative transactions under a single minimum support. In real applications, different items may have different criteria to judge their importance. In this paper, we thus propose an algorithm which combines clustering, fuzzy and genetic concepts for extracting reasonable multiple minimum support values, membership functions and fuzzy association rules from quantitative transactions. It first uses the k-means clustering approach to gather similar items into groups. All items in the same cluster are considered to have similar characteristics and are assigned similar values for initializing a better population. Each chromosome is then evaluated by the criteria of requirement satisfaction and suitability of membership functions to estimate its fitness value. Experimental results also show the effectiveness and the efficiency of the proposed approach.  相似文献   

16.
A core issue of the association rule extracting process in the data mining field is to find the frequent patterns in the database of operational transactions. If these patterns discovered, the decision making process and determining strategies in organizations will be accomplished with greater precision. Frequent pattern is a pattern seen in a significant number of transactions. Due to the properties of these data models which are unlimited and high-speed production, these data could not be stored in memory and for this reason it is necessary to develop techniques that enable them to be processed online and find repetitive patterns. Several mining methods have been proposed in the literature which attempt to efficiently extract a complete or a closed set of different types of frequent patterns from a dataset. In this paper, a method underpinned upon Cellular Learning Automata (CLA) is presented for mining frequent itemsets. The proposed method is compared with Apriori, FP-Growth and BitTable methods and it is ultimately concluded that the frequent itemset mining could be achieved in less running time. The experiments are conducted on several experimental data sets with different amounts of minsup for all the algorithms as well as the presented method individually. Eventually the results prod to the effectiveness of the proposed method.  相似文献   

17.
Data mining provides the opportunity to extract useful information from large databases. Various techniques have been proposed in this context in order to extract this information in the most efficient way. However, efficiency is not our only concern in this study. The security and privacy issues over the extracted knowledge must be seriously considered as well. By taking this into consideration, we study the procedure of hiding sensitive association rules in binary data sets by blocking some data values and we present an algorithm for solving this problem. We also provide a fuzzification of the support and the confidence of an association rule in order to accommodate for the existence of blocked/unknown values. In addition, we quantitatively compare the proposed algorithm with other already published algorithms by running experiments on binary data sets, and we also qualitatively compare the efficiency of the proposed algorithm in hiding association rules. We utilize the notion of border rules, by putting weights in each rule, and we use effective data structures for the representation of the rules so as (a) to minimize the side effects created by the hiding process and (b) to speed up the selection of the victim transactions. Finally, we study the overall security of the modified database, using the C4.5 decision tree algorithm of the WEKA data mining tool, and we discuss the advantages and the limitations of blocking.  相似文献   

18.
The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying indicators to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder if a so-called “interesting” rule noted LHSRHS is meaningful when 30% of the LHS data are not up-to-date anymore, 20% of the RHS data are not accurate, and 15% of the LHS data come from a data source that is well-known for its bad credibility. This paper presents an overview of data quality characterization and management techniques that can be advantageously employed for improving the quality awareness of the knowledge discovery and data mining processes. We propose to integrate data quality indicators for quality aware association rule mining. We propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-Cup-98 datasets show that variations on data quality have a great impact on the cost and quality of discovered association rules and confirm our approach for the integrated management of data quality indicators into the KDD process that ensure the quality of data mining results.  相似文献   

19.
Time series analysis has always been an important and interesting research field due to its frequent appearance in different applications. In the past, many approaches based on regression, neural networks and other mathematical models were proposed to analyze the time series. In this paper, we attempt to use the data mining technique to analyze time series. Many previous studies on data mining have focused on handling binary-valued data. Time series data, however, are usually quantitative values. We thus extend our previous fuzzy mining approach for handling time-series data to find linguistic association rules. The proposed approach first uses a sliding window to generate continues subsequences from a given time series and then analyzes the fuzzy itemsets from these subsequences. Appropriate post-processing is then performed to remove redundant patterns. Experiments are also made to show the performance of the proposed mining algorithm. Since the final results are represented by linguistic rules, they will be friendlier to human than quantitative representation.  相似文献   

20.
Association rule is one of the data mining techniques involved in discovering information that represents the association among data. Data in the database sometimes appear infrequent but highly associated with a specific data. This paper proposes a technique for significant rare data by introducing second support in discovering the association rules of such data. We show that the proposed approach provides better performance as compared to standard association rules techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号