首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mining sequential patterns means to discover sequential purchasing behaviors of most customers from a large number of customer transactions. Past transaction data can be analyzed to discover customer purchasing behaviors such that the quality of business decisions can be improved. However, the size of the transaction database can be very large. It is very time consuming to find all the sequential patterns from a large database, and users may be only interested in some sequential patterns. Moreover, the criteria of the discovered sequential patterns for user requirements may not be the same. Many uninteresting sequential patterns for user requirements can be generated when traditional mining methods are applied. Hence, a data mining language needs to be provided such that users can query only knowledge of interest to them from a large database of customer transactions. In this article, a data mining language is presented. From the data mining language, users can specify the items of interest and the criteria of the sequential patterns to be discovered. Also, an efficient data mining technique is proposed to extract the sequential patterns according to the users' requests. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 73–87, 2005.  相似文献   

2.
In this paper, we examine a new data mining issue of mining association rules from customer databases and transaction databases. The problem is decomposed into two subproblems: identifying all the large itemsets from the transaction database and mining association rules from the customer database and the large itemsets identified. For the first subproblem, we propose an efficient algorithm to discover all the large itemsets from the transaction database. Experimental results show that by our approach, the total execution time can be reduced significantly. For the second subproblem, a relationship graph is constructed according to the identified large itemsets from the transaction database and the priorities of condition attributes from the customer database. Based on the relationship graph, we present an efficient graph-based algorithm to discover interesting association rules embedded in the transaction database and the customer database.  相似文献   

3.
Discovery of fuzzy temporal association rules   总被引:1,自引:0,他引:1  
We propose a data mining system for discovering interesting temporal patterns from large databases. The mined patterns are expressed in fuzzy temporal association rules which satisfy the temporal requirements specified by the user. Temporal requirements specified by human beings tend to be ill-defined or uncertain. To deal with this kind of uncertainty, a fuzzy calendar algebra is developed to allow users to describe desired temporal requirements in fuzzy calendars easily and naturally. Fuzzy operations are provided and users can define complicated fuzzy calendars to discover the knowledge in the time intervals that are of interest to them. A border-based mining algorithm is proposed to find association rules incrementally. By keeping useful information of the database in a border, candidate itemsets can be computed in an efficient way. Updating of the discovered knowledge due to addition and deletion of transactions can also be done efficiently. The kept information can be used to help save the work of counting and unnecessary scans over the updated database can be avoided. Simulation results show the effectiveness of the proposed system. A performance comparison with other systems is also given.  相似文献   

4.
《Information Systems》2001,26(1):1-14
In this paper, we examine the two issues of mining association rules and mining sequential patterns in a large database of sales transactions. The problems of mining association rules and mining sequential patterns focus on discovering large itemsets and large sequences, respectively. We present PSI and PSI_seq for efficient large itemsets generation and large sequences generation, respectively. The main ideas of these two algorithms are using prestored information to minimize the numbers of candidate itemsets and candidate sequences counted in each database scan. The prestored informations for PSI and PSI_seq include the itemsets and the sequences along with their support counts found in the last mining, respectively. Typically a user may require to tune the value of the minimum support many times before a set of useful association rules can be obtained from the transaction database. Using prestored information, the total computation time will be reduced effectively. Empirical results show that our approaches outperform previous methods by an order of magnitude, using little storage space for the prestored information.  相似文献   

5.
基于Web的数据挖掘是一种结合了数据挖掘和互联网系统的热门研究课题。本文首先综述了基于Web的几类数据挖掘技术,包括Web内容挖掘、Web的访问挖掘、Web页面聚类以及用户频繁访问路径发现等技术。在此基础上又着重介绍了Web数据挖掘技术在电子商务中的具体应用。  相似文献   

6.
基于Web的数据挖掘技术研究及其在电子商务中的应用   总被引:1,自引:0,他引:1  
基于Web的数据挖掘是一种结合了数据挖掘和互联网系统的热门研究课题.本文首先综述了基于Web的几类数据挖掘技术,包括Web内容挖掘、Web的访问挖掘、Web页面聚类以及用户频繁访问路径发现等技术.在此基础上又着重介绍了Web数据挖掘技术在电子商务中的具体应用.  相似文献   

7.
Mining sequential patterns is to discover sequential purchasing behaviours for most of the customers from a large number of customer transactions. The strategy of mining sequential patterns focuses on discovering frequent sequences. A frequent sequence is an ordered list of the itemsets purchased by a sufficient number of customers. The previous approaches for mining sequential patterns need to repeatedly scan the database so that they take a large amount of computation time to find frequent sequences. The customer transactions will grow rapidly in a short time, and some of the customer transactions may be antiquated. Consequently, the frequent sequences may be changed due to the insertion of new customer transactions or the deletion of old customer transactions from the database. It may require rediscovering all the patterns by scanning the entire updated customer transaction database. In this paper, we propose an incremental updating technique to maintain the discovered sequential patterns when transactions are inserted into or deleted from the database. Our approach partitions the database into some segments and scans the database segment by segment. For each segment scan, our approach prunes those sequences that cannot be frequent sequences any more to accelerate the finding process of the frequent sequences. Therefore, the number of database scans can be significantly reduced by our approach. The experimental results show that our algorithms are more efficient than other algorithms for the maintenance of mining sequential patterns.  相似文献   

8.
Mining association rules is an important task for knowledge discovery. We can analyze past transaction data to discover customer behaviors such that the quality of business decisions can be improved. Various types of association rules may exist in a large database of customer transactions. The strategy of mining association rules focuses on discovering large item sets, which are groups of items which appear together in a sufficient number of transactions. We propose a graph-based approach to generate various types of association rules from a large database of customer transactions. This approach scans the database once to construct an association graph and then traverses the graph to generate all large item sets. Empirical evaluations show that our algorithms outperform other algorithms which need to make multiple passes over the database  相似文献   

9.
Many researchers in database and machine learning fields are primarily interested in data mining because it offers opportunities to discover useful information and important relevant patterns in large databases. Most previous studies have shown how binary valued transaction data may be handled. Transaction data in real-world applications usually consist of quantitative values, so designing a sophisticated data-mining algorithm able to deal with various types of data presents a challenge to workers in this research field. In the past, we proposed a fuzzy data-mining algorithm to find association rules. Since sequential patterns are also very important for real-world applications, this paper thus focuses on finding fuzzy sequential patterns from quantitative data. A new mining algorithm is proposed, which integrates the fuzzy-set concepts and the AprioriAll algorithm. It first transforms quantitative values in transactions into linguistic terms, then filters them to find sequential patterns by modifying the AprioriAll mining algorithm. Each quantitative item uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as the number of the original items. The patterns mined out thus exhibit the sequential quantitative regularity in databases and can be used to provide some suggestions to appropriate supervisors.  相似文献   

10.
基于项目集知识库的关联规则挖掘与更新的高效算法   总被引:2,自引:2,他引:2  
通过对已有的诸关联规则挖掘与更新算法进行深入的分析和研究,指出了其共同存在的问题与不足,提出了一种基于项目集知识库的关联规则挖掘与更新方法。该方法既适应当数据库D中数据不变而用户指定的最小支持度和最小置信度这两个阈值变化的情况,也适合事务数据库D中数据发生变化的情况。当事务数据库D中数据不变时,仅需扫描数据库一次,便可建立项目集知识库KBD,然后可反复调整最小支持度和最小置信度进行关联规则挖掘与更新。而当事务数据库D中数据发生变化时,仅需扫描数据集d 和d-各一次;通过对项目集知识库KBD的更新来达到对频繁项目集和关联规则的更新。  相似文献   

11.
一种新的基于FP-Tree的关联规则增量式更新算法   总被引:2,自引:0,他引:2  
挖掘关联规则是数据挖掘研究的一个重要方面,目前已经提出了许多算法用于高效地发现大规模数据库中的关联规则,而维护已发现的关联规则同样是重要的.针对在事务数据库增加和最小支持度同时发生变化的情况下,如何进行关联规则的更新问题进行了研究,提出了一种新的基于频繁模式树的关联规则增量式更新算法,并对该算法进行了分析和讨论.  相似文献   

12.
一种有效的关联规则增量式更新算法   总被引:6,自引:2,他引:6  
关联规则是数据挖掘中的一个重要研究内容。目前已经提出了许多用于高效地发现大规模数据库中的关联规则的算法,而对已发现规则的更新及维护问题的研究却较少。文章提出了基于频繁模式树的关联规则增量式更新算法,以处理事务数据库中增加了新的事务数据集后相应关联规则的更新问题,并对其性能进行了分析。  相似文献   

13.
一种实用的关联规则增量式更新算法   总被引:2,自引:0,他引:2  
薛锦  陈原斌 《计算机工程与应用》2003,39(13):212-213,217
关联规则是数据挖掘中的一个重要研究内容。目前已经提出了许多用于高效地发现大规模数据库中的关联规则的算法,而对已发现规则的更新及维护问题的研究却较少。该文提出了一种实用的关联规则增量式更新算法,以处理事务数据库中增加了新的事务数据集后相应的关联规则的更新问题,并对其性能进行了分析。  相似文献   

14.
传统的数据挖掘方法会生成大量的模式和规则,且难以理解,而实际上用户感兴趣的只是其中的一小部分.针对该问题,在挖掘序列模式的PrefixSpan算法基础上提出一种带数据项约束的序列模式挖掘方法,通过数据项约束,减少了搜索空间.实验结果表明,该方法可以有效地挖掘出满足数据项约束的序列模式.  相似文献   

15.
影响关联规则挖掘的有趣性因素的研究   总被引:7,自引:2,他引:7  
关联规则挖掘是数据挖掘研究中的一个重要方面,而其中一个重要问题是对挖掘出的规则的感兴趣程度的评估。实际应用中可从数据源中挖掘出大量的规则,但这些规则中的大部分对用户来说是不一定感兴趣的。关联规则挖掘中的有趣性问题可从客观和主观两个方面对关联规则的兴趣度进行评测。利用模板将用户感兴趣的规则和不感兴趣的规则区分开,以此来完成关联规则有趣性的主观评测;在关联规则的置信度和支持度基础上对关联规则的有趣性的客观评测增加了约束。  相似文献   

16.
数据采集手段的丰富,使获取、保存大量数据变得容易,从庞杂的数据中提取有用的知识和信息是数据挖掘的主要任务,关联规则是数据挖掘领域的一个重要分支。本文针对事务数据库中增加新的数据集后相应关联规则的更新和维护问题,提出了一种关联规则增量式增量算法  相似文献   

17.
一种基于已存信息的序列模式挖掘更新方法   总被引:2,自引:0,他引:2  
在挖掘序列模式过程中,用户需要多次调整(增加或减少)最小支持度,才能从事务数据库中获得有趣序列模式。文章给出了一个利用已存信息有效产生大序列的PSI-seq算法,它能显著地减少每次扫描数据库时候选序列的计算,从而,提高挖掘的效率。  相似文献   

18.
数据采集手段的丰富,使获取、保存大量数据变得容易,从庞杂的数据中提取有用的知识和信息是数据挖掘的主要任务,关联规则是数据挖掘领域的一个重要分支。本文针对事务数据库中增加新的数据集后相应关联规则的更新和维护问题,提出了一种关联规则增量式增量算法  相似文献   

19.
In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as mining of association rules, correlations. FP-tree is a very versatile data structure used for mining of frequent patterns in knowledge discovery and data mining process. FP-tree is a compact representation of transaction database that contains frequency information of all relevant frequent patterns (FP) of the database. All of the existing incremental frequent pattern mining algorithms, such as AFPIM, CATS, CanTree, CP-tree, and SPO-tree, perform incremental mining by processing one transaction of the incremental part of database at a time and updating it to the FP-tree of initial (original) database. Here, in this paper, we propose a novel method that takes advantage of FP-tree representation of incremental transaction database for incremental mining. We propose a batch incremental processing algorithm BIT_FPGrowth that restructures and merges two small consecutive duration FP-trees to obtain a FP-tree of the FP-Growth algorithm. Our BIT_FPGrowth uses FP-tree as preprocessed data repository to get transactions (i.e., item-sets), unlike other sequential incremental algorithms that read transactions from database. BIT_FPGrowth algorithm takes less time for constructing FP-tree. Our experimental results show that, as the size of the database increases, increase in runtime of BIT_FPGrowth is much less and is least of all the other algorithms.  相似文献   

20.
一种新的普遍化关联规则挖掘算法   总被引:1,自引:0,他引:1  
提出了一种新颖的普遍化关联规则挖掘算法GARL。该算法连续扫描数据库事务序列,在最多不超过两遍扫描后生成所有频繁项目集,在首次扫描数据库时,能为用户给出反馈信息,允许用户对最小支持率进行调整,该算法能连续处理事务序列,可用于网上在线数据挖掘。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号