首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 125 毫秒
1.
随着现实待挖掘数据库规模不断增长,系统可使用的内存成为用FP-GROWTH算法进行关联规则挖掘的瓶颈.为了摆脱内存的束缚,对大规模数据库中的数据进行关联规则挖掘,基于磁盘的关联规则挖掘成为重要的研究方向.对此,改进原始的FP-TREE数据结构,提出了一种新颖的基于磁盘表的DTRFP-GROWTH(disk table resident FP-TREE growth)算法.该算法利用磁盘表存储FP-TREE,降低内存使用,在传统FP-GROWTH算法占用过多内存、挖掘工作无法进行时,以独特的磁盘表存储FP-TREE技术,减少内存使用,能够继续完成挖掘工作,适合空间性能优先的场合.不仅如此,该算法还将关联规则挖掘和关系型数据库整合,克服了基于文件系统相关算法效率较低、开发难度较大等问题.在真实数据集上进行了验证实验以及性能分析.实验结果表明,在内存空间有限的情况下,DTRFP-GROWTH算法是一种有效的基于磁盘的关联规则挖掘算法.  相似文献   

2.
Commercial recommender systems use various data mining techniques to make appropriate recommendations to users during online, real-time sessions. Published algorithms focus more on the discrete user ratings instead of binary results, which hampers their predictive capabilities when usage data is sparse. The system proposed in this paper, e-VZpro, is an association mining-based recommender tool designed to overcome these problems through a two-phase approach. In the first phase, batches of customer historical data are analyzed through association mining in order to determine the association rules for the second phase. During the second phase, a scoring algorithm is used to rank the recommendations online for the customer. The second phase differs from the traditional approach and an empirical comparison between the methods used in e-VZpro and other collaborative filtering methods including dependency networks, item-based, and association mining is provided in this paper. This comparison evaluates the algorithms used in each of the above methods using two internal customer datasets and a benchmark dataset. The results of this comparison clearly show that e-VZpro performs well compared to dependency networks and association mining. In general, item-based algorithms with cosine similarity measures have the best performance.  相似文献   

3.
一种关联规则增量式挖掘算法研究   总被引:1,自引:0,他引:1  
刘造新 《计算机时代》2012,(3):20-21,24
现有关联规则更新算法都是基于支持度-置信度框架而提出的,仅针对大于最小支持度闭值的频繁项集进行挖掘。为了提高告警关联规则的完整性和准确性,在相关度AARSC算法基础上,提出了一种增量式挖掘UAARSC算法(Updating-AARSC)。该算法对增量计算进行了改进,可以发现频繁和非频繁告警序列间的关联规则。  相似文献   

4.
空间关联规则的双向挖掘   总被引:9,自引:0,他引:9  
空间数据库中关联规则挖掘不仅需要考虑关系元组属性之间的关系——纵向关系,更需要挖掘元组之间的关系——横向关系,如相邻、相交、重叠等。本文通过分析空间数据库的存储模式,借鉴事务数据库关联规则的挖掘方法,对空间关联规则进行完整定义,并对规则的兴趣度度量进行探讨。根据挖掘的方向将空间数据挖掘归纳为纵向挖掘、横向挖掘、双向挖掘。在双向挖掘中,提出一种新算法,该算法根据挖掘任务进行约束,缩小挖掘空间,然后通过空间计算将空间关系转化为非空间关系,经过多次循环,获取非空间项集,进而挖掘出空间关联规则。据此提出空间数据双向挖掘工作流程,并通过实例进行了验证。  相似文献   

5.
王雪蓉  万年红 《计算机应用》2011,31(9):2421-2425
传统的协同过滤推荐算法基于互联网模式单纯从某个角度研究电子商务推荐问题,推荐质量明显不高。为改善推荐效果,提高推荐系统的伸缩性和实用价值,基于研究云模式的用户行为相似性度量公式、用户行为等级函数、关联规则函数,定义关联聚类方法,改进相应算法,提出一种云模式用户行为关联聚类的协同过滤推荐算法。最后使用MovieLens和阿里巴巴的云测试数据进行局部实验与全局实验,并对各种算法的实验结果进行对比分析。实验结果表明,该算法推荐效果明显优于传统算法,具有较强的伸缩性和较高的实用价值。  相似文献   

6.
为了挖掘可疑通信的行为模式,定位发生了可疑通信行为的上网账户,本文首先分析了可疑通信行为特点。然后针对已有关联规则挖掘算法不能同时满足多层次数据挖掘和加权关联规则挖掘的问题,分析对比两种典型的基本关联规则算法,以FP-tree为基础,提出了ML-WFP多层次加权关联规则挖掘算法。针对算法中数据项权重的确定问题,由用户设置数据项间的重要性比较关系,借鉴模糊一致矩阵的概念,利用模糊层次分析法计算数据项的权重。最后将该算法应用于可疑通信行为的挖掘。实验测试结果表明可疑通信行为挖掘方案合理有效。  相似文献   

7.
针对现有关联分类技术的不足,提出了一种适用于关联分类的增量更新算法IUAC。该算法是基于频繁模式树挖掘和更新关联规则的,并使用一种树形结构来存储最终用于分类的关联规则。同时,增加了对分类规则的约束条件,进一步控制了用于分类的关联规则的数量。最后,对算法整体进行了分析和讨论。  相似文献   

8.
在数据挖掘的关联规则挖掘算法中,传统的频繁模式挖掘算法需要用户指定项集的最小支持度。引入Top-k模式挖掘概念的改进算法虽然无需指定最小支持度,但仍需指定阈值k。针对上述问题,对传统挖掘算法进行改进,提出一种新的频繁模式挖掘算法(TNFP- growth)。该算法无需指定最小支持度或阈值,按照支持度降序排列进行模式挖掘,有序地返回频繁模式给用户。实验结果证明,该算法的执行效率更高,具有更强的伸缩性。  相似文献   

9.
Sequential rule mining is an important data mining task used in a wide range of applications. However, current algorithms for discovering sequential rules common to several sequences use very restrictive definitions of sequential rules, which make them unable to recognize that similar rules can describe a same phenomenon. This can have many undesirable effects such as (1) similar rules that are rated differently, (2) rules that are not found because they are considered uninteresting when taken individually, (3) and rules that are too specific, which makes them less likely to be used for making predictions. In this paper, we address these problems by proposing a more general form of sequential rules such that items in the antecedent and in the consequent of each rule are unordered. We propose an algorithm named CMRules for mining this form of rules. The algorithm proceeds by first finding association rules to prune the search space for items that occur jointly in many sequences. Then it eliminates association rules that do not meet the minimum confidence and support thresholds according to the sequential ordering. We evaluate the performance of CMRules in three different ways. First, we provide an analysis of its time complexity. Second, we compare its performance (in terms of execution time, memory usage and scalability) with an adaptation of an algorithm from the literature that we name CMDeo. For this comparison, we use three real-life public datasets, which have different characteristics and represent three kinds of data. In many cases, results show that CMRules is faster and has a better scalability for low support thresholds than CMDeo. Lastly, we report a successful application of the algorithm in a tutoring agent.  相似文献   

10.
基于GEP的多层关联规则挖掘算法及其应用   总被引:1,自引:1,他引:0  
为了在Web使用挖掘中挖掘网站服务器日志数据库的热点Web页面访问集及发现其关联规则,提出了一种新的基于GEP(gene expression programming,基因表达式编程)的适用于挖掘多层关联规则的算法.将泛化技术应用于GEP作为它的适应性函数度量,引入GEP强大的自搜索功能,进化到较优的种群后,再利用传统的支持度一置信度的方法在子数据库的多个层及层间挖掘频繁项及关联规则.该算法改进了传统多层关联规则挖掘框架,实验结果表明了该算法在大数据库中的有效性和高效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号