首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
基于双库协同机制的挖掘关联规则算法Maradbcm   总被引:9,自引:1,他引:9  
关联规则是数据挖掘中一种重要的模式,Aprori算法是挖掘关联规则的典型算法,而Apriori算法存在一定的缺点:数据库的全局搜索和产生大项集时使用支持度阈值会删除有意义的规则等。Maradbcm算法是在KDD内在机理研究 的基础上提出的一种新的挖掘关联规则算法,它可以克服Apriori算法的上述缺点,在简要地叙述了双库协同机制和Maradbcm算法后,将该算法应用于蘑菇数据库,结果显示该算法是有效的,它充分显示了内在机理研究对KDD主流发展的重要作用与影响,并为整个知识发现系统的研究提供了一条全新的路径。  相似文献   

2.
一种集成数据挖掘的自动视频分类方法   总被引:1,自引:0,他引:1  
针对自动视频分类工作中分类预测精度低的问题,提出了一种集成数据挖掘技术的自动视频分类方法。首先进行视频分割,形成了一个视频属性数据库;然后分别使用决策树、分类关联规则等技术对视频属性数据库进行数据挖掘,提取出决策树分类规则集和分类关联规则集;最后利用一个规则集的合并裁减算法来合并这两个分类预测规则集,形成最终的具有更高精度的视频分类规则集。通过实验验证了决策树分类预测规则和分类关联规则具有分类预测的一致性;同时实验表明,使用合并后的规则集比单独使用一个规则集来预测视频具有更高的预测准确率。  相似文献   

3.
Association rule is one of the data mining techniques involved in discovering information that represents the association among data. Data in the database sometimes appear infrequent but highly associated with a specific data. This paper proposes a technique for significant rare data by introducing second support in discovering the association rules of such data. We show that the proposed approach provides better performance as compared to standard association rules techniques.  相似文献   

4.
During electronic commerce (EC) environment, how to effectively mine the useful transaction information will be an important issue to be addressed in designing the marketing strategy for most enterprises. Especially, the relationships between different databases (e.g., the transaction and online browsing database) may have the unknown and potential knowledge of business intelligence. Two important issues of mining association rules were mentioned to address EC application in this study. The first issue is the discovery of generalized fuzzy association rules in the transaction database. The second issue is to discover association rules from the web usage data and the large itemsets identified in the transaction database. A cluster-based fuzzy association rules (CBFAR) mining architecture is then proposed to simultaneously address such two issues in this study. Three contributions were achieved as: (a) an efficient fuzzy association rule miner based on cluster-based fuzzy-sets tables is presented to identify all the large fuzzy itemsets; (b) this approach requires less contrast to generate large itemsets; (3) a fuzzy rule mining approach is used to compute the confidence values for discovering the relationships between transaction database and browsing information database. Finally, a simulated example during EC environment is provided to demonstrate the rationality and feasibility of the proposed approach.  相似文献   

5.
曾庆花  王文国 《微机发展》2007,17(7):236-239
关联规则的发现是数据挖掘中的一个重要问题,但只是对离散型数据进行处理。为解决连续数量值属性的划分出现的“尖锐边界”问题,采用模糊划分,实现数据平滑过渡。由于入侵检测系统(IDS)对训练数据要求不高,文中提出了一种使用哈希链表改进模糊关联规则挖掘的新算法,且在挖掘过程中使用了等价类快速查找频繁项集,避免了反复扫描数据库及大量重复计算检验步骤。通过一个入侵检测系统的算例显示了其优越性,来提高对入侵数据的识别能力。  相似文献   

6.
在分析科学数据网格环境下数据挖掘之特点的基础上,提出了科学数据挖掘网格服务框架.科学数据挖掘网格服务以网格服务的形式提供了科学数据网格环境下的数据挖掘解决方案.与传统的数据挖掘系统相比,科学数据挖掘网格服务具有诸多优点,更适合科学数据网格和科学数据库环境.目前已经实际应用于几个数据库中,不仅具有简单的查询检索功能,而且可以进行数据统计分析及知识发现,进一步提高了科学数据网格服务的水平.  相似文献   

7.
一种关联规则挖掘方法在客户分析中的应用   总被引:1,自引:0,他引:1  
数据挖掘(DataMining)是数据库系统和数据库应用的一个繁荣的学科前沿.Apriori算法作为数据挖掘中关联规则挖掘的算法之一,是一种最有影响的挖掘布尔关联规则频繁项集的算法.本文主要探讨Apriori算法的实现细节及其结合在电信业中的实现过程,并通过对实际数据的分析提出提高电信业务量的建议.  相似文献   

8.
Association rule mining is one of most popular data analysis methods that can discover associations within data. Association rule mining algorithms have been applied to various datasets, due to their practical usefulness. Little attention has been paid, however, on how to apply the association mining techniques to analyze questionnaire data. Therefore, this paper first identifies the various data types that may appear in a questionnaire. Then, we introduce the questionnaire data mining problem and define the rule patterns that can be mined from questionnaire data. A unified approach is developed based on fuzzy techniques so that all different data types can be handled in a uniform manner. After that, an algorithm is developed to discover fuzzy association rules from the questionnaire dataset. Finally, we evaluate the performance of the proposed algorithm, and the results indicate that our method is capable of finding interesting association rules that would have never been found by previous mining algorithms.  相似文献   

9.
在关联规则挖掘算法中,Apriori由于多次对数据库进行扫描会产生较多的候选集,在多次扫描数据库的情况下容易产生I/O开销问题,并引起数据挖掘效率低。矩阵关联规则在数据挖掘过程中没有删除非频繁项集,致使存在较多的无效扫描,对于挖掘效率的提高也不明显。该文提出了一种改进的矩阵和排序索引关联规则数据挖掘算法,首先,删除不需要的事务和项,通过矩阵相乘和查找表获得频繁的二项式集合,结合排序索引得到剩下的频繁k-项集。与矩阵关联规则算法和Apriori算法进行比较,提出的算法可以直接查找频繁项集并对数据库进行扫描,当产生频繁项集比较多或者数据库需要进行动态更新时,该算法具有较好的可行性和执行效率。实验表明,提出的矩阵排序索引算法很好地降低了内存的使用率和I/O的开销,提高了数据挖掘的效率且具有较好的可扩展性。  相似文献   

10.
数据库中关联规则信息是知识的表述形式之一,负关联规则挖掘是数据库关联信息挖掘的重要研究内容,具有广泛的应用范围。现有的挖掘方法不能获取数据库中全部的负关联规则,考虑从数据库中提取全部的负关联规则,通过(1)扫描数据库建立数据库频繁模式树DFP-tree(Database Frequent Pattern tree);(2)在精简DFP-tree的基础上获取全部极小非频繁项集ASI;(3)对ASI中极大频繁项集的向上闭包,得到全部非频繁项集;(4)在此基础上采用相关度作为规则兴趣度量之一提取负关联规则。理论和实验表明算法的正确性和效率。  相似文献   

11.
Existing parallel algorithms for association rule mining have a large inter-site communication cost or require a large amount of space to maintain the local support counts of a large number of candidate sets. This study proposes a de-clustering approach for distributed architectures, which eliminates the inter-site communication cost, for most of the influential association rule mining algorithms. To de-cluster the database into similar partitions, an efficient algorithm is developed to approximate the shortest spanning path (SSP) to link transaction data together. The SSP obtained is then used to evenly de-cluster the transaction data into subgroups. The proposed approach guarantees that all subgroups are similar to each other and to the original group. Experiment results show that data size and the number of items are the only two factors that determine the performance of de-clustering. Additionally, based on the approach, most of the influential association rule mining algorithms can be implemented in a distributed architecture to obtain a drastic increase in speed without losing any frequent itemsets. Furthermore, the data distribution in each de-clustered participant is almost the same as that of a single site, which implies that the proposed approach can be regarded as a sampling method for distributed association rule mining. Finally, the experiment results prove that the original inadequate mining results can be improved to an almost perfect level.  相似文献   

12.
何友全 《计算机工程》2006,32(15):87-89
现有的数据挖掘方法大致有两类:有候选项集和无候选项集,有候选项集的挖掘以Apriori算法为代表,其特点是产生大量的候选项集,重复多次扫描数据库,挖掘效率低,不适合大型数据库的挖掘。无候选项集的挖掘以FP-T方法为代表,但它不能同时挖掘多概念层的关联规则,对具有超大项ID的大型数据库,无法生成“树”结构,使用也受到限制。该文将FP-T原理引入多层关联规则的并发挖掘,通过构建一个特殊节点链的指针表,可实现超大规模数据库的并发、多层挖掘。对实现物流系统信息自动化及其它数据挖掘应用领域都具有极其重要的指导意义。  相似文献   

13.
Association rules form one of the most widely used techniques to discover correlations among attribute in a database. So far, some efficient methods have been proposed to obtain these rules with respect to an optimal goal, such as: to maximize the number of large itemsets and interesting rules or the values of support and confidence for the discovered rules. This paper first introduces optimized fuzzy association rule mining in terms of three important criteria; strongness, interestingness and comprehensibility. Then, it proposes multi-objective Genetic Algorithm (GA) based approaches for discovering these optimized rules. Optimization technique according to given criterion may be one of two different forms; The first tries to determine the appropriate fuzzy sets of quantitative attributes in a prespecified rule, which is also called as certain rule. The second deals with finding both uncertain rules and their appropriate fuzzy sets. Experimental results conducted on a real data set show the effectiveness and applicability of the proposed approach.  相似文献   

14.
Shared-nothing并行事务数据库系统中规则的挖掘与更新算法   总被引:1,自引:0,他引:1  
关联规则是数据挖掘中的一个重要研究内容.本文提出了Shared—nothing并行事务数据库系统(简称SNPDBS)中一种快速的关联规则挖掘算法SNPMAR,并考虑当最小支持度发生变化后SNPDBS中关联规则的高效更新问题,提出了一种有效的关联规则更新算法SNPIUA.  相似文献   

15.
提出了一种针对大型数据库、关于多频项集、动态增最式的挖掘新算法,利用前次的挖掘结果和新增物品项ID的明细数据,能有效地挖掘出频繁项集及各项ID之间的量化比例关系,给商家和物流系统提供信息指导,避免错误决策,对实现物流系统自动化及其它数据挖掘应用领域都具有极其重要的指导意义。  相似文献   

16.
一种含负项目的一般化关联规则挖掘算法   总被引:3,自引:0,他引:3  
张玉芳  彭燕  刘君  陈铭灏 《计算机工程与设计》2006,27(20):3904-3908,3934
传统的关联规则是形如A→B反映正项目之间关联关系的蕴涵式,它无法反映出数据之间隐藏的负关联关系.在表达式中引入负项目,将这种传统的关联规则扩展成包含正、负项目的一般化关联规则.介绍了一般化关联规则的概念及其相关性质定理,并加以证明,提出了一种基于频繁模式树的挖掘混合正、负项目的一般化关联规则的MGPNFP算法,对其性能进行了分析,并比较了MGPNFP算法比现有的挖掘含负项目关联规则的算法所具有的优势.  相似文献   

17.
关联规则挖掘常常会产生大量的规则,这使得用户分析和利用这些规则变得十分困难。为了帮助用户做探索式分析,提出了一种基于距离的相关性关联规则优化方法,该方法从数学分析关联规则相关性概念公式的值的特点出发,通过根据关联规则结构上的相关性差别来挖掘出包括正负两种关联规则在内的更多潜在的相关规则,实验结果表明该方法有效且可靠。  相似文献   

18.
In the field of data mining, an important issue for association rules generation is frequent itemset discovery, which is the key factor in implementing association rule mining. Therefore, this study considers the user’s assigned constraints in the mining process. Constraint-based mining enables users to concentrate on mining itemsets that are interesting to themselves, which improves the efficiency of mining tasks. In addition, in the real world, users may prefer recording more than one attribute and setting multi-dimensional constraints. Thus, this study intends to solve the multi-dimensional constraints problem for association rules generation.The ant colony system (ACS) is one of the newest meta-heuristics for combinatorial optimization problems, and this study uses the ant colony system to mine a large database to find the association rules effectively. If this system can consider multi-dimensional constraints, the association rules will be generated more effectively. Therefore, this study proposes a novel approach of applying the ant colony system for extracting the association rules from the database. In addition, the multi-dimensional constraints are taken into account. The results using a real case, the National Health Insurance Research Database, show that the proposed method is able to provide more condensed rules than the Apriori method. The computational time is also reduced.  相似文献   

19.
崔建  李强  杨龙坡 《计算机科学》2011,38(4):216-220
为进一步解决对大型事务数据库进行关联规则挖掘时产生的CPU时间开销大和I/O操作频繁的问题,给出了一种基于垂直数据分布的改进关联规则挖掘算法,称为VARMLDb算法。该算法首先有效地把数据库分为内存可以满足要求的若干划分,然后结合有向无环图和垂直数据形式diffse、差集来存储和计算频繁项集,极大地减少了存储中间结果所需的内存大小,解决了传统垂直数据挖掘算法对稠密数据库挖掘效率低下的问题,使该算法可有效地适用于大型稠密数据库的关联规则挖掘。整个算法吸取CARMA算法的优势,只需扫描两次数据库便可完成挖掘过程。实验结果表明该算法是正确的,在大型稠密数据库中,VARMLDb算法具有较高的执行效率。  相似文献   

20.
针对传统数据挖掘中的“尖锐边界”问题,采用将模糊理论和关联规则挖掘技术相结合的思想,在改进传统Apriori算法的基础上,结合多层关联规则挖掘的方法,提出了一种模糊多层关联规则挖掘算法。对模糊多层关联规则挖掘的基本概念进行了定义,详细描述了模糊多层关联规则挖掘算法。最后用Visual FoxPro6.0语言实现了该算法程序,通过交易数据库挖掘实验表明算法是有效的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号