首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 22 毫秒
1.
基于Apriori算法的水平加权关联规则挖掘   总被引:19,自引:2,他引:19  
关联规则挖掘可以发现大量数据中项集之间有趣的关联或相关联系,并已在许多领域得到了广泛的应用。目前业界已经提出了许多发现关联规则的算法,这些算法都认为每个数据对规则的重要性相同。但在实际应用中,用户会比较倾向于自己最感兴趣或认为最重要的那部分项目,因此有必要加强这些项目对规则的影响,同时减弱另一些用户兴趣不大或认为不重要的项目对规则的影响。为此,论文提出了水平加权关联规则的问题,并结合Apriori算法,加以改进,给出了关于该问题的解决方案及有效算法New_Apriori。  相似文献   

2.
一种新的加权关联规则模型   总被引:5,自引:3,他引:5  
关联规则挖掘可以发现大量数据项集之间隐含的关系,在许多领域得到了广泛应用。目前很多关联规则挖掘算法已经被提出,这些算法一般都认为每个数据项的重要性相同。然而在现实中各个项目的重要性往往不同,从决策者角度出发,他们往往会优先考虑利润较高的项目,而忽略利润较低的项目。论文分析了现有加权关联规则文献中存在的问题,提出了一种新的加权关联规则模型,给出了有效挖掘加权频繁项集的MWFI算法。  相似文献   

3.
郁雪  张昊男 《计算机应用研究》2020,37(4):977-981,985
基于矩阵分解技术的社会化推荐通过加入用户信任关系来加强学习准确性,但忽略了物品之间的关联信息在模型分解过程中对用户兴趣的影响。对此首先提出在物品相似度计算方法中加入用户参与度进行改进,并构建了融合物品关联正则项和信任用户正则项双重约束的矩阵分解推荐模型,在优化隐式特征矩阵过程中体现了物品之间的关联信息对推荐的重要影响。最后通过对两个不同稀疏级别的数据集的实验证明,相比主流的矩阵分解模型,提出的双重正则项的矩阵分解模型能够提高稀疏数据集上预测评分的准确性,并能明显缓解用户冷启动问题。  相似文献   

4.
现有基于多兴趣框架的序列推荐方法仅从用户近期交互序列中学习得到用户多兴趣表示,忽略了数据集中项目间的关联信息。针对这一问题,提出了一种关联项目增强的多兴趣序列推荐方法IAMIRec(item associations aware multi-interest sequential recommendation method)。首先通过数据集中用户交互序列计算得到项目关联集合和对应的项目关联矩阵,然后根据项目关联矩阵通过多头自注意力机制建模用户的近期交互序列,最后使用多兴趣框架学习得到用户的多个兴趣向量并进行top-N推荐。在三个数据集上对该方法进行了测试与分析,IAMIRec在recall、NDCG(normalized discounted cumulative gain)和hit rate指标上的表现均优于相关方法。实验结果说明 IAMIRec可以实现更优的推荐性能,也表明引入项目关联信息可以有效增强用户的多兴趣表示。  相似文献   

5.
基于多维数据模型的交叉层关联规则挖掘   总被引:3,自引:0,他引:3  
多层关联规则是带有一定概念分层的关联规更哇,它描述了不同抽象级别上数据项之间的关联性,且不同级别上的关联性具有不同的指导意义.但目前已讨论的多层关联规则,大都局限于挖掘同一抽象层上数据项之间的关联,因而,针对这一问题,本文对已有的FP—Tree算法进行扩充和改进,实现了既能挖掘同一抽象层上也能挖掘不同抽象层上数据项之间关联性的多层关联挖掘算法,即交叉层关联规则挖掘算法FP—Tree*.同时,在算法实施之前,还结合多层关联挖掘本身的特点,对现有的数据存储结构进行改进,提出用字符序列对事务项编码的方法,从而简化了大量的数据预处理工作.  相似文献   

6.
Given a user-specified minimum correlation threshold /spl theta/ and a market-basket database with N items and T transactions, an all-strong-pairs correlation query finds all item pairs with correlations above the threshold /spl theta/. However, when the number of items and transactions are large, the computation cost of this query can be very high. The goal of this paper is to provide computationally efficient algorithms to answer the all-strong-pairs correlation query. Indeed, we identify an upper bound of Pearson's correlation coefficient for binary variables. This upper bound is not only much cheaper to compute than Pearson's correlation coefficient, but also exhibits special monotone properties which allow pruning of many item pairs even without computing their upper bounds. A two-step all-strong-pairs correlation query (TAPER) algorithm is proposed to exploit these properties in a filter-and-refine manner. Furthermore, we provide an algebraic cost model which shows that the computation savings from pruning is independent of or improves when the number of items is increased in data sets with Zipf-like or linear rank-support distributions. Experimental results from synthetic and real-world data sets exhibit similar trends and show that the TAPER algorithm can be an order of magnitude faster than brute-force alternatives. Finally, we demonstrate that the algorithmic ideas developed in the TAPER algorithm can be extended to efficiently compute negative correlation and uncentered Pearson's correlation coefficient.  相似文献   

7.
In item promotion applications, there is a strong need for tools that can help to unlock the hidden profit within each individual customer’s transaction history. Discovering association patterns based on the data mining technique is helpful for this purpose. However, the conventional association mining approach, while generating “strong” association rules, cannot detect potential profit-building opportunities that can be exposed by “soft” association rules, which recommend items with looser but significant enough associations. This paper proposes a novel mining method that automatically detects hidden profit-building opportunities through discovering soft associations among items from historical transactions. Specifically, this paper proposes a relaxation method of association mining with a new support measurement, called soft support, that can be used for mining soft association patterns expressed with the “most” fuzzy quantifier. In addition, a novel measure for validating the soft-associated rules is proposed based on the estimated possibility of a conditioned quantified fuzzy event. The new measure is shown to be effective by comparison with several existing measures. A new association mining algorithm based on modification of the FT-Tree algorithm is proposed to accommodate this new support measure. Finally, the mining algorithm is applied to several data sets to investigate its effectiveness in finding soft patterns and content recommendation.  相似文献   

8.
Recent advances in educational technologies and the wide-spread use of computers in schools have fueled innovations in test construction and analysis. As the measurement accuracy of a test depends on the quality of the items it includes, item selection procedures play a central role in this process. Mathematical programming and the item response theory (IRT) are often used in automating this task. However, when the item bank is very large, the number of item combinations increases exponentially and item selection becomes more tedious. To alleviate the computational complexity, researchers have previously applied heuristic search and machine learning approaches, including neural networks, to solve similar problems. This paper proposes a novel approach that uses abductive network modeling to automatically identify the most-informative subset of test items that can be used to effectively assess the examinees without seriously degrading accuracy. Abductive machine learning automatically selects only effective model inputs and builds an optimal network model of polynomial functional nodes that minimizes a predicted squared error criterion. Using a training dataset of 1500 cases (examinees) and 45 test items, the proposed approach automatically selected only 12 items which classified an evaluation population of 500 cases with 91% accuracy. Performance is examined for various levels of model complexity and compared with that of statistical IRT-based techniques. Results indicate that the proposed approach significantly reduces the number of test items required while maintaining acceptable test quality.  相似文献   

9.
组件化服务化软件系统由松耦合的异构服务组件构成,每个服务组件都包含着大量可高度灵活配置的配置项.服务组件之间存在着复杂的依赖关系,导致其配置项相互关联,使得系统部署、更新或迁移易于出错.对于相互关联的配置项,更改一个配置项就需要修改与之关联的其他配置项,否则将违反约束条件,导致系统出现故障.因而,分析配置项关联性对于保障系统可靠性至关重要,但需要跨产品的领域知识.提出了一种基于关联挖掘的服务一致化配置方法.该方法爬取配置文件样本数据以将搜索范围缩小到频繁改变的配置项,根据配置项的名称、取值和类型的相似性计算,为配置项对生成关联系数,使用定义的过滤规则确定候选关联配置项对集合,输出排序的配置项关联性列表以供查询.基于该方法部署了典型应用系统进行实验和评估,实验结果表明:该方法能够准确检测配置项的关联性.  相似文献   

10.
一种改进的Apriori挖掘关联规则算法   总被引:2,自引:0,他引:2  
关联规则挖掘可以发现大量数据中项集之间有趣的联系,并已在许多领域得到了广泛的应用。但传统关联规则挖掘很少考虑数据项的重要程度,这些算法认为每个数据对规则的重要性相同,实际挖掘的结果不是很理想。为了挖掘出更具有价值的规则,文中提出了一种加权的关联规则算法,即用频度和利润来标识该项的重要性,然后对经典Apriori算法进行改进。最后用实例对改进后算法进行验证,结果证明改进后算法是合理有效的,能够挖掘出更具价值的信息。  相似文献   

11.
一种高效的多层和概化关联规则挖掘方法   总被引:4,自引:1,他引:3  
毛宇星  陈彤兵  施伯乐 《软件学报》2011,22(12):2965-2980
通过对分类数据的深入研究,提出了一种高效的多层关联规则挖掘方法:首先,根据分类数据所在的领域知识构建基于领域知识的项相关性模型DICM(domain knowledge-based item correlation model),并通过该模型对分类数据的项进行层次聚类;然后,基于项的聚类结果对事务数据库进行约简划分;最后,将约简划分后的事务数据库映射至一种压缩的AFOPT树形结构,并通过遍历AFOPT树替代原事务数据库来挖掘频繁项集.由于缩小了事务数据库规模,并采用了压缩的AFOPT结构,所提出的方法有效地节省了算法的I/O时间,极大地提升了多层关联规则的挖掘效率.基于该方法,给出了一种自顶向下的多层关联规则挖掘算法TD-CBP-MLARM和一种自底向上的多层关联规则挖掘算法BU-CBP-MLARM.此外,还将该挖掘方法成功扩展至概化关联规则挖掘领域,提出了一种高效的概化关联规则挖掘算法CBP-GARM.通过大量人工随机生成数据的实验证明,所提出的多层和概化关联规则挖掘算法不仅可以确保频繁项集挖掘结果的正确性和完整性,还比现有同类最新算法具有更好的挖掘效率和扩展性.  相似文献   

12.

Privacy preservation in distributed database is an active area of research. With the advancement of technology, massive amounts of data are continuously being collected and stored in distributed database applications. Indeed, temporal associations and correlations among items in large transactional datasets of distributed database can help in many business decision-making processes. One among them is mining frequent itemset and computing their association rules, which is a nontrivial issue. In a typical situation, multiple parties may wish to collaborate for extracting interesting global information such as frequent association, without revealing their respective data to each other. This may be particularly useful in applications such as retail market basket analysis, medical research, academic, etc. In the proposed work, we aim to find frequent items and to develop a global association rules model based on the genetic algorithm (GA). The GA is used due to its inherent features like robustness with respect to local maxima/minima and domain-independent nature for large space search technique to find exact or approximate solutions for optimization and search problems. For privacy preservation of the data, the concept of trusted third party with two offsets has been used. The data are first anonymized at local party end, and then, the aggregation and global association is done by the trusted third party. The proposed algorithms address various types of partitions such as horizontal, vertical, and arbitrary.

  相似文献   

13.
在分布式数据流中,数据流之间相关性分析可以揭示被监测对象之间存在的内在联系。提出了一个基于基窗口的相关系数的计算方法,该方法先将计算相关系数的公式变形为由适合基窗口聚集的因子组成,然后用基于基窗口的方法聚集每个因子。基于基窗口的聚集方法是将窗口中的数据项划分成一系列基窗口并分别对基窗口进行计算。当窗口随机滑动后,新窗口中数据项的聚集可以部分地利用上一次窗口聚集的结果。模拟实验表明,与每次对窗口中所有数据进行聚集相比,基于基窗口的方法可以有效地降低数据流相关系数的计算时间。  相似文献   

14.
为了增大统一组织的在线考试中考生试卷间题目排列的差异性,防止邻座考生相互抄袭,提出了题目位置相似度的定义,并根据位置相似度提出并实现了由同一份试卷,生成多份题目排列顺序不同的考生试卷(变换卷)的模型。通过与随机排列试卷题目方法生考生试卷对比,结果表明,在按照题目类型和难度分组时,使用该模型,邻座考生间试卷题目排列的差异性明显大于随机方式。  相似文献   

15.
数据挖掘是从数据库中发现潜在有用知识或者感兴趣模式的过程。在数据挖掘领域中主要集中于单一支持度下的关联规则挖掘,在事务数据库中发现项目之间的关联性,而在实际应用中,项目可以有不同的最小支持度,不同的项目可能具有不同的标准去判断其重要性,因此提出一个在最大值支持度约束下,发现有用的模糊关联规则挖掘算法,在该约束下,利用逐层搜索的迭代方法发现频繁项目集,通过实例证明了该挖掘算法是易于理解和有意义的,具有很好的效率。  相似文献   

16.
简单数据集可以通过关联规则得到在数据间的相互关系;相当多的情况下,由于不能从关联规则得到隐藏在数据间的相互关系,需要按间接关联规则分析出数据项集在交易集合中出现的频度,挖掘隐藏在数据间的相互关系。文中通过使用概念分层和基于近邻的方法,探讨利用FP树产生的频繁项集,对候选关联检验其是否满足项对支持度条件,并利用这个频繁项集挖掘事务的间接关联,找到挖掘事务的间接关联的内在规律,构造出不依赖中介条件的间接关联挖掘算法。  相似文献   

17.
Modern production planning and inventory control has been developed in order to treat more practical and more complicated circumstances, such as researching supply chain instead of single stock point; multi-items with correlation instead of single item and so on. In this paper, how to classify inventory items which are correlated each other is discussed by using the concept of ‘cross-selling effect’. In history, the ABC classification is usually used for inventory items aggregation because the number of inventory items is so large that it is not computationally feasible to set stock and service control guidelines for each individual item. A fundamental principle in ABC classification is that ranking all inventory items with respect to a notion of profit based on historical transactions. The difficulty is that the profit of one item not only comes from its own sales, but also from its influence on the sales of other items or reverse, i.e., the ‘cross-selling effect’. We had previously developed a classification approach for inventory items by using the association rules to deal with the ‘cross-selling effect’ and found that a very different classification can be obtained when comparing with traditional ABC classification. However, the ‘cross-selling effect’ may be considered in different ways. In this paper, a new consideration of inventory classification based on loss rule is presented. The lost profit of item/itemset with ‘cross-selling effect’ is discussed and defined as criterion for evaluating of importance of item, based on which new algorithms on classifying inventory items, also on discovering maximum profit item selection, are presented. A simple example is used to explain the new algorithm, and large amount of empirical experiments, both on real database collected from Japanese convenient store and on downloaded benchmark database, are implemented to evaluate the performances on effectiveness and utility. The results show that the proposed approach in this paper can gain a well insight into the cross-selling effect among items and is applicable for large-sized transaction database.  相似文献   

18.
基于隐马尔可夫模型的Web信息抽取   总被引:1,自引:1,他引:0       下载免费PDF全文
刘亚清  陈荣 《计算机工程》2009,35(18):25-27
针对Web信息抽取领域中存在的“项缺失”和“项无序”问题,提出一种基于隐马尔可夫模型的Web信息抽取方法。将Web文档解析为一棵扩展的DOM树,映射待抽取的信息项为状态,映射待抽取的信息项在扩展DOM树中的路径为词汇,使用归纳算法构造隐马尔可夫模型。实验结果证明该方法可以获得更好的抽取性能。  相似文献   

19.
Association mining is a well explored topic applied to various fields. In this article, the associations among the genes have been identified from microarray gene expression data. Here a methodology, called Fuzzy Correlated Association Mining (FCAM), is developed for identifying the associations among the genes that have altered quite significantly from normal state to diseased state with respect to their expression patterns. This idea leads to predict the disease mediating genes along with their altered associations. The proposed methodology involves generation of fuzzy gene sets, construction of fuzzy items, computation of fuzzy support for fuzzy items and fuzzy correlation coefficient of a pair of fuzzy items, generation of associations, and identification of altered associations from normal to diseased state. The concept of finding fuzzy correlation between two groups of items, generation of altered associations among the items (groups of items) and then rank these items (groups of items) according to their importance are the novel contribution of the present article. The effectiveness of the methodology has been demonstrated on five gene expression data sets dealing with human lung cancer, colon cancer, sarcoma, breast cancer and leukemia. As a result, some possible genes, like IGFBP3, ERBB2, TP53, HBB, KRAS, PTEN, CALCA, CDKN2A, has been found as important genes that may mediate the development of various cancers considered here. For comparison, we have considered 11 existing association rule mining algorithms. The results are appropriately validated in terms of gene–gene interactions, functional enrichment, biochemical pathways, and using NCBI database.  相似文献   

20.
One fundamental problem for visualizing frequent itemsets and association rules is how to present a long border of frequent itemsets in an itemset lattice. Another problem comes from the lack of an effective visual metaphor to represent many-to-many relationships. This work proposes an approach for visualizing frequent itemsets and many-to-many association rules by a novel use of parallel coordinates. An association rule is visualized by connecting items in the rule, one item on each parallel coordinate, with continuous polynomial curves. In the presence of item taxonomy, each coordinate can be used to visualize an item taxonomy tree which can be expanded or shrunk by user interaction. This user interaction introduces a border, which separates displayable itemsets from nondisplayable ones, in the generalized itemset lattice. Only those itemsets that are both frequent and displayable are considered to be displayed. This approach of visualizing frequent itemsets and association rules has the following features: 1) It is capable of visualizing many-to-many rules and itemsets with many items. 2) It is capable of visualizing a large number of itemsets or rules by displaying only those ones whose items are selected by the user. 3) The closure properties of frequent itemsets and association rules are inherently supported such that the implied ones are not displayed. Usefulness of this approach is demonstrated through examples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号