一种基于Spark的不确定数据集频繁模式挖掘算法 A Spark-based Frequent Patterns Mining Algorithm for Uncertain Datasets期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于Spark的不确定数据集频繁模式挖掘算法

引用本文：	杨阳,丁家满,李海滨,贾连印,游进国,姜瑛.一种基于Spark的不确定数据集频繁模式挖掘算法[J].信息与控制,2019,48(3):257-264.

作者姓名：	杨阳丁家满李海滨贾连印游进国姜瑛

作者单位：	昆明理工大学信息工程与自动化学院, 云南昆明 650500

基金项目：	国家自然科学基金资助项目（51467007，61562054，61462050）

摘要：	如何在海量不确定数据集中提高频繁模式挖掘性能是目前研究的热点.传统算法大多是以期望、概率或者权重等单一指标为数据项集支持度,在大数据背景下,同时考虑概率和权重支持度的算法难以兼顾其执行效率.为此,本文提出一种基于Spark的不确定数据集频繁模式挖掘算法(UWEFP),首先,为了同时兼顾数据项的概率和权重,计算一项集的最大概率权重值并进行剪枝;然后,为了减少对数据集的多次扫描,结合Spark框架的优点,设计了一种具有FP-tree特征的新颖的UWEFP-tree结构进行模式树的构建及挖掘;最后在Spark环境下,以UCI数据集进行实验验证.实验结果表明本文的方法在保证挖掘结果的同时,提高了效率.
关键词：	不确定数据数据挖掘频繁模式 SPARK
收稿时间：	2018-07-25
A Spark-based Frequent Patterns Mining Algorithm for Uncertain Datasets

YANG Yang,DING Jiaman,LI Haibin,JIA Lianyin,YOU Jinguo,JIANG Ying.A Spark-based Frequent Patterns Mining Algorithm for Uncertain Datasets[J].Information and Control,2019,48(3):257-264.

Authors:	YANG Yang DING Jiaman LI Haibin JIA Lianyin YOU Jinguo JIANG Ying

Affiliation:	Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China

Abstract:	In recent years, improving the performance of mining frequent patterns in massive uncertain datasets has become an active research topic. Most traditional algorithms for mining frequent patterns consider only a single factor of data items-any of expectation, probability, or weight, while for those algorithms that consider both probability and weight, it is difficult to balance execution efficiency when big data are involved. Therefore, we propose a Spark framework-based algorithm for mining frequent patterns according to expected weight for uncertain datasets (UWEFP for short). To consider both the probabilities and weights of items, UWEFP first calculates the maximum probability weight value of one set and to prune them. A novel UWEFP-tree structure with the advantages of Spark framework is designed to mine frequent patterns; it has the FP-tree characteristics and reduces the time of scanning the datasets. Finally, in the Spark environment, UCI datasets are used to verify the algorithm. The experimental results show that the proposed algorithm is effective and has excellent performance.

Keywords:	uncertain data data mining frequent patterns Spark
本文献已被维普等数据库收录！
	点击此处可从《信息与控制》浏览原始摘要信息
	点击此处可从《信息与控制》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏