首页 | 本学科首页   官方微博 | 高级检索  
     

基于云计算和改进K-means算法的海量用电数据分析方法
引用本文:张承畅,张华誉,罗建昌,何丰. 基于云计算和改进K-means算法的海量用电数据分析方法[J]. 计算机应用, 2018, 38(1): 159-164. DOI: 10.11772/j.issn.1001-9081.2017071660
作者姓名:张承畅  张华誉  罗建昌  何丰
作者单位:1. 重庆邮电大学 光电工程学院, 重庆 400065;2. 重庆邮电大学 通信与信息工程学院, 重庆 400065
基金项目:中国电力科学研究院科技基金资助项目(XXB51201603155);国网北京经济技术研究院科技基金资助项目(15JS191)。
摘    要:针对小区居民用电数据挖掘效率低、数据量大等难题,进行了基于云计算和改进K-means算法的海量用电数据分析方法研究。针对传统K-means算法中存在初始聚类中心和K值难确定的问题,提出一种基于密度的K-means改进算法。首先,定义样本密度、簇内样本平均距离的倒数和簇间距离三者乘积为权值积,通过最大权值积法依次确定聚类中心,提高了聚类的准确率;然后,基于MapReduce模型实现改进算法的并行化,提高了聚类的效率;最后,以小区400户家庭用电数据为基础,进行海量电力数据的挖掘分析实验。以家庭为单位,提取出用户的峰时耗电率、负荷率、谷电负荷系数以及平段用电量百分比,建立聚类的数据维度特征向量,完成相似用户类型的聚类,同时分析出各类用户的行为特征。基于Hadoop集群的实验结果证明提出的改进K-means算法运行稳定、可靠,具有很好的聚类效果。

关 键 词:用电数据  云计算  改进K-means算法  MapReduce模型  并行化  
收稿时间:2017-07-04
修稿时间:2017-08-21

Massive data analysis of power utilization based on improved K-means algorithm and cloud computing
ZHANG Chengchang,ZHANG Huayu,LUO Jianchang,HE Feng. Massive data analysis of power utilization based on improved K-means algorithm and cloud computing[J]. Journal of Computer Applications, 2018, 38(1): 159-164. DOI: 10.11772/j.issn.1001-9081.2017071660
Authors:ZHANG Chengchang  ZHANG Huayu  LUO Jianchang  HE Feng
Affiliation:1. College of Optoelectronic Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;2. College of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Abstract:For such difficulties as low mining efficiency and large amount of data that the data mining of residential electricity data has to be faced with, the analysis based on improved K-means algorithm and cloud computing on massive data of power utilization was researched. As the initial cluster center and the value K are difficult to determine in traditional K-means algorithm, an improved K-means algorithm based on density was proposed. Firstly, the product of sample density, the reciprocal of the average distance between the samples in the cluster, and the distance between the clusters were defined as weight product, the initial center was determined successively according to the maximum weight product method and the accuracy of the clustering was improved. Secondly, the parallelization of improved K-means algorithm was realized based on MapReduce model and the efficiency of clustering was improved. Finally, the mining experiment of massive power utilization data was carried out on the basis of 400 households' electricity data. Taking a family as a unit, such features as electricity consumption rate during peak hour, load rate, valley load coefficient and the percentage of power utilization during normal hour were calculated, and the feature vector of data dimension was established to complete the clustering of similar user types, at the same time, the behavioral characteristics of each type of users were analyzed. The experimental results on Hadoop cluster show that the improved K-means algorithm operates stably and efficiently and it can achieve better clustering effect.
Keywords:power utilization data  cloud computing  improved K-means algorithm  MapReduce model  parallelization  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号