首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进K-means算法的微博舆情分析研究
引用本文:谢修娟,李香菊,莫凌飞. 基于改进K-means算法的微博舆情分析研究[J]. 计算机工程与科学, 2018, 40(1): 155-158
作者姓名:谢修娟  李香菊  莫凌飞
作者单位:;1.东南大学成贤学院计算机工程系;2.东南大学仪器科学与工程学院
基金项目:江苏高校哲学社会科学基金(2016SJD880186);江苏省现代教育技术研究课题(2016-R-46509);“十二五”国家科技支撑计划(2013BAJ05B02-2)
摘    要:为避免初始聚类中心选取到孤立点容易导致聚类结果陷入局部最优的不足,提出一种基于密度的K-means(聚类算法)初始聚类中心选择方法。该方法首先计算每个数据对象与其它数据对象间的平均相似度,找出平均相似度高于某固定阈值的对象视作核心对象,再从核心对象中选取彼此间最不相似的作为初始聚类中心。通过自构建的新浪微博抓取工具,分别抓取不同类别的数千条数据,经过分词、预处理及权重计算后,用改进的K-means算法对其进行聚类分析,查准/全率较传统的K-means算法要稳定,聚类的平均时间也得到缩短。实验结果表明,改进后的算法在微博聚类中有更高的准确性和稳定性,有利于从大量的微博数据中发现热点舆情。

关 键 词:微博  聚类中心  K-means聚类算法  密度
收稿时间:2016-02-22
修稿时间:2018-01-25

Microblogging opinion analysis basedon an improved K-means algorithm
XIE Xiu-juan,LI Xiang-ju,MO Ling-fei. Microblogging opinion analysis basedon an improved K-means algorithm[J]. Computer Engineering & Science, 2018, 40(1): 155-158
Authors:XIE Xiu-juan  LI Xiang-ju  MO Ling-fei
Affiliation:(1.Department of Computer Engineering,Southeast University Chengxian College,Nanjing 210000;2.School of Instrument Science and Engineering,Southeast University,Nanjing 210000,China)
Abstract:In order to avoid selecting isolated points as the initial clustering center which can cause clustering results to fall into local optimum, we propose a new K-means (clustering algorithm) initial clustering center selection method based on density. This algorithm firstly calculates the average similarity between each data object and the others, and finds the core objects whose average similarities are higher than a fixed threshold. The least similar core object to each other is taken as the initial clustering center. We build a crawler for Sina Microblog to grab thousands of different types of data. After dividing words, pretreatment and weight calculation, we use the improved K-means algorithm for clustering analysis. Compared with the traditional K-means algorithm, our proposal has a more stable precision/full ratio, and the average clustering time is also shortened. Experimental results show that the improved algorithm has higher accuracy and better stability in microblog clustering, and can be used in discovering public opinion from a large number of microblog data.
Keywords:microblog  clustering center  K-means clustering algorithm  density  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号