首页 | 本学科首页   官方微博 | 高级检索  
     

海量中文短信文本密度聚类研究
引用本文:周泓,刘金岭. 海量中文短信文本密度聚类研究[J]. 计算机工程, 2010, 36(22): 81-82
作者姓名:周泓  刘金岭
作者单位:(淮阴工学院计算机工程学院,江苏 淮安 233003)
摘    要:根据短信文本的特性,给出一种基于密度的中文短信聚类的方法,该方法将文本数据中具有高密度的区域划分为簇,构造一个可达相似度的升序排列的种子队列存储待扩张的短信文本,选择大阈值相似度可达的对象,即快速定位稠密空间的文本对象使较高密度的簇优先完成。实验结果表明,该聚类方法比K-means提高10倍左右的效率。

关 键 词:密度    邻域  短信文本  聚类

Study on Mass Chinese Short Message Text Density Clustering
ZHOU Hong,LIU Jin-ling. Study on Mass Chinese Short Message Text Density Clustering[J]. Computer Engineering, 2010, 36(22): 81-82
Authors:ZHOU Hong  LIU Jin-ling
Affiliation:(Faculty of Computer Engineering, Huaiyin Institute of Technology, Huaian 223003, China)
Abstract:According to the characteristics of short message text, a clustering method of the Chinese message based on density is given. High-density region of the text data is divided into clusters and a seed queue is constructed, which is arranged in ascending order of the reachable similarity, to store the text of short message text to be expanded. The text message is disposed in a specific order. In order to make higher-density clusters to complete first, the object is selected according to a greater threshold similarity, namely that the dense space text object which can be rapidly located makes the high-density cluster complete first. Experimental result shows that this clustering method’s efficiency is increased 10 times of K-means method.
Keywords:density  cluster  neighborhood  short message text  clustering
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号