首页 | 本学科首页   官方微博 | 高级检索  
     

一种新的演化文本流聚类算法
引用本文:邓维维,彭宏.一种新的演化文本流聚类算法[J].计算机科学,2007,34(9):125-127.
作者姓名:邓维维  彭宏
作者单位:华南理工大学计算机学院,广州,510641
基金项目:国家自然科学基金 , 广东省自然科学基金
摘    要:数据流的聚类作为聚类的一个分支,已经成为了数据挖掘的研究热点。虽然已经有不少数据流算法出现,但是大部分都是针对低维的数值型数据,很少有高维文本流的研究。本文在传统的数据流聚类框架基础上,提出了一种新的文本微聚类结构体,它更适合文本聚类,同时还将在线微聚类分为潜在微聚类和异常微聚类,提高了对孤立点的适应能力。实验表明该算法相对于其他文本流聚类算法更有效。

关 键 词:聚类  数据流  文本流

An Algorithm for Clustering Evolving Text Data Stream with Outliers
DENG Wei-Wei,PENG Hong.An Algorithm for Clustering Evolving Text Data Stream with Outliers[J].Computer Science,2007,34(9):125-127.
Authors:DENG Wei-Wei  PENG Hong
Affiliation:Computer Science Department, South China University of China, Guangzhou 510641
Abstract:As a branch of clustering, data stream clustering has become a hot spot in data mining. Although there are many stream clustering algorithms, they are only suitable for low dimensional numeric data type, and few of them are designed for high dimensional text streams. A novel online micro cluster structure based on the traditional stream clustering framework was proposed and it is suitable for clustering text. Dividing the online micro cluster into potential and outlier micro clusters also brings advantage when outliers appear frequently in stream. Experiments show that these methods bring advancements for processing text streams when compared to others.
Keywords:Clustering  Data stream  Text stream
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号