首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于群体智能的Web文档聚类算法
引用本文:吴斌,傅伟鹏,郑毅,刘少辉,史忠植. 一种基于群体智能的Web文档聚类算法[J]. 计算机研究与发展, 2002, 39(11): 1429-1435
作者姓名:吴斌  傅伟鹏  郑毅  刘少辉  史忠植
作者单位:1. 中国科学院计算技术研究所智能信息处理开放重点实验室,北京,100080;北京邮电大学计算机科学与技术学院,北京,100876
2. 中国科学院计算技术研究所智能信息处理开放重点实验室,北京,100080
基金项目:国家自然科学基金项目 ( 6 0 0 730 19,90 10 40 2 1),北京市自然科学基金重点项目 ( 4 0 110 0 3)资助
摘    要:将群体智能聚类模型运用于文档聚类,提出了一种基于群体智能的Web文档聚类算法,首先运用向量空间模型表示Web文档信息,采用常规方法如消除无用词和特征词条约简法则得到文本特征集,然后将文档的向量随机分布到一个平面上,运用基于群体智能的聚类方法进行文档聚类,最后从平面上采用递归算法收集聚类结果,为了改善算法的实用性,将原算法与k均值算法结合提出一种混合聚类算法,通过实验比较,结果表明基于群体智能的Web文档聚类算法具有较好的聚类特性,它能将与一个主题相关的Web文档较完全而准确地聚成一类。

关 键 词:群体智能 Web 文档聚类算法 自组织聚类 群体相似度 互联网 信息检索

A CLUSTERING ALGORITHM BASED ON SWARM INTELLIGENCE FOR WEB DOCUMENT
Abstract:Swarm intelligence due to its flexibility, robustness and self-organization has been applied in a variety of areas. A clustering algorithm based on swarm intelligence (CSI) for web documents is proposed. Firstly, web documents, which are denoted by vector space model with reduced document feature set, are randomly projected on a plane. Then, clustering analysis is conducted by a clustering method derived from a basic model interpreting ant colony organization of cemeteries. The artificial ants perform random walks on the plane and pick up or drop projected data items with the probability which is converted from swarm similarity within a local region by probability conversion function. Clusters are visually formed on the plane by ant colony collective actions in the absence of central controls. Finally, the clustering results are collected from the plane by a recursive algorithm. Each clustering center is labeled by the most weighted feature. A hybrid clustering algorithm CSIM is also proposed by combining the CSI with the k -means algorithm. CSIM inherits the prominent properties of both swarm intelligence and k -means. It also offsets the weakness of those two techniques. The experiment results and the comparison with other document clustering methods show that this web document clustering algorithm based on swarm intelligence has good clustering performance. The web documents focusing on a subject are rather completely and exactly clustering together.
Keywords:swarm intelligence   document clustering   self-organizing clustering   swarm similarity
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号