首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于术语簇和关联规则的文档聚类方法
引用本文:徐建民,成岳鹏,辛丽军. 一种基于术语簇和关联规则的文档聚类方法[J]. 计算机工程与应用, 2007, 43(5): 178-181,188
作者姓名:徐建民  成岳鹏  辛丽军
作者单位:河北大学,数学与计算机学院,河北,保定,071002;河北大学,图书馆,河北,保定,071002
基金项目:国家自然科学基金 , 河北省科学技术研究与发展计划
摘    要:提出一种新的基于术语簇和关联规则的文档聚类方法。首先对文档集合进行分词,根据术语之间的平均互信息形成术语簇,用术语簇来表示文档矢量空间模型,使用关联规则挖掘文档的初始聚类,对此进行聚类分析获得最终的文档聚类。实验结果表明,与传统的聚类方法相比,其运行速度快,聚类效果和聚类质量都有明显提高。

关 键 词:术语簇  关联规则  文档聚类  Web挖掘  矢量空间模型
文章编号:1002-8331(2007)05-0178-04
修稿时间:2006-05-01

Document clustering approach based on term clustering and association rules
XU Jian-min,CHENG Yue-peng,XIN Li-jun. Document clustering approach based on term clustering and association rules[J]. Computer Engineering and Applications, 2007, 43(5): 178-181,188
Authors:XU Jian-min  CHENG Yue-peng  XIN Li-jun
Affiliation:1.Mathematics and Computer College, Hebei University, Baoding, Hebei 071002, China; 2.Hebei University Library, Baoding, Hebei 071002, China
Abstract:This paper proposes a new document clustering approach based on term clustering and association rules.In this approach,firstly we extract words from document collection,then construct term clustering according to AMI(Average Mutual Information) between terms,the document VSM(Vector Space Model) is represented by term clustering,then we use association rules to mine initial document clustering,finally we do the clustering analysis to get final document clustering.The experimental results show that the performance and clustering quality of this approach are obviously improved than those of traditional methods in the procession of document clustering.
Keywords:term clustering   association rules   document clustering   Web mining   Vector Space Model
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号