首页 | 本学科首页   官方微博 | 高级检索  
     

基于聚类分析的图模型文档分类
引用本文:孟海东,刘小荣.基于聚类分析的图模型文档分类[J].计算机应用与软件,2012(1):171-174,229.
作者姓名:孟海东  刘小荣
作者单位:内蒙古科技大学信息工程学院
摘    要:针对传统向量空间模型中的特征项孤立处理问题,首先通过χ2统计和特征聚类相结合的模式实现特征降维,然后使用图模型来建立词和词之间相互关联信息,最后运用KNN方法进行文档分类测试。该算法提高了稀有词对分类的贡献,强化了关联词的分类效果,并降低了文档向量的维数。实验证明,该算法提高了分类的准确率和召回率。

关 键 词:聚类分析  图模型  文档分类

DOCUMENT CATEGORISATION USING GRAPH MODEL BASED ON CLUSTERING ANALYSIS
Meng Haidong Liu Xiaorong.DOCUMENT CATEGORISATION USING GRAPH MODEL BASED ON CLUSTERING ANALYSIS[J].Computer Applications and Software,2012(1):171-174,229.
Authors:Meng Haidong Liu Xiaorong
Affiliation:Meng Haidong Liu Xiaorong(School of Information Engineering,Inner Mongolia University of Science and Technology,Baotou 014010,Inner Mongolia,China)
Abstract:Directing at the problem in traditional vector space model that the feature items are dealt with in isolation,in this paper the feature reduction is firstly done through the model of χ2 statistics in combination with feature clustering,and then the graph model is used to establish correlative information between the words.At the end,KNN method is utilised for document classification test.The algorithm improves the contribution of rare words to the classification,enhances the classification performance of conjunctive words and reduces the number of dimensions in document vectors.Experiment indicates that the algorithm improves the accuracy and recall rates of classification.
Keywords:Clustering analysis Graph model Document categorisation
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号