一种基于词共现的文档聚类算法 Document Clustering Algorithm Based on Word Co-occurrence期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于词共现的文档聚类算法

引用本文：	常鹏,冯楠,马辉.一种基于词共现的文档聚类算法[J].计算机工程,2012,38(2):213-214.

作者姓名：	常鹏冯楠马辉

作者单位：	1. 天津大学管理与经济学部,天津300072;天津大学信息与网络中心,天津300072 2. 天津大学管理与经济学部,天津,300072 3. 天津城市建设学院管理工程系,天津,300384

基金项目：	国家自然科学基金资助项目（70901054）

摘要：	为解决文本主题表达存在的信息缺失问题，提出一种基于词共现的文档聚类算法。利用文档集上的频繁共现词建立文档主题向量表示模型，将其应用于层次聚类算法中，并通过聚类熵寻找最优的层次划分，从而准确反映文档之间的主题相关关系。实验结果表明，该算法所获得的结果优于其他基于短语的文档层次聚类算法。
关键词：	文档聚类文档模型词共现文档相似度聚类增益
收稿时间：	2011-07-05
Document Clustering Algorithm Based on Word Co-occurrence

CHANG Peng , FENG Nan , MA Hui.Document Clustering Algorithm Based on Word Co-occurrence[J].Computer Engineering,2012,38(2):213-214.

Authors:	CHANG Peng FENG Nan MA Hui

Affiliation:	1a.School of Management;1b.Information and Network Center,Tianjin University,Tianjin 300072,China;2.Department of Management Engineering,Tianjin Institute of Urban Construction,Tianjin 300384,China)

Abstract:	This paper presents a document clustering algorithm based on word co-occurrence to solve the problem about information deletion of text subject expression.It uses the word co-occurrence of document set to establish the document theme vector presentation model,and applies to the hierarchical clustering algorithm,through the clustering entropy to find the best level partition,and accurately reflects the relationship between documents＇ theme.Experimental results show that the algorithm results is better than other phrases document hierarchical clustering algorithm.

Keywords:	document clustering document model word co-occurrence document similarity clustering gain
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏