基于边界距离的多向量文本聚类方法 Border distance based multi-vector document clustering method期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于边界距离的多向量文本聚类方法

引用本文：	蔡东风,王智超,季铎,张桂平.基于边界距离的多向量文本聚类方法[J].计算机工程与应用,2008,44(3):198-201.

作者姓名：	蔡东风王智超季铎张桂平

作者单位：	沈阳航空工业学院,自然语言处理研究室,沈阳,110034

基金项目：	国家高技术研究发展计划(863计划) , 教育部科学技术研究重点项目

摘要：	文本聚类是自然语言处理中的一项重要研究课题,主要应用于信息检索和Web挖掘等领域。其中的关键是文本的表示和聚类算法。在层次聚类的基础上,提出了一种新的基于边界距离的层次聚类算法,该方法通过选择两个类间边缘样本点的距离作为类间距离,有效地利用类的边界信息,提高类间距离计算的准确性。综合考虑不同词性特征对文本的贡献,采用多向量模型对文本进行表示。不同文本集上的实验表明,基于边界距离的多向量文本聚类算法取得了较好的性能。
关键词：	距离计算文本表示多向量文本聚类
文章编号：	1002-8331(2008)03-0198-04
修稿时间：	2007年8月1日
Border distance based multi-vector document clustering method

CAI Dong-feng,WANG Zhi-chao,JI Duo,ZHANG Gui-ping.Border distance based multi-vector document clustering method[J].Computer Engineering and Applications,2008,44(3):198-201.

Authors:	CAI Dong-feng WANG Zhi-chao JI Duo ZHANG Gui-ping

Affiliation:	Natural Language Processing Research Laboratory，Shenyang Institute of Aeronautical Engineering，Shenyang 110034，China

Abstract:	Document clustering is an important task of natural language processing and is widely applicable in areas such as information retrieval and web mining.The representation of document and the clustering algorithm are the key issues of document clustering.In order to improve the precision of distance calculation,this paper put forward a novel border distance based document clustering approach,which chooses the average of distances between documents at the border of different clusters as the similarity between this pairwise of clusters and takes advantage of the border information of the clusters.Considering the contribution of different kinds of terms,documents are represented by multi-vector.Experimental results of different corpus have shown that the proposed approach outperforms other widely used hierarchical clustering methods.

Keywords:	distance computation document representation multi-vector document clustering
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏