中文词聚类研究 Research on Chinese Word Clustering期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

中文词聚类研究

引用本文：	胡和平,曾庆锐,路松峰.中文词聚类研究[J].计算机工程与科学,2006,28(1):122-124.

作者姓名：	胡和平曾庆锐路松峰

作者单位：	华中科技大学计算机科学与技术学院,湖北,武汉,430074

摘要：	词聚类是语言自动处理中一个重要的基础环节。针对中文词聚类研究中训练数据缺乏、质量不高而影响聚类效果这一主要障碍，本文提出一种面向中文的词聚类算法，算法以词的上下文分布相似度作距离量度；然后分析了仪依据距离量度进行中文词聚类的缺陷，提出词的临近空间概念，并根据词的临近空间概念进行聚类，使得在不用指定类的数数目与大小的情况下，依靠词的内在语义进行聚类；最后，算法再将聚类结果作为计算相似度的依据，进行EM迭代聚类，使聚类结果得到明显优化。实验证明，算法有效地克服了中文训练数据的数量和质量问题，聚类结果好。
关键词：	中文词词聚类词的临近空间 EM算法
文章编号：	1007-130X(2006)01-0122-03
修稿时间：	2004年11月12
Research on Chinese Word Clustering

HU He-ping,ZENG Qing-rui,LU Song-feng.Research on Chinese Word Clustering[J].Computer Engineering & Science,2006,28(1):122-124.

Authors:	HU He-ping ZENG Qing-rui LU Song-feng

Abstract:	Word clustering is an important fundamental work of automatic language process.Point to dearth of training data and low quality of training data,which is the main obstacle of Chinese word clustering,a Chinese oriented algorithm is presented in this paper.First,the context similar degree of a word is used as the distance measure of the word;second,the limitation of taking the distance measure only into account is analyzed;then,the concept of Word-Near-Space is put forward,which can make word clustering work without allocating the total class number.Finally,according to the class which is the result of clustering,we calculate the context similar degree,and repeat the above steps until the whole algorithm converges,so that it is consistent with the EM criterion.Experiments show that the algorithm effectively conquers the two main obstacles of Chinese word clustering,and brings about good clustering results.

Keywords:	Chinese word clustering Word-Near-Space EM algorithm
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程与科学》浏览原始摘要信息
	点击此处可从《计算机工程与科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏