基于词聚类特征的统计中文组块分析模型 Statistical Chinese Chunking Model Based on Word Clustering Features期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于词聚类特征的统计中文组块分析模型

引用本文：	孙广路,王晓龙,刘秉权,关毅.基于词聚类特征的统计中文组块分析模型[J].电子学报,2008,36(12):2450-2453.

作者姓名：	孙广路王晓龙刘秉权关毅

作者单位：	哈尔滨工业大学计算机科学与技术学院,黑龙江,哈尔滨,150001;哈尔滨理工大学计算机科学与技术学院,黑龙江,哈尔滨,150080;哈尔滨工业大学计算机科学与技术学院,黑龙江,哈尔滨,150001

基金项目：	国家自然科学基金(No.60435020No.60673037); 国家863项目(No.2006AA01Z197No.2007AA01Z172)

摘要：	提出了一种基于信息熵的层次词聚类算法,并将该算法产生的词簇作为特征应用到中文组块分析模型中.词聚类算法基于信息熵的理论,利用中文组块语料库中的词及其组块标记作为基本信息,采用二元层次聚类的方法形成具有一定句法功能的词簇.在聚类过程中,设计了优化算法节省聚类时间.用词簇特征代替传统的词性特征应用到组块分析模型中,并引入名实体和仿词识别模块,在此基础上构建了基于最大熵马尔科夫模型的中文组块分析系统.实验表明,本文的算法提升了聚类效率,产生的词簇特征有效地改进了中文组块分析系统的性能.
关键词：	词聚类信息熵中文组块分析句法功能
收稿时间：	2007-06-04
Statistical Chinese Chunking Model Based on Word Clustering Features

SUN Guang-lu,WANG Xiao-long,LIU Bing-quan,GUAN Yi.Statistical Chinese Chunking Model Based on Word Clustering Features[J].Acta Electronica Sinica,2008,36(12):2450-2453.

Authors:	SUN Guang-lu WANG Xiao-long LIU Bing-quan GUAN Yi

Affiliation:	1. School of Computer Science and Technology,Harbin Institute of Technology,Harbin,Heilongjiang 150001,China;2. School of Computer Science and Technology,Harbin University of Science and Technology,Harbin,Heilongjiang 150080,China

Abstract:	An entropy-based hierarchical word clustering algorithm is proposed.Word clusters generated by the clustering algorithm were used as features in Chinese chunking model.Based on words' chunk tags and the theory of entropy,a binary hierarchical clustering algorithm was applied to the words in Chinese chunking corpus.An accelerating algorithm was employed to save the clustering time.With the recognition of name entity and factoid,the new Chinese chunking system was constructed based on maximum entropy Markov m...

Keywords:	word clustering information entropy Chinese chunking syntactic function
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《电子学报》浏览原始摘要信息
	点击此处可从《电子学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏