一种新词检测方法研究 Research on new word detection期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种新词检测方法研究

引用本文：	钟将,耿升华,董高峰.一种新词检测方法研究[J].数字通信,2013(2):1-5.

作者姓名：	钟将耿升华董高峰

作者单位：	重庆大学计算机学院，重庆400030

基金项目：	国家自然科学基金青年基金资助项目(61103114)

摘要：	汉语自动分词是进行中文信息处理的基础。目前,困扰汉语自动分词的一个主要难题就是新词自动识别,尤其是非专名新词的自动识别。同时,新词自动识别对于汉语词典的编纂也有着极为重要的意义。提出一种新的未登录词识别方法,该方法混合了互信息、信息熵及词频等3个评价指标评价新词,并在此基础上添加了垃圾串过滤机制,大幅度提高了新词识别准确率和召回率。
关键词：	新词互信息信息熵词频垃圾串
Research on new word detection

ZHONG Jiang,GENG Shenghu,DONG Gaofeng.Research on new word detection[J].Digital Communication,2013(2):1-5.

Authors:	ZHONG Jiang GENG Shenghu DONG Gaofeng

Affiliation:	(College of Computer Science, Chongqing University, Chongqing 400030,P.R. China)

Abstract:	Chinese automatic segmentation is the basis of Chinese information processing. Currently, a major problem that has plagued Chinese automatic segmentation is new words identity. At the same time, it is important for the compilation of Chinese dictionary. This paper presents a new method for new word identify. It includes three parameters such as mutual information, entropy of information, word frequency and also includes garbage string filtering mechanism. It improves the accuracy and the recall rate of new words greatly.

Keywords:	new word mutual information entropy of information word frequency garbage string
本文献已被 CNKI 等数据库收录！
	点击此处可从《数字通信》浏览原始摘要信息
	点击此处可从《数字通信》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏