中文新词识别技术综述 Survey of Chinese New Words Identification期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

中文新词识别技术综述

引用本文：	张海军,史树敏,朱朝勇,黄河燕.中文新词识别技术综述[J].计算机科学,2010,37(3):6-10.

作者姓名：	张海军史树敏朱朝勇黄河燕

作者单位：	1. 中国科学技术大学计算机科学与技术学院,合肥,230027;新疆师范大学计算机系,乌鲁木齐,830054 2. 中国科学院计算机语言信息工程研究中心,北京,100097 3. 中国科学技术大学计算机科学与技术学院,合肥,230027

基金项目：	国家自然科学基金项目(60672149);;国家863计划重点项目(2006AA010109)资助

摘要：	新词识别是中文信息处理领域的关键技术。新词识别主要包括候选字串的提取过滤和词性猜测两项任务。中文没有特定符号标志词边界,因此任何相邻字符都有成词的可能性,这给新词提取过滤带来了很大困难;由于没有先验知识和统计数据,新词词性猜测一直是中文词性标注的技术瓶颈。详细分析了中文新词识别技术的研究现状,重点讨论了候选新词提取和词性猜测的研究方法与存在的主要问题,最后对新词识别研究方向进行了展望。
关键词：	新词识别未登录词候选字串训练语料词性猜测
收稿时间：	2009/4/30 0:00:00
修稿时间：	2009/7/24 0:00:00
Survey of Chinese New Words Identification

ZHANG Hai-jun,SHI Shu-min,ZHU Chao-yong,HUANG He-yan.Survey of Chinese New Words Identification[J].Computer Science,2010,37(3):6-10.

Authors:	ZHANG Hai-jun SHI Shu-min ZHU Chao-yong HUANG He-yan

Affiliation:	School of Computer Science and Technology/a>;University of Science and Technology of China/a>;Hefei 230027/a>;China;Research Center of Computer and Language Information Engineering/a>;Chinese Academy of Sciences/a>;Beijing 100097/a>;China;Department of Computer Science and Technology/a>;Xinjiang Normal University/a>;Urumqi 830054/a>;China

Abstract:	New Words Identification (NWI) is a key technology in the field of Chinese information processing.NWI mainly includes two tasks:one is new words candidate extracting and filtering,the other is new words POS guessing.Since there is no specific symbol to mark word boundary for Chinese words,any adjacent characters are possible to compose a word,which brings a lot of obstacles for NWI.Moreover,because the prior knowledge and statistical data are not available,new words POS guessing has become the technological...

Keywords:	New words Identification Unknown words Candidate string Training corpus POS guessing
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《计算机科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏