融合全局词语边界特征的中文命名实体识别方法 Chinese Named Entity RecognitionIncorporatingGlobal Word Boundary Features期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

融合全局词语边界特征的中文命名实体识别方法

引用本文：	刘冰洋,伍大勇,刘欣然,程学旗.融合全局词语边界特征的中文命名实体识别方法[J].中文信息学报,2017,31(2):86-91.

作者姓名：	刘冰洋伍大勇刘欣然程学旗

作者单位：	1.中国科学院计算技术研究所网络数据科学与工程研究中心,北京 100190; 2.中国科学院大学,北京 100190; 3.国家计算机网络应急技术处理协调中心,北京 100029

基金项目：	国家自然科学基金(61232010,61100083);国家973课题(2012CB316303);国家863课题(2012AA011003);国家科技支撑计划(2012BAH46B04);国家安全专项(2013A140)

摘要：	目前在中文命名实体识别的任务中经常采用有监督的字序列标注模型。我们在实际应用中发现,基于字序列标注模型的中文命名实体识别模型对于词语边界的识别错误是影响识别效果的主要因素之一,边界错误平均占错误结果中的47.5%。该文通过在平均感知机模型中引入全局的词语边界特征,使得人名、地名、机构名识别的F值平均提升了0.04并降低了边界错误占错误结果的比例。
关键词：	命名实体识别字序列标注全局特征词语边界特征
Chinese Named Entity RecognitionIncorporatingGlobal Word Boundary Features

LIU Bingyang,WU Dayong,LIU Xinran,CHENG Xueqi.Chinese Named Entity RecognitionIncorporatingGlobal Word Boundary Features[J].Journal of Chinese Information Processing,2017,31(2):86-91.

Authors:	LIU Bingyang WU Dayong LIU Xinran CHENG Xueqi

Affiliation:	Research Center of Web Data & Engineering, Institute of Computing Technology, Beijing 100190, China; Graduate University of Chinese Academy of Sciences, Beijing 100190, China; CNCERT/CC, Beijing 100029, China

Abstract:	Supervised character sequence labeling model is a popular method in Chinese named entity recognition(NER) task. It is found in practice suffering from word boundary error, covering roughly 47.5% of all errors. This paper incorporates global words boundary features in averaged perceptron model. Experiments indicate that the F value of recognizing people name, location names and organization names is improved by 0.04, reducing the proportion of boundary errors in overall errors.

Keywords:	named entity recognition sequence labeling global feature word boundary feature

	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏