一种基于可信度的人名识别方法 Recognition of Person Names Based on Reliability期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于可信度的人名识别方法

引用本文：	罗智勇,宋柔.一种基于可信度的人名识别方法[J].中文信息学报,2005,19(3):68-72,86.

作者姓名：	罗智勇宋柔

作者单位：	1. 北京工业大学计算机学院,北京　100022 ; 2. 北京语言大学语言信息处理研究所,北京　100083

基金项目：	国家自然科学基金，国家高技术研究发展计划(863计划)，教育部科学技术基金

摘要：	专名识别技术是影响中文自动分词精度的一个重要方面,也是自动分词技术的难点之一。本文以人名识别为例,分析了目前流行的基于语料库和统计语言模型的专名识别方法中在概率估值问题上存在的弊端;同时在规则和统计相结合的基础上,提出了一种基于可信度的人名识别方法,并给出了一个渐进式模型训练方法,克服了人工标注语料库规模的限制。从我们对《人民日报》1998 年1 月、2000 年12 月(共约379 万字) 语料的测试结果来看,基于可信度的人名识别方法比传统的概率估值方法识别效果有一定的提高。
关键词：	计算机应用中文信息处理自动分词人名识别统计方法可信度
文章编号：	1003-0077(2005)03-0067-06
Recognition of Person Names Based on Reliability

LUO Zhi-yong,Song Rou.Recognition of Person Names Based on Reliability[J].Journal of Chinese Information Processing,2005,19(3):68-72,86.

Authors:	LUO Zhi-yong Song Rou

Affiliation:	1.College of Computer Science , Beijing University of Technology , Beijing 100022 ,China ;2.College of information Science , Beijing Language and Culture University , Beijing 100083 ,China

Abstract:	Recognition of proper noun is one of the most important parts in word segmentation system in modern Chinese. This paper firstly analyzes the shortcomings of traditional proper noun recognition method in statistical language models and other corpus-based models. Secondly, we put forward a recognition strategy of person names based on reliability. We also train the model with a bootstrapping method without the limit of manually tagged corpus. Large-scale test on real corpus shows that this method successfully resolves the problem of mis-estimate of candidate proper nouns in traditional methods. In addition, our method is comparable to traditional corpus-based method.

Keywords:	computer application Chinese information processing word segmentation recognition of person-names statistical method reliability
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏