首页 | 本学科首页   官方微博 | 高级检索  
     

基于多级阈值的中文人名识别
引用本文:余祖波,高庆狮,马建军. 基于多级阈值的中文人名识别[J]. 计算机工程与应用, 2007, 43(33): 1-3
作者姓名:余祖波  高庆狮  马建军
作者单位:大连理工大学,计算机科学与工程系,辽宁,大连,116023;大连理工大学,计算机科学与工程系,辽宁,大连,116023;北京科技大学,智能、语言与计算机科学研究所,北京,100083
基金项目:国家高技术研究发展计划(863计划)
摘    要:在对大规模姓名样本库统计的基础上,研究了各种中文人名的姓氏、名字用字规律,并通过对大规模语料库的统计分析,得到了每个姓氏用字在真实文本中用作真实姓氏的概率及其上下文规律;针对汉族人名和少数民族人名及音译人名,分别提出了多级姓氏阈值和多级首字阈值的概念,并使用3σ法则确定阈值。实验结果表明,基于多级阈值的中文人名识别模型是有效的。

关 键 词:自然语言处理  未登录词识别  中文人名识别  多级阈值  3σ法则
文章编号:1002-8331(2007)33-0001-03
修稿时间:2007-08-01

Chinese personal name recognition based on multilevel threshold
YU Zu-bo,GAO Qing-shi,MA Jian-jun. Chinese personal name recognition based on multilevel threshold[J]. Computer Engineering and Applications, 2007, 43(33): 1-3
Authors:YU Zu-bo  GAO Qing-shi  MA Jian-jun
Affiliation:1.Department of Computer Science and Engineering,Dalian University of Technology,Dalian,Liaoning 116023,China 2.Institute of Intelligence,Linguistics and Computer Science,University of Science and Technology Beijing,Beijing 100083,China
Abstract:This paper presents the rules of surname words and name words of all kinds of Chinese personal names based on a large scale personal names base.It also shows the probability of all surname words being a surname and their contexts rules by making a statistics on a large scale corpus.In allusion to personal names of Chinese Han Nationality,multilevel threshold of surname is proposed.In order to recognize personal names of Chinese minority nationalities and transliterated personal names,it proposes multilevel threshold of the first word of personal name as well.And these thresholds are chosen by 3σ rule.The results show that the model of multilevel threshold is effective in recognizing Chinese personal names.
Keywords:natural language processing  unknown words recognition  Chinese personal name recognition  multilevel threshold  3σ rule
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号