首页 | 本学科首页   官方微博 | 高级检索  
     

基于中文人名用字特征的性别判定方法
引用本文:于江德,赵红丹,郑勃举,余正涛.基于中文人名用字特征的性别判定方法[J].山东大学学报(工学版),2014,44(1):13-18.
作者姓名:于江德  赵红丹  郑勃举  余正涛
作者单位:1. 安阳师范学院计算机与信息工程学院, 河南 安阳 455000;2. 昆明理工大学信息工程与自动化学院, 云南 昆明 650051
基金项目:国家自然科学基金资助项目(60863011);河南省基础与前沿技术研究计划资助项目(112300410182)
摘    要:基于中文人名用字具有的较强的性别区分性,提出一种利用朴素贝叶斯分类器对中文人名性别进行判定的方法,该方法将每个中文人名中的第一个字(字1)、第二个字(字2)、第一和第二个字组合(字1字2)作为区分特征,利用朴素贝叶斯分类方法对该人名所属性别进行判定。在412775个中文人名语料上采用10重交叉验证法进行训练和测试,对比了依据不同区分特征组合进行性别判定的准确率,分别采用字1,字2,字1+字2,字1+字1字2,字2+字1字2,字1+字2+字1字2(全部区分特征)构成的特征组合进行性别判定,平均判定准确率分别为72.75%,86.92%,88.84%,87.37%,89.35%,90.06%,取得的最好平均判定准确率为90.06%。

关 键 词:朴素贝叶斯分类  中文人名  用字特征  特征组合  性别判定  区分特征  
收稿时间:2013-06-28

A method of gender discrimination based on character feature of Chinese names
YU Jiang-de,ZHAO Hong-dan,ZHENG Bo-ju,YU Zheng-tao.A method of gender discrimination based on character feature of Chinese names[J].Journal of Shandong University of Technology,2014,44(1):13-18.
Authors:YU Jiang-de  ZHAO Hong-dan  ZHENG Bo-ju  YU Zheng-tao
Affiliation:1. School of Computer and Information Engineering, Anyang Normal University, Anyang   455000, China;2. School of Information Engineering and Automation, Kunming University of Science and Technology,  Kunming 650051, China
Abstract:Based on the strong gender discrimination of Chinese names, a method of gender discrimination based on character feature of Chinese names using nave Bayes classifier was presented. In this method, the first character of each Chinese name (Zi1), the second character (Zi2), the first and the second characters (Zi1Zi2) were regarded as distinguishing features. The nave Bayes classification method was used for gender discrimination of Chinese names. Training and testing were done on 412775 Chinese names corpus using 10 fold cross validation method, and comparative experiments were done according to the different feature combinations, they were Zi1, Zi2, Zi1+Zi2, Zi1+Zi1Zi2, Zi2+Zi1Zi2, Zi1+Zi2+Zi1Zi2(all the distinguishing features). The average accuracy were as followings in turn, 72.75%,86.92%, 88.84%, 87.37%, 89.35%, 90.06%, of which the best average accuracy was 90.06%.
Keywords:character feature  gender discrimination  feature combination  distinguishing feature  Chinese names  nave Bayes classification  
本文献已被 CNKI 等数据库收录!
点击此处可从《山东大学学报(工学版)》浏览原始摘要信息
点击此处可从《山东大学学报(工学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号