首页 | 本学科首页   官方微博 | 高级检索  

引用本文:熊 丹,陆 勤,罗凤珠,石定栩,赵天成.基于语料库的明清小说人名与称谓研究[J].中文信息学报,2015,29(1):19-27.
作者姓名:熊 丹  陆 勤  罗凤珠  石定栩  赵天成
作者单位:1.香港理工大学 电子计算学系,香港;
2. 台湾元智大学 中国语文学系,台湾;
3. 香港理工大学 中文及双语学系,香港
摘    要:在自然语言处理及其应用领域,人名和称谓作为重要的命名实体,是信息处理的关键部分之一。该文从命名实体识别和资讯提取的角度出发,在对4部明清古典小说的语料库进行标注的前提下,建构了姓名、字号和称谓作为命名实体的分类及标注系统。人名和称谓总体上分为单一型和复合型,根据复合型的内部组成元素和组合方式,将其进一步分为固定式、同位式、附属嵌套式、灵活嵌套式。结合语料库的完整数据统计,该文对各类型人名和称谓进行了比较分析,并分别展示了4部名著在人名、称谓使用上的特点。

关 键 词:命名实体标注  人名和称谓分类  语料库构建  

A Corpus-Based Study on Personal Names and Terms of Address in Chinese Classical Novels
XIONG Dan,LU Qin,LUO Fengzhu,SHI Dingxu,ZHAO Tiancheng.A Corpus-Based Study on Personal Names and Terms of Address in Chinese Classical Novels[J].Journal of Chinese Information Processing,2015,29(1):19-27.
Authors:XIONG Dan  LU Qin  LUO Fengzhu  SHI Dingxu  ZHAO Tiancheng
Affiliation:1. Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China;
2. Department of Chinese Linguistics & Literature, Yuan Ze University, Taiwan, China;
3. Department of Chinese & Bilingual Studies, Hong Kong Polytechnic University, Hong Kong, China
Abstract:Personal names and terms of address are important parts of named entities. The recognition of personal names as well as terms of address is ans essential issue in natural language processing. This paper presents a classification and annotation scheme for personal names and terms of address from the perspective of named entity recognition and information extraction on a corpus of four Chinese classical novels. Personal names and terms of address are categorized into simple types and compound types. And the compound-type is further categorized into four subtypes, fixed expressions, appositive constructions, subordinate constructions of affiliation, and other subordinate constructions. This paper also presents a comparative analysis on these types and the characteristics of the four novels based on full statistics of the annotated corpus.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号