首页 | 本学科首页   官方微博 | 高级检索  
     

汉语委婉语语言资源建设
引用本文:张辰麟,王明文,谭亦鸣,肖文艳.汉语委婉语语言资源建设[J].中文信息学报,1986,34(8):32-40.
作者姓名:张辰麟  王明文  谭亦鸣  肖文艳
作者单位:1.江西师范大学 计算机信息工程学院,江西 南昌 330022;
2.东南大学 网络空间安全学院,江苏 南京 211189
基金项目:国家自然科学基金(61876074)
摘    要:委婉语是语言交流中不可或缺的交际手段,委婉语研究一直是语言学界的热门话题之一,但在自然语言处理领域,尚未有委婉语相关研究。该文借助现有纸质词典,基于语料库检索和专家人工判别的方式,初步构建了规模为63 000余条语料的汉语委婉语语言资源;并根据自然语言处理的相关任务需求,结合词典释义对委婉语进行分类。该文提出了利用同类委婉语的上下文语境辅助进行标注的方法。经过实验,对简单语义委婉语的语义判别准确率达89.71%,对语义复杂的兼类委婉语的语义判别准确率达74.65%,初步验证了利用计算机辅助人工标注构建委婉语语言资源的可行性。

关 键 词:委婉语  语义辨析  语言资源构建  

Construction of Chinese Euphemism Resources
ZHANG Chenlin,WANG Mingwen,TAN Yiming,XIAO Wenyan.Construction of Chinese Euphemism Resources[J].Journal of Chinese Information Processing,1986,34(8):32-40.
Authors:ZHANG Chenlin  WANG Mingwen  TAN Yiming  XIAO Wenyan
Affiliation:1.School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, Jiangxi 330022, China;
2.School of Cyberspace Security, Southeast University, Nanjing, Jiangsu 211189, China
Abstract:Euphemism is an indispensable method of language communication. It has always been one of the hottest issue in linguistics. However, this issues is hardly addressed in natural language processing community. In this paper, a corpus of euphemism (about 63,000 sentences) is collected and identified manually, with a reference to existing dictionaries. According to the dictionaries’ definition and the requirements of the related natural language processing work, euphemisms are classified at the semantic level. With the collected corpus and classification, we attempted to identify polysemous euphemisms automatically and achieved an accuracy of 89.71% for simple euphemisms and 74.65% for complex ones.
Keywords:euphemism  semantic discrimination  language resource construction  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号