首页 | 本学科首页   官方微博 | 高级检索  
     

基于词典信息的先秦汉语全文词义标注方法研究
引用本文:张颖杰,李斌,陈家骏,陈小荷.基于词典信息的先秦汉语全文词义标注方法研究[J].中文信息学报,2012,26(3):65-72.
作者姓名:张颖杰  李斌  陈家骏  陈小荷
作者单位:1. 南京大学 计算机软件新技术国家重点实验室,江苏 南京 210093;
2. 南京师范大学 语言信息科技研究中心,江苏 南京 210097
基金项目:先秦文献词汇知识挖掘资助项目(2010JDXM023);211项目“先秦汉语词汇统计与知识检索”;国家社会科学基金资助项目(10&ZD117,10CYY021,08BYY054)
摘    要:词义消歧是自然语言处理中的一项基础任务,古汉语信息处理也急需深层次的语义标注工作。该文针对先秦古汉语这一特殊的语言材料,在训练语料和语义资源匮乏的条件下,采用《汉语大词典2.0》作为知识来源,将其词条释义作为义类,每个义项的例句作为训练语料,使用基于支持向量机(SVM)的半指导方法对《左传》进行全文的词义标注。按照频度不同、义项数量不同的原则,我们随机选取了22个词进行了人工检查,平均正确率达到67%。该方法可以广泛用于缺乏训练语料的古汉语义项标注工作,能够在古汉语全文词义标注的起步阶段提供初始结果,为人工标注词语义项提供良好的数据底本,补正传统词典释义不全的问题,进一步丰富汉语史发展研究资料。

关 键 词:词义消歧  义项标注  古汉语  自然语言处理  

A Study in Dictionary-Based All-word Word Sense Disambiguation for Pre-Qin Chinese
ZHANG Yingjie , LI Bin , CHEN Jiajun , CHEN Xiaohe.A Study in Dictionary-Based All-word Word Sense Disambiguation for Pre-Qin Chinese[J].Journal of Chinese Information Processing,2012,26(3):65-72.
Authors:ZHANG Yingjie  LI Bin  CHEN Jiajun  CHEN Xiaohe
Affiliation:1. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu 210093, China;
2. Research Center for Language Informatics, Nanjing Normal University, Nanjing, Jiangsu 210097, China
Abstract:Word Sense Disambiguation(WSD) is a basic task of Natural Language Processing,including the processing of ancient Chinese documents.In this paper we focuse on the specific field of analyzing pre-Qin ancient Chinese documents.Considering the shortage of training data and semantic resources,we employe a semi-supervised machine learning method to perform all-word WSD of Zuo Zhuan and use Chinese Dictionary v2.0 as the knowledge resource.We randomly selecte 22 words of different frequency and sense number to evaluate the proposed method.On the selected words,our method achieves an average accuracy of 67%,which is significant higher than the baseline method of selecting the most frequent sense.This method is promising for sense tagging of ancient Chinese documents when there is no training data available.It also provides a raw sense tagging result for human correction,enriching traditional dictionaries which usually suffer from insufficient word sense entries.
Keywords:word sense disambiguation  sense tagging  ancient Chinese  natural language processing
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号