首页 | 本学科首页   官方微博 | 高级检索  
     


Generating Chinese named entity data from parallel corpora
Authors:Ruiji Fu  Bing Qin  Ting Liu
Affiliation:School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Abstract:Annotating named entity recognition (NER) training corpora is a costly but necessary process for supervised NER approaches. This paper presents a general framework to generate large-scale NER training data from parallel corpora. In our method, we first employ a high performance NER system on one side of a bilingual corpus. Then, we project the named entity (NE) labels to the other side according to the word level alignments. Finally, we propose several strategies to select high-quality auto-labeled NER training data. We apply our approach to Chinese NER using an English-Chinese parallel corpus. Experimental results show that our approach can collect high-quality labeled data and can help improve Chinese NER.
Keywords:named entity recognition  Chinese named entity  training data generating  parallel corpora  
本文献已被 SpringerLink 等数据库收录!
点击此处可从《Frontiers of Computer Science》浏览原始摘要信息
点击此处可从《Frontiers of Computer Science》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号