古汉语双字词自动获取方法的比较与分析 A Comparative Study on the Automatic Extraction of Two-character Word from Ancient Chinese期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

古汉语双字词自动获取方法的比较与分析

引用本文：	段磊,韩芳,宋继华.古汉语双字词自动获取方法的比较与分析[J].中文信息学报,2012,26(4):34-43.

作者姓名：	段磊韩芳宋继华

作者单位：	北京师范大学计算机科学与技术学院,北京 100875

摘要：	词汇的自动获取在自然语言生成、计算词典编纂、句法分析以及语料库语言学等领域均有着重要的研究价值。该文针对古汉语双字词的自动获取问题,以《史记》全文语料为例,分别应用基于频率、互信息、假设检验的统计方法获取古汉语双字词,并结合人工标注结果进行了详细的比较和分析,评价了各方法的优缺点及可靠性,为不同应用背景下的古汉语双字词自动获取提供了相应的解决方案。
关键词：	中文信息处理古汉语史记双字词统计模型
A Comparative Study on the Automatic Extraction of Two-character Word from Ancient Chinese

DUAN Lei , HAN Fang , SONG Jihua.A Comparative Study on the Automatic Extraction of Two-character Word from Ancient Chinese[J].Journal of Chinese Information Processing,2012,26(4):34-43.

Authors:	DUAN Lei HAN Fang SONG Jihua

Affiliation:	College of Computer Science, Beijing Normal University, Beijing 100875, China

Abstract:	Word extraction is of great importance in the research fields of natural language generation, computational lexicography, parsing, corpus linguistic, etc. To address the issue of automatic extraction of two-character word from ancient Chinese, this paper takes the “Records of the Grand Historian” corpus as an example, and uses the statistical methods that based on frequency, mutual information and hypothesis testing to extract two-character word, respectively. Then it compares and analyzes the results according to the manual marked result in detail. It paves the way for the scheme design for the two-character word extraction from ancient Chinese in different applications. Key wordsChinese information processing; Ancient Chinese; Records of the Grand Historian; two-character word; statistical model

Keywords:	Chinese information processing Ancient Chinese Records of the Grand Historian two-character word statistical model
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏