统计和规则相结合的汉语最长名词短语自动识别 Automatic Identification of Chinese Maximum Noun Phrase Based on Statistics and Rules期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

统计和规则相结合的汉语最长名词短语自动识别

引用本文：	代翠,周俏丽,蔡东风,杨洁.统计和规则相结合的汉语最长名词短语自动识别[J].中文信息学报,2008,22(6):110-115.

作者姓名：	代翠周俏丽蔡东风杨洁

作者单位：	沈阳航空工业学院知识工程中心,辽宁沈阳 110034

基金项目：	国家高技术研究发展计划(863计划)，国家高技术研究发展计划(863计划)

摘要：	在分析汉语最长名词短语特点的基础上,提出了一种统计和规则相结合的汉语最长名词短语自动识别方法通过实验词及词性的不同组合选择特征集合,基于该特征训练得到条件随机场(CRF)识别模型;分析错误识别结果,结合最长名词短语的边界信息和内部结构信息构建规则库对识别结果进行后处理,弥补了机器学习模型获取知识不够全面的不足。实验结果表明,用统计和规则相结合的方法识别最长名词短语是有效的,系统开放测试结果F值达到了90.2%。
关键词：	计算机应用中文信息处理条件随机场最长名词短语基于规则的后处理
Automatic Identification of Chinese Maximum Noun Phrase Based on Statistics and Rules

DAI Cui,ZHOU Qiao-li,CAI Dong-feng,YANG Jie.Automatic Identification of Chinese Maximum Noun Phrase Based on Statistics and Rules[J].Journal of Chinese Information Processing,2008,22(6):110-115.

Authors:	DAI Cui ZHOU Qiao-li CAI Dong-feng YANG Jie

Affiliation:	Knowledge Engineering Research Center, Shenyang Institute of Aeronautical Engineering, Shenyang, Liaoning 110034, China

Abstract:	By analyzing the characteristics of Chinese maximum noun phrase,the research proposes an automatic identification method of Chinese maximum noun phrase based on statistics and rules.Firstly,the feature set is empirically extracted by the combination of word and part of speech,and a conditional random fields(CRF) model is established for automatic identification.Then a rule base is constructed according to the boundary information and inner structure knowledge of maximum noun phrase for a post-processing module.The experimental results show the method is efficient for identifying Chinese maximum noun phrase,with a 90.2 % F-score in the open test.

Keywords:	computer application Chinese information processing conditional random fields maximum noun phrase rule based post-processing
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏