首页 | 本学科首页   官方微博 | 高级检索  
     

统计和规则相结合的汉语最长名词短语自动识别
引用本文:代翠,周俏丽,蔡东风,杨洁.统计和规则相结合的汉语最长名词短语自动识别[J].中文信息学报,2008,22(6):110-115.
作者姓名:代翠  周俏丽  蔡东风  杨洁
作者单位:沈阳航空工业学院 知识工程中心,辽宁 沈阳 110034
基金项目:国家高技术研究发展计划(863计划),国家高技术研究发展计划(863计划)
摘    要:在分析汉语最长名词短语特点的基础上,提出了一种统计和规则相结合的汉语最长名词短语自动识别方法 通过实验词及词性的不同组合选择特征集合,基于该特征训练得到条件随机场(CRF)识别模型;分析错误识别结果,结合最长名词短语的边界信息和内部结构信息构建规则库对识别结果进行后处理,弥补了机器学习模型获取知识不够全面的不足。实验结果表明,用统计和规则相结合的方法识别最长名词短语是有效的,系统开放测试结果F值达到了90.2%。

关 键 词:计算机应用  中文信息处理  条件随机场  最长名词短语  基于规则的后处理  

Automatic Identification of Chinese Maximum Noun Phrase Based on Statistics and Rules
DAI Cui,ZHOU Qiao-li,CAI Dong-feng,YANG Jie.Automatic Identification of Chinese Maximum Noun Phrase Based on Statistics and Rules[J].Journal of Chinese Information Processing,2008,22(6):110-115.
Authors:DAI Cui  ZHOU Qiao-li  CAI Dong-feng  YANG Jie
Affiliation:Knowledge Engineering Research Center, Shenyang Institute of Aeronautical
Engineering, Shenyang, Liaoning 110034, China
Abstract:By analyzing the characteristics of Chinese maximum noun phrase,the research proposes an automatic identification method of Chinese maximum noun phrase based on statistics and rules.Firstly,the feature set is empirically extracted by the combination of word and part of speech,and a conditional random fields(CRF) model is established for automatic identification.Then a rule base is constructed according to the boundary information and inner structure knowledge of maximum noun phrase for a post-processing module.The experimental results show the method is efficient for identifying Chinese maximum noun phrase,with a 90.2 % F-score in the open test.
Keywords:computer application  Chinese information processing  conditional random fields  maximum noun phrase  rule based post-processing
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号