首页 | 本学科首页   官方微博 | 高级检索  
     

基于频繁依存子树模式的中心词提取方法研究
引用本文:田卫东,虞勇勇.基于频繁依存子树模式的中心词提取方法研究[J].中文信息学报,2016,30(3):133-142.
作者姓名:田卫东  虞勇勇
作者单位:合肥工业大学 计算机与信息学院,安徽 合肥 230009
基金项目:国家863高技术研究发展计划资助项目(2012AA011005);国家自然科学基金(61273292)
摘    要:条件随机场模型通过抓取问句中心词各方面统计特征来进行中心词标注,但未能充分利用中心词特征间存在的深层统计关系。该文利用中文问句的依存关系树结构,通过挖掘问句依存关系树所蕴藏的中心词各维度特征之间的统计概率关系,为正确提取中心词提供依据,通过挖掘频繁依存子树模式以生成相应统计规则模式,使用条件随机场模型进行中心词初始标注,使用频繁依存子树模式统计规则进行中心词标注校正等。该文方法属于典型的客观方法,建立在严格的统计语料基础上,标注的稳定性、适应性和鲁棒性较好。实验结果表明,该文方法将条件随机场模型的中心词标注准确率提高约3%。


关 键 词:中心词  依存关系树  条件随机场  频繁子树模式
  

Automatic Extraction of Focus Based on Frequent Dependency Subtree Patterns
TIAN Weidong,YU Yongyong.Automatic Extraction of Focus Based on Frequent Dependency Subtree Patterns[J].Journal of Chinese Information Processing,2016,30(3):133-142.
Authors:TIAN Weidong  YU Yongyong
Affiliation:School of Computer and Information,Hefei University of Technology, Hefei, Anhui 230009, China
Abstract:Even though Conditional Random Field(CRF) model can automatically tag focus in question,some deep relationships among focuses still cannot be mined,and this results in nontrivial impairing on focus recognition. In this paper,a focus recognition method based on frequent dependency tree pattern of Chinese question is proposed. In this method,probabilities of various dimensional relationships of focus hidden in the dependency tree corpus are mined to improve the recognition accuracy. The main steps of the method include mining frequent subtree dependency model to generate the corresponding statistical rules,using CRF for initial focus annotation,and using frequency dependent subtree statistical rules to correct initial annotation etc. The experimental results show that the proposed method can improve the accuracy by 3% or so in average compared to CRF model.
Keywords:focus  dependency tree  CRF  frequent subtree pattern  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号