基于频繁依存子树模式的中心词提取方法研究 Automatic Extraction of Focus Based on Frequent Dependency Subtree Patterns期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于频繁依存子树模式的中心词提取方法研究

引用本文：	田卫东,虞勇勇.基于频繁依存子树模式的中心词提取方法研究[J].中文信息学报,2016,30(3):133-142.

作者姓名：	田卫东虞勇勇

作者单位：	合肥工业大学计算机与信息学院,安徽合肥 230009

基金项目：	国家863高技术研究发展计划资助项目(2012AA011005);国家自然科学基金(61273292)

摘要：	条件随机场模型通过抓取问句中心词各方面统计特征来进行中心词标注,但未能充分利用中心词特征间存在的深层统计关系。该文利用中文问句的依存关系树结构,通过挖掘问句依存关系树所蕴藏的中心词各维度特征之间的统计概率关系,为正确提取中心词提供依据,通过挖掘频繁依存子树模式以生成相应统计规则模式,使用条件随机场模型进行中心词初始标注,使用频繁依存子树模式统计规则进行中心词标注校正等。该文方法属于典型的客观方法,建立在严格的统计语料基础上,标注的稳定性、适应性和鲁棒性较好。实验结果表明,该文方法将条件随机场模型的中心词标注准确率提高约3%。
关键词：	中心词依存关系树条件随机场频繁子树模式
Automatic Extraction of Focus Based on Frequent Dependency Subtree Patterns

TIAN Weidong,YU Yongyong.Automatic Extraction of Focus Based on Frequent Dependency Subtree Patterns[J].Journal of Chinese Information Processing,2016,30(3):133-142.

Authors:	TIAN Weidong YU Yongyong

Affiliation:	School of Computer and Information,Hefei University of Technology, Hefei, Anhui 230009, China

Abstract:	Even though Conditional Random Field(CRF) model can automatically tag focus in question,some deep relationships among focuses still cannot be mined,and this results in nontrivial impairing on focus recognition. In this paper,a focus recognition method based on frequent dependency tree pattern of Chinese question is proposed. In this method,probabilities of various dimensional relationships of focus hidden in the dependency tree corpus are mined to improve the recognition accuracy. The main steps of the method include mining frequent subtree dependency model to generate the corresponding statistical rules,using CRF for initial focus annotation,and using frequency dependent subtree statistical rules to correct initial annotation etc. The experimental results show that the proposed method can improve the accuracy by 3% or so in average compared to CRF model.

Keywords:	focus dependency tree CRF frequent subtree pattern

	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏