首页 | 本学科首页   官方微博 | 高级检索  
     

基于特征组合的中文语义角色标注
引用本文:李世奇,赵铁军,李晗静,刘鹏远,刘水.基于特征组合的中文语义角色标注[J].软件学报,2011,22(2):222-232.
作者姓名:李世奇  赵铁军  李晗静  刘鹏远  刘水
作者单位:1. 哈尔滨工业大学,计算机科学与技术学院,黑龙江,哈尔滨,150001
2. 北京大学,计算语言学研究所,北京,100871
基金项目:国家自然科学基金(60736014, 60803094, 60773069, 60903063)
摘    要:提出一种基于特征组合和支持向量机(support vector machine,简称SVM)的语义角色标注(semantic role labeling,简称SRL)方法.该方法以句法成分作为基本标注单元,首先从当前基于句法分析的语义角色标注系统中选出高效特征,构成基本特征集合.然后提出一种基于统计的特征组合方法.该方法能够根据正反例中组合特征的分布状况,以类间距离和类内距离之比作为统计量来衡量组合特征对分类所产生的效果,保留分类效果较好的组合特征.最后,在Chinese PropBank(CPB)语料上利用支持向量机进行分类实验,结果表明,引入该特征组合方法后,语义角色标注整体F值达91.81%,提高了近2%.

关 键 词:语义角色标注  自然语言处理  支持向量机  特征组合
收稿时间:2009/10/29 0:00:00
修稿时间:2010/1/20 0:00:00

Chinese Semantic Role Labeling Based on Feature Combination
LI Shi-Qi,ZHAO Tie-Jun,LI Han-Jing,LIU Peng-Yuan and LIU Shui.Chinese Semantic Role Labeling Based on Feature Combination[J].Journal of Software,2011,22(2):222-232.
Authors:LI Shi-Qi  ZHAO Tie-Jun  LI Han-Jing  LIU Peng-Yuan and LIU Shui
Affiliation:School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;Institute of Computational Linguistics, Peking University, Beijing 100871, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Abstract:This paper proposes a semantic role labeling (SRL) approach for the Chinese, based on feature combination and support vector machine (SVM). The approach takes the constituent as the labeling unit. First, this paper defines the basic feature set by selecting the high-performance features of existing parsing-based SRL systems. Then, a statistics-based method is proposed to construct a combined feature set derived from the basic feature set. According to the distribution of combining features in both positive and negative instances, the ratio of between-class to within-class distance is utilized as the measurement of classifying the performance the feature, and then choosing the combining features with high ratios into the combining feature set. Finally, the experimental results show that the feature combination method-based SRL achieved 91.81% F-score on Chinese PropBank (CPB) corpus, nearly 2% higher than the traditional method.
Keywords:semantic role labeling  natural language processing  support vector machine  feature combination
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号