首页 | 本学科首页   官方微博 | 高级检索  
     

谓词自动识别中的特征选择度量研究
引用本文:张宜浩,金澎. 谓词自动识别中的特征选择度量研究[J]. 计算机工程与科学, 2012, 34(9): 188-192
作者姓名:张宜浩  金澎
作者单位:1. 乐山师范学院计算机科学学院,四川乐山,614004
2. 乐山师范学院智能信息处理与应用实验室,四川乐山,614004
基金项目:四川省教育厅资助科研资助项目,国家自然科学基金资助项目,乐山师范学院科研创新团队建设计划资助项目
摘    要:谓词的自动识别是浅层句法分析的重要内容。本文提出了基于支持向量机分类算法的谓词自动识别方法,重点描述了在特征构建过程中基于信息增益的特征筛选方法与基于同义词词林的特征词度量方法。信息增益方法选取对分类影响较大的特征,降低了特征维度;同义词词林的度量方法将特征词映射为深层次的语义概念,增强了特征的表达能力,强调了属性特征与模型的相关度。在小规模语料库上的实验表明,谓词识别的最好F-Score达到了84.0%,相较于对数据无任何处理的情况F-Score提高了4.6%。结果表明,这种新的特征筛选与特征度量方法在谓词识别中十分有效,可以极大提高分类器的性能。

关 键 词:谓词识别  特征选择  同义词词林  信息增益  支持向量机

Research on Feature Selection Metric for Predicate Identification
ZHANG Yi-hao , JIN Peng. Research on Feature Selection Metric for Predicate Identification[J]. Computer Engineering & Science, 2012, 34(9): 188-192
Authors:ZHANG Yi-hao    JIN Peng
Affiliation:1.School of Computer Science,Leshan Teachers’College,Leshan 614004; (2.Laboratory of Intelligent Information Processing and Application Institutional, Leshan Teachers’ College,Leshan 614004,China)
Abstract:Predicate Identification is one of the important research topics in shallow parsing.In this paper, a predicate identification method is proposed based on the support vector machine classification algorithm.Our focus is on the feature selection method with information gain and the metric method of feature words with TongYiCiCiLin information gain method selects the features that have a greater impact to classification model,which can reduce the dimensions of feature vector.TongYiCiCiLin maps the feature words into deep-seated semantic concept,enhances the representation ability of features, and emphasizes the degree of correlation between the features and the model.Experiments on a relatively small corpus show that the best F-Score of predicate identification reaches 84.0% and increases by 4.6% compared with the situation without dealing with the data.The experimental results show that the new method of the selection method of feature words and the representation of feature attribute are effective for predicate identification and can greatly improve the performance of classification.
Keywords:predicate identification  feature selection  TongYiCiCiLin  information gain  support vector machine
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号