首页 | 本学科首页   官方微博 | 高级检索  
     

基于动词名词和CHI特征选择的中文人物社会关系抽取
引用本文:曾 辉,唐佳丽,熊李艳,黄晓辉. 基于动词名词和CHI特征选择的中文人物社会关系抽取[J]. 计算机应用研究, 2017, 34(6)
作者姓名:曾 辉  唐佳丽  熊李艳  黄晓辉
作者单位:华东交通大学 信息工程学院.南昌 330013,华东交通大学 信息工程学院.南昌 330013,华东交通大学 信息工程学院.南昌 330013,华东交通大学 信息工程学院.南昌 330013
基金项目:跨网页多源异构数据社会关系挖掘方法及应用研究(国家自然科学基金:61363072);科技项目评审中关键技术研究与应用(江西省科技厅科技成果转移转化计划:509100955024);基于张量的异构信息网络演化聚类关键技术研究(国家自然科学基金:61562027);江西社会科学“十二五”规划项目(15XW12);江西省教育厅项目(150494)
摘    要:针对中文人物社会关系标注语料库的匮乏和人物关系分类过于粗糙的问题,本文采用一种简单的方式标注了八类主要人物社会关系。为了有效的降低特征向量的维数避免维数灾难,并尽可能去除噪声特征以提高关系抽取的准确率,本文提出一种基于动词和名词抽取与χ2统计量法(CHI)相结合的特征选择方法,并使用TF-IDF计算特征权重。通过SVM分类器进行实验,F值和正确率都得到了提高。为了充分利用数据集对该特征选择方法的效果进行测试,使用K-折交叉验证检验该方法的有效性,实验表明通过该方法产生的分类模型具有较强的区分能力和泛化能力。

关 键 词:人物关系抽取  人物关系标注  特征选择  CHI  SVM分类器
收稿时间:2016-04-20
修稿时间:2017-04-11

Personal social relation extraction in Chinese based on feature selection of CHI, verb and noun
Hui Zeng,Jiali Tang,Liyan Xiong and Xiaohui Huang. Personal social relation extraction in Chinese based on feature selection of CHI, verb and noun[J]. Application Research of Computers, 2017, 34(6)
Authors:Hui Zeng  Jiali Tang  Liyan Xiong  Xiaohui Huang
Affiliation:School of Information Engineering,East China Jiaotong University,School of Information Engineering,East China Jiaotong University,School of Information Engineering,East China Jiaotong University,School of Information Engineering,East China Jiaotong University
Abstract:Due to the scarce of labeled Chinese corpus of social relation and the rough classification of personal social relations, eight main types of personal social relation are labeled by a simple method. It is necessary to reduce the dimension of feature vector effectively to avoid the curse of dimensionality and remove the noise characteristics to improve the accuracy of relation extraction, therefore, a feature selection method based on Chi square statistic combination with selection of verb and noun is proposed, and TF-IDF is used to calculate weight of the feature items. After feature selection,the proposed method is tested by SVM classifier, and the results of F-Score and accuracy are improved.In order to make full use of the data set to test the effect of this feature selection method, the validity of the proposed method is tested by using k-fold cross validation. Experimental results show that the classification model generated by this method has high discernibility and generalization ability.
Keywords:social relation extraction   social relation labeled   feature selection   CHI   SVM classifier
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号