首页 | 本学科首页   官方微博 | 高级检索  
     

基于条件随机场与信息熵的特定领域概念发现
引用本文:付瑶,万静,邢立栋.基于条件随机场与信息熵的特定领域概念发现[J].计算机应用研究,2020,37(3):708-711,730.
作者姓名:付瑶  万静  邢立栋
作者单位:北京化工大学 信息科学与技术学院,北京100029;中国科学院自动化研究所,北京100190
摘    要:针对特定领域内自动化识别既有概念和发现新概念的问题,提出一种基于条件随机场和信息熵的抽取方法。通过使用条件随机场对文本中的概念词进行边界预测,与词典中的概念对比,筛选出新概念的候选项并找出其大概位置,然后由互信息和左右熵分别判断概念窗口内的概念内部结合度和概念边界自由度,从而发现新的专业概念。实验表明,使用该方法进行概念发现比单独使用条件随机场的方法有更好的效果,基于字和词的模型概念发现的准确率分别提升了20.06%和46.54%。

关 键 词:概念识别  新概念发现  条件随机场  信息熵  特定领域
收稿时间:2018/8/17 0:00:00
修稿时间:2020/1/30 0:00:00

New words discovery method based on CRF and information entropy in specific domain
Fu Yao,Wan Jing and Xing Lidong.New words discovery method based on CRF and information entropy in specific domain[J].Application Research of Computers,2020,37(3):708-711,730.
Authors:Fu Yao  Wan Jing and Xing Lidong
Affiliation:Beijing University of Chemical Technology,,
Abstract:Aiming at the problem of automatic identification of existing concepts and discovering new concepts in a specific field, this paper proposed a new words discovery method based on conditional random field(CRF) and information entropy. This method used CRF to predict the boundary of conceptual words in text, selected the candidates of the new concept with the comparison to the existing concepts in the dictionary and found the probably location in text. Then it used the mutual information and the left and right entropy to judge the internal integration degree and the boundary freedom of the concept in the concept window for discovering new professional concepts. Experiments show that the concept discovery using the proposed method has a better effect than the method of using CRF alone. The accuracy of the concept discovery based on word and words model is respectively improved by 20.06% and 46.54%.
Keywords:concept recognition  new concept discovery  conditional random field  information entropy  specific field
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号