首页 | 本学科首页   官方微博 | 高级检索  
     

基于主动学习的命名实体识别算法
引用本文:张岑芳. 基于主动学习的命名实体识别算法[J]. 计算机与现代化, 2021, 0(7): 18-22. DOI: 10.3969/j.issn.1006-2475.2021.07.004
作者姓名:张岑芳
作者单位:南京理工大学计算机科学与工程学院,江苏 南京 210094
基金项目:江苏省研究生科研与实践创新计划项目(SJCX19_0054)
摘    要:命名实体识别的目的是识别文本中的实体指称的边界和类别.在进行命名实体识别模型训练的过程中,通常需要大量的标注样本.本文通过实现有效的选择算法,从大量样本中选择适合模型更新的样本,减少对样本的标注工作.通过5组对比实验,验证使用有效的选择算法能够获得更好的样本集,实现具有针对性的标注样本.通过设计在微博网络数据集上的实验...

关 键 词:命名实体识别  主动学习  深度学习  Bi-LSTM
收稿时间:2021-08-02

Named Entity Recognition Algorithm Based on Active Learning
ZHANG Cen-fang. Named Entity Recognition Algorithm Based on Active Learning[J]. Computer and Modernization, 2021, 0(7): 18-22. DOI: 10.3969/j.issn.1006-2475.2021.07.004
Authors:ZHANG Cen-fang
Abstract:The purpose of named entity recognition is to identify the boundaries and categories of entities in the text. In the process of training named entity recognition models, a large number of labeled samples are usually required. By implementing effective selection algorithms, this paper reduces the labeling of samples from a large number of samples suitable for model updates. By using five sets of comparison experiments, it is verified that a better set of samples can be obtained by effective selection algorithm, and a targeted sample of annotations is realized. Through experiments designed on microblog network data sets, it is verified that the current-based active learning algorithm can select more appropriate sample sets for a large amount of Internet text data, which can effectively reduce the cost of manual labeling. This paper uses two models to realize the boundary extraction and classification of entities. The sequence labeling model extracts the position of the entity in the sequence, the entity classification model realizes the classification of the labeling results, and uses the active learning method to realize the training on the unlabeled data set. Experiment on two data sets is done by using the training method in this article. Experiments on the Weibo dataset show that the algorithm can learn text features from the unlabeled dataset. The experimental results on the MSRA data set show that when the proportion of the pre-training data set reaches more than 40%, the F1 score of the model on the test data set is stable at about 90%, which is close to the result of using all the data sets, indicating that the model  in unlabeled data sets has certain feature extraction capabilities.
Keywords:named entity recognition  activate learning  deep learning  Bi-LSTM  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号