共查询到20条相似文献,搜索用时 0 毫秒
1.
为了能快速、准确地将分散在Web网页中的音乐实体抽取出来,在全方位了解音乐领域中命名实体的特征的基础上,提出了一种规则与统计相结合的中文音乐实体识别方法,并实现了音乐命名实体识别系统。通过测试发现,该系统具有较高的准确率和召回率。 相似文献
2.
This paper describes the creation of a fine-grained named entity annotation scheme and corpus for Dutch, and experiments on automatic main type and subtype named entity recognition. We give an overview of existing named entity annotation schemes, and motivate our own, which describes six main types (persons, organizations, locations, products, events and miscellaneous named entities) and finer-grained information on subtypes and metonymic usage. This was applied to a one-million-word subset of the Dutch SoNaR reference corpus. The classifier for main type named entities achieves a micro-averaged F-score of 84.91 %, and is publicly available, along with the corpus and annotations. 相似文献
3.
目前中文命名实体识别模型在识别具有嵌套结构的实体时存在误差,无法准确识别。基于跨度的方法能够找出嵌套实体,但在识别过程中经常生成不包含实体的跨度,无法明确划分跨度边界,增加模型负担。针对此问题,提出了基于词汇融合与跨度边界检测的中文嵌套命名实体识别模型。该模型使用多词融合方法达到文本特征增强的目的,在设计的注入模块中将目标语句中字符相关的多个词汇信息进行合并,之后融入到BERT中,以此获得更全面的上下文信息,提供更好的跨度表示;其次添加跨度边界检测模块,通过感知分类器预测跨度的首尾字符来划分跨度边界。在公共数据集上的实验表明,该模型可有效提升识别准确率。 相似文献
4.
5.
为了解决多模态命名实体识别方法中存在的图文语义缺失、多模态表征语义不明确等问题,提出了一种图文语义增强的多模态命名实体识别方法。其中,利用多种预训练模型分别提取文本特征、字符特征、区域视觉特征、图像关键字和视觉标签,以全面描述图文数据的语义信息;采用Transformer和跨模态注意力机制,挖掘图文特征间的互补语义关系,以引导特征融合,从而生成语义补全的文本表征和语义增强的多模态表征;整合边界检测、实体类别检测和命名实体识别任务,构建了多任务标签解码器,该解码器能对输入特征进行细粒度语义解码,以提高预测特征的语义准确性;使用这个解码器对文本表征和多模态表征进行联合解码,以获得全局最优的预测标签。在Twitter-2015和Twitter-2017基准数据集的大量实验结果显示,该方法在平均F1值上分别提升了1.00%和1.41%,表明该模型具有较强的命名实体识别能力。 相似文献
6.
Product named entity recognition in Chinese text 总被引:1,自引:0,他引:1
There are many expressive and structural differences between product names and general named entities such as person names,
location names and organization names. To date, there has been little research on product named entity recognition (NER),
which is crucial and valuable for information extraction in the field of market intelligence. This paper focuses on product
NER (PRO NER) in Chinese text. First, we describe our efforts on data annotation, including well-defined specifications, data
analysis and development of a corpus with annotated product named entities. Second, a hierarchical hidden Markov model-based
approach to PRO NER is proposed and evaluated. Extensive experiments show that the proposed method outperforms the cascaded
maximum entropy model and obtains promising results on the data sets of two different electronic product domains (digital
and cell phone).
相似文献
Feifan LiuEmail: |
7.
Classifier ensembling approach is considered for biomedical named entity recognition task. A vote-based classifier selection
scheme having an intermediate level of search complexity between static classifier selection and real-valued and class-dependent
weighting approaches is developed. Assuming that the reliability of the predictions of each classifier differs among classes,
the proposed approach is based on selection of the classifiers by taking into account their individual votes. A wide set of
classifiers, each based on a different set of features and modeling parameter setting are generated for this purpose. A genetic
algorithm is developed so as to label the predictions of these classifiers as reliable or not. During testing, the votes that
are labeled as being reliable are combined using weighted majority voting. The classifier ensemble formed by the proposed
scheme surpasses the full object F-score of the best individual classifier by 2.75% and it is the highest score achieved on
the data set considered. 相似文献
8.
9.
最近一些基于字符的命名实体识别(NER)模型无法充分利用词信息,而利用词信息的格子结构模型可能会退化为基于词的模型而出现分词错误。针对这些问题提出了一种基于transformer的python NER模型来编码字符-词信息。首先,将词信息与词开始或结束对应的字符绑定;然后,利用三种不同的策略,将词信息通过transformer编码为固定大小的表示;最后,使用条件随机场(CRF)解码,从而避免获取词边界信息带来的分词错误,并提升批量训练速度。在python数据集上的实验结果可以看出,所提模型的F1值比Lattice-LSTM模型高2.64个百分点,同时训练时间是对比模型的1/4左右,说明所提模型能够防止模型退化,提升批量训练速度,更好地识别python命名实体。 相似文献
10.
为了减少传统的命名实体识别需要人工制定特征的大量工作,通过无监督训练获得军事领域语料的分布式向量表示,采用双向LSTM递归神经网络模型解决军事领域命名实体的识别问题,并且通过添加字词结合的输入向量和注意力机制对双向LSTM递归神经网络模型进行扩展和改进,进而提高军事领域命名实体识别。实验结果表明,提出的方法能够完成军事领域命名实体的识别,并且在测试集语料上的F-值达到了87.38%。 相似文献
11.
12.
13.
针对警情领域关键实体信息难以识别的问题,提出一种基于BERT的神经网络模型BERT-BiLSTM-Attention-CRF用于识别和提取相关命名实体,且针对不同案由设计了相应的实体标记注规范。该模型使用BERT预训练词向量代替传统Skip-gram和CBOW等方式训练的静态词向量,提升了词向量的表证能力,同时解决了中文语料采用字向量训练时词语边界的划分问题;还使用注意力机制改进经典的命名实体识别(NER)模型架构BiLSTM-CRF。BERT-BiLSTM-Attention-CRF模型在测试集上的准确率达91%,较CRF++的基准模型提高7%,也高于BiLSTM-CRF模型86%的准确率,其中相关人名、损失金额、处理方式等实体的F1值均高于0.87。 相似文献
14.
Recognizing and disambiguating bio-entities (genes, proteins, cells, etc.) names are very challenging tasks as some biologica databases can be outdated, names may not be normalized, abbreviations are used, syntactic and word order is modified, etc. Thus, the same bio-entity might be written into different ways making searching tasks a key obstacle as many candidate relevant literature containing those entities might not be found. As consequence, the same protein mention but using different names should be looked for or the same discovered protein name is being used to name a new protein using completely different features hence named-entity recognition methods are required. In this paper, we developed a bio-entity recognition model which combines different classification methods and incorporates simple pre-processing tasks for bio-entities (genes and proteins) recognition is presented. Linguistic pre-processing and feature representation for training and testing is observed to positively affect the overall performance of the method, showing promising results. Unlike some state-of-the-art methods, the approach does not require additional knowledge bases or specific-purpose tasks for post processing which make it more appealing. Experiments showing the promise of the model compared to other state-of-the-art methods are discussed. 相似文献
15.
中文命名实体识别在多个重要领域有广泛的运用,提出一种基于转移学习的算法进行中文命名实体识别,旨在提高识别的准确率和召回率。基于转移学习算法的中心思想是开始以一些简单的结论应用于问题,然后在每个步骤应用转换,选择出每次转换的最优结论再次应用于问题,当选择的转换在足够的空间内不再修改数据时算法停止。提出算法的规则模板和约束文件的获取方法,形成一个完整的用于中文命名实体识别的模型,并利用该模型进行实验,获得了较好的结果。 相似文献
16.
命名实体识别属于自然语言处理领域词法分析中的一部分,是计算机正确理解自然语言的基础。为了加强模型对命名实体的识别效果,本文使用预训练模型BERT(bidirectional encoder representation from transformers)作为模型的嵌入层,并针对BERT微调训练对计算机性能要求较高的问题,采用了固定参数嵌入的方式对BERT进行应用,搭建了BERT-BiLSTM-CRF模型。并在该模型的基础上进行了两种改进实验。方法一,继续增加自注意力(self-attention)层,实验结果显示,自注意力层的加入对模型的识别效果提升不明显。方法二,减小BERT模型嵌入层数。实验结果显示,适度减少BERT嵌入层数能够提升模型的命名实体识别准确性,同时又节约了模型的整体训练时间。采用9层嵌入时,在MSRA中文数据集上F1值提升至94.79%,在Weibo中文数据集上F1值达到了68.82%。 相似文献
17.
18.
针对预训练模型BERT存在词汇信息缺乏的问题,在半监督实体增强最小均方差预训练模型的基础上提出了一种基于知识库实体增强BERT模型的中文命名实体识别模型OpenKG+Entity Enhanced BERT+CRF。首先,从中文通用百科知识库CN-DBPedia中下载文档并用Jieba中文分词抽取实体来扩充实体词典;然后,将词典中的实体嵌入到BERT中进行预训练,将训练得到的词向量输入到双向长短期记忆网络(BiLSTM)中提取特征;最后,经过条件随机场(CRF)修正后输出结果。在CLUENER 2020 和 MSRA数据集上进行模型验证,将所提模型分别与Entity Enhanced BERT Pre-training、BERT+BiLSTM、ERNIE和BiLSTM+CRF模型进行对比实验。实验结果表明,该模型的F1值在两个数据集上比四个对比模型分别提高了1.63个百分点和1.1个百分点、3.93个百分点和5.35个百分点、2.42个百分点和4.63个百分点以及6.79个百分点和7.55个百分点。可见,所提模型对命名实体识别的综合效果得到有效提升,F1值均优于对比模型。 相似文献
19.
A web-based Bengali news corpus for named entity recognition 总被引:1,自引:0,他引:1
The rapid development of language resources and tools using machine learning techniques for less computerized languages requires
appropriately tagged corpus. A tagged Bengali news corpus has been developed from the web archive of a widely read Bengali
newspaper. A web crawler retrieves the web pages in Hyper Text Markup Language (HTML) format from the news archive. At present,
the corpus contains approximately 34 million wordforms. Named Entity Recognition (NER) systems based on pattern based shallow
parsing with or without using linguistic knowledge have been developed using a part of this corpus. The NER system that uses
linguistic knowledge has performed better yielding highest F-Score values of 75.40%, 72.30%, 71.37%, and 70.13% for person,
location, organization, and miscellaneous names, respectively.
相似文献
Sivaji BandyopadhyayEmail: Email: |
20.
在生物医学领域,以静态词向量表征语义的命名实体识别方法准确率不高.针对此问题,提出一种将预训练语言模型BERT和BiLSTM相结合应用于生物医学命名实体识别的模型.首先使用BERT进行语义提取生成动态词向量,并加入词性分析、组块分析特征提升模型精度;其次,将词向量送入BiLSTM模型进一步训练,以获取上下文特征;最后通过CRF进行序列解码,输出概率最大的结果.该模型在BC4CHEMD、BC5CDR-chem和NCBI-disease数据集上的平均F1值达到了89.45%.实验结果表明,提出的模型有效地提升了生物医学命名实体识别的准确率. 相似文献