首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 484 毫秒
1.
临床术语标准化任务是医学统计中不可或缺的一部分。在实际应用中,一个标准的临床术语可能有数种口语化和非标准化的描述,而对于一些应用例如临床知识库的构建而言,如何将这些描述进行标准化是必须要面对的问题。该文主要关注中文临床术语的标准化任务,即将非标准的中文临床术语的描述文本和给定的临床术语库中的标准词进行对应。尽管一些深度判别式模型在简单文本结构的医疗术语,例如,疾病、药品名等的标准化任务上取得了一定成效,但对于中文临床术语标准化任务而言,其带标准化的描述文本中经常包含的信息缺失、“一对多”等情况,仅依靠判别式模型无法得到完整的语义信息,因而导致模型效果欠佳。该文将临床术语标准化任务类比为翻译任务,引入深度生成式模型对描述文本的核心语义进行生成并得到标准词候选集,再利用基于BERT的语义相似度算法对候选集进行重排序得到最终标准词。该方法在第五届中国健康信息处理会议(CHIP2019)评测数据中进行了实验并取得了很好的效果。  相似文献   

2.
电子病历中的临床术语描述形式具有多样性和不规范性,阻碍了医疗数据的分析和利用,因此对临床术语标准化的研究具有重要的现实意义。当前国内医疗机构临床术语标准化主要由人工完成,效率低,成本高。该文提出了一种基于BERT的临床术语标准化方法。该方法使用Jaccard相似度算法从标准术语集中挑选出候选词,基于BERT模型对原始词和候选词进行匹配得到标准化的结果。在CHIP2019临床术语标准化评测任务的数据集上准确率为90.04%。实验结果表明,该方法对于临床术语标准化任务是有效的。  相似文献   

3.
随着互联网大健康数字化时代的到来,健康数据海量增长,为解决医疗数据集成应用中异构数据的术语标准化问题,提出一种利用PubMedBERT计算语义相似度实现医学术语对齐的技术。使用特定医学领域预训练模型,结合缩略词扩展方法增强语义信息,并与传统相似度计算模型、BERT(Bidirectional Encoder Representations from Transformers)及其变体相比较。在测试语料上的实验表明,缩略词扩展后PubMedBERT预训练模型TOP1的准确率提高了18.79%,PubMedBERT模型TOP1、TOP3、TOP5、TOP10的准确率分别达到78.49%、85.69%、87.44%、89.54%,优于其他对比模型。该方法可以为医学术语对齐工作提供一种智能化的解决方案。  相似文献   

4.
第五届中国健康信息处理会议(China Conference on Health Information Processing, CHIP2019)组织了中文临床医疗信息处理方面的三个评测任务,其中任务1为临床术语标准化任务。该任务的主要目标是对中文电子病历中挖掘出的真实手术实体进行语义标准化。评测数据集中所有手术原词均来自于真实医疗数据,并以《ICD9-2017协和临床版》手术词表为标准进行了标注。共有56支队伍报名参加了评测,最终有20支队伍提交了47组结果。该评测以准确率作为最终评估标准,提交结果中最高准确率达到94.83%。  相似文献   

5.
医学术语标准化作为消除实体歧义性的重要手段, 被广泛应用于知识图谱的构建过程之中. 针对医学领域涉及大量的专业术语和复杂的表述方式, 传统匹配模型往往难以达到较高的准确率的问题, 提出语义召回加精准排序的两阶段模型来提升医学术语标准化效果. 首先在语义召回阶段基于改进的有监督对比学习和RoBERTa-wwm提出语义表征模型CL-BERT, 通过CL-BERT生成实体的语义表征向量, 根据向量之间的余弦相似度进行召回并得到标准词候选集, 其次在精准排序阶段使用T5结合prompt tuning构建语义精准匹配模型, 并将FGM对抗训练应用到模型训练中, 然后使用精准匹配模型对原词和标准词候选集分别进行精准排序得到最终标准词. 采用ccks2019公开数据集进行实验, F1值达到了0.9206, 实验结果表明所提出的两阶段模型具有较高的性能, 为实现医学术语标准化提供了新思路.  相似文献   

6.
针对传统的基于模板匹配、人工构建特征、语义匹配等解决术语标准化的方案, 往往会存在术语映射准确率不高, 难以对齐等问题. 本文结合医疗领域的文本中术语口语化、表达多样化的特点, 使用了多策略召回和蕴含语义评分排序模块来提升医学术语标准化效果. 在多策略召回模块中使用了基于Jaccard相关系数、TF-IDF、历史召回方法进行召回, 在蕴含语义评分模块使用了RoBERTa-wwm-ext作为判分语义模型. 首次在医学专业人员标注的基于SNOMED CT标准的中文数据集上验证了可用性. 实验证明, 在医疗知识特征的处理中, 本方法能够在医学术语标准化实际应用上达到不错的效果, 具有很好的泛化性及实用价值.  相似文献   

7.
医疗实体标准化旨在将电子病历、患者主诉等文本数据中非标准化术语映射为统一且规范的医疗实体。针对医学文本普遍存在的标注语料规模小、规范化程度低等领域特点,该文提出了一种基于多模型协同的集成学习框架,用以解决医疗实体标准化问题。该框架通过建立多模型之间的“合作与竞争”模式,能够兼具字符级、语义级等不同标准化方法的优势。具体而言,运用知识蒸馏技术进行协同学习,从各模型中汲取有效特征;利用竞争意识综合各模型的实体标准化结果,保证候选集的多样性。在CHIP-CDN 2021医疗实体标准化评测任务中,该文提出的方法在盲测数据集上达到了73.985%的F1值,在包括百度BDKG、蚂蚁金融Antins、思必驰AIspeech在内的255支队伍中,取得了第二名的成绩。后续实验结果进一步表明,该方法可有效对医疗文本中的术语进行标准化处理。  相似文献   

8.
近年来,随着深度学习(Deep Learning)在机器阅读理解(Machine Reading Comprehension)领域的广泛应用,机器阅读理解迅速发展。针对机器阅读理解中的语义理解和推理,提出一种双线性函数注意力(Attention)双向长短记忆网络(Bi directional-Long Short-Term Memory)模型,较好地完成了在机器阅读理解中抽取文章、问题、问题候选答案的语义并给出了正确答案的任务。将其应用到四六级(CET-4,CET-6)听力文本上测试,测试结果显示,以单词为单位的按序输入比以句子为单位的按序输入准确率高2%左右;此外,在基本的模型之上加入多层注意力转移的推理结构后准确率提升了8%左右。  相似文献   

9.
高考语文阅读理解问答相对普通阅读理解问答难度更大,同时高考问答任务中的训练数据较少,目前的深度学习方法不能取得良好的答题效果。针对这些问题,该文提出融合BERT语义表示的高考阅读理解答案候选句抽取方法。首先,采用改进的MMR算法对段落进行筛选;其次,运用微调之后的BERT模型对句子进行语义表示;再次,通过SoftMax分类器对答案候选句进行抽取,最后利用PageRank排序算法对输出结果进行二次排序。该方法在北京近十年高考语文阅读理解问答题上的召回率和准确率分别达到了61.2%和50.1%,验证了该方法的有效性。  相似文献   

10.
张晶  曹存根  王石 《计算机科学》2012,39(7):170-174
中文术语及未登录词的翻译是机器翻译、跨语言检索中的一个重要问题,这些翻译很难从现有的词典中获取。提出了一种通过搜索引擎从网页中自动获取中文术语英文翻译的方法。通过术语的部分翻译信息,构造出3种查询项模式,提出了多特征的翻译抽取方法。针对传统方法结果准确率不高、候选翻译干扰项多的问题,提出端类比对齐验证、双语对齐度验证、构词法验证3种验证模型来对候选翻译进行有效验证。实验结果表明,获取的双语翻译对准确率高,TOP1的准确率达到97.4%,TOP3的准确率达到98.3%。  相似文献   

11.
该文探究手术操作术语归一化方法的构建。首先,分析手术操作术语归一化数据集的特点;其次,调研术语归一化的相关方法;最后,结合调研知悉的技术理论方法和数据集特征,建立手术操作术语归一化模型。该文融合文本相似度排序+BERT模型匹配开展建模,在2019年中文健康信息处理会议(CHIP2019)手术操作术语归一化学术评测中,验证集准确率为88.35%,测试集准确率为88.51%,在所有参赛队伍中排名第5。  相似文献   

12.
An information retrieval system has to retrieve all and only those documents that are relevant to a user query, even if index terms and query terms are not matched exactly. However, term mismatches between index terms and query terms have been a serious obstacle to the enhancement of retrieval performance. In this article, we discuss automatic term normalization between words and phrases in text corpora and their application to a Korean information retrieval system. We perform three new types of term normalizations: transliterated word normalization, noun phrase normalization, and context-based term normalization. Transliterated words are normalized into equivalence classes by using contextual similarity to alleviate lexical term mismatches. Then, noun phrases are normalized into phrasal terms by segmenting compound nouns as well as normalizing noun phrases. Moreover, context-based terms are normalized by using a combination of mutual information and word context to establish word similarities. Next, unsupervised clustering is done by using the K-means algorithm and cooccurrence clusters are identified to alleviate semantic term mismatches. These term normalizations are used in both the indexing and the retrieval system. The experimental results show that our proposed system can alleviate three types of term mismatches and can also provide the appropriate similarity measurements. As a result, our system can improve the retrieval effectiveness of the information retrieval system.  相似文献   

13.
临床术语标准化即对于医生书写的任一术语,给出其在标准术语集合内对应的标准词。标准词数量多且相似度高,存在Zero-shot和Few-shot等问题,给术语标准化带来了巨大的挑战。该文基于“中国健康信息处理大会”CHIP 2019评测1中提供的数据集,设计并实现了基于BERT蕴含分数排序的临床术语标准化系统。该系统由数据预处理、BERT蕴含打分、BERT数量预测、基于逻辑回归的重排序四个模块组成。用精确率(Accuracy)作为评价指标,最终结果为0.948 25,取得了评测1第一名的成绩。  相似文献   

14.
In those problems that deal with multiple sources of linguistic information we can find problems defined in contexts where the linguistic assessments are assessed in linguistic term sets with different granularity of uncertainty and/or semantics (multigranular linguistic contexts). Different approaches have been developed to manage this type of contexts, that unify the multigranular linguistic information in an unique linguistic term set for an easy management of the information. This normalization process can produce a loss of information and hence a lack of precision in the final results. In this paper, we shall present a type of multigranular linguistic contexts we shall call linguistic hierarchies term sets, such that, when we deal with multigranular linguistic information assessed in these structures we can unify the information assessed in them without loss of information. To do so, we shall use the 2-tuple linguistic representation model. Afterwards we shall develop a linguistic decision model dealing with multigranular linguistic contexts and apply it to a multi-expert decision-making problem.  相似文献   

15.
Centroid-based categorization is one of the most popular algorithms in text classification. In this approach, normalization is an important factor to improve performance of a centroid-based classifier when documents in text collection have quite different sizes and/or the numbers of documents in classes are unbalanced. In the past, most researchers applied document normalization, e.g., document-length normalization, while some consider a simple kind of class normalization, so-called class-length normalization, to solve the unbalancedness problem. However, there is no intensive work that clarifies how these normalizations affect classification performance and whether there are any other useful normalizations. The purpose of this paper is three folds; (1) to investigate the effectiveness of document- and class-length normalizations on several data sets, (2) to evaluate a number of commonly used normalization functions and (3) to introduce a new type of class normalization, called term-length normalization, which exploits term distribution among documents in the class. The experimental results show that a classifier with weight-merge-normalize approach (class-length normalization) performs better than one with weight-normalize-merge approach (document-length normalization) for the data sets with unbalanced numbers of documents in classes, and is quite competitive for those with balanced numbers of documents. For normalization functions, the normalization based on term weighting performs better than the others on average. For term-length normalization, it is useful for improving classification accuracy. The combination of term- and class-length normalizations outperforms pure class-length normalization and pure term-length normalization as well as unnormalization with the gaps of 4.29%, 11.50%, 30.09%, respectively.  相似文献   

16.
Many well-known probabilistic information retrieval models have shown promise for use in document ranking, especially BM25. Nevertheless, it is observed that the control parameters in BM25 usually need to be adjusted to achieve improved performance on different data sets; additionally, the assumption in BM25 on the bag-of-words model prevents its direct utilization of rich information that lies at the sentence or document level. Inspired by the above challenges with respect to BM25, we first propose a new normalization method on the term frequency in BM25 (called BM25QL in this paper); in addition, the method is incorporated into CRTER2, a recent BM25-based model, to construct CRTER2QL. Then, we incorporate topic modeling and word embedding into BM25 to relax the assumption of the bag-of-words model. In this direction, we propose a topic-based retrieval model, TopTF, for BM25, which is then further incorporated into the language model (LM) and the multiple aspect term frequency (MATF) model. Furthermore, an enhanced topic-based term frequency normalization framework, ETopTF, based on embedding is presented. Experimental studies demonstrate the great effectiveness and performance of these methods. Specifically, on all tested data sets and in terms of the mean average precision (MAP), our proposed models, BM25QL and CRTER2QL, are comparable to BM25 and CRTER2 with the best b parameter value; the TopTF models significantly outperform the baselines, and the ETopTF models could further improve the TopTF in terms of the MAP.  相似文献   

17.
Probabilistic linguistic preference relation (PLPR) provides an effective and flexible tool with which preference degrees of decision-makers can be captured when they vacillatingly express linguistic preference values among several linguistic terms. Individual consistency and group consensus are two important research topics of PLPRs in group decision making (GDM). Considering the problems associated with these two topics, this study proposes a novel GDM framework with consistency-driven and consensus-driven optimization models based on a personalized normalization method for managing complete and incomplete PLPRs. First, existing limitations of the traditional normalization method for probabilistic linguistic term sets (PLTSs) managing ignorance information are specifically discussed. Given the potential valuable information hidden in PLTSs, a personalized normalization method is newly proposed through a two-stage decision-making process with a comprehensive fusion mechanism. Then, based on the proposed normalization method for PLTSs, consistency-driven optimization models that aim to minimize the overall adjustment amount of a PLPR are constructed to improve consistency. Moreover, the developed models are extended to improve consistency and estimate the missing values of an incomplete PLPR. Subsequently, a consensus-driven optimization model that aims to maximize group consensus by adjusting experts’ weights is constructed to support the consensus-reaching process. Finally, an illustrative example, followed by some comparative analyses is presented to demonstrate the application and advantages of the proposed approach.  相似文献   

18.
This paper presents a new shape prior-based implicit active contour model for image segmentation. The paper proposes an energy functional including a data term and a shape prior term. The data term, inspired from the region-based active contour approach, evolves the contour based on the region information of the image to segment. The shape prior term, defined as the distance between the evolving shape and a reference shape, constraints the evolution of the contour with respect to the reference shape. Especially, in this paper, we present shapes via geometric moments, and utilize the shape normalization procedure, which takes into account the affine transformation, to align the evolving shape with the reference one. By this way, we could directly calculate the shape transformation, instead of solving a set of coupled partial differential equations as in the gradient descent approach. In addition, we represent the level-set function in the proposed energy functional as a linear combination of continuous basic functions expressed on a B-spline basic. This allows a fast convergence to the segmentation solution. Experiment results on synthetic, real, and medical images show that the proposed model is able to extract object boundaries even in the presence of clutter and occlusion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号