基于藏文BERT的藏医药医学实体识别 |
| |
引用本文: | 朱亚军,拥措,尼玛扎西. 基于藏文BERT的藏医药医学实体识别[J]. 计算机与现代化, 2023, 0(1): 43-48 |
| |
作者姓名: | 朱亚军 拥措 尼玛扎西 |
| |
基金项目: | 科技部重点研发计划重点专项(2017YFB1402200); 西藏自治区科技创新基地自主研究项目(XZ2021JR002G); 西藏大学研究生“高水平人才培养计划”项目(2019-GSP-S118) |
| |
摘 要: | 藏医药文本字符嵌入对藏医药医学实体识别有着重要意义,但目前藏文缺少高质量的藏文语言模型。本文结合藏文结构特点使用普通藏文新闻文本训练基于音节的藏文BERT模型,并基于藏文BERT模型构建BERT-BiLSTM-CRF模型。该模型首先使用藏文BERT模型对藏医药文本字符嵌入进行学习,增强字符嵌入对藏文字符及其上下文信息的表示能力,然后使用BiLSTM层进一步抽取藏医药文本中字符之间的依赖关系,最后使用CRF层强化标注序列的合法性。实验结果表明,使用藏文BERT模型初始化藏医药文本字符嵌入有助于提高藏医药医学实体识别效果,F1值达96.18%。
|
关 键 词: | 藏文 藏医药 命名实体识别 BERT 双向长短期记忆 |
收稿时间: | 2023-03-02 |
Tibetan Medical Entity Recognition Based on Tibetan BERT |
| |
Abstract: | Tibetan medicine character embedding is of great significance for Tibetan medical entity recognition, but there is a lack of high-quality Tibetan language model. Combined with Tibetan structural characteristics, the BERT model based on syllable is trained by using ordinary Tibetan news text, and a BERT-BiLSTM-CRF model is built by using the Tibetan BERT model. Firstly, the model uses Tibetan BERT model to learn the character embedding of Tibetan medicine text, and enhances the ability of character embedding to express Tibetan characters and their context information. And then, the BiLSTM layer is used to further extract the dependencies between characters in Tibetan medicine text. Finally, the CRF layer is used to strengthen the legitimacy of the label sequence. The experimental results show that using Tibetan BERT model to initialize character embedding is helpful to improve the recognition of Tibetan medical entity, and the F1 value reaches 96.18%. |
| |
Keywords: | Tibetan Tibetan medicine NER BERT BiLSTM |
|
| 点击此处可从《计算机与现代化》浏览原始摘要信息 |
|
点击此处可从《计算机与现代化》下载全文 |