首页 | 本学科首页   官方微博 | 高级检索  
     

基于BERT的盗窃罪法律文书命名实体识别方法
引用本文:李春楠,王雷,孙媛媛,林鸿飞.基于BERT的盗窃罪法律文书命名实体识别方法[J].中文信息学报,2021,35(8):73-81.
作者姓名:李春楠  王雷  孙媛媛  林鸿飞
作者单位:1.大连理工大学 计算机科学与技术学院,辽宁 大连116024;
2.锦州市人民检察院,辽宁 锦州121000
基金项目:“十三五”国家重点研发计划(2018YFC0830603)
摘    要:法律文书命名实体识别是智慧司法领域的关键性和基础性任务。在目前法律文书命名实体识别方法中,存在实体定义与司法业务结合不紧密、传统词向量无法解决一词多义等问题。针对以上问题,该文提出一种新的法律文本命名实体定义方案,构建了基于起诉意见书的法律文本命名实体语料集LegalCorpus;提出一种基于BERT-ON-LSTM-CRF(Bidirectional Encoder Representations from Transformers-Ordered Neuron-Long Short Term Memory Networks-Conditional Random Field)的法律文书命名实体识别方法,该方法首先利用预训练语言模型BERT根据字的上下文动态生成语义向量作为模型输入,然后运用ON-LSTM对输入进行序列和层级建模以提取文本特征,最后利用CRF获取最优标记序列。在LegalCorpus上进行实验,该文提出的方法F1值达到86.09%,相比基线模型lattice LSTM F1值提升了7.8%。实验结果表明,该方法可以有效对法律文书的命名实体进行识别。

关 键 词:BERT  法律文书  命名实体识别  有序神经元  
收稿时间:2020-02-22

BERT Based Named Entity Recognition for Legal Texts on Theft Cases
LI Chunnan,WANG Lei,SUN Yuanyuan,LIN Hongfei.BERT Based Named Entity Recognition for Legal Texts on Theft Cases[J].Journal of Chinese Information Processing,2021,35(8):73-81.
Authors:LI Chunnan  WANG Lei  SUN Yuanyuan  LIN Hongfei
Affiliation:1.School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China;2.People's Procuratorate of Jinzhou, Jinzhou, Liaoning 121000, China
Abstract:Legal named entity recognition(LNER)is a fundamentaltask for the field of smart judiciary.This paper presents a new definition of LNER and a corpus of letters of proposal for prosecution named LegalCorpus. This paper proposes novel BERT based NER model for legal texts, named BERT-ON-LSTM-CRF (Bidirectional Encoder Representations from Transformers-Ordered Neuron-Long Short Term Memory Networks-Conditional Random Field). The proposed model utilizes BERT model to dynamically obtain the semantic vectors according to the context of words. Then the ONLSTM is adopted to extract the text features by modeling the input sequence and hierarchy. Finally, the text features are decoded by CRF to obtain the optimal tag sequence. Experiments show that the proposed model can achieve a F1-value of 86.09%, with 7.8% increased than the best baseline Lattice-LSTM.
Keywords:BERT  legal text  named entity recognition  ON-LSTM  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号