首页 | 本学科首页   官方微博 | 高级检索  
     

面向法律文本的三元组抽取模型
引用本文:陈彦光,王雷,孙媛媛,王治政,张书晨.面向法律文本的三元组抽取模型[J].计算机工程,2021,47(5):277-284.
作者姓名:陈彦光  王雷  孙媛媛  王治政  张书晨
作者单位:1. 大连理工大学 计算机科学与技术学院, 辽宁 大连 116024;2. 辽宁省人民检察院第三检察部, 沈阳 110033
摘    要:在中国裁判文书网上的开源刑事判决文档中蕴藏着重要的法律信息,但刑事判决书文档通常以自然语言的形式进行记录,而机器难以直接理解文档中的内容。为使由自然语言记录的非结构化刑事判决书文本转化为结构化三元组形式,构建一种面向法律文本的司法三元组抽取模型。将三元组抽取过程看作二阶段流水线结构,利用预训练的基于Transformer的双向编码器表示模型先进行命名实体识别,再将识别结果应用于关系抽取阶段得到相应的三元组表示,从而实现对非结构化刑事判决书文本的信息提取。实验结果表明,在经过人工标注的刑事判决书数据集上,该模型相比基于循环神经网络的组合模型的F1值提高了28.1个百分点,具有更优的三元组抽取性能。

关 键 词:命名实体识别  关系抽取  预训练语言模型  Transformer编码器  流水线结构  
收稿时间:2020-03-11
修稿时间:2020-05-08

Triple Extraction Model for Legal Texts
CHEN Yanguang,WANG Lei,SUN Yuanyuan,WANG Zhizheng,ZHANG Shuchen.Triple Extraction Model for Legal Texts[J].Computer Engineering,2021,47(5):277-284.
Authors:CHEN Yanguang  WANG Lei  SUN Yuanyuan  WANG Zhizheng  ZHANG Shuchen
Affiliation:1. School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China;2. The Third Procuratorial Department, People's Procuratorate of Liaoning Province, Shenyang 110033, China
Abstract:The open-source documents of criminal sentences on China judgments online contain important legal information.However,the documents are usually transcribed in the form of natural language and difficult for machines to understand.This paper proposes a triplet extraction model for legal texts to transform the unstructured texts recorded by natural language into structured triplets.In the construction of the model,the triplet extraction process is considered as a two-stage pipeline structure.The pretrained Bidirectional Encoder Representations from Transformer(BERT)model is used for Named Entity Recognition(NER),and the recognition results are applied to relation extraction to obtain the corresponding triplet representation,completing the information extraction for the unstructured legal texts of criminal senteces.Experimental results on the manually labeled dataset of criminal sentences show that the F1 score of the proposed model is 28.1 percentage points higher than that of combinational model based on recurrent neural network,demonstrating its excellent triplet extraction performance.
Keywords:Named Entity Recognition(NER)  relation extraction  pretrained language model  Transformer encoder  pipeline structure
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号