首页 | 本学科首页   官方微博 | 高级检索  
     

基于知识蒸馏与模型集成的事件论元抽取方法
引用本文:王士浩,王中卿,李寿山,周国栋. 基于知识蒸馏与模型集成的事件论元抽取方法[J]. 计算机工程, 2022, 48(7): 97-103. DOI: 10.19678/j.issn.1000-3428.0061790
作者姓名:王士浩  王中卿  李寿山  周国栋
作者单位:苏州大学计算机科学与技术学院,江苏苏州215006
基金项目:国家自然科学基金(61806137,61702518);;江苏省高等学校自然科学研究面上项目(18KJB520043);
摘    要:目前先进的事件论元抽取方法通常使用BERT模型作为编码器,但BERT巨大的参数量会降低效率,使模型无法在计算资源有限的设备中运行。提出一种新的事件论元抽取方法,将事件论元抽取教师模型蒸馏到2个不同的学生模型中,再对2个学生模型进行集成。构造使用BERT模型和图卷积神经网络的事件论元抽取教师模型,以及2个分别使用单层卷积神经网络和单层长短期记忆网络的学生模型。先通过均方误差损失函数对学生模型和教师模型的中间层向量进行知识蒸馏,再对分类层进行知识蒸馏,使用均方误差损失函数和交叉熵损失函数让学生模型学习教师模型分类层的知识和真实标签的知识。在此基础上,利用加权平均的方法对2个学生模型进行集成,从而提升事件论元抽取性能。使用ACE2005英文数据集进行实验,结果表明,与学生模型相比,该方法可使事件论元抽取F1值平均提升5.05个百分点,推理时间和参数量较教师模型减少90.85%和99.25%。

关 键 词:事件论元抽取  知识蒸馏  模型集成  预训练语言模型  模型压缩
收稿时间:2021-05-31
修稿时间:2021-09-16

Event Argument Extraction Method Based on Knowledge Distillation and Model Ensemble
WANG Shihao,WANG Zhongqing,LI Shoushan,ZHOU Guodong. Event Argument Extraction Method Based on Knowledge Distillation and Model Ensemble[J]. Computer Engineering, 2022, 48(7): 97-103. DOI: 10.19678/j.issn.1000-3428.0061790
Authors:WANG Shihao  WANG Zhongqing  LI Shoushan  ZHOU Guodong
Affiliation:School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China
Abstract:Existing advanced event argument extraction methods focus on model's performance and ignore model's size and efficiency.These models exist problems of high computation cost and high delay.To address these problems, this paper proposes Event Argument Extraction method via knowledge Distillation and model Ensemble(EAEDE).The event argument extraction teacher model is distilled into two different student models, and then ensemble the student models.Firstly, a event argument extraction teacher model using BERT and graph Convolution Neural Network(CNN) is constructed, and then two student models using Long Short-Term Memory network(LSTM) and CNN respectively are constructed.During the distilling process, the student models learn the middle hidden of teacher model, and then learn the logits of teacher model.The Mean Square Error(MSE) loss function and Cross Entropy(CE) loss function are used to let students learn the knowledge of the teacher's model classification layer and the knowledge of the real label.Finally, the weighted average method is used to ensemble the two student models to get the final model.The experiments using ACE2005 dataset show that this method improves the event argument extraction performance of student models by an average of 5.05 percentage points, while reduces the infer time by 90.85% and reduces the size of model by 99.25%, comparing with the teacher model.
Keywords:event argument extraction  knowledge distillation  model ensemble  pre-trained language model  model compression  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号