首页 | 本学科首页   官方微博 | 高级检索  
     

基于增强BiLSTM-CRF模型的推文恶意软件名称识别
引用本文:古雪梅,刘嘉勇,程芃森,何祥.基于增强BiLSTM-CRF模型的推文恶意软件名称识别[J].计算机科学,2020,47(2):245-250.
作者姓名:古雪梅  刘嘉勇  程芃森  何祥
作者单位:四川大学网络空间安全学院 成都 610000;四川大学网络空间安全学院 成都 610000;中国科学院信息工程研究所中国科学院网络测评技术重点实验室 北京 100093
摘    要:针对推文中恶意软件名称识别任务存在的文本简短、非正式、实体类别单一以及实体歧义等问题,提出了一种基于BERT-BiLSTM-Self-attention-CRF的实体识别方法,以实现推文中恶意软件名称的自动识别。在BiLSTM-CRF模型的基础上,利用BERT模型编码单词语境信息,提升词嵌入的上下文语义质量,增强原有模型的语义消歧能力;同时,借助Self-attention机制学习单词间关系和句子结构特征,利用加权表征帮助单一类别实体的解码,以提升恶意软件名称实体的识别效果。通过构建包含恶意软件名称实体的推文标记数据集进行实验测试,结果表明,提出的方法可以实现更好的性能,其精确率、召回率、F1值分别为86.38%,84.73%,85.55%,相较于基线模型BiLSTM-CRF,F1值提升了12.61%。

关 键 词:恶意软件名称识别  实体消歧  动态词嵌入  类别不均  重要性加权

Malware Name Recognition in Tweets Based on Enhanced BiLSTM-CRF Model
GU Xue-mei,LIU Jia-yong,CHENG Peng-sen,HE Xiang.Malware Name Recognition in Tweets Based on Enhanced BiLSTM-CRF Model[J].Computer Science,2020,47(2):245-250.
Authors:GU Xue-mei  LIU Jia-yong  CHENG Peng-sen  HE Xiang
Affiliation:(School of Cybersecurity,Sichuan University,Chengdu 610000,China;Key Laboratory of Network Assessment Technology,Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China)
Abstract:To address the problems such as short,informal,single entity category and entity disambiguation in the malware name recognition task on Twitter,this paper proposed an entity recognition method based on BERT-BiLSTM-Self-attention-CRF to automatically recognize malware name in tweets.Based on the BiLSTM-CRF model,the BERT is used to encode context information,improve the contextual semantic quality of word embeddings,and enhance the semantic disambiguation ability.At the same time,Self-attention mechanism is used to learn weighted representation to improve the performance of single entity category re-cognition by learning the long-term relations between words and sentence structure.To evaluate the proposed methods,this paper constructed a labeled dataset in tweets that contains malware name entities.Experimental results show that the proposed method can achieve a better performance,attain 86.38%precision,84.73%recall and 85.55%F-score.The proposed model can outperforms the baseline model,with F-score improved by 12.61%.
Keywords:Malware name recognition  Entity disambiguation  Dynamic word embedding  Class imbalance  Importance weighting
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号