基于增强BiLSTM-CRF模型的推文恶意软件名称识别 Malware Name Recognition in Tweets Based on Enhanced BiLSTM-CRF Model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于增强BiLSTM-CRF模型的推文恶意软件名称识别

引用本文：	古雪梅,刘嘉勇,程芃森,何祥.基于增强BiLSTM-CRF模型的推文恶意软件名称识别[J].计算机科学,2020,47(2):245-250.

作者姓名：	古雪梅刘嘉勇程芃森何祥

作者单位：	四川大学网络空间安全学院成都 610000;四川大学网络空间安全学院成都 610000;中国科学院信息工程研究所中国科学院网络测评技术重点实验室北京 100093

摘要：	针对推文中恶意软件名称识别任务存在的文本简短、非正式、实体类别单一以及实体歧义等问题,提出了一种基于BERT-BiLSTM-Self-attention-CRF的实体识别方法,以实现推文中恶意软件名称的自动识别。在BiLSTM-CRF模型的基础上,利用BERT模型编码单词语境信息,提升词嵌入的上下文语义质量,增强原有模型的语义消歧能力;同时,借助Self-attention机制学习单词间关系和句子结构特征,利用加权表征帮助单一类别实体的解码,以提升恶意软件名称实体的识别效果。通过构建包含恶意软件名称实体的推文标记数据集进行实验测试,结果表明,提出的方法可以实现更好的性能,其精确率、召回率、F1值分别为86.38%,84.73%,85.55%,相较于基线模型BiLSTM-CRF,F1值提升了12.61%。
关键词：	恶意软件名称识别实体消歧动态词嵌入类别不均重要性加权
Malware Name Recognition in Tweets Based on Enhanced BiLSTM-CRF Model

GU Xue-mei,LIU Jia-yong,CHENG Peng-sen,HE Xiang.Malware Name Recognition in Tweets Based on Enhanced BiLSTM-CRF Model[J].Computer Science,2020,47(2):245-250.

Authors:	GU Xue-mei LIU Jia-yong CHENG Peng-sen HE Xiang

Affiliation:	(School of Cybersecurity,Sichuan University,Chengdu 610000,China;Key Laboratory of Network Assessment Technology,Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China)

Abstract:	To address the problems such as short,informal,single entity category and entity disambiguation in the malware name recognition task on Twitter,this paper proposed an entity recognition method based on BERT-BiLSTM-Self-attention-CRF to automatically recognize malware name in tweets.Based on the BiLSTM-CRF model,the BERT is used to encode context information,improve the contextual semantic quality of word embeddings,and enhance the semantic disambiguation ability.At the same time,Self-attention mechanism is used to learn weighted representation to improve the performance of single entity category re-cognition by learning the long-term relations between words and sentence structure.To evaluate the proposed methods,this paper constructed a labeled dataset in tweets that contains malware name entities.Experimental results show that the proposed method can achieve a better performance,attain 86.38%precision,84.73%recall and 85.55%F-score.The proposed model can outperforms the baseline model,with F-score improved by 12.61%.

Keywords:	Malware name recognition Entity disambiguation Dynamic word embedding Class imbalance Importance weighting
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏