首页 | 本学科首页   官方微博 | 高级检索  
     

基于语言模型词嵌入和注意力机制的敏感信息检测方法
引用本文:黄诚,赵倩锐.基于语言模型词嵌入和注意力机制的敏感信息检测方法[J].计算机应用,2022,42(7):2009-2014.
作者姓名:黄诚  赵倩锐
作者单位:四川大学 网络空间安全学院,成都 610065
基金项目:国家自然科学基金资助项目(61902265);
摘    要:针对基于关键词字符匹配和短语级情感分析等传统敏感信息检测方法准确率低和泛化性差的问题,提出了一种基于语言模型词嵌入和注意力机制(A-ELMo)的敏感信息检测方法。首先,进行字典树快速匹配,以最大限度地减少无用字符的比较,从而极大地提高查询效率;其次,构建了一个语言模型词嵌入模型(ELMo)进行语境分析,并通过动态词向量充分表征语境特征,从而实现较高的可扩展性;最后,结合注意力机制加强模型对敏感特征的识别度,从而进一步提升对敏感信息的检测率。在由多个网络数据源构成的真实数据集上进行实验,结果表明,所提敏感信息检测方法与基于短语级情感分析的方法相比,准确率提升了13.3个百分点;与基于关键字匹配的方法相比,准确率提升了43.5个百分点,充分验证了所提方法在加强敏感特征识别度、提高敏感信息检测率方面的优越性。

关 键 词:敏感信息  语言模型词嵌入  语境分析  注意力机制  字典树  
收稿时间:2021-05-27
修稿时间:2021-08-27

Sensitive information detection method based on attention mechanism-based ELMo
Cheng HUANG,Qianrui ZHAO.Sensitive information detection method based on attention mechanism-based ELMo[J].journal of Computer Applications,2022,42(7):2009-2014.
Authors:Cheng HUANG  Qianrui ZHAO
Affiliation:School of Cyber Science and Engineering,Sichuan University,Chengdu Sichuan 610065,China
Abstract:In order to solve the problems of low accuracy and poor generalization of the traditional sensitive information detection methods such as keyword character matching-based method and phrase-level sentiment analysis-based method, a sensitive information detection method based on Attention mechanism-based Embedding from Language Model (A-ELMo) was proposed. Firstly, the quick matched of trie tree was performed to reduce the comparison of useless words significantly, thereby improving the query efficiency greatly. Secondly, an Embedding from Language Model (ELMo) was constructed for context analysis, and the dynamic word vectors were used to fully represent the context characteristics to achieve high scalability. Finally, the attention mechanism was combined to enhance the identification ability of the model for sensitive features, and further improve the detection rate of sensitive information. Experiments were carried out on real datasets composed of multiple network data sources. The results show that the accuracy of the proposed sensitive information detection method is improved by 13.3 percentage points compared with that of the phrase-level sentiment analysis-based method, and the accuracy of the proposed method is improved by 43.5 percentage points compared with that of the keyword matching-based method, verifying that the proposed method has advantages in terms of enhancing identification ability of sensitive features and improving the detection rate of sensitive information.
Keywords:sensitive information  Embedding from Language Model (ELMo)  context analysis  attention mechanism  trie tree  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号