基于语言模型词嵌入和注意力机制的敏感信息检测方法 Sensitive information detection method based on attention mechanism-based ELMo期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于语言模型词嵌入和注意力机制的敏感信息检测方法

引用本文：	黄诚,赵倩锐.基于语言模型词嵌入和注意力机制的敏感信息检测方法[J].计算机应用,2022,42(7):2009-2014.

作者姓名：	黄诚赵倩锐

作者单位：	四川大学网络空间安全学院，成都 610065

基金项目：	国家自然科学基金资助项目（61902265）；

摘要：	针对基于关键词字符匹配和短语级情感分析等传统敏感信息检测方法准确率低和泛化性差的问题，提出了一种基于语言模型词嵌入和注意力机制（A-ELMo）的敏感信息检测方法。首先，进行字典树快速匹配，以最大限度地减少无用字符的比较，从而极大地提高查询效率；其次，构建了一个语言模型词嵌入模型（ELMo）进行语境分析，并通过动态词向量充分表征语境特征，从而实现较高的可扩展性；最后，结合注意力机制加强模型对敏感特征的识别度，从而进一步提升对敏感信息的检测率。在由多个网络数据源构成的真实数据集上进行实验，结果表明，所提敏感信息检测方法与基于短语级情感分析的方法相比，准确率提升了13.3个百分点；与基于关键字匹配的方法相比，准确率提升了43.5个百分点，充分验证了所提方法在加强敏感特征识别度、提高敏感信息检测率方面的优越性。
关键词：	敏感信息语言模型词嵌入语境分析注意力机制字典树
收稿时间：	2021-05-27
修稿时间：	2021-08-27
Sensitive information detection method based on attention mechanism-based ELMo

Cheng HUANG,Qianrui ZHAO.Sensitive information detection method based on attention mechanism-based ELMo[J].journal of Computer Applications,2022,42(7):2009-2014.

Authors:	Cheng HUANG Qianrui ZHAO

Affiliation:	School of Cyber Science and Engineering，Sichuan University，Chengdu Sichuan 610065，China

Abstract:	In order to solve the problems of low accuracy and poor generalization of the traditional sensitive information detection methods such as keyword character matching-based method and phrase-level sentiment analysis-based method， a sensitive information detection method based on Attention mechanism-based Embedding from Language Model （A-ELMo） was proposed. Firstly， the quick matched of trie tree was performed to reduce the comparison of useless words significantly， thereby improving the query efficiency greatly. Secondly， an Embedding from Language Model （ELMo） was constructed for context analysis， and the dynamic word vectors were used to fully represent the context characteristics to achieve high scalability. Finally， the attention mechanism was combined to enhance the identification ability of the model for sensitive features， and further improve the detection rate of sensitive information. Experiments were carried out on real datasets composed of multiple network data sources. The results show that the accuracy of the proposed sensitive information detection method is improved by 13.3 percentage points compared with that of the phrase-level sentiment analysis-based method， and the accuracy of the proposed method is improved by 43.5 percentage points compared with that of the keyword matching-based method， verifying that the proposed method has advantages in terms of enhancing identification ability of sensitive features and improving the detection rate of sensitive information.

Keywords:	sensitive information Embedding from Language Model (ELMo) context analysis attention mechanism trie tree

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏