首页 | 本学科首页   官方微博 | 高级检索  
     

基于文档平滑和查询扩展的文档敏感信息检测方法
引用本文:苏赢彬,杜学绘,夏春涛,李海华.基于文档平滑和查询扩展的文档敏感信息检测方法[J].计算机应用,2014,34(9):2639-2644.
作者姓名:苏赢彬  杜学绘  夏春涛  李海华
作者单位:1. 数学工程与先进计算国家重点实验室,郑州 450001; 2. 信息工程大学,郑州 450001; 3. 河南工业贸易职业学院 计算机科学与技术系,郑州 450001
摘    要:由于办公终端可能出现敏感信息泄露的风险,对终端上的文档进行敏感信息检测就显得十分重要,但现有敏感信息检测方法中存在上下文信息无关的索引导致文档建模不准确、查询语义扩展不充分的问题。为此,首先提出基于上下文的文档索引平滑算法,构建尽可能保留文档信息的索引;然后改进查询语义扩展算法,结合领域本体中概念敏感度适当扩大敏感信息检测范围;最后将文档平滑和查询扩展融合于语言模型,在其基础上提出了文档敏感信息检测方法。将采用不同索引机制、查询关键字扩展算法及检测模型的四种方法进行比较,所提出的算法在文档敏感信息检测中的查全率、准确率和F值分别为0.798,0.786和0.792,各项性能指标均明显优于对比算法。结果表明该算法是一种能更有效检测敏感信息的方法。

关 键 词:敏感信息  文档平滑  语义扩展  语言模型  检测方法
收稿时间:2014-04-09
修稿时间:2014-05-28

Sensitive information detection approach for documents based on document smoothing and query expansion
SU Yingbin,DU Xuehui,XIA Chuntao,LI Haihua.Sensitive information detection approach for documents based on document smoothing and query expansion[J].journal of Computer Applications,2014,34(9):2639-2644.
Authors:SU Yingbin  DU Xuehui  XIA Chuntao  LI Haihua
Affiliation:1. Information Engineering University, Zhengzhou Henan 450001, China
2. State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou Henan 450001, China
3. Department of Computer Science and Technology, Henan Industry and Trade Vocational College, Zhengzhou Henan 450001, China
Abstract:Detecting sensitive information on terminal documents becomes extremely important due to the potential risk of sensitive information leakage. In order to resolve the problems of imprecise document model caused by context-free index and inadequate semantic extension, firstly, a context-sensitive document smoothing algorithm was proposed to build document index, which can retain much more document information; secondly, combining the sensitivity of concept in the domain ontology, semantic extension was improved to expand the detection range of sensitive information; finally, document smoothing and query expansion were integrated into the language model, and a sensitive information detection approach based on the language model was proposed. Comparative experiments on four approaches using different index mechanisms, query expansion algorithms and detection models, the recall, precision and F-Measure of the proposed approach were 0.798, 0.786 and 0.792 respectively, and the various performance indicators were obviously better than the compared algorithms. The experimental results show that the proposed approach is a more effective one.
Keywords:sensitive information  document smoothing  semantic expansion  language model  detection approach
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号