首页 | 本学科首页   官方微博 | 高级检索  
     

基于二叉排序树及中文分词的关键字过滤技术
引用本文:叶敏,范金锋.基于二叉排序树及中文分词的关键字过滤技术[J].电力信息化,2011,9(7):15-18.
作者姓名:叶敏  范金锋
作者单位:1. 国网电力科学研究院信息与通信研究所,江苏,南京,210003
2. 国家电网公司信息化工作部,北京,100031
摘    要:防止敏感重要的文档资料泄漏是电力行业信息安全中一项重要的工作。采用二叉排序树技术对基础词组库和过滤关键字进行预排序,采用最大后缀匹配方式对需要检测的文本字符串进行中文分词,再通过关键字二叉排序树进行检查过滤,以达到安全高效检测敏感关键字的目的。经性能分析测试,该技术在性能和准确率上都有很好的效果。

关 键 词:二叉排序  中文分词  关键字过滤  信息安全

Research on Keyword Filtering Technology Based on Binary Sort Tree and Chinese Word Segmentation
YE Min,FAN Jin-feng.Research on Keyword Filtering Technology Based on Binary Sort Tree and Chinese Word Segmentation[J].Electric Power Information Technology,2011,9(7):15-18.
Authors:YE Min  FAN Jin-feng
Affiliation:1.Insitute of Information and Commucation, State Grid Electric Power Research Institute, Nanjing 210003, China; 2.IT-Office, State Grid Corporation of China, Beijing 100031, China)
Abstract:To avoid the information leakages of sensitive and important documents is one of important tasks for information security of electricity industry. This paper presents a keyword filtering method based on binary sort tree and Chinese word segmentation. The proposed method first uses binary sort tree to sort basic phrases and filtering keywords, and then implements the Chinese word segmentation by applying the postfix maximum match algorithm to the candidate Chinese texts, further checks sensitive keywords by using keyword binary sort tree. The experimental results show that the proposed method has a very good effect on performance and accuracy.
Keywords:binary sort  Chinese word segmentation  keyword filtering  information security
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号