首页 | 本学科首页   官方微博 | 高级检索  
     

基于变长元组的文件类型识别算法
引用本文:曹鼎,罗军勇,尹美娟. 基于变长元组的文件类型识别算法[J]. 计算机应用, 2011, 31(7): 1894-1897. DOI: 10.3724/SP.J.1087.2011.01894
作者姓名:曹鼎  罗军勇  尹美娟
作者单位:信息工程大学 信息工程学院, 郑州 450002
摘    要:快速准确地判断文件实体的真实类型对保护计算机信息安全具有重要意义。通过分析现有基于二进制内容的文件类型识别算法中存在的问题,提出采用变长元组描述文件的统计特征,并结合结构化文件中元组的分散度、稳定度以及条件广泛度设计出一种特征评估函数,从而更加准确地选取有效的特征。该算法不依靠特定文件类型的结构和关键标识,适用范围更为广泛。实验表明该算法能有效提高文件类型识别的查准率和查全率。

关 键 词:文件类型识别   变长元组   元组频率分布   文件类型指纹   特征选择
收稿时间:2011-01-21
修稿时间:2011-03-02

Variable length gram based file type identification algorithm
CAO Ding,LUO Jun-yong,YIN Mei-juan. Variable length gram based file type identification algorithm[J]. Journal of Computer Applications, 2011, 31(7): 1894-1897. DOI: 10.3724/SP.J.1087.2011.01894
Authors:CAO Ding  LUO Jun-yong  YIN Mei-juan
Affiliation:Institute of Information Engineering,Information Engineering University,Zhengzhou Henan 450002,China
Abstract:Fast and accurate identification of the true type of an arbitrary file is very important in information security. Concerning the problems of current content based file type identification algorithms, variable length gram was introduced for describing statistic characteristics of files binary content, and a new evaluation function combining gram divergence, stability and conditional width was adopted for feature selection for structured file types. This algorithm does not rely on the structure and key words of any specific file types, which allows the approach to be applied more widely. The experimental results show that the proposed approach improves the precision and recall of file type identification.
Keywords:file type identification   variable length gram   gram frequency distribution   fileprints   feature selection
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号