首页 | 本学科首页   官方微博 | 高级检索  
     

基于混合梯度提升决策树和逻辑回归模型的分组密码算法识别方案
引用本文:袁科,黄雅冰,杜展飞,李家保,贾春福.基于混合梯度提升决策树和逻辑回归模型的分组密码算法识别方案[J].四川大学学报(工程科学版),2022,54(4):218-227.
作者姓名:袁科  黄雅冰  杜展飞  李家保  贾春福
作者单位:河南大学,河南大学,河南大学,河南大学,南开大学
基金项目:国家重点研发计划(No.2018YFA0704703);国家自然科学基金项目(No.61802111, No.61972073, No.61972215);河南省重点研发与推广专项(No.222102210062);河南省高等学校重点科研项目基础研究计划(No.22A413004);国家级大学生创新训练项目(No. 202110475072)
摘    要:针对密码算法识别工作中因密码算法数量增多、密文数据复杂化以及数据间干扰增加,导致单层识别方案的识别准确率和稳定性变差等问题,提出一种基于混合梯度提升决策树和逻辑回归模型,并基于该模型构造分组密码算法识别方案。在该方案中,首先用原始十组特征训练梯度提升决策树模型,然后利用其学习而生成的树来构造新特征,再将新特征做one-hot编码,最后把这些新特征加入到原有特征中,训练逻辑回归模型进行预测。在唯密文情况下,针对AES、3DES、Blowfish、CAST和RC2五种典型的分组密码开展密码算法识别研究。当密文大小及其它实验条件都相同时,其二分类的识别准确率最高可达70%,五分类准确率最高达32%;高于基于单一梯度提升决策树分类方案的52.5%和27.2%分类准确率,以及单一逻辑回归模型分类方案的45%和25.6%分类准确率;显著优于二分类50%以及五分类20%的随机猜测正确率。实验结果表明,在分组密码算法上开展二分类和五分类识别,相较于其它识别方案,当密文长度相同时,该方案具有更高的分类准确率。同时随着密文长度的变化,识别准确率呈波动性变化,该方案波动幅度最小,受影响程度最小,稳定性最高。

关 键 词:密码算法识别  机器学习  集成学习  梯度提升决策树  逻辑回归
收稿时间:2021/4/18 0:00:00
修稿时间:2022/3/27 0:00:00

Block Cipher Algorithm Identification Scheme Based on Hybrid Gradient Boosting Decision Tree and Logistic Regression Model
YUAN Ke,HUANG Yabing,DU Zhanfei,LI Jiabao,JIA Chunfu.Block Cipher Algorithm Identification Scheme Based on Hybrid Gradient Boosting Decision Tree and Logistic Regression Model[J].Journal of Sichuan University (Engineering Science Edition),2022,54(4):218-227.
Authors:YUAN Ke  HUANG Yabing  DU Zhanfei  LI Jiabao  JIA Chunfu
Affiliation:School of Computer and Info. Eng., Henan Univ., Kaifeng 475004;Henan Province Eng. Research Center of Spatial Info. Processing, Kaifeng 475004, China; School of Computer and Info. Eng., Henan Univ., Kaifeng 475004;College of Cybersecurity, Nankai Univ., Tianjin 300350, China
Abstract:In order to solve the worse identification accuracy and stability problem of single-layer scheme in cryptographic algorithm identification work with the increase number of cryptographic algorithms, the complexity of ciphertext data and the increase of interference between data, a hybrid gradient boosting decision tree and logistic regression model (HGBDTLR) were proposed, and a block cryptographic algorithm identification scheme was constructed based on this model. In this scheme, the gradient boosting decision tree model was first trained with the original ten sets of features, and then the trees generated by its learning were used to construct new features. After that, the new features were one-hot encoded, and finally were added to the original features to train the logistic regression model to make predictions. In the ciphertext only scenario, the cipher algorithm identification research was carried out for five typical block ciphers: AES, 3DES, Blowfish, CAST and RC2. When the ciphertext sizes and other experimental conditions were the same, the identification accuracy of the binary classification was up to 70%, the five classification was up to 32%, which was higher than the identification accuracy of 52.5% and 27.2% of the identification scheme based on single gradient boosting decision tree, and the identification accuracy of 45% and 25.6% of the identification scheme based on single logistic regression model. And this result is significantly better than the 50% and 20% accuracy of random guessing scheme. The experimental results showed that, compared with existing identification schemes, when the size of ciphertext files was same, this scheme had higher identification accuracy on block ciphers binary classification and five classification tasks. At the same time, with the change of the size of ciphertext files, the identification accuracy showed a fluctuating change. This scheme had the smallest fluctuation range, the smallest degree of influence and the highest stability.
Keywords:cryptographic algorithm identification  machine learning  ensemble learning  gradient boosting decision tree  logistic regression
点击此处可从《四川大学学报(工程科学版)》浏览原始摘要信息
点击此处可从《四川大学学报(工程科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号