首页 | 本学科首页   官方微博 | 高级检索  
     

集成学习在装备小样本试验缺失数据插补上的应用
引用本文:马亮,郭力强,刘丙杰,杨静.集成学习在装备小样本试验缺失数据插补上的应用[J].计算机测量与控制,2022,30(8):116-121.
作者姓名:马亮  郭力强  刘丙杰  杨静
作者单位:海军潜艇学院,海军潜艇学院,,
摘    要:针对装备试验数据量有限和装备测试数据易缺失的现状,提出了一种基于集成学习的回归插补方法。以随机森林和XGBoost算法为回归器,通过设定快速填充基准和特征重要性评估策略的方法,改进数据子集重建和训练集与测试集的迭代划分策略,使用Optuna框架实现回归器超参数的自动优化,在某型导弹发射试验上进行实例验证。结果表明,使用集成学习算法的回归插补效果明显优于传统的统计量插补法以及KNN和BP神经网络,在不同缺失比例下的回归确定系数结果均保持在0.95以上,能有效解决装备小样本试验数据缺失的问题,并利用KEEL公测数据集验证了该方法的推广价值和通用性。

关 键 词:小样本试验  集成学习  随机森林  XGBoost  数据插补
收稿时间:2021/11/16 0:00:00
修稿时间:2022/3/15 0:00:00

Application of Ensemble Learning for Interpolating Missing Datafrom Small Sample Trials of Equipment
Abstract:For the current situation that the amount of equipment test data is limited and the equipment test data is prone to missing, we propose a regression interpolation method based on ensemble learning algorithm. The Random Forests and XGBoost algorithms are used as regressor for interpolating the missing data by setting fast filling benchmarks and feature importance assessment strategies to improving data subset reconstruction and iterative partitioning strategies for the training and test sets, and automatically optimizing regressors hyperparameters via the Optuna framework. Based on this method, a type of missile launch trial were used for validating. The results show that the regression interpolation effect of the ensembel learning algorithm is significantly better than the traditional statistical interpolation method as well as KNN and BP neural networks. And the R square different missing proportions are maintained above 0.95, which can effectively solve the problem of missing data of small sample tests of equipment. In addition, We validate the generalizability of this method by using the KEEL public test dataset.
Keywords:Small sample trials of equipment  Ensemble Learning  Random Forests  XGBoost  Data interpolation
点击此处可从《计算机测量与控制》浏览原始摘要信息
点击此处可从《计算机测量与控制》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号