首页 | 本学科首页   官方微博 | 高级检索  
     

基于ADASYN-SFS-RF的欺诈检测模型泛化性能提升及可解释性研究
引用本文:汪万敏,智路平.基于ADASYN-SFS-RF的欺诈检测模型泛化性能提升及可解释性研究[J].计算机应用研究,2022,39(12).
作者姓名:汪万敏  智路平
作者单位:上海理工大学 管理学院,上海理工大学 管理学院
基金项目:国家自然科学基金项目(71801150);上海市人民政府决策咨询研究项目(2022-Z-J07)
摘    要:针对行业欺诈行为形式多样、操作隐蔽,且数据分布极端不平衡等问题,研究采用ADASYN(adaptive synthetic sampling approach for imbalanced learning)算法将分类决策边界向困难的实例进行自适应移动实现数据扩增,以解决不平衡数据造成的过拟合问题。采用基于随机森林的序列向前搜索策略算法筛选出最优特征子集对欺诈进行检测,降低ADASYN算法添加噪声数据对分类边界确定的影响,构建欺诈检测模型,并使用LIME对模型检测结果作出局部解释,提高模型的使用价值。实验表明,该模型可以较好地克服传统欺诈检测模型对多数类样本误分类的缺陷,有助于提高行业对交易欺诈行为识别的效率。同时,通过LIME对模型检测出的随机样本进行有效解析,便于决策者对算法模型的检测结果作出实证分析,起到明显的预警及决策参考价值。

关 键 词:欺诈检测    随机森林    ADASYN    LIME    特征选择
收稿时间:2022/5/10 0:00:00
修稿时间:2022/11/17 0:00:00

Fraud detection model generalization performance improvement and interpretability study based on ADASYN-SFS-RF
Wang Wanmin and Zhi Luping.Fraud detection model generalization performance improvement and interpretability study based on ADASYN-SFS-RF[J].Application Research of Computers,2022,39(12).
Authors:Wang Wanmin and Zhi Luping
Affiliation:Business School,University of Shanghai for Science Technology,
Abstract:Aiming at the problems of various forms, hidden operations, and extremely unbalanced data distribution of fraud in the industry, this paper adopted the ADASYN algorithm to adaptively move the classification decision boundary to difficult instances to achieve data augmentation, to solve the over-fitting problem caused by unbalanced data. It used the sequence forward search strategy algorithm based on the random forest to filter out the optimal feature subset to detect fraud, reduced the impact of noise data added by the ADASYN algorithm on the determination of classification boundary, constructed a fraud detection model, and usedLIME to make local interpretation of the model detection results to improve the use of the model. The experiments show that the model can better overcome the defects of traditional fraud detection models in misclassifying most classes of samples, and help to improve the efficiency of transaction fraud identification in the industry. At the same time, the random samples detected by the model are effectively analyzed through LIME, which is convenient for decision-makers to make empirical analyses on the detection results of the algorithm model and plays an obvious early warning and decision-making reference value.
Keywords:fraud detection  random forest  ADASYN  LIME  feature selection
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号