首页 | 本学科首页   官方微博 | 高级检索  
     

面向非平衡数据集的金融欺诈账户检测研究
引用本文:吕芳,汤丰赫,黄俊恒,王佰玲. 面向非平衡数据集的金融欺诈账户检测研究[J]. 计算机工程, 2021, 47(6): 312-320. DOI: 10.19678/j.issn.1000-3428.0058006
作者姓名:吕芳  汤丰赫  黄俊恒  王佰玲
作者单位:1. 哈尔滨工业大学(威海) 计算机科学与技术学院, 山东 威海 264209;2. 哈尔滨工业大学(威海)网络空间安全研究院, 山东 威海 264209
摘    要:针对非平衡金融数据集,提出一种银行欺诈账户检测框架iForest-SMOTE.基于账户的动态交易特点,从统计、时序、监督信息维度抽取账户交易行为特征.针对过采样技术ADASYN在金融账户数据集中存在的跨区域样本合成问题,提出一种基于iForest算法的数据集均衡预处理策略,通过iForest算法对数据进行混合采样,在去...

关 键 词:隔离森林  非平衡分类  欺诈账户检测  随机森林  特征挖掘
收稿时间:2020-04-08
修稿时间:2020-06-19

Study on Financial Fraud Account Detection Based on Imbalanced Datasets
Lü Fang,TANG Fenghe,HUANG Junheng,WANG Bailing. Study on Financial Fraud Account Detection Based on Imbalanced Datasets[J]. Computer Engineering, 2021, 47(6): 312-320. DOI: 10.19678/j.issn.1000-3428.0058006
Authors:Lü Fang  TANG Fenghe  HUANG Junheng  WANG Bailing
Affiliation:1. School of Computer Science and Technology, Harbin Institute of Technology(Weihai), Weihai, Shandong 264209, China;2. Research Institute of Cyberspace Security, Harbin Institute of Technology(Weihai), Weihai, Shandong 264209, China
Abstract:For the detection of bank accounts involved in fraud, this paper proposes a framework, iForest-SMOTE, which is applicable to the imbalanced financial datasets.Based on the dynamic transaction features of the accounts, the transaction behavior features are extracted from the dimensions of statistical information, sequential order information and supervision information.Then a datasets equalization strategy for data pre-processing is proposed to address the problem of cross-region sample synthesis, which is faced by the oversampling technology, ADASYN, on the financial account datasets.The strategy uses the iForest algorithm for mixed sampling of the data to remove the majority of noisy data and reduce the difficulty of the classifier learning from the minor classes.On this basis, a random forest classifier is designed to implement the detection of the accounts involved in financial fraud.The experimental results on the datasets of actual financial account transactions show that iForest-SMOTE has a clear advantage in the recall rate and accuracy over ADASYN, SMOTE and other sampling techniques.Its F-value is at least 2.13 percentage points higher than that of the other algorithms.
Keywords:isolation forest  imbalanced classification  fraud account detection  random forest  feature mining  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号