首页 | 本学科首页   官方微博 | 高级检索  
     

WKAG:一种针对不平衡医保数据的欺诈检测方法
引用本文:吴文龙,周喜,王轶,王保全.WKAG:一种针对不平衡医保数据的欺诈检测方法[J].计算机工程与应用,2021,57(9):247-254.
作者姓名:吴文龙  周喜  王轶  王保全
作者单位:1.中国科学院 新疆理化技术研究所,乌鲁木齐 830011 2.中国科学院大学,北京 100049 3.新疆民族语音语言信息处理实验室,乌鲁木齐 830011
基金项目:中国科学院STS计划;中科院创新青年促进会;自治区天山青年计划
摘    要:医保欺诈检测具有迫切的现实意义,当前工作主要以机器学习方法为主,但面临两个重要问题:(1)数据不平衡问题较为突出,欺诈样本占比极小,影响识别效果;(2)数据特征的选取与构造过于依赖领域业务知识,难以保证特征有效性。针对这些问题,提出了一种针对不平衡医保数据的欺诈检测方法--WKAG。使用WGAN-KDE(Wasserstein Generative Adversarial Network-Kernel Density Estimation)方法改善数据不平衡问题,结合自编码器(Auto-Encoder)提取数据的深层隐藏特征,使用Gradient Boosted Decision Tree(GBDT)检测医保欺诈行为。在多个公开数据集上验证了该方法有效性,并在真实医保业务数据集上进行了实验验证,结果表明了WKAG可作为医保欺诈行为的有效检测方法。

关 键 词:生成对抗网络  不平衡类  自编码特征表示  医保欺诈检测  集成学习  

WKAG:Fraud Detection Method for Imbalanced Medical Insurance Data
WU Wenlong,ZHOU Xi,WANG Yi,WANG Baoquan.WKAG:Fraud Detection Method for Imbalanced Medical Insurance Data[J].Computer Engineering and Applications,2021,57(9):247-254.
Authors:WU Wenlong  ZHOU Xi  WANG Yi  WANG Baoquan
Affiliation:1.Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China 2.University of Chinese Academy of Sciences, Beijing 100049, China 3.Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
Abstract:Medical insurance fraud detection has urgent practical significance. The current work is mainly concentrated on machine learning methods and confronted with two important issues:(1)The problem of imbalanced data is prominent and the proportion of fraud data among medical insurance data is extremely small, which affects the identification effect; (2)The selection and construction of data features depend on domain business knowledge, and it is difficult to guarantee the validity of features. Aiming at these problems, this paper proposes a fraud detection method for imbalanced healthcare data-WKAG:The Wasserstein Generative Adversarial Network-Kernel Density Estimation(WGAN-KDE) method is used to improve the imbalance of medical insurance data. The Auto-Encoder is used to extract the deep hidden features of data. The Gradient Boosted Decision Tree(GBDT) is used to detect medical insurance fraud. The validity of the method has been verified on multiplepublic data sets as well as the real medical insurance business data set. The results show that WKAG can be used as an effective detection method for medical insurance fraud.
Keywords:generative adversarial network  imbalance dataset  auto-encoder feature representation  medical insurance fraud detection  ensemble learning  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号