首页 | 官方网站   微博 | 高级检索  
     

面向类不平衡问题的"职业举报人"识别方法
引用本文:易成岐,黄倩倩,王从余,张何灿,靳晓锟,王建冬.面向类不平衡问题的"职业举报人"识别方法[J].计算机工程与应用,2019,55(14):1-7.
作者姓名:易成岐  黄倩倩  王从余  张何灿  靳晓锟  王建冬
作者单位:国家信息中心 大数据发展部,北京,100045;清华大学 心理学系,北京,100084;北京大学 软件与微电子学院,北京,102600;北京大学 数学科学学院,北京,100871
摘    要:“职业举报人”团伙化、规模化、专业化、低龄化作案趋势日趋明显,政府部门对其识别大多采用人工鉴别的方法,造成了大量人力资源的浪费。采用Bootstrapping数据重采样技术,结合文本、时间和举报人属性等特征,在解决类不平衡数据的过拟合问题基础上,实现了“职业举报人”的准确识别。实验结果表明,相比过采样和欠采样技术而言,利用Bootstrapping重采样技术识别准确率更高,采用CFS方法结合BestFirst策略对数据特征进行优化,在保证精度的前提下能够实现更高的计算效率。以全国12358价格监管平台的真实数据为驱动,验证了方法的有效性,对比分析了“职业举报人”和正常消费者的投诉举报行为习惯差异。

关 键 词:职业举报人  类不平衡  特征选择  数据驱动  12358价格监管平台

Identification Method of “Professional Whistleblower” Based on Class Imbalance Problem
YI Chengqi,HUANG Qianqian,WANG Congyu,ZHANG Hecan,JIN Xiaokun,WANG Jiandong.Identification Method of “Professional Whistleblower” Based on Class Imbalance Problem[J].Computer Engineering and Applications,2019,55(14):1-7.
Authors:YI Chengqi  HUANG Qianqian  WANG Congyu  ZHANG Hecan  JIN Xiaokun  WANG Jiandong
Affiliation:1.Department of Big Data Development, State Information Center, Beijing 100045, China 2.Department of Psychology, Tsinghua University, Beijing 100084, China 3.School of Software and Microelectronics, Peking University, Beijing 102600, China 4.School of Mathematical Sciences, Peking University, Beijing 100871, China
Abstract:“Professional whistleblower” is a problem that has perplexed market regulators for many years, and with the trend of gangs, large-scale, professional and low-age. Most of the government departments take the manual identification methods to identify “professional whistleblower”, which uses up much labor power. This paper uses the statistical technique “bootstrapping”, combined with the characteristics of text, time and whistleblower attributes, on the basis of solving the problem of over-fitting of class unbalanced data, the accurate identification of “professional whistleblower” is realized. The experimental results show that:the recognition accuracy of “bootstrapping” is higher than that of other resampling methods such as “oversampling” and “undersampling”, the correlation-based feature selection method combined with the best first search strategy to optimize the data features in the identification method has higher computational efficiency on the premise of ensuring the accuracy. By the real-world data-driven of “national 12358 price regulation platform”, this paper verifies the effectiveness of the method. Finally, this paper compares and analyzes the differences of the behaviors between professional whistleblower and normal consumers.
Keywords:professional whistleblower  class imbalance  feature selection  data driven  12358 price regulation platform  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号