共查询到18条相似文献,搜索用时 609 毫秒
1.
针对行业欺诈行为形式多样、操作隐蔽,且数据分布极端不平衡等问题,研究采用ADASYN(adaptive synthetic sampling approach for imbalanced learning)算法将分类决策边界向困难的实例进行自适应移动实现数据扩增,以解决不平衡数据造成的过拟合问题。采用基于随机森林的序列向前搜索策略算法筛选出最优特征子集对欺诈进行检测,降低ADASYN算法添加噪声数据对分类边界确定的影响,构建欺诈检测模型,并使用LIME对模型检测结果作出局部解释,提高模型的使用价值。实验表明,该模型可以较好地克服传统欺诈检测模型对多数类样本误分类的缺陷,有助于提高行业对交易欺诈行为识别的效率。同时,通过LIME对模型检测出的随机样本进行有效解析,便于决策者对算法模型的检测结果作出实证分析,起到明显的预警及决策参考价值。 相似文献
2.
针对信用评估中最为常见的不均衡数据集问题以及单个分类器在不平衡数据上分类效果有限的问题,提出了一种基于带多数类权重的少数类过采样技术和随机森林(MWMOTE-RF)结合的信用评估方法。首先,在数据预处理过程中利用MWMOTE技术增加少数类别样本的样本数;然后,在预处理后的较平衡的新数据集上利用监督式机器学习算法中的随机森林算法对数据进行分类预测。使用受测者工作特征曲线下面积(AUC)作为分类评价指标,在UCI机器学习数据库中的德国信用卡数据集和某公司的汽车违约贷款数据集上的仿真实验表明,在相同数据集上,MWMOTE-RF方法与随机森林方法和朴素贝叶斯方法相比,AUC值分别提高了18%和20%。与此同时,随机森林方法分别与合成少数类过采样技术(SMOTE)方法和自适应综合过采样(ADASYN)方法结合,MWMOTE-RF方法与它们相比,AUC值分别提高了1.47%和2.34%,从而验证了所提方法的有效性及其对分类器性能的优化。 相似文献
3.
针对目前的分类算法在不均衡数据集上的分类效果不理想的问题,将监督学习和无监督学习相结合,提出了一种基于质心的欠采样——ICIKMDS。在现实应用中,一些数据并不容易获得,或者不同类型的数据本身在数量上就存在着差异性,因此造成了数据集分布的不均,如疾病检测中疾病患者和正常人比例的不均、信用卡欺诈中欺诈用户和正常用户比例的不均等。所提方法很好地解决了数据集不均衡的问题,首先通过求解样本之间的欧氏距离得到初始质心,然后采用k-means算法在大类样本集上进行聚类,使不均衡数据集在分布上更加均衡,有效地改善了分类器的分类效果。所提方法使分类器在测试集小类上的分类准确率远远高于随机欠采样和SMOTE算法,在整个测试集上的准确率几乎与其他算法相同。 相似文献
4.
针对传统入侵检测模型在高维数据且数据不均衡环境下检测性能较差的问题,提出了一种自适应过采样算法(ADASYN)与改进堆叠式降噪自编码器(SDA)结合的入侵检测模型。使用ADASYN算法进行数据过采样处理。使用Adam优化算法,以及Dropout正则化对SDA深度学习模型进行改进,提取出低维数、高鲁棒性的集成特征。在softmax分类器中进行入侵检测识别。实验结果表明,ADASYN-SDA模型相较于SDA、AE-DNN和MSVM模型,在平均准确率、检测率和误判率上均有一定程度的提高。 相似文献
5.
针对信用卡欺诈检测中样本数据规模大, 计算复杂程度高, 数据分布极度不平衡等问题, 提出卷积神经网络(CNN)结合大规模信用卡交易数据进行欺诈检测, 同时为了解决交易数据的极端不平衡性问题, 使用K-means算法进行聚类, 结合支持向量机合成少数类过采样技术(SVMSMOTE)增加少数类样本数量, 最终构建一个KM-SVMSMOTE-CNN的信用卡交易欺诈预测模型. 选取Kaggle平台上发布的信用卡欺诈数据进行验证, 实验结果表明, 基于KM-SVMSMOTE-CNN的融合模型从整体上大大提高了信用卡欺诈检测的识别率. 相似文献
6.
为解决垃圾网页检测过程中的“维数灾难”和不平衡分类问题,提出一种基于免疫克隆特征选择和欠采样(US)集成的二元分类器算法。首先,使用欠采样技术将训练样本集大类抽样成多个与小类样本数相近的样本集,再将其分别与小类样本合并构成多个平衡的子训练样本集;然后,设计一种免疫克隆算法遴选出多个最优的特征子集;基于最优特征子集对平衡的子样本集进行投影操作,生成平衡数据集的多个视图;最后,用随机森林(RF)分类器对测试样本进行分类,采用简单投票法确定测试样本的最终类别。在WEBSPAM UK-2006数据集上的实验结果表明,该集成分类器算法应用于垃圾网页检测:与随机森林算法及其Bagging和AdaBoost集成分类器算法相比,准确率、F1测度、AUC等指标均提高11%以上;与其他最优的研究结果相比,该集成分类器算法在F1测度上提高2%,在AUC上达到最优。 相似文献
7.
8.
为解决垃圾网页检测过程中的不平衡分类和"维数灾难"问题,提出一种基于随机森林(RF)和欠采样集成的二元分类器算法。首先使用欠采样技术将训练样本集大类抽样成多个子样本集,再将其分别与小类样本集合并构成多个平衡的子训练样本集;然后基于各个子训练样本集训练出多个随机森林分类器;最后用多个随机森林分类器对测试样本集进行分类,采用投票法确定测试样本的最终所属类别。在WEBSPAM UK-2006数据集上的实验表明,该集成分类器算法应用于垃圾网页检测比随机森林算法及其Bagging和Adaboost集成分类器算法效果更好,准确率、F1测度、ROC曲线下面积(AUC)等指标提高至少14%,13%和11%。与Web spam challenge 2007 优胜团队的竞赛结果相比,该集成分类器算法在F1测度上提高至少1%,在AUC上达到最优结果。 相似文献
9.
10.
11.
Every year billions of Euros are lost worldwide due to credit card fraud. Thus, forcing financial institutions to continuously improve their fraud detection systems. In recent years, several studies have proposed the use of machine learning and data mining techniques to address this problem. However, most studies used some sort of misclassification measure to evaluate the different solutions, and do not take into account the actual financial costs associated with the fraud detection process. Moreover, when constructing a credit card fraud detection model, it is very important how to extract the right features from the transactional data. This is usually done by aggregating the transactions in order to observe the spending behavioral patterns of the customers. In this paper we expand the transaction aggregation strategy, and propose to create a new set of features based on analyzing the periodic behavior of the time of a transaction using the von Mises distribution. Then, using a real credit card fraud dataset provided by a large European card processing company, we compare state-of-the-art credit card fraud detection models, and evaluate how the different sets of features have an impact on the results. By including the proposed periodic features into the methods, the results show an average increase in savings of 13%. 相似文献
12.
吴剑 《计算机测量与控制》2018,26(4):167-170
传统医保信息欺诈检测算法存在运行时间长、效率低的问题,无法保障患者医保信息安全,为了解决该问题,采用基于随机森林算法对失稳网络医保信息欺诈行为进行检测。通过混合抽样可抽取在失稳情况下的数据,并建立非平衡数据分类算法抽样机制;进行迭代随机森林数据计算,采用多数投票法构建基分类器,并以此为基础筛选异常数据;利用模型实现该算法对医保信息欺诈检测。设计对比实验,验证该算法有效性。通过实验结果可知,基于随机森林算法运行时间较短、效率高。 相似文献
13.
针对区块链上存在的欺诈账户给交易带来的安全问题,提出了基于机器学习的欺诈账户的检测及特征分析模型,将以太坊上真实的链上数据进行特征提取后作为模型的数据来源,通过对不同的机器学习方法进行比较得到最优模型并进行迭代训练以获得最佳的预测模型,同时引入 SHAP值对数据特征进行分析。实验结果表明,基于XGBoost的欺诈账户检测模型在RMSE、MAE和R2三组指标上达到了0.205、0.084和0.833,优于其余的对比模型,并结合SHAP值识别出预测欺诈账户的关键因素,为区块链的交易安全提供决策参考。 相似文献
14.
Sanjeev Jha Montserrat Guillen J. Christopher Westland 《Expert systems with applications》2012,39(16):12650-12657
Credit card fraud costs consumers and the financial industry billions of dollars annually. However, there is a dearth of published literature on credit card fraud detection. In this study we employed transaction aggregation strategy to detect credit card fraud. We aggregated transactions to capture consumer buying behavior prior to each transaction and used these aggregations for model estimation to identify fraudulent transactions. We use real-life data of credit card transactions from an international credit card operation for transaction aggregation and model estimation. 相似文献
15.
Fraud detection mechanisms support the successful identification of fraudulent system transactions performed through security flaws within deployed technology frameworks while maintaining optimal levels of service delivery and a minimal numbers of false alarms. Knowledge discovery techniques have been widely applied in fraud detection for data analysis and training of supervised learning algorithms to support the extraction of fraudulent account behaviour within static data sets. Escalating costs associated with fraud however have continued to drive the migration towards increasingly proactive methods of fraud detection, to support the real-time screening of transactional data and detection of ambiguous user behaviour prior to transaction completion. This shift in data processing from post to pre data storage significantly reduces the available time within which to evaluate newly arriving system requests and produce an accurate fraud decision, demanding increasingly robust and intelligent user profiling technologies to support advanced fraud detection. This paper provides a comprehensive survey of existing research into account signatures, an innovative account profiling technology which maintains a statistical representation of normal account usage for rapid recalculation in real-time. Fraud detection architectures, processing models and applications to date are critically examined and evaluated with respect to their proactive capabilities for detection of fraud within streaming financial data. Discussion is also presented on challenges which remain within the proactive profiling of account behaviour and future research directions within the signature domain. 相似文献
16.
The design of an efficient credit card fraud detection technique is, however, particularly challenging,
due to the most striking characteristics which are; imbalancedness and non-stationary environment
of the data. These issues in credit card datasets limit the machine learning algorithm to show a
good performance in detecting the frauds. The research in the area of credit card fraud detection
focused on detection the fraudulent transaction by analysis of normality and abnormality concepts.
Balancing strategy which is designed in this paper can facilitate classification and retrieval problems
in this domain. In this paper, we consider the classification problem in supervised learning scenario
by creating a contrast vector for each customer based on its historical behaviors. The performance
evaluation of proposed model is made possible by a real credit card data-set provided by FICO, and it
is found that the proposed model has significant performance than other state-of-the-art classifiers. 相似文献
17.
In today’s technological society there are various new means to commit fraud due to the advancement of media and communication networks. One typical fraud is the ATM phone scams. The commonality of ATM phone scams is basically to attract victims to use financial institutions or ATMs to transfer their money into fraudulent accounts. Regardless of the types of fraud used, fraudsters can only collect victims’ money through fraudulent accounts. Therefore, it is very important to identify the signs of such fraudulent accounts and to detect fraudulent accounts based on these signs, in order to reduce victims’ losses. This study applied Bayesian Classification and Association Rule to identify the signs of fraudulent accounts and the patterns of fraudulent transactions. Detection rules were developed based on the identified signs and applied to the design of a fraudulent account detection system. Empirical verification supported that this fraudulent account detection system can successfully identify fraudulent accounts in early stages and is able to provide reference for financial institutions. 相似文献
18.
Developing fraud management policies and fraud detection systems is a vital capability for financial institutions towards minimising the effect of fraud upon customer service delivery, bottom line financial losses and the adverse impact on the organisation’s brand image reputation. Rapidly changing attacks in real-time financial service platforms continue to demonstrate fraudster’s ability to actively re-engineer their methods in response to ad hoc security protocol deployments, and highlights the distinct gap between the speed of transaction execution within streaming financial data and corresponding fraud technology frameworks that safeguard the platform. This paper presents the design of FFML, a rule-based policy modelling language and encompassing architecture for facilitating the conceptual level expression and implementation of proactive fraud controls within multi-channel financial service platforms. It is demonstrated how a domain specific language can be used to abstract the financial platform into a data stream based information model to reduce policy modelling complexity and deployment latencies through an innovative policy mapping language usable by both expert and non-expert users. FFML is part of a comprehensive suite of assistive tools and knowledge-based systems developed to support fraud analysts’ daily work of designing new high level fraud management policies, mapping into executable code of the underpinning application programming interface and deployment of active monitoring and compliance functionality within the financial platform. 相似文献