首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 609 毫秒
1.
针对行业欺诈行为形式多样、操作隐蔽,且数据分布极端不平衡等问题,研究采用ADASYN(adaptive synthetic sampling approach for imbalanced learning)算法将分类决策边界向困难的实例进行自适应移动实现数据扩增,以解决不平衡数据造成的过拟合问题。采用基于随机森林的序列向前搜索策略算法筛选出最优特征子集对欺诈进行检测,降低ADASYN算法添加噪声数据对分类边界确定的影响,构建欺诈检测模型,并使用LIME对模型检测结果作出局部解释,提高模型的使用价值。实验表明,该模型可以较好地克服传统欺诈检测模型对多数类样本误分类的缺陷,有助于提高行业对交易欺诈行为识别的效率。同时,通过LIME对模型检测出的随机样本进行有效解析,便于决策者对算法模型的检测结果作出实证分析,起到明显的预警及决策参考价值。  相似文献   

2.
田臣  周丽娟 《计算机应用》2019,39(6):1707-1712
针对信用评估中最为常见的不均衡数据集问题以及单个分类器在不平衡数据上分类效果有限的问题,提出了一种基于带多数类权重的少数类过采样技术和随机森林(MWMOTE-RF)结合的信用评估方法。首先,在数据预处理过程中利用MWMOTE技术增加少数类别样本的样本数;然后,在预处理后的较平衡的新数据集上利用监督式机器学习算法中的随机森林算法对数据进行分类预测。使用受测者工作特征曲线下面积(AUC)作为分类评价指标,在UCI机器学习数据库中的德国信用卡数据集和某公司的汽车违约贷款数据集上的仿真实验表明,在相同数据集上,MWMOTE-RF方法与随机森林方法和朴素贝叶斯方法相比,AUC值分别提高了18%和20%。与此同时,随机森林方法分别与合成少数类过采样技术(SMOTE)方法和自适应综合过采样(ADASYN)方法结合,MWMOTE-RF方法与它们相比,AUC值分别提高了1.47%和2.34%,从而验证了所提方法的有效性及其对分类器性能的优化。  相似文献   

3.
针对目前的分类算法在不均衡数据集上的分类效果不理想的问题,将监督学习和无监督学习相结合,提出了一种基于质心的欠采样——ICIKMDS。在现实应用中,一些数据并不容易获得,或者不同类型的数据本身在数量上就存在着差异性,因此造成了数据集分布的不均,如疾病检测中疾病患者和正常人比例的不均、信用卡欺诈中欺诈用户和正常用户比例的不均等。所提方法很好地解决了数据集不均衡的问题,首先通过求解样本之间的欧氏距离得到初始质心,然后采用k-means算法在大类样本集上进行聚类,使不均衡数据集在分布上更加均衡,有效地改善了分类器的分类效果。所提方法使分类器在测试集小类上的分类准确率远远高于随机欠采样和SMOTE算法,在整个测试集上的准确率几乎与其他算法相同。  相似文献   

4.
针对传统入侵检测模型在高维数据且数据不均衡环境下检测性能较差的问题,提出了一种自适应过采样算法(ADASYN)与改进堆叠式降噪自编码器(SDA)结合的入侵检测模型。使用ADASYN算法进行数据过采样处理。使用Adam优化算法,以及Dropout正则化对SDA深度学习模型进行改进,提取出低维数、高鲁棒性的集成特征。在softmax分类器中进行入侵检测识别。实验结果表明,ADASYN-SDA模型相较于SDA、AE-DNN和MSVM模型,在平均准确率、检测率和误判率上均有一定程度的提高。  相似文献   

5.
针对信用卡欺诈检测中样本数据规模大, 计算复杂程度高, 数据分布极度不平衡等问题, 提出卷积神经网络(CNN)结合大规模信用卡交易数据进行欺诈检测, 同时为了解决交易数据的极端不平衡性问题, 使用K-means算法进行聚类, 结合支持向量机合成少数类过采样技术(SVMSMOTE)增加少数类样本数量, 最终构建一个KM-SVMSMOTE-CNN的信用卡交易欺诈预测模型. 选取Kaggle平台上发布的信用卡欺诈数据进行验证, 实验结果表明, 基于KM-SVMSMOTE-CNN的融合模型从整体上大大提高了信用卡欺诈检测的识别率.  相似文献   

6.
为解决垃圾网页检测过程中的“维数灾难”和不平衡分类问题,提出一种基于免疫克隆特征选择和欠采样(US)集成的二元分类器算法。首先,使用欠采样技术将训练样本集大类抽样成多个与小类样本数相近的样本集,再将其分别与小类样本合并构成多个平衡的子训练样本集;然后,设计一种免疫克隆算法遴选出多个最优的特征子集;基于最优特征子集对平衡的子样本集进行投影操作,生成平衡数据集的多个视图;最后,用随机森林(RF)分类器对测试样本进行分类,采用简单投票法确定测试样本的最终类别。在WEBSPAM UK-2006数据集上的实验结果表明,该集成分类器算法应用于垃圾网页检测:与随机森林算法及其Bagging和AdaBoost集成分类器算法相比,准确率、F1测度、AUC等指标均提高11%以上;与其他最优的研究结果相比,该集成分类器算法在F1测度上提高2%,在AUC上达到最优。  相似文献   

7.
范莹  计华  张化祥 《计算机应用》2008,28(5):1204-1207
提出一种新的基于模糊聚类的组合分类器算法,该算法利用模糊聚类技术产生训练样本的分布特征,据此为每一个样本赋予一个权值,来确定它们被采样的概率,利用采样样本训练的分类器调整训练集的采样概率,依次生成新的分类器直至达到一定的精度。该组合分类器算法在UCI的多个标准数据集上进行了测试,并与Bagging和AdaBoost算法进行了比较,实验结果表明新的算法具有更好的健壮性和更高的分类精度。  相似文献   

8.
为解决垃圾网页检测过程中的不平衡分类和"维数灾难"问题,提出一种基于随机森林(RF)和欠采样集成的二元分类器算法。首先使用欠采样技术将训练样本集大类抽样成多个子样本集,再将其分别与小类样本集合并构成多个平衡的子训练样本集;然后基于各个子训练样本集训练出多个随机森林分类器;最后用多个随机森林分类器对测试样本集进行分类,采用投票法确定测试样本的最终所属类别。在WEBSPAM UK-2006数据集上的实验表明,该集成分类器算法应用于垃圾网页检测比随机森林算法及其Bagging和Adaboost集成分类器算法效果更好,准确率、F1测度、ROC曲线下面积(AUC)等指标提高至少14%,13%和11%。与Web spam challenge 2007 优胜团队的竞赛结果相比,该集成分类器算法在F1测度上提高至少1%,在AUC上达到最优结果。  相似文献   

9.
程险峰  李军  李雄飞 《计算机工程》2011,37(13):147-149
针对不平衡数据学习问题,提出一种基于欠采样的分类算法。对多数类样例进行欠采样,保留位于分类边界附近的多数类样例。以AUC为优化目标,选择最恰当的邻域半径使数据达到平衡,利用欠采样后的样例训练贝叶斯分类器,并采用AUC评价分类器性能。仿真数据及UCI数据集上的实验结果表明,该算法有效。  相似文献   

10.
在分析了传统支持向量机(SVM)对不平衡数据的学习缺陷后,提出了一种改进SVM算法,采用自适应合成(ADASYN)采样技术对数据集进行部分重采样,增加少类样本的数量;对不同的样本点分配不同的权重,减弱噪声对训练结果的影响;使用基于代价敏感的SVM算法训练,缓解不平衡数据对超平面造成的偏移.选择UCI数据库中的6组不平衡数据集进行测试,实验结果表明:在各个数据集上改进SVM算法的性能优于其他算法,并在少类准确率和多类准确率上取得了很好的平衡.  相似文献   

11.
Every year billions of Euros are lost worldwide due to credit card fraud. Thus, forcing financial institutions to continuously improve their fraud detection systems. In recent years, several studies have proposed the use of machine learning and data mining techniques to address this problem. However, most studies used some sort of misclassification measure to evaluate the different solutions, and do not take into account the actual financial costs associated with the fraud detection process. Moreover, when constructing a credit card fraud detection model, it is very important how to extract the right features from the transactional data. This is usually done by aggregating the transactions in order to observe the spending behavioral patterns of the customers. In this paper we expand the transaction aggregation strategy, and propose to create a new set of features based on analyzing the periodic behavior of the time of a transaction using the von Mises distribution. Then, using a real credit card fraud dataset provided by a large European card processing company, we compare state-of-the-art credit card fraud detection models, and evaluate how the different sets of features have an impact on the results. By including the proposed periodic features into the methods, the results show an average increase in savings of 13%.  相似文献   

12.
传统医保信息欺诈检测算法存在运行时间长、效率低的问题,无法保障患者医保信息安全,为了解决该问题,采用基于随机森林算法对失稳网络医保信息欺诈行为进行检测。通过混合抽样可抽取在失稳情况下的数据,并建立非平衡数据分类算法抽样机制;进行迭代随机森林数据计算,采用多数投票法构建基分类器,并以此为基础筛选异常数据;利用模型实现该算法对医保信息欺诈检测。设计对比实验,验证该算法有效性。通过实验结果可知,基于随机森林算法运行时间较短、效率高。  相似文献   

13.
针对区块链上存在的欺诈账户给交易带来的安全问题,提出了基于机器学习的欺诈账户的检测及特征分析模型,将以太坊上真实的链上数据进行特征提取后作为模型的数据来源,通过对不同的机器学习方法进行比较得到最优模型并进行迭代训练以获得最佳的预测模型,同时引入 SHAP值对数据特征进行分析。实验结果表明,基于XGBoost的欺诈账户检测模型在RMSE、MAE和R2三组指标上达到了0.205、0.084和0.833,优于其余的对比模型,并结合SHAP值识别出预测欺诈账户的关键因素,为区块链的交易安全提供决策参考。  相似文献   

14.
Credit card fraud costs consumers and the financial industry billions of dollars annually. However, there is a dearth of published literature on credit card fraud detection. In this study we employed transaction aggregation strategy to detect credit card fraud. We aggregated transactions to capture consumer buying behavior prior to each transaction and used these aggregations for model estimation to identify fraudulent transactions. We use real-life data of credit card transactions from an international credit card operation for transaction aggregation and model estimation.  相似文献   

15.
Fraud detection mechanisms support the successful identification of fraudulent system transactions performed through security flaws within deployed technology frameworks while maintaining optimal levels of service delivery and a minimal numbers of false alarms. Knowledge discovery techniques have been widely applied in fraud detection for data analysis and training of supervised learning algorithms to support the extraction of fraudulent account behaviour within static data sets. Escalating costs associated with fraud however have continued to drive the migration towards increasingly proactive methods of fraud detection, to support the real-time screening of transactional data and detection of ambiguous user behaviour prior to transaction completion. This shift in data processing from post to pre data storage significantly reduces the available time within which to evaluate newly arriving system requests and produce an accurate fraud decision, demanding increasingly robust and intelligent user profiling technologies to support advanced fraud detection. This paper provides a comprehensive survey of existing research into account signatures, an innovative account profiling technology which maintains a statistical representation of normal account usage for rapid recalculation in real-time. Fraud detection architectures, processing models and applications to date are critically examined and evaluated with respect to their proactive capabilities for detection of fraud within streaming financial data. Discussion is also presented on challenges which remain within the proactive profiling of account behaviour and future research directions within the signature domain.  相似文献   

16.
The design of an efficient credit card fraud detection technique is, however, particularly challenging, due to the most striking characteristics which are; imbalancedness and non-stationary environment of the data. These issues in credit card datasets limit the machine learning algorithm to show a good performance in detecting the frauds. The research in the area of credit card fraud detection focused on detection the fraudulent transaction by analysis of normality and abnormality concepts. Balancing strategy which is designed in this paper can facilitate classification and retrieval problems in this domain. In this paper, we consider the classification problem in supervised learning scenario by creating a contrast vector for each customer based on its historical behaviors. The performance evaluation of proposed model is made possible by a real credit card data-set provided by FICO, and it is found that the proposed model has significant performance than other state-of-the-art classifiers.  相似文献   

17.
In today’s technological society there are various new means to commit fraud due to the advancement of media and communication networks. One typical fraud is the ATM phone scams. The commonality of ATM phone scams is basically to attract victims to use financial institutions or ATMs to transfer their money into fraudulent accounts. Regardless of the types of fraud used, fraudsters can only collect victims’ money through fraudulent accounts. Therefore, it is very important to identify the signs of such fraudulent accounts and to detect fraudulent accounts based on these signs, in order to reduce victims’ losses. This study applied Bayesian Classification and Association Rule to identify the signs of fraudulent accounts and the patterns of fraudulent transactions. Detection rules were developed based on the identified signs and applied to the design of a fraudulent account detection system. Empirical verification supported that this fraudulent account detection system can successfully identify fraudulent accounts in early stages and is able to provide reference for financial institutions.  相似文献   

18.
Developing fraud management policies and fraud detection systems is a vital capability for financial institutions towards minimising the effect of fraud upon customer service delivery, bottom line financial losses and the adverse impact on the organisation’s brand image reputation. Rapidly changing attacks in real-time financial service platforms continue to demonstrate fraudster’s ability to actively re-engineer their methods in response to ad hoc security protocol deployments, and highlights the distinct gap between the speed of transaction execution within streaming financial data and corresponding fraud technology frameworks that safeguard the platform. This paper presents the design of FFML, a rule-based policy modelling language and encompassing architecture for facilitating the conceptual level expression and implementation of proactive fraud controls within multi-channel financial service platforms. It is demonstrated how a domain specific language can be used to abstract the financial platform into a data stream based information model to reduce policy modelling complexity and deployment latencies through an innovative policy mapping language usable by both expert and non-expert users. FFML is part of a comprehensive suite of assistive tools and knowledge-based systems developed to support fraud analysts’ daily work of designing new high level fraud management policies, mapping into executable code of the underpinning application programming interface and deployment of active monitoring and compliance functionality within the financial platform.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号