宏病毒在高级持续性威胁中被广泛运用.其变形成本低廉且方式灵活,导致传统的基于病毒规则库的反病毒系统难于有效对抗.提出一种基于梯度提升决策树的变形宏病毒检测方法.该方法以病毒专家经验为指导,实施大规模特征工程,基于词法分析对变形宏病毒做细粒度建模,并使用海量样本训练模型.实验表明,该方法能够准确检测企业级用户网络中传播的...  相似文献   


In today’s world of connectivity there is a huge amount of data than we could imagine. The number of network users are increasing day by day and there are large number of social networks which keeps the users connected all the time. These social networks give the complete independence to the user to post the data either political, commercial or entertainment value. Some data may be sensitive and have a greater impact on the society as a result. The trustworthiness of data is important when it comes to public social networking sites like facebook and twitter. Due to the large user base and its openness there is a huge possibility to spread spam messages in this network. Spam detection is a technique to identify and mark data as a false data value. There are lot of machine learning approaches proposed to detect spam in social networks. The efficiency of any spam detection algorithm is determined by its cost factor and accuracy. Aiming to improve the detection of spam in the social networks this study proposes using statistical based features that are modelled through the supervised boosting approach called Stochastic gradient boosting to evaluate the twitter data sets in the English language. The performance of the proposed model is evaluated using simulation results.


现代处理器一般只内置了 4~8 个性能计数器,但可以监测多达上千个时钟周期级别的性能事件。这些事件可以轻易地产生大量数据,称为处理器性能大数据。然而,如何从这些性能大数据中提取有价值的信息面临着许多挑战。该文提出一种处理器性能数据分析方法,通过迭代使用梯度提升 回归树算法构建性能模型,为云计算负载的性能事件进行重要性排序,从而指导云计算平台的性能调优。  相似文献   

Nowadays, many payment service providers use the discounts and other marketing strategies to promote their products. This also raises the issue of people who deliberately take advantage of such promotions to reap financial benefits. These people are known as ‘scalper parties’ or ‘econnoisseurs’ which can constitute an underground industry. In this paper, we show how to use machine learning to assist in identifying abnormal scalper transactions. Moreover, we introduce the basic methods of Decision Tree and Boosting Tree, and show how these classification methods can be applied in the detection of abnormal transactions. In addition, we introduce a graph computing method, which implicitly describes the characteristics of people and merchants through node correlation, in order to mine deep features. Because of the volume of large data, we carried out reasonable block calculation, and succeeded in reducing a large amount of data to a series of segments, thereby decreasing the computational resources and memory requirements. Compared with other work on abnormal transaction detection, we pay more attention to creating and using the portraits of merchants or individuals to assist in decision-making. After data analysis and model building, we find that focusing on only one transaction or one day does not yield a comprehensive number of characteristics, and many characteristics can be obtained by examining the transactions of a person or a merchant over a period of time. Furthermore, a large number of characteristics can be obtained from transactions in a period of time. After GBDT (Gradient Boosting Decision Tree) based classification prediction and analysis, we can conclude that there is a clear distinction between abnormal trading shops and conventional shops, facilitating the clustering of abnormal merchants. By filtering transaction data from multiple dimensions, multiple sub-graphs can be obtained. After hierarchical clustering, the abnormal trading group is mined and classified according to its features. Finally, we build a scoring model and apply it to the big data platform of one of China’s largest payment service providers to help enterprises identify abnormal trading groups and specific marketing strategies.  相似文献   

网络入侵检测系统作为一种保护网络免受攻击的安全防御技术,在保障计算机系统和网络安全领域起着非常重要的作用.针对网络入侵检测中数据不平衡的多分类问题,机器学习已被广泛用于入侵检测,比传统方法更智能、更准确.对现有的网络入侵检测多分类方法进行了改进研究,提出了一种融合随机森林模型进行特征转换、使用梯度提升决策树模型进行分类...  相似文献   

Heart failure is now widely spread throughout the world. Heart disease affects approximately 48% of the population. It is too expensive and also difficult to cure the disease. This research paper represents machine learning models to predict heart failure. The fundamental concept is to compare the correctness of various Machine Learning (ML) algorithms and boost algorithms to improve models’ accuracy for prediction. Some supervised algorithms like K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Trees (DT), Random Forest (RF), Logistic Regression (LR) are considered to achieve the best results. Some boosting algorithms like Extreme Gradient Boosting (XGBoost) and CatBoost are also used to improve the prediction using Artificial Neural Networks (ANN). This research also focuses on data visualization to identify patterns, trends, and outliers in a massive data set. Python and Scikit-learns are used for ML. Tensor Flow and Keras, along with Python, are used for ANN model training. The DT and RF algorithms achieved the highest accuracy of 95% among the classifiers. Meanwhile, KNN obtained a second height accuracy of 93.33%. XGBoost had a gratified accuracy of 91.67%, SVM, CATBoost, and ANN had an accuracy of 90%, and LR had 88.33% accuracy.  相似文献   

梯度提升树算法由于其高准确率和可解释性,被广泛地应用于分类、回归、排序等各类问题.随着数据规模的爆炸式增长,分布式梯度提升树算法成为研究热点.虽然目前已有一系列分布式梯度提升树算法的实现,但是它们在高维特征和多分类任务上性能较差,原因是它们采用的数据并行策略需要传输梯度直方图,而高维特征和多分类情况下梯度直方图的传输成为性能瓶颈.针对这个问题,研究更加适合高维特征和多分类的梯度提升树的并行策略,具有重要的意义和价值.首先比较了数据并行与特征并行策略,从理论上证明特征并行更加适合高维和多分类场景.根据理论分析的结果,提出了一种特征并行的分布式梯度提升树算法FP-GBDT.FP-GBDT设计了一种高效的分布式数据集转置算法,将原本按行切分的数据集转换为按列切分的数据表征;在建立梯度直方图时,FP-GBDT使用一种稀疏感知的方法来加快梯度直方图的建立;在分裂树节点时,FP-GBDT设计了一种比特图压缩的方法来传输数据样本的位置信息,从而减少通信开销.通过详尽的实验,对比了不同并行策略下分布式梯度提升树算法的性能,首先验证了FP-GBDT提出的多种优化方法的有效性;然后比较了FP-GBDT与XGBoost的性能,在多个数据集上验证了FP-GBDT在高维特征和多分类场景下的有效性,取得了最高6倍的性能提升.  相似文献   

针对电力电子电路的常见故障类型,提出一种利用主元分析(PCA)提取电路状态的故障信息特征和基于梯度提升决策树(GBDT)分类的电力电子电路故障诊断方法.首先讨论利用PCA进行特征提取的步骤以及GBDT的分类原理;然后研究了基于PCA特征提取以及GBDT分类的电力电子电路故障诊断流程;最后利用三相桥式整流电路进行了建模、仿真、验证,实验结果表明,采用该方法进行电力电子电路故障诊断相比其他方法在低维空间具有更高的诊断准确率和更佳的样本泛化能力.  相似文献   

机器学习在诸多学科领域的定量分析中都已经显现出了巨大价值.本文借助sklearn机器学习库,以韩国国立国语院2015年发布的《新词调查报告书》中收录的新造词为对象,根据报告中出现的分类标准为词汇建立特征矩阵.而后运用多种机器学习算法进行特征选择,最终筛选出对韩国语新造词词义理解影响较强的因素.实验结果表明:如果该词为派...  相似文献   

文中首先分析降噪集成算法采用的样本置信度度量函数的性质,阐述此函数不适合处理多类问题的根源。进而设计更有针对性的置信度度量函数,并基于此函数提出一种增强型降噪参数集成算法。从而使鉴别式贝叶斯网络参数学习算法不但有效地抑止噪声影响,而且避免分类器的过度拟合,进一步拓展采用集群式学习算法的鉴别式贝叶斯网络分类器在多类问题上的应用。最后,实验结果及其统计假设检验分析充分验证此算法比目前的集群式贝叶斯网络参数学习方法得到的分类器在性能上有较显著提高。  相似文献   

随着保险行业的蓬勃发展,保险欺诈问题也显得日趋严重。车险欺诈一直是保险欺诈的“重灾区”,对保险行业的发展至关重要。因此,车险欺诈检测技术一直是国内外学者研究的热点问题。鉴于我国在机动车辆保险欺诈检测技术方相对滞后,而国外的研究成果又较少对我国车险业务数据进行有效建模与分析,首次针对机器学习模型应用在车险欺诈检测的研究工作进行了文献调研,对二十多年来的研究工作进行系统化的归纳与总结。通过引入车险欺诈流程的简介,对专家系统与智能理赔系统在车险欺诈检测的流程进行了叙述;依次从国外和国内的角度介绍了机器学习模型应用在车险欺诈检测的具体研究进展,并进行了宏观的对比;基于国内某车险公司提供近5年来高质量的车险数据选取具有代表性的机器学习模型进行建模,并进行了全面的测试与分析;探讨了车险欺诈检测技术未来的研究方向。  相似文献   

针对电子商务推荐系统中,互联网“信息过载”所造成的难以准确定位用户兴趣并提供准确品牌推荐的问题,通过深入挖掘电子商务网中的用户行为日志,抽取出能辨别出用户对商品品牌购买行为的多个特征,然后将这些特征融入到梯度渐进回归树算法中,建立用户兴趣偏好模型来提高推荐精度。实验结果表明,在数据稀疏的情况下,该算法仍能较好的识别出用户对品牌的偏好,并在推荐准确度方面较其他传统推荐和分类算法有明显的提高。  相似文献   

MultiBoosting is an extension to the highly successful AdaBoost technique for forming decision committees. MultiBoosting can be viewed as combining AdaBoost with wagging. It is able to harness both AdaBoost's high bias and variance reduction with wagging's superior variance reduction. Using C4.5 as the base learning algorithm, MultiBoosting is demonstrated to produce decision committees with lower error than either AdaBoost or wagging significantly more often than the reverse over a large representative cross-section of UCI data sets. It offers the further advantage over AdaBoost of suiting parallel execution.  相似文献   

Ribonucleic acid (RNA) hybridization is widely used in popular RNA simulation software in bioinformatics. However, limited by the exponential computational complexity of combinatorial problems, it is challenging to decide, within an acceptable time, whether a specific RNA hybridization is effective. We hereby introduce a machine learning based technique to address this problem. Sample machine learning (ML) models tested in the training phase include algorithms based on the boosted tree (BT), random forest (RF), decision tree (DT) and logistic regression (LR), and the corresponding models are obtained. Given the RNA molecular coding training and testing sets, the trained machine learning models are applied to predict the classification of RNA hybridization results. The experiment results show that the optimal predictive accuracies are 96.2%, 96.6%, 96.0% and 69.8% for the RF, BT, DT and LR-based approaches, respectively, under the strong constraint condition, compared with traditional representative methods. Furthermore, the average computation efficiency of the RF, BT, DT and LR-based approaches are 208 679, 269 756, 184 333 and 187 458 times higher than that of existing approach, respectively. Given an RNA design, the BT-based approach demonstrates high computational efficiency and better predictive accuracy in determining the biological effectiveness of molecular hybridization.   相似文献   

短时交通流预测是交通流建模的一个重要组成部分,在城市道路交通的 管理和控制中起着重要的作用。然而,常见的时间序列模型(如ARIMA)、随机森林(RF)模型在交通流预测方面由于被构建模型产生的残差和输入变量所影响,其预测精度受到限制。针对该问题,提出了一种基于梯度提升回归树的短时交通预测模型来预测交通速度。首先,模型引入Huber损失函数作为模型残差的处理方法;其次, 在输入变量中考虑预测断面受到毗邻空间因素和时间因素相关性的影响。模型在训练过程中通过不断调整弱学习器的权重来纠正模型的残差,从而提高模型预测的精度。利用某城市快速路的交通速度数据进行实验,并使用MSE和MAPE等指标将本文模型与ARIMA模型和随机森林模型进行对比,结果表明,文中所提模型的预测精度最好,从而验证了模型在短时交通流预测方面的有效性。  相似文献   

卫星及其载荷的在轨运行异常诊断是卫星高效安全运行的重要支持,发展智能、高效的卫星异常检测方法,是卫星地面系统的研究焦点之一.在我国空间科学先导专项系列卫星任务的应用背景下,根据空间科学卫星的数据特性与异常形态,基于梯度提升决策树(gradient boosting decision tree,GBDT)原理构建卫星工程...  相似文献   

阐述了拒绝服务(DoS)对DNS可能构成的威胁,提出了一种能对不同类型DNS的DoS攻击进行检测和分类的入侵检测系统(IDS)。该系统由统计预处理器和机器学习(ML)引擎组成。利用模拟网络对三种神经网络分类器和支持向量机进行了评估。结果表明,BP神经网络引擎以99%的准确率优于其他类型的分类器。  相似文献   

轮对在列车走行过程中起着导向、承受以及传递载荷的作用,其踏面及轮缘磨耗对地铁列车运行安全性和钢轨的寿命都将产生重要影响。根据地铁列车车轮磨耗机理,分析车轮尺寸数据特点,针对轮缘厚度这一型面参数,基于梯度提升决策树算法构建轮缘厚度磨耗预测模型。在该模型的基础上,任意选取某轮对数据进行验证分析,结果表明:基于梯度提升决策树的轮对磨耗预测模型具有较好的预测精度,可预测出1~6个月的轮缘厚度变化趋势范围,预测时间范围较长,可为地铁维保部门对轮对的维修方式由状态修转为预防修提供指导性建议。  相似文献   

本文考虑眼睛状态检测问题,提出了一种结合用gabor滤波和模糊支持向量机进行人眼状态检测的方案。首先用gabor小波对人脸图像进行特征提取,从而得到眼睛特征图像,然后在特征空间中,用FSVM和三叉决策树相结合设计眼睛状态分类器。在AR人脸库上的实验结果表明,该算法能够取得较好的分类效果。  相似文献   

The rapid progress of the Internet has exposed networks to an increased number of threats. Intrusion detection technology can effectively protect network security against malicious attacks. In this paper, we propose a ReliefF-P-Naive Bayes and softmax regression (RP-NBSR) model based on machine learning for network attack detection to improve the false detection rate and F1 score of unknown intrusion behavior. In the proposed model, the Pearson correlation coefficient is introduced to compensate for deficiencies in correlation analysis between features by the ReliefF feature selection algorithm, and a ReliefF-Pearson correlation coefficient (ReliefF-P) algorithm is proposed. Then, the Relief-P algorithm is used to preprocess the UNSW-NB15 dataset to remove irrelevant features and obtain a new feature subset. Finally, naïve Bayes and softmax regression (NBSR) classifier is constructed by cascading the naïve Bayes classifier and softmax regression classifier, and an attack detection model based on RP-NBSR is established. The experimental results on the UNSW-NB15 dataset show that the attack detection model based on RP-NBSR has a lower false detection rate and higher F1 score than other detection models.  相似文献   

