共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
组合药物在复杂疾病特别是癌症的治疗中发挥越来越重要的作用。以组合药物靶标为初始节点在药物-蛋白质异构网络上执行重启型随机游走,将收敛后的概率分布作为药物组合的特征向量,训练梯度提升决策树模型来预测新的药物组合。在标准药物组合数据集的性能评估表明,该方法比其他七种典型分类器和传统的提升算法具有更好的性能,且基于异构网络的特征显著提升了各分类器的性能,AUC值从0.528提升至0.909。 相似文献
3.
Insider trading is a kind of criminal behavior in stock market by using nonpublic information. In recent years, it has become the major illegal activity in China’s stock market. In this study, a combination approach of GBDT (Gradient Boosting Decision Tree) and DE (Differential Evolution) is proposed to identify insider trading activities by using data of relevant indicators. First, insider trading samples occurred from year 2007 to 2017 and corresponding non-insider trading samples are collected. Next, the proposed method is trained by the GBDT, and initial parameters of the GBDT are optimized by the DE. Finally, out-of-samples are classified by the trained GBDT–DE model and its performances are evaluated. The experiment results show that our proposed method performed the best for insider trading identification under time window length of ninety days, indicating the relevant indicators under 90-days time window length are relatively more useful. Additionally, under all three time window lengths, relative importance result shows that several indicators are consistently crucial for insider trading identification. Furthermore, the proposed approach significantly outperforms other benchmark methods, demonstrating that it could be applied as an intelligent system to improve identification accuracy and efficiency for insider trading regulation in China stock market. 相似文献
4.
针对火电厂数据量大且复杂的特点,通过采用基于spark的并行回归算法,解决了传统供电煤耗回归预测模型所需的运行时间较长且预测精度较低的问题。本文采用了大数据平台中采集到的某电厂周期为一年的运行数据,对数据进行异常值筛选、空值填补等清洗及预处理过程,并对工况进行判稳,选取稳定工况下的健康数据进行数据分析,最后利用灰色关联度分析方法选择关联度最大的12个特征,对火电厂供电煤耗进行预测。通过参数调优建立基于spark的火电厂供电煤耗的随机森林和梯度提升决策树的并行回归模型,最后对实验结果进行比较分析和总结。结果表明,随机森林回归模型和梯度提升决策树回归模型对火电厂的供电煤耗都有较好的预测效果,但随机森林回归模型预测的准确度相对更高。 相似文献
5.
6.
联邦学习解决了机器学习的数据孤岛问题,然而,各方的数据集在数据样本空间和特征空间上可能存在较大差异,导致联邦模型的预测精度下降。针对上述问题,提出了一种基于差分隐私保护知识迁移的联邦学习方法。该方法使用边界扩展局部敏感散列计算各方实例之间的相似度,根据相似度对实例进行加权训练,实现基于实例的联邦迁移学习,在此过程中,实例本身无须透露给其他方,防止了隐私的直接泄露。同时,为了减少知识迁移过程的隐私间接泄露,在知识迁移过程中引入差分隐私机制,对需要在各方之间传输的梯度数据进行扰动,实现知识迁移过程的隐私保护。理论分析表明,知识迁移过程满足ε-差分隐私保护。在XGBoost梯度提升树模型上实现了所提方法,实验结果表明,与无知识迁移方法相比,所提方法使联邦模型测试误差平均下降6%以上。 相似文献
7.
8.
网络流量的决策树分类 总被引:1,自引:1,他引:1
应用识别与流量分类是网络管理、安全、研究等相关事务的必要前提.随着网络的高速发展以及各种新型应用的不断涌现,基于分组传输层端口号和深度分组解析的分类技术难以满足需求.本文验证网络流量的统计特性可以有效地区分不同应用,提出一种基于C4.5决策树分类器的有监督网络流量分类方法,讨论boosting增强方法和特征选择两种改进.实验结果表明,C4.5分类器的训练复杂度适中,准确率高且分类速度快;增强方法可以进一步提高分类器的准确率,代价是训练时间大幅提高和分类时间稍微减慢;特征选择算法则提高分类速度而稍微降低准确率. 相似文献
9.
现有的加密流量检测技术缺少对数据和模型的隐私性保护,不仅违反了隐私保护法律法规,而且会导致严重的敏感信息泄露.主要研究了基于梯度提升决策树(GBDT)算法的加密流量检测模型,结合差分隐私技术,设计并实现了一个隐私保护的加密流量检测系统.在CICIDS2017数据集下检测了 DDoS攻击和端口扫描的恶意流量,并对系统性能... 相似文献
10.
12.
In one-class classification, the low variance directions in the training data carry crucial information to build a good model of the target class. Boundary-based methods like One-Class Support Vector Machine (OSVM) preferentially separates the data from outliers along the large variance directions. On the other hand, retaining only the low variance directions can result in sacrificing some initial properties of the original data and is not desirable, specially in case of limited training samples. This paper introduces a Covariance-guided One-Class Support Vector Machine (COSVM) classification method which emphasizes the low variance projectional directions of the training data without compromising any important characteristics. COSVM improves upon the OSVM method by controlling the direction of the separating hyperplane through incorporation of the estimated covariance matrix from the training data. Our proposed method is a convex optimization problem resulting in one global optimum solution which can be solved efficiently with the help of existing numerical methods. The method also keeps the principal structure of the OSVM method intact, and can be implemented easily with the existing OSVM libraries. Comparative experimental results with contemporary one-class classifiers on numerous artificial and benchmark datasets demonstrate that our method results in significantly better classification performance. 相似文献
13.
MultiBoosting: A Technique for Combining Boosting and Wagging 总被引:12,自引:0,他引:12
MultiBoosting is an extension to the highly successful AdaBoost technique for forming decision committees. MultiBoosting can be viewed as combining AdaBoost with wagging. It is able to harness both AdaBoost's high bias and variance reduction with wagging's superior variance reduction. Using C4.5 as the base learning algorithm, MultiBoosting is demonstrated to produce decision committees with lower error than either AdaBoost or wagging significantly more often than the reverse over a large representative cross-section of UCI data sets. It offers the further advantage over AdaBoost of suiting parallel execution. 相似文献
14.
15.
提出一种基于引力的孤立点检测算法.通过综合考虑数据对象周围的密度及数据对象之间的距离等因素对孤立点定义的影响来挖掘出数据集中隐含的孤立点.给出了与该算法相关的概念与技术,详细介绍了该算法,并用实际数据进行了实验.实验表明:该算法对数据集的维度具有很好的扩展性,能有效地识别孤立点,同时能反映出数据对象在数据集中的孤立程度. 相似文献
16.
Minimizing False Positives of a Decision Tree Classifier for Intrusion Detection on the Internet 总被引:1,自引:0,他引:1
Satoru Ohta Ryosuke Kurebayashi Kiyoshi Kobayashi 《Journal of Network and Systems Management》2008,16(4):399-419
Machine learning or data mining technologies are often used in network intrusion detection systems. An intrusion detection
system based on machine learning utilizes a classifier to infer the current state from the observed traffic attributes. The
problem with learning-based intrusion detection is that it leads to false positives and so incurs unnecessary additional operation
costs. This paper investigates a method to decrease the false positives generated by an intrusion detection system that employs
a decision tree as its classifier. The paper first points out that the information-gain criterion used in previous studies
to select the attributes in the tree-constructing algorithm is not effective in achieving low false positive rates. Instead
of the information-gain criterion, this paper proposes a new function that evaluates the goodness of an attribute by considering
the significance of error types. The proposed function can successfully choose an attribute that suppresses false positives
from the given attribute set and the effectiveness of using it is confirmed experimentally. This paper also examines the more
trivial leaf rewriting approach to benchmark the proposed method. The comparison shows that the proposed attribute evaluation
function yields better solutions than the leaf rewriting approach.
相似文献
Satoru OhtaEmail: |
17.
改进的决策树算法在潜在客户获取中的应用 总被引:1,自引:0,他引:1
在企业营销活动中,对潜在客户进行有针对性的营销活动,可以节省很大的开支,增加企业利润,该文将引入boosting思想的改进的决策树算法用于挖掘预测潜在客户群,并提出了获取潜在客户的合理可行的数据挖掘流程,用以指导企业的营销决策。试验结果表明,该方法有着很好的理论价值和应用价值。 相似文献
18.
19.
深度决策树迁移学习Boosting方法(DTrBoost)可以有效地实现单源域有监督情况下向一个目标域迁移学习,但无法实现多个源域情况下的无监督迁移场景。针对这一问题,提出了多源域分布下优化权重的无监督迁移学习Boosting方法,主要思想是根据不同源域与目标域分布情况计算出对应的KL值,通过比较选择合适数量的不同源域样本训练分类器并对目标域样本打上伪标签。最后,依照各个不同源域的KL距离分配不同的学习权重,将带标签的各个源域样本与带伪标签的目标域进行集成训练得到最终结果。对比实验表明,提出的算法实现了更好的分类精度并对不同的数据集实现了自适应效果,分类错误率平均下降2.4%,在效果最好的marketing数据集上下降6%以上。 相似文献
20.
短时交通流预测是交通流建模的一个重要组成部分,在城市道路交通的 管理和控制中起着重要的作用。然而,常见的时间序列模型(如ARIMA)、随机森林(RF)模型在交通流预测方面由于被构建模型产生的残差和输入变量所影响,其预测精度受到限制。针对该问题,提出了一种基于梯度提升回归树的短时交通预测模型来预测交通速度。首先,模型引入Huber损失函数作为模型残差的处理方法;其次, 在输入变量中考虑预测断面受到毗邻空间因素和时间因素相关性的影响。模型在训练过程中通过不断调整弱学习器的权重来纠正模型的残差,从而提高模型预测的精度。利用某城市快速路的交通速度数据进行实验,并使用MSE和MAPE等指标将本文模型与ARIMA模型和随机森林模型进行对比,结果表明,文中所提模型的预测精度最好,从而验证了模型在短时交通流预测方面的有效性。 相似文献