共查询到20条相似文献,搜索用时 78 毫秒
1.
为提高建筑能耗预测效果,提出一种基于随机森林特征选择算法的建筑能耗预测集成回归模型(RF-GBDT)。通过随机森林的特征选择算法处理原始数据集生成最优特征子集,使用梯度提升决策树算法将6种基础的机器学习算法建立集成回归模型,以最优特征子集作为集成模型的输入数据集。使用评价指标RMSE和R2将集成模型预测结果与传统集成模型以及单一机器学习算法的预测结果进行对比,实验结果验证了集成后的RF-GBDT模型比单一算法的预测性能有了大幅度提升。 相似文献
2.
针对线性回归、SVR以及大部分多变量回归树等回归模型不能直接利用分类型属性进行回归分析的问题,提出了一种可联合多种类型属性的决策树结点划分方法.该方法通过定义样本集合在分类型属性上的中心以及样本到中心的距离,使得分类型属性也可以像数值型属性一样参与样本的聚类过程,从而形成样本集的划分.之后,文中又为由该方法产生的决策树... 相似文献
3.
场外配资是一种高风险的融资手段,对可疑的场外配资行为进行及时的识别与监控,有利于维护投资者的合法权益与证券市场的稳定.为此提出一种基于改进XGBoost机器学习算法的配资账户识别方法.通过分析场外配资的业务逻辑,构建了与识别算法强相关的特征指标体系,并结合场外配资行为特性采用召回率作为关键度量指标.通过对所构建识别算法... 相似文献
4.
组合药物在复杂疾病特别是癌症的治疗中发挥越来越重要的作用。以组合药物靶标为初始节点在药物-蛋白质异构网络上执行重启型随机游走,将收敛后的概率分布作为药物组合的特征向量,训练梯度提升决策树模型来预测新的药物组合。在标准药物组合数据集的性能评估表明,该方法比其他七种典型分类器和传统的提升算法具有更好的性能,且基于异构网络的特征显著提升了各分类器的性能,AUC值从0.528提升至0.909。 相似文献
5.
针对肌电识别中粘贴较多的电极引起的通道冗余问题,提出了一种基于梯度提升树(Gradient Boosting Decision Tree, GBDT)模型的最优通道组合选择方法.首先,手动提取每个通道肌电信号常用的5个特征,然后利用GBDT模型生成隐含的新特征.其次,对所有通道组合下的原始特征和新特征进行组合并训练另外的GBDT模型,用于预测每种组合的动作识别率.最后,选择出最优的通道组合用于在线控制实验.实验结果表明,最优的通道组合具有较高的离线识别率和在线控制精度,能实现对机器手准确实时的控制.使用肌电识别系统多次进行在线控制实验时,选择最优的通道组合可以减少电极的粘贴数量,减少多余通道带来的信息冗余和干扰,从而提高系统的实用性和鲁棒性. 相似文献
6.
8.
在增强现实领域,实时性是一个重要的问题.利用传统的方法训练与识别往往需要大量的时间.提出并实现了一种基于图像特征点快速提取与随机树分类的增强现实系统.在对一幅或很少的几幅包含标志物的原始图像进行训练之后,该系统能快速高效地识别摄像头新采集到的图像中的标志物,并计算标志物的空间位置坐标,有效地将真实场景与虚拟物体进行合成.该系统大大减少了识别与合成所花的时间,并在较大的视角和尺度变化下仍体现出良好的效果. 相似文献
9.
针对计算机辅助诊断(CAD)技术在乳腺癌疾病诊断准确率的优化问题,提出了一种基于随机森林模型下Gini指标特征加权的支持向量机方法(RFG-SVM)。该方法利用了随机森林模型下的Gini指数衡量各个特征对分类结果的重要性,构造具有加权特征向量核函数的支持向量机,并在乳腺癌疾病诊断方面加以应用。经理论分析和实验数据验证,相比于传统的支持向量机(SVM),该方法提升了分类预测的性能,其结果与最新的方法相比也具有一定的竞争力,而且在医疗诊断应用方面更具优势。 相似文献
10.
烟草制丝过程中烟丝的加水量对制丝质量起着重要的作用,而影响加水量的因素众多.为了定量研究各影响因素对生丝水分的影响程度,通过对绵阳卷烟厂生丝水分历史数据,运用多种机器学习树模型算法进行学习,并对结果进行对比分析.分析结果表明,不同模型所获得的预测精度存在差别,在现有数据上极端梯度提升树获得了最高的预测精度.通过极端梯度... 相似文献
11.
José M. Cadenas M. Carmen Garrido Raquel Martínez 《Expert systems with applications》2013,40(16):6241-6252
Today, feature selection is an active research in machine learning. The main idea of feature selection is to choose a subset of available features, by eliminating features with little or no predictive information, as well as redundant features that are strongly correlated. There are a lot of approaches for feature selection, but most of them can only work with crisp data. Until now there have not been many different approaches which can directly work with both crisp and low quality (imprecise and uncertain) data. That is why, we propose a new method of feature selection which can handle both crisp and low quality data. The proposed approach is based on a Fuzzy Random Forest and it integrates filter and wrapper methods into a sequential search procedure with improved classification accuracy of the features selected. This approach consists of the following main steps: (1) scaling and discretization process of the feature set; and feature pre-selection using the discretization process (filter); (2) ranking process of the feature pre-selection using the Fuzzy Decision Trees of a Fuzzy Random Forest ensemble; and (3) wrapper feature selection using a Fuzzy Random Forest ensemble based on cross-validation. The efficiency and effectiveness of this approach is proved through several experiments using both high dimensional and low quality datasets. The approach shows a good performance (not only classification accuracy, but also with respect to the number of features selected) and good behavior both with high dimensional datasets (microarray datasets) and with low quality datasets. 相似文献
12.
为解决乡村振兴战略规划下空心村常住人口预测问题,为国家促进乡村发展、乡村建设、乡村治理提供辅助决策。本文采用GBDT回归算法利用电力、气象等数据对空心村常住人口进行预测。通过特征值重要性分析分析方法筛选出空心村常住人口相关性最强的5个特征,针对这些特征采用模型训练及预测的方式预测空心村常住人口。完成数据预处理后,本文采用5折交叉验证法,以3:1:1的比例将数据集分别划分为训练集、交叉验证集和预测集,获取常住人口预测结果后,并采用均方误差和R方值结合可视化方法对于预测结果进行准确性验证。验证结果表明,采用基于GBDT回归的空心村常住人口预测算法对于空心村常住人口有较好的预测结果。 相似文献
13.
Deep ensemble learning with non-equivalent costs of fault severities for rolling bearing diagnostics
Effective fault diagnostics on rolling bearings is vital to ensuring safe and reliable operations of industrial equipment. In recent years, enabled by Machine Learning (ML) algorithms, data-based fault diagnostics approaches have been steadily developed as promising solutions to support industries. However, each ML algorithm exhibits some shortcomings limiting its applicability in practice. To tackle this issue, in this paper, Deep Learning (DL) and Ensemble Learning (EL) algorithms are integrated as a novel Deep Ensemble Learning (DEL) approach. In the DEL approach, the training requirements for the DL algorithm are alleviated, and the accuracy for fault condition classifications is enhanced by the EL algorithm. The DEL approach is comprised of the following critical steps: (i) Convolutional Neural Networks (CNNs) are constructed to pre-process vibration signals of rolling bearings to extract fault-related preliminary features efficiently; (ii) decision trees are designed to optimise the extracted features by quantifying their importance contributing to the faults of rolling bearings; (iii) the EL algorithm, which is enabled by a Gradient Boosting Decision Tree (GBDT) algorithm and a Non-equivalent Cost Logistic Regression (NCLR) algorithm, is developed for fault condition classifications with optimised non-equivalent costs assigned to different fault severities. Case studies demonstrate that the DEL approach is superior to some other comparative ML approaches. The industrial applicability of the DEL approach is showcased via the case studies and analyses. 相似文献
14.
随着网络应用服务类型的多样化以及网络流量加密技术的不断发展,加密流量识别已经成为网络安全领域的一个重大挑战。传统的流量识别技术如深度包检测无法有效地识别加密流量,而基于机器学习理论的加密流量识别技术则表现出很好的效果。因此,本文提出一种融合梯度提升决策树算法(GBDT)与逻辑回归(LR)算法的加密流量分类模型,使用贝叶斯优化(BO)算法进行超参数调整,利用与时间相关的流特征对普通加密流量与VPN加密流量进行识别,实现了整体高于90%的流量识别准确度,与其他常用分类模型相比拥有更好的识别效果。 相似文献
15.
16.
随着时代的不断进步,人民生活水平日益提高。在解决温饱问题之余,有了可供投资的余财。越来越多的人将目光转向股市投资,为股市发展提供了资金条件。然而在纷繁复杂的股票市场,如何寻找最优股成为亟待解决的问题。这不仅是投资者单方面的困惑,也是股票预测领域中学者们所关心的重点。通过网格搜索算法对XGBoost模型进行参数优化构建GS-XGBoost的金融预测模型,并将该模型运用于股票短期预测中。分别以中国平安、中国建筑、中国中车、科大讯飞和三一重工2005年4月至2018年12月28日的每日收盘价作为实验数据。通过实验对比,相较于XGBoost原模型、GBDT模型以及SVM模型,GS-XGBoost模型在MSE、RMSE与MAE三个评价指标上都表现出较好的预测结果。从而验证,GS-XGBoost金融预测模型在股票短期预测中具有更好的拟合性能。 相似文献
17.
决策树算法采用递归方法构建,训练效率较低,过度分类的决策树可能产生过拟合现象.因此,文中提出模型决策树算法.首先在训练数据集上采用基尼指数递归生成一棵不完全决策树,然后使用一个简单分类模型对其中的非纯伪叶结点(非叶结点且结点包含的样本不属于同一类)进行分类,生成最终的决策树.相比原始的决策树算法,这样产生的模型决策树能在算法精度不损失或损失很小的情况下,提高决策树的训练效率.在标准数据集上的实验表明,文中提出的模型决策树在速度上明显优于决策树算法,具备一定的抗过拟合能力. 相似文献
18.
Developed by the United States Green Building Council, Leadership in Energy and Environmental Design (LEED) is a credit-based rating system that provides third-party verification for green buildings. Selection of target credits is important yet challenging for LEED managers because various factors such as target certification grade level and building features need to be considered on a case-by-case basis. Local climatic factors could affect the selection of green building technologies and hence the target credits, but currently there is no research suggesting target LEED credits based on climatic factors. This paper presents a methodology for the selection of target LEED credits based on project information and climatic factors. This study focuses on projects certified with LEED for Existing Buildings (LEED-EB). Information of 912 projects and their surrounding climatic circumstances was collected and studied. 55 classification models for 47 LEED-EB credits were then constructed and optimized using three classification algorithms - Random Forests, AdaBoost Decision Tree, and Support Vector Machine (SVM). The results showed that Random Forests performed the best in most of the 55 classification models. With a combination of the three algorithms, the trained classification models were used to develop a web-based decision support system for LEED credit selection. The system was tested using 20 recently certified LEED projects, and the results showed that our system had an accuracy of 82.56%. 相似文献
19.
20.
Solving the feature selection problem is considered an important issue when addressing data from real applications that contain a large number of features. However, not all of these features are important; therefore, the redundant features must be removed because they affect the accuracy of the data representation and introduce time complexity into the analysis of these data. For these reasons, the feature selection problem is considered an NP-complete nonlinearly constrained optimization problem. The rough set (RS) and neighborhood rough set (NRS) are the most powerful methods used to solve the feature selection problem; however, both approaches suffer from high time complexity. To avoid these limitations, we combined the RS and NRS with a new metaheuristic algorithm called the runner-root algorithm (RRA). The spirit of the RRA originated from real-life plants called running plants, which have roots and runners that spread the plants in search of minerals and water resources through their root and runner development. To validate the proposed algorithm, several UCI Machine Learning Repository datasets are used to compute the performance of our algorithm employing two effective classifiers, the random forest and the K-nearest neighbor, in addition to some other measures for the performance evaluation. The experimental results illustrate that the proposed algorithm is superior to the state-of-the-art metaheuristic algorithms in terms of the performance measures. Additionally, the NRS increases the performance of the proposed method more than the RS as an objective function. 相似文献