首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Gestational Diabetes Mellitus (GDM) is an illness that represents a certain degree of glucose intolerance with onset or first recognition during pregnancy. In the past few decades, numerous investigations were conducted upon early identification of GDM. Machine Learning (ML) methods are found to be efficient prediction techniques with significant advantage over statistical models. In this view, the current research paper presents an ensemble of ML-based GDM prediction and classification models. The presented model involves three steps such as preprocessing, classification, and ensemble voting process. At first, the input medical data is preprocessed in four levels namely, format conversion, class labeling, replacement of missing values, and normalization. Besides, four ML models such as Logistic Regression (LR), k-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest (RF) are used for classification. In addition to the above, RF, LR, KNN and SVM classifiers are integrated to perform the final classification in which a voting classifier is also used. In order to investigate the proficiency of the proposed model, the authors conducted extensive set of simulations and the results were examined under distinct aspects. Particularly, the ensemble model has outperformed the classical ML models with a precision of 94%, recall of 94%, accuracy of 94.24%, and F-score of 94%.  相似文献   

2.
Rational parameters of TBM (Tunnel Boring Machine) are the key to ensuring efficient and safe tunnel construction. Machine learning (ML) has become the main method for predicting operating parameters. Grid Search and optimization algorithms, such as Particle Swarm Optimization (PSO), are often used to find the hyper parameters of ML models but suffer from excessive time and low accuracy. In order to efficiently construct ML models and enhance the accuracy of predicting models, a BPSO (Beetle antennae search Particle Swarm Optimization) algorithm is proposed. Based on the PSO algorithm, the concept of BAS (Beetle Antennae Search) is integrated into the updating process of an individual particle, which improves the random search capability. The convergence of the BPSO algorithm is discussed in terms of inhomogeneous recursive equations and characteristic roots. Then, based on the proposed BPSO prototype, a hybrid ML model BPSO-XGBoost (eXtreme Gradient Boosting) is proposed. We applied the model to the Hangzhou Central Park tunnel project for the prediction of screw conveyer rotational speed. Finally, our model is compared with existing methods. The experimental results show that the BPSO-based model outperforms other traditional ML methods. The BPSO-XGBoost is more accurate than PSO-XGBoost and BPSO-RandomForest for predicting the speed. Also, it is verified that the hyper parameters optimized by the BPSO are better than those optimized by the original PSO. The comprehensive prediction performance ranking of models is as follows: BPSO-XGBoost > PSO-XGBoost > BPSO-RF > PSO-RF. Our models have preferable engineering application value.  相似文献   

3.
: Cardiotocography (CTG) represents the fetus’s health inside the womb during labor. However, assessment of its readings can be a highly subjective process depending on the expertise of the obstetrician. Digital signals from fetal monitors acquire parameters (i.e., fetal heart rate, contractions, acceleration). Objective:: This paper aims to classify the CTG readings containing imbalanced healthy, suspected, and pathological fetus readings. Method:: We perform two sets of experiments. Firstly, we employ five classifiers: Random Forest (RF), Adaptive Boosting (AdaBoost), Categorical Boosting (CatBoost), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LGBM) without over-sampling to classify CTG readings into three categories: healthy, suspected, and pathological. Secondly, we employ an ensemble of the above-described classifiers with the over-sampling method. We use a random over-sampling technique to balance CTG records to train the ensemble models. We use 3602 CTG readings to train the ensemble classifiers and 1201 records to evaluate them. The outcomes of these classifiers are then fed into the soft voting classifier to obtain the most accurate results. Results:: Each classifier evaluates accuracy, Precision, Recall, F1-scores, and Area Under the Receiver Operating Curve (AUROC) values. Results reveal that the XGBoost, LGBM, and CatBoost classifiers yielded 99% accuracy. Conclusion:: Using ensemble classifiers over a balanced CTG dataset improves the detection accuracy compared to the previous studies and our first experiment. A soft voting classifier then eliminates the weakness of one individual classifier to yield superior performance of the overall model.  相似文献   

4.
Heart failure is now widely spread throughout the world. Heart disease affects approximately 48% of the population. It is too expensive and also difficult to cure the disease. This research paper represents machine learning models to predict heart failure. The fundamental concept is to compare the correctness of various Machine Learning (ML) algorithms and boost algorithms to improve models’ accuracy for prediction. Some supervised algorithms like K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Trees (DT), Random Forest (RF), Logistic Regression (LR) are considered to achieve the best results. Some boosting algorithms like Extreme Gradient Boosting (XGBoost) and CatBoost are also used to improve the prediction using Artificial Neural Networks (ANN). This research also focuses on data visualization to identify patterns, trends, and outliers in a massive data set. Python and Scikit-learns are used for ML. Tensor Flow and Keras, along with Python, are used for ANN model training. The DT and RF algorithms achieved the highest accuracy of 95% among the classifiers. Meanwhile, KNN obtained a second height accuracy of 93.33%. XGBoost had a gratified accuracy of 91.67%, SVM, CATBoost, and ANN had an accuracy of 90%, and LR had 88.33% accuracy.  相似文献   

5.
COVID-19 has significantly impacted the growth prediction of a pandemic, and it is critical in determining how to battle and track the disease progression. In this case, COVID-19 data is a time-series dataset that can be projected using different methodologies. Thus, this work aims to gauge the spread of the outbreak severity over time. Furthermore, data analytics and Machine Learning (ML) techniques are employed to gain a broader understanding of virus infections. We have simulated, adjusted, and fitted several statistical time-series forecasting models, linear ML models, and nonlinear ML models. Examples of these models are Logistic Regression, Lasso, Ridge, ElasticNet, Huber Regressor, Lasso Lars, Passive Aggressive Regressor, K-Neighbors Regressor, Decision Tree Regressor, Extra Trees Regressor, Support Vector Regressions (SVR), AdaBoost Regressor, Random Forest Regressor, Bagging Regressor , AuoRegression, MovingAverage, Gradient Boosting Regressor, Autoregressive Moving Average (ARMA), Auto-Regressive Integrated Moving Averages (ARIMA), SimpleExpSmoothing, Exponential Smoothing, Holt-Winters, Simple Moving Average, Weighted Moving Average, Croston, and naive Bayes. Furthermore, our suggested methodology includes the development and evaluation of ensemble models built on top of the best-performing statistical and ML-based prediction methods. A third stage in the proposed system is to examine three different implementations to determine which model delivers the best performance. Then, this best method is used for future forecasts, and consequently, we can collect the most accurate and dependable predictions.  相似文献   

6.
Fraud detection for credit/debit card, loan defaulters and similar types is achievable with the assistance of Machine Learning (ML) algorithms as they are well capable of learning from previous fraud trends or historical data and spot them in current or future transactions. Fraudulent cases are scant in the comparison of non-fraudulent observations, almost in all the datasets. In such cases detecting fraudulent transaction are quite difficult. The most effective way to prevent loan default is to identify non-performing loans as soon as possible. Machine learning algorithms are coming into sight as adept at handling such data with enough computing influence. In this paper, the rendering of different machine learning algorithms such as Decision Tree, Random Forest, linear regression, and Gradient Boosting method are compared for detection and prediction of fraud cases using loan fraudulent manifestations. Further model accuracy metric have been performed with confusion matrix and calculation of accuracy, precision, recall and F-1 score along with Receiver Operating Characteristic (ROC )curves.  相似文献   

7.
离群点检测任务通常缺少可用的标注数据,且离群数据只占整个数据集的很小一部分,相较于其他的数据挖掘任务,离群点检测的难度较大,尚没有单一的算法适合于所有的场景。因此,结合多样性模型集成和主动学习思想,提出了一种基于主动学习的离群点集成检测方法OMAL(Outlier Mining based on Active Learning)。在主动学习框架指导下,根据各种基学习器的对比分析,选择了基于统计的、基于相似性的、基于子空间划分的三个无监督模型作为基学习器。将各基学习器评判的处于离群和正常边界的数据整合后呈现给人类专家进行标注,以最大化人类专家反馈的信息量;从标注的数据集和各基学习器投票产生的数据集中抽样,基于GBM(Gradient BoostingMachine)训练一个有监督二元分类模型,并将该模型应用于全数据集,得出最终的挖掘结果。实验表明,提出方法的AUC有了较为明显的提升,且具有良好的运行效率,具备较好的实用价值。  相似文献   

8.
Accurate diagnosis of Lung Cancer Disease (LCD) is an essential process to provide timely treatment to the lung cancer patients. Artificial Neural Networks (ANN) is a recently proposed Machine Learning (ML) algorithm which is used on both large-scale and small-size datasets. In this paper, an ensemble of Weight Optimized Neural Network with Maximum Likelihood Boosting (WONN-MLB) for LCD in big data is analyzed. The proposed method is split into two stages, feature selection and ensemble classification. In the first stage, the essential attributes are selected with an integrated Newton–Raphsons Maximum Likelihood and Minimum Redundancy (MLMR) preprocessing model for minimizing the classification time. In the second stage, Boosted Weighted Optimized Neural Network Ensemble Classification algorithm is applied to classify the patient with selected attributes which improves the cancer disease diagnosis accuracy and also minimize the false positive rate. Experimental results demonstrate that the proposed approach achieves better false positive rate, accuracy of prediction, and reduced delay in comparison to the conventional techniques.  相似文献   

9.
Accurate prediction of electricity consumption is essential for providing actionable insights to decision-makers for managing volume and potential trends in future energy consumption for efficient resource management. A single model might not be sufficient to solve the challenges that result from linear and non-linear problems that occur in electricity consumption prediction. Moreover, these models cannot be applied in practice because they are either not interpretable or poorly generalized. In this paper, a stacking ensemble model for short-term electricity consumption is proposed. We experimented with machine learning and deep models like Random Forests, Long Short Term Memory, Deep Neural Networks, and Evolutionary Trees as our base models. Based on the experimental observations, two different ensemble models are proposed, where the predictions of the base models are combined using Gradient Boosting and Extreme Gradient Boosting (XGB). The proposed ensemble models were tested on a standard dataset that contains around 500,000 electricity consumption values, measured at periodic intervals, over the span of 9 years. Experimental validation revealed that the proposed ensemble model built on XGB reduces the training time of the second layer of the ensemble by a factor of close to 10 compared to the state-of-the-art , and also is more accurate. An average reduction of approximately 39% was observed in the Root mean square error.  相似文献   

10.
卞凌志  王直杰 《计算机应用》2021,41(9):2539-2544
信用风险是商业银行所面临的主要金融风险之一,而传统的基于统计学习的信用评分方法不能有效利用现有的特征学习方法,因此预测准确度不高。为解决这个问题,提出一种增强多维多粒度级联森林的方法建立信用评分模型,借鉴残差学习的思想,建立了多维多粒度级联残差森林(grcForest)模型,从而大幅增加提取的特征。除此之外,使用多维多粒度的扫描尽可能多地提取原始数据的特征,从而提高了特征提取的效率。对各模型的实验结果通过AUC(Area Under Curve)、准确率等指标进行评价,同时把所提模型与现有的统计学习和机器学习算法在四个不同的信用评分数据集上进行对比,可知所提出的模型的AUC值相较于轻量级梯度提升机(LightGBM)方法平均高1.13%,相较于极端梯度提升(XGBoost)方法平均高1.44%。从实验结果可以看出,提出的模型预测效果最佳。  相似文献   

11.
Effective fault diagnostics on rolling bearings is vital to ensuring safe and reliable operations of industrial equipment. In recent years, enabled by Machine Learning (ML) algorithms, data-based fault diagnostics approaches have been steadily developed as promising solutions to support industries. However, each ML algorithm exhibits some shortcomings limiting its applicability in practice. To tackle this issue, in this paper, Deep Learning (DL) and Ensemble Learning (EL) algorithms are integrated as a novel Deep Ensemble Learning (DEL) approach. In the DEL approach, the training requirements for the DL algorithm are alleviated, and the accuracy for fault condition classifications is enhanced by the EL algorithm. The DEL approach is comprised of the following critical steps: (i) Convolutional Neural Networks (CNNs) are constructed to pre-process vibration signals of rolling bearings to extract fault-related preliminary features efficiently; (ii) decision trees are designed to optimise the extracted features by quantifying their importance contributing to the faults of rolling bearings; (iii) the EL algorithm, which is enabled by a Gradient Boosting Decision Tree (GBDT) algorithm and a Non-equivalent Cost Logistic Regression (NCLR) algorithm, is developed for fault condition classifications with optimised non-equivalent costs assigned to different fault severities. Case studies demonstrate that the DEL approach is superior to some other comparative ML approaches. The industrial applicability of the DEL approach is showcased via the case studies and analyses.  相似文献   

12.
针对短期电力负荷随时间变化呈现随机性和不确定性问题,提出了一种基于加权灰色关联投影算法Bagging-Blending的融合模型。首先,采用加权灰色关联投影算法对电力负荷中各影响因素(如天气、温度、湿度、日期类型等)进行分析,以选取历史负荷特征。在此基础上,分别将各单一模型SVR (support vector regression)、KNN (K-nearest neighbor)、GRU (gate recurrent unit)、XGBoost (eXtreme Gradient Boosting)、LightGBM (light gradient boosting machine)、CatBoost (Categorical features gradient Boosting)嵌入Bagging集成算法中以提升模型的稳定性和泛化能力。同时利用Pearson相关系数对各单一模型进行相关性分析。然后,依据模型对数据观测空间角度的不同,使用Blending模型对相关性小的模型进行融合。最后,通过新英格兰地区电力负荷数据ISO New England进行验证。所提融合模型与传统单模型(SVR、GRU)和其他融合模型(Bagging-XGBoost、最优加权的GRU-XGBoost)相比,具有较强的泛化能力和较高的稳定性与预测精度。  相似文献   

13.
叶志宇  冯爱民  高航 《计算机应用》2019,39(12):3434-3439
针对轻量化梯度促进机(LightGBM)等集成学习模型只对数据信息进行一次挖掘,无法自动地细化数据挖掘粒度或通过深入挖掘得到更多的数据中潜在内部关联信息的问题,提出了深度LightGBM集成学习模型,该模型由滑动窗口和加深两部分组成。首先,通过滑动窗口使得集成学习模型能够自动地细化数据挖掘粒度,从而更加深入地挖掘数据中潜在的内部关联信息,同时赋予模型一定的表示学习能力。然后,基于滑动窗口,用加深步骤进一步地提升模型的表示学习能力。最后,结合特征工程对数据集进行处理。在谷歌商店数据集上进行的实验结果表明,所提深度集成学习模型相较原始集成学习模型的预测精度高出6.16个百分点。所提方法能够自动地细化数据挖掘粒度,从而获取更多数据集中的潜在信息,并且深度LightGBM集成学习模型与传统深度神经网络相比是非神经网络的深度模型,参数更少,可解释性更强。  相似文献   

14.
陈海龙  杨畅  杜梅  张颖宇 《计算机应用》2022,42(7):2256-2264
针对信用风险评估中数据集不平衡影响模型预测效果的问题,提出一种基于边界自适应合成少数类过采样方法(BA-SMOTE)和利用Focal Loss函数改进LightGBM损失函数的算法(FLLightGBM)相结合的信用风险预测模型。首先,在边界合成少数类过采样(Borderline-SMOTE)的基础上,引入自适应思想和新的插值方式,使每个处于边界的少数类样本生成不同数量的新样本,并且新样本的位置更靠近原少数类样本,以此来平衡数据集;其次,利用Focal Loss函数来改进LightGBM算法的损失函数,并以改进的算法训练新的数据集以得到最终结合BA-SMOTE方法和FLLightGBM算法建立的BA-SMOTE-FLLightGBM模型;最后,在Lending Club数据集上进行信用风险预测。实验结果表明,与其他不平衡分类算法RUSBoost、CUSBoost、KSMOTE-AdaBoost和AK-SMOTE-Catboost相比,所建立的模型在G-mean和AUC两个指标上都有明显的提升,提升了9.0%~31.3%和5.0%~14.1%。以上结果验证了所提出的模型在信用风险评估中具有更好的违约预测效果。  相似文献   

15.
Machine Learning - Boosting combines weak (biased) learners to obtain effective learning algorithms for classification and prediction. In this paper, we show a connection between boosting and...  相似文献   

16.
17.
针对传统房价评估方法中存在的数据源单一、过分依赖主观经验、考虑因素理想化等问题,提出一种基于多源数据和集成学习的智能评估方法。首先,从多源数据中构造特征集,并利用Pearson相关系数与序列前向选择法提取最优特征子集;然后,基于构造的特征,以Bagging集成策略作为结合方法集成多个轻量级梯度提升机(LightGBM),并利用贝叶斯优化算法对模型进行优化;最后,将该方法应用于房价评估问题,实现房价的智能评估。在真实的房价数据集上进行的实验表明,相较于支持向量机(SVM)、随机森林等传统模型,引入集成学习和贝叶斯优化的新模型的评估精度提升了3.15%,并且百分误差在10%以内的评估结果占比84.09%。说明所提模型能够很好地应用于房价评估领域,得到的评估结果更准确。  相似文献   

18.
针对传统房价评估方法中存在的数据源单一、过分依赖主观经验、考虑因素理想化等问题,提出一种基于多源数据和集成学习的智能评估方法。首先,从多源数据中构造特征集,并利用Pearson相关系数与序列前向选择法提取最优特征子集;然后,基于构造的特征,以Bagging集成策略作为结合方法集成多个轻量级梯度提升机(LightGBM),并利用贝叶斯优化算法对模型进行优化;最后,将该方法应用于房价评估问题,实现房价的智能评估。在真实的房价数据集上进行的实验表明,相较于支持向量机(SVM)、随机森林等传统模型,引入集成学习和贝叶斯优化的新模型的评估精度提升了3.15%,并且百分误差在10%以内的评估结果占比84.09%。说明所提模型能够很好地应用于房价评估领域,得到的评估结果更准确。  相似文献   

19.
黄晓祥  胡咏梅  吴丹  任力杰 《计算机应用》2021,41(10):3082-3088
颈动脉狭窄、颈动脉内中膜厚度增加(CIMT)或颈动脉斑块等可导致脑卒中的发生。为实现脑卒中大规模初步筛查,提出基于医疗数据的改进的变分自编码器(VAE)来识别和预测异常颈动脉。首先,针对医疗数据存在缺失的情况,采用K近邻(KNN)、均值和众数相混合的方法(MKNN)以及改进的VAE对缺失数据进行填补以得到完整的数据集,从而提高数据的应用范围;接着,分析特征属性,并依据重要性对特征进行排序;然后,运用逻辑回归(LR)、支持向量机(SVM)、随机森林(RF)和极限梯度提升树(XGBT)这四种有监督学习方法结合遗传算法(GA)来建立异常颈动脉识别模型;最后,基于改进的VAE建立预测异常颈动脉的半监督模型。相较于基线模型,基于改进的VAE的半监督模型性能提升明显,灵敏度达到0.893 8,特异性达到0.927 2,F1值达到0.910 5,分类准确率达到0.910 5。实验结果表明,所建立的半监督模型可以用来识别异常颈动脉,进而作为一种识别脑卒中高危人群的工具,预防和减少脑卒中的发生。  相似文献   

20.
In recent years, smart healthcare, artificial intelligence (AI)-aided diagnostics, and automated surgical robots are just a few of the innovations that have emerged and gained popularity with the advent of Healthcare 4.0. Such technologies are powered by machine learning (ML) and deep learning (DL), which are preferable for disease diagnosis, identifying patterns, prescribing treatments, and forecasting diseases like stroke prediction, cancer prediction and so forth. Nevertheless, much data is needed for AI, ML, and DL-based systems to train effectively and provide the desired outcomes. Further, it raises concerns about data privacy, security, communication overhead, regulatory compliance and so forth. Federated learning (FL) is a technology that protects data security and privacy by limiting data sharing and utilizing model information of distributed systems to enhance performance. However, existing approaches are traditionally verified on pre-established datasets that fail to capture real-life applicability. Therefore, this study proposes an AI-enabled stroke prediction architecture consisting of FL based on the artificial neural network (ANN) model using data from actual stroke cases. This architecture can be implemented on healthcare-based wearable devices (WD) for real-time use as it is effective, precise, and computationally affordable. In order to continuously enhance the performance of the global model, the proposed FL-based architecture aggregates the optimizer weights of many clients using a fifth-generation (5G) communication channel. Then, the performance of the proposed FL-based architecture is studied based on multiple parameters such as accuracy, precision, recall, bit error rate, and spectral noise. It outperforms the traditional approaches regarding accuracy, which is 5% to 10% higher.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号