首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 114 毫秒
1.
Heart failure is now widely spread throughout the world. Heart disease affects approximately 48% of the population. It is too expensive and also difficult to cure the disease. This research paper represents machine learning models to predict heart failure. The fundamental concept is to compare the correctness of various Machine Learning (ML) algorithms and boost algorithms to improve models’ accuracy for prediction. Some supervised algorithms like K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Trees (DT), Random Forest (RF), Logistic Regression (LR) are considered to achieve the best results. Some boosting algorithms like Extreme Gradient Boosting (XGBoost) and CatBoost are also used to improve the prediction using Artificial Neural Networks (ANN). This research also focuses on data visualization to identify patterns, trends, and outliers in a massive data set. Python and Scikit-learns are used for ML. Tensor Flow and Keras, along with Python, are used for ANN model training. The DT and RF algorithms achieved the highest accuracy of 95% among the classifiers. Meanwhile, KNN obtained a second height accuracy of 93.33%. XGBoost had a gratified accuracy of 91.67%, SVM, CATBoost, and ANN had an accuracy of 90%, and LR had 88.33% accuracy.  相似文献   

2.
目前,我国高速公路拥堵程度居高不下,而交通流预测作为实现智能交通系统的重要一环,若能对其实现高精度的预测,那么将能够高效地管理交通,从而缓解拥堵。针对该问题,提出了一种考虑时空关联的多通道交通流预测方法(MCST-Transformer)。首先,将Transformer结构用于不同数据的内在规律提取,然后引入空间关联模块对不同数据间的关联特征进行挖掘,最后,借助通道注意力整合优化全局信息。采用广东省高速公路数据,实现了两小时内92个收费站的高精度流量预测。结果表明:MCST-Transformer优于传统机器学习方法以及部分基于注意力机制的时间序列模型,在120 min预测跨度下,相比贝叶斯回归,MAPE降低了5.1%;对比Seq2Seq-Att以及Seq2Seq这些深度学习算法,所提方法的总体MAPE也能降低0.5%,说明通过多通道的方式能够区分不同数据的特性,进而更好地预测。  相似文献   

3.
基于混合智能的航天器故障诊断系统   总被引:1,自引:0,他引:1  
面向航天器测控管理,研究了一种基于专家系统(ES)、案例推理(CBR)以及故障树(FT)的混合智 能诊断技术.文中,故障树双向混合推理机制被用于实现航天器故障定位和预测.同时案例推理的k 最近邻检索 策略(KNN)采用了简单实用、易收敛特性的多感官群集算法(MSA).基于案例推理和故障树的航天器专家系 统(SESCF)采用了2 种融合模式.案例推理和故障树采用独立运行模式,专家系统与案例推理和故障树之间则采 用了松耦合运行模式.出于改善推理效率的目的,文中提出了一种将遥测信息转化为语义信息的结合特定推理方法 的非线性转换方法.某卫星供配电分系统的测试证实了SESCF 系统诊断的有效性.测试结果表明,相对于专家系 统,SESCF 系统具有更高的诊断准确度和可靠性.SESCF 系统采用的非线性转换方法在航天器故障诊断过程中简 单实用且容错性较好.  相似文献   

4.
滚动轴承作为旋转机械中的必需元件,其任何故障都可能导致机器乃至整个系统发生故障,从而导致巨大的经济损失和时间的浪费,因此必须要及时准确地诊断滚动轴承故障。针对传统极限学习机中模型参数对滚动轴承故障诊断精度影响较大的问题,提出了一种基于贝叶斯优化的深度核极限学习机的滚动轴承故障诊断方法。首先,将自动编码器与核极限学习机相结合,构建了深度核极限学习机(Deep kernel extreme learning machine, DKELM)模型。其次,利用贝叶斯优化(Bayesian optimization, BO)算法对DKELM中的超参数进行寻优,使得训练数据集和验证数据集在DKELM模型中的分类错误率之和最低。然后,将测试数据集输入到训练好的BO-DKELM中进行故障诊断。最后,采用凯斯西储大学轴承故障数据集对所提方法进行验证,最终故障诊断精度为99.6%,与深度置信网络和卷积神经网络等传统智能算法进行对比,所提方法具有更高的故障诊断精度。  相似文献   

5.
Zhang  Hongpo  Cheng  Ning  Zhang  Yang  Li  Zhanbo 《Applied Intelligence》2021,51(7):4503-4514

Label flipping attack is a poisoning attack that flips the labels of training samples to reduce the classification performance of the model. Robustness is used to measure the applicability of machine learning algorithms to adversarial attack. Naive Bayes (NB) algorithm is a anti-noise and robust machine learning technique. It shows good robustness when dealing with issues such as document classification and spam filtering. Here we propose two novel label flipping attacks to evaluate the robustness of NB under label noise. For the three datasets of Spambase, TREC 2006c and TREC 2007 in the spam classification domain, our attack goal is to increase the false negative rate of NB under the influence of label noise without affecting normal mail classification. Our evaluation shows that at a noise level of 20%, the false negative rate of Spambase and TREC 2006c has increased by about 20%, and the test error of the TREC 2007 dataset has increased to nearly 30%. We compared the classification accuracy of five classic machine learning algorithms (random forest(RF), support vector machine(SVM), decision tree(DT), logistic regression(LR), and NB) and two deep learning models(AlexNet, LeNet) under the proposed label flipping attacks. The experimental results show that two label noises are suitable for various classification models and effectively reduce the accuracy of the models.

  相似文献   

6.
We set out in this study to review a vast amount of recent literature on machine learning (ML) approaches to predicting financial distress (FD), including supervised, unsupervised and hybrid supervised–unsupervised learning algorithms. Four supervised ML models including the traditional support vector machine (SVM), recently developed hybrid associative memory with translation (HACT), hybrid GA-fuzzy clustering and extreme gradient boosting (XGBoost) were compared in prediction performance to the unsupervised classifier deep belief network (DBN) and the hybrid DBN-SVM model, whereby a total of sixteen financial variables were selected from the financial statements of the publicly-listed Taiwanese firms as inputs to the six approaches. Our empirical findings, covering the 2010–2016 sample period, demonstrated that among the four supervised algorithms, the XGBoost provided the most accurate FD prediction. Moreover, the hybrid DBN-SVM model was able to generate more accurate forecasts than the use of either the SVM or the classifier DBN in isolation.  相似文献   

7.
利用健康医疗领域的海量临床数据进行辅助医疗决策支持是智慧医疗的核心技术和必然的发展趋势。医疗决策支持主要包括疾病风险预测与疾病智能诊断两方面,以临床积累和实时获取的多种数据来源为基础,通过多种机器学习算法实现对患者疾病类型的分类或者对患病风险的预测。从医疗决策支持的概念和方法框架出发,按照不同疾病种类,总结了当前采用的机器学习诊断和预测方法,着重介绍这些方法的特点和区别,并对存在的挑战和未来发展进行分析。  相似文献   

8.
Ribonucleic acid (RNA) hybridization is widely used in popular RNA simulation software in bioinformatics. However, limited by the exponential computational complexity of combinatorial problems, it is challenging to decide, within an acceptable time, whether a specific RNA hybridization is effective. We hereby introduce a machine learning based technique to address this problem. Sample machine learning (ML) models tested in the training phase include algorithms based on the boosted tree (BT), random forest (RF), decision tree (DT) and logistic regression (LR), and the corresponding models are obtained. Given the RNA molecular coding training and testing sets, the trained machine learning models are applied to predict the classification of RNA hybridization results. The experiment results show that the optimal predictive accuracies are 96.2%, 96.6%, 96.0% and 69.8% for the RF, BT, DT and LR-based approaches, respectively, under the strong constraint condition, compared with traditional representative methods. Furthermore, the average computation efficiency of the RF, BT, DT and LR-based approaches are 208 679, 269 756, 184 333 and 187 458 times higher than that of existing approach, respectively. Given an RNA design, the BT-based approach demonstrates high computational efficiency and better predictive accuracy in determining the biological effectiveness of molecular hybridization.   相似文献   

9.
基于支持向量机的机械故障智能分类研究   总被引:7,自引:0,他引:7  
故障样本不足是制约故障诊断技术向智能化方向发展的主要原因之一,支持向量机(SVM)是一种基于统计学习理论(SLT)的机器学习算法,它能在训练样本很少的情况下达到很好的分类效果,从而为故障诊断技术向智能化发展提供了新的途径.本文介绍了支持向量机分类算法,以滚动轴承的故障分类为例,探讨了该算法在故障诊断领域中的应用,并与BP神经网络分类方法进行了对比研究,结果表明,SVM方法在少样本情况下的分类效果优于BP神经网络分类方法.  相似文献   

10.
为了提高变压器故障诊断精度,提出一种基于改进SSA优化MDS-SVM的变压器故障诊断方法.首先,利用多维尺度缩放法(multiple dimensional scaling,MDS)对20维变压器故障特征数据进行特征提取,降低高维数据存在的稀疏性和多重共线性;其次,引入樽海鞘群算法(salp swarm algorithm,SSA),并对该算法进行改进,增置信赖机制和突变,以提高算法的收敛速度和收敛能力;然后,通过与原始SSA、PSO、GWO和$ \beta $-GWO算法进行寻优测试对比来验证改进SSA算法的优越性;最后,使用改进SSA算法对MDS降低维数和支持向量机(support vector machine,SVM)的参数联合寻优,构建新的故障诊断模型.分析并比较其与常用算法优化的SVM故障诊断模型、BP神经网络(back propagation neural network,BPNN)、K最近邻(K-nearest neighbor,KNN)以及随机森林(random forest,RF)故障诊断模型的故障诊断精确度,结果表明,基于改进SSA的MDS-SVM变压器故障诊断模型的精确度高于其他算法模型,且泛化能力较强.  相似文献   

11.
E级计算机系统规模巨大,使得故障异常总量随之增多,导致诊断发现的难度增加,因此,迫切需要一套更加准确高效的实时维护故障诊断系统,对硬件系统进行全面的异常及故障信息实时检测、故障诊断及故障预测。传统故障诊断系统在面对数万节点规模的诊断时存在执行效率低、异常检测误报率高的问题,异常检测及故障诊断的覆盖率不足。对异常及故障检测、故障诊断与故障预测相关技术进行研究,分析技术原理及适用性,并结合E级高性能计算机实际工程需求,设计一套满足数E级高性能计算机需求的维护故障诊断系统。基于维护系统的结构组成设计可扩展的边缘诊断架构,将高性能计算机系统知识、专家知识与数理统计、机器学习相融合给出故障检测、诊断及预测算法,并针对专用场景建立预测模型。实验结果表明,该系统具有较好的可扩展性,能在10 s内完成对十万个节点规模系统的故障诊断,与传统故障诊断系统相比,异常检测某特定指标误报率从3.3%降低到几乎为0,硬件故障检测覆盖率从90.2%提升至96%以上,硬件故障诊断覆盖率从71%提升至约94%,能较准确地预测多个重要应用场景下的故障。  相似文献   

12.
This paper provides a systematic review of previous software fault prediction studies with a specific focus on metrics, methods, and datasets. The review uses 74 software fault prediction papers in 11 journals and several conference proceedings. According to the review results, the usage percentage of public datasets increased significantly and the usage percentage of machine learning algorithms increased slightly since 2005. In addition, method-level metrics are still the most dominant metrics in fault prediction research area and machine learning algorithms are still the most popular methods for fault prediction. Researchers working on software fault prediction area should continue to use public datasets and machine learning algorithms to build better fault predictors. The usage percentage of class-level is beyond acceptable levels and they should be used much more than they are now in order to predict the faults earlier in design phase of software life cycle.  相似文献   

13.
The assessment of promotional sales with models constructed by machine learning techniques is arousing interest due, among other reasons, to the current economic situation leading to a more complex environment of simultaneous and concurrent promotional activities. An operative model diagnosis procedure was previously proposed in the companion paper, which can be readily used both for agile decision making on the architecture and implementation details of the machine learning algorithms, and for differential benchmarking among models. In this paper, a detailed example of model analysis is presented for two representative databases with different promotional behaviour, namely, a non-seasonal category (milk) and a heavily seasonal category (beer). The performance of four well-known machine learning techniques with increasing complexity is analyzed in detail here. In particular, k-Nearest Neighbours, General Regression Neural Networks, Multilayer Perceptron (MLP), and Support Vector Machines (SVM), are differentially compared. Present paper evaluates these techniques along the experiments described for both categories when applying the methodological findings obtained in the companion paper. We conclude that some elements included in the architecture are not essential for a good performance of the machine learning promotional models, such as the semiparametric nature of the kernel in SVM models, whereas other can be strongly dependent of the database, such as the convenience of multiple output models in MLP regression schemes. Additionally, the specificity of the behaviour of certain categories and product ranges determines the need to establish suitable and specific procedures for a better prediction and feature extraction.  相似文献   

14.
Air quality is closely related to concentrations of gaseous pollutants, and the prediction of gaseous pollutant concentration plays a decisive role in regulating plant and vehicle emissions. Due to the non-linear and chaotic characteristics of the gas concentration series, traditional models may not easily capture the complex time series pattern. In this study, the Gaussian Process Mixture (GPM) model, which adopts hidden variables posterior hard-cut (HC) iterative learning algorithm, is first applied to the prediction of gaseous pollutant concentration in order to improve prediction performance. This algorithm adopts iterative learning and uses the maximizing a posteriori (MAP) estimation to achieve the optimal grouping of samples which effectively improves the expectation–maximization (EM) learning in GPM. The empirical results of the GPM model reveals improved prediction accuracy in gaseous pollutant concentration prediction, as compared with the kernel regression (K-R), minimax probability machine regression (MPMR), linear regression (L-R) and Gaussian Processes (GP) models. Furthermore, GPM with various learning algorithms, namely the HC algorithm, Leave-one-out Cross Validation (LOOCV), and variational algorithms, respectively, are also examined in this study. The results also show that the GPM with HC learning achieves superior performance compared with other learning algorithms.  相似文献   

15.
《Knowledge》2000,13(4):207-214
This paper describes a machine learning method, called Regression on Feature Projections (RFP), for predicting a real-valued target feature, given the values of multiple predictive features. In RFP training is based on simply storing the projections of the training instances on each feature separately. Prediction of the target value for a query point is obtained through two averaging procedures executed sequentially. The first averaging process is to find the individual predictions of features by using the K-Nearest Neighbor (KNN) algorithm. The second averaging process combines the predictions of all features. During the first averaging step, each feature is associated with a weight in order to determine the prediction ability of the feature at the local query point. The weights, found for each local query point, are used in the second prediction step and enforce the method to have an adaptive or context-sensitive nature. We have compared RFP with KNN and the rule based-regression algorithms. Results on real data sets show that RFP achieves better or comparable accuracy and is faster than both KNN and Rule-based regression algorithms.  相似文献   

16.
BackgroundSoftware fault prediction is the process of developing models that can be used by the software practitioners in the early phases of software development life cycle for detecting faulty constructs such as modules or classes. There are various machine learning techniques used in the past for predicting faults.MethodIn this study we perform a systematic review of studies from January 1991 to October 2013 in the literature that use the machine learning techniques for software fault prediction. We assess the performance capability of the machine learning techniques in existing research for software fault prediction. We also compare the performance of the machine learning techniques with the statistical techniques and other machine learning techniques. Further the strengths and weaknesses of machine learning techniques are summarized.ResultsIn this paper we have identified 64 primary studies and seven categories of the machine learning techniques. The results prove the prediction capability of the machine learning techniques for classifying module/class as fault prone or not fault prone. The models using the machine learning techniques for estimating software fault proneness outperform the traditional statistical models.ConclusionBased on the results obtained from the systematic review, we conclude that the machine learning techniques have the ability for predicting software fault proneness and can be used by software practitioners and researchers. However, the application of the machine learning techniques in software fault prediction is still limited and more number of studies should be carried out in order to obtain well formed and generalizable results. We provide future guidelines to practitioners and researchers based on the results obtained in this work.  相似文献   

17.
Predicting the fault-proneness labels of software program modules is an emerging software quality assurance activity and the quality of datasets collected from previous software version affects the performance of fault prediction models. In this paper, we propose an outlier detection approach using metrics thresholds and class labels to identify class outliers. We evaluate our approach on public NASA datasets from PROMISE repository. Experiments reveal that this novel outlier detection method improves the performance of robust software fault prediction models based on Naive Bayes and Random Forests machine learning algorithms.  相似文献   

18.
As an essential part of hydraulic transmission systems, hydraulic piston pumps have a significant role in many state-of-the-art industries. Thus, it is important to implement accurate and effective fault diagnosis of hydraulic piston pumps. Owing to the heavy reliance of shallow machine learning models on the expertise and experience of engineers, fault diagnosis based on deep models has attracted significant attention from academia and industry. To construct a deep model with good performance, it is necessary and challenging to tune the hyperparameters (HPs). Since many existing methods focus on manual tuning and use common search algorithms, it is meaningful to explore more intelligent algorithms that can automatically optimize the HPs. In this paper, Bayesian optimization (BO) is employed for adaptive HP learning, and an improved convolutional neural network (CNN) is established for fault feature extraction and classification in a hydraulic piston pump. First, acoustic signals are transformed into time–frequency distributions by a continuous wavelet transform. Second, a preliminary CNN model is built by setting initial HPs. The range of each HP to be optimized is identified. Third, BO is employed to select the optimal combination of HPs. An improved model called CNN-BO is constructed. Finally, the diagnostic efficiency of CNN-BO is analyzed using a confusion matrix and t-distributed stochastic neighbor embedding. The classification performance of different models is compared. It is found that CNN-BO has a higher accuracy and better robustness in fault diagnosis for a hydraulic piston pump. This research will provide a basis for ensuring the reliability and safety of the hydraulic pump.  相似文献   

19.
The investigation of the accuracy of methods employed to forecast agricultural commodities prices is an important area of study. In this context, the development of effective models is necessary. Regression ensembles can be used for this purpose. An ensemble is a set of combined models which act together to forecast a response variable with lower error. Faced with this, the general contribution of this work is to explore the predictive capability of regression ensembles by comparing ensembles among themselves, as well as with approaches that consider a single model (reference models) in the agribusiness area to forecast prices one month ahead. In this aspect, monthly time series referring to the price paid to producers in the state of Parana, Brazil for a 60 kg bag of soybean (case study 1) and wheat (case study 2) are used. The ensembles bagging (random forests — RF), boosting (gradient boosting machine — GBM and extreme gradient boosting machine — XGB), and stacking (STACK) are adopted. The support vector machine for regression (SVR), multilayer perceptron neural network (MLP) and K-nearest neighbors (KNN) are adopted as reference models. Performance measures such as mean absolute percentage error (MAPE), root mean squared error (RMSE), mean absolute error (MAE), and mean squared error (MSE) are used for models comparison. Friedman and Wilcoxon signed rank tests are applied to evaluate the models’ absolute percentage errors (APE). From the comparison of test set results, MAPE lower than 1% is observed for the best ensemble approaches. In this context, the XGB/STACK (Least Absolute Shrinkage and Selection Operator-KNN-XGB-SVR) and RF models showed better performance for short-term forecasting tasks for case studies 1 and 2, respectively. Better APE (statistically smaller) is observed for XGB/STACK and RF in relation to reference models. Besides that, approaches based on boosting are consistent, providing good results in both case studies. Alongside, a rank according to the performances is: XGB, GBM, RF, STACK, MLP, SVR and KNN. It can be concluded that the ensemble approach presents statistically significant gains, reducing prediction errors for the price series studied. The use of ensembles is recommended to forecast agricultural commodities prices one month ahead, since a more assertive performance is observed, which allows to increase the accuracy of the constructed model and reduce decision-making risk.  相似文献   

20.
Digital transformation (DT) is the process of combining digital technologies with sound business models to generate great value for enterprises. DT intertwines with customer requirements, domain knowledge, and theoretical and empirical insights for value propagations. Studies of DT are growing rapidly and heterogeneously, covering the aspects of product design, engineering, production, and life-cycle management due to the fast and market-driven industrial development under Industry 4.0. Our work addresses the challenge of understanding DT trends by presenting a machine learning (ML) approach for topic modeling to review and analyze advanced DT technology research and development. A systematic review process is developed based on the comprehensive DT in manufacturing systems and engineering literature (i.e., 99 articles). Six dominant topics are identified, namely smart factory, sustainability and product-service systems, construction digital transformation, public infrastructure-centric digital transformation, techno-centric digital transformation, and business model-centric digital transformation. The study also contributes to adopting and demonstrating the ML-based topic modeling for intelligent and systematic bibliometric analysis, particularly for unveiling advanced engineering research trends through domain literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号