硬盘故障所致的数据丢失和损坏给企业和用户带来重大损失,硬盘故障预测也因此引起了学术界和企业界的高度重视,涌现了不少基于机器学习的故障预测方法,但由于存在机器学习算法模型的样本数据差异、性能指标不一致等原因,无法合理评估预测方法的优劣。鉴于此,建立了基于机器学习的硬盘故障检测评估平台,在统一的实验平台中对随机森林、逻辑回归、多层感知神经网络、决策树、朴素贝叶斯、极端梯度提升树、梯度提升决策树和AdaBoost算法模型进行故障预测性能比较,主要针对相同样本集和同一性能度量进行预测对比研究,还对同一预测模型在不同大小样本集上的预测效果进行了对比。实验结果表明:随机森林模型和梯度提升决策树模型不仅预测精度很高而且对不同规模的样本集具有很强的泛化性。  相似文献   

为实现电梯困人故障的应急处置快速响应,缩短现场故障原因排查时间,促进排障模式由人工经验向数据支撑下的智能诊断转变,利用梯度提升树算法(GBDT)建立电梯故障原因预测模型。经过数据清洗和特征提取,以2015—2020年南京市累积电梯故障数据进行模型训练。与真实值对比后的预测结果表明,前三位故障原因实时预测准确率可达81%,评估指标优于同类型机器学习算法。GBDT模型预测性可适用于电梯困人故障数据稀疏、特征量不明显的预测问题。  相似文献   

软件故障静态预测方法综述   总被引:2,自引:0,他引:2  
软件故障静态预测通过从项目数据中提取度量信息预测故障,以便于测试和验证资源的分配。从可用度量数据和预测模型两个方面总结了软件故障静态预测方法,可用度量包括方法层、类层、构件层、文件层以及过程层度量,预测模型分为机器学习和统计方法两类;总结了性能评价指标、度量数据可得性以及故障分类对故障预测的影响等需要进一步研究的问题。  相似文献   

由于共享单车的流动性强,随机性很高,因此快速精确地预测出城市共享单车的短时需求量具有十分重要的意义.采用随机森林、极端随机树、支持向量机、人工神经网络、XGBoost这5种机器学习方法,基于美国华盛顿共享单车项目数据,分析时间因子、气象因子等对单车需求量的影响,实现对共享单车短时需求量的预测.仿真结果表明,影响单车需求...  相似文献   

为了解决循环神经网络不能并行训练导致的在时间性能上的缺陷,对Transformer架构应用于船舶运行的趋势预测进行研究,并围绕Transformer设计一套更高效的故障预测方法。在数据预处理上,设计了一种滑动窗口的方法筛选平稳数据,取得了良好的结果;在故障诊断上,使用多项式回归,通过对历史数据的训练,在测试集上获得了较高评分;在趋势预测上,使用Transformer架构,经过训练可以做出较为精准的预测,并在与GRU网络的比较中体现出了显著的性能优势。综合来看,所设计的方法有一定实践意义。  相似文献   

基于机器学习的软件缺陷预测是一种有效的提高软件可靠性的方法。该方法基于软件模块的统计特性预测软件模块可能出现的缺陷数或是否容易出现缺陷。通过对软件模块缺陷状况的预测,软件开发组织可以将有限的资源集中于容易出现缺陷的模块,从而有效地提高软件产品的质量。基于机器学习的软件缺陷预测近年来出现了很多研究成果,文章概述该领域近年来的主要研究成果,并根据各方法的特点进行了分类。  相似文献   

为了解决传统分析方法在直流供电系统中电弧故障检测的精确度不足及过程繁琐的问题,将直流电弧故障检测归为二分类问题,引入机器学习方法,通过直流电弧实验得到正常状态和电弧状态的数据,从时域中提取电流均值等4个特征,从频域中提取高频分量标准差等3个特征.利用提取到的特征对支持向量机(SVM)进行训练,利用求解得到的模型对测试数据集进行分类,分类准确率为94.483%.结果证明:所提方法能有效检测直流电弧故障,提高故障检测精度,且步骤精简,易于推广.  相似文献   

血管紧张素转换酶抑制剂(ACEI)对高血压的治疗具有重要意义。基于从结构复杂的化合物数据库中构建的候选小分子数据集,采用分子对接技术从数据集中筛选出样本构建分类模型。分别采用支持向量机、[K]近邻、决策树、随机森林和贝叶斯方法建立血管紧张素转换酶潜在抑制剂和非抑制剂的分类模型。经结果对比,支持向量机相比于其他方法有更高的预测率,其中模型总体预测率和相关系数分别为82.4%和0.653。研究表明,支持向量机方法对于虚拟筛选血管紧张素转换酶抑制剂具有良好的效果。  相似文献   

Predicting resource consumption and run time of computational workloads is crucial for efficient resource allocation, or cost and energy optimization. In this paper, we evaluate various machine learning techniques to predict the execution time of computational jobs. For experiments we use datasets from two application areas: scientific workflow management and data processing in the ALICE experiment at CERN. We apply a two-stage prediction method and evaluate its performance. Other evaluated aspects include: (1) comparing performance of global (per-workflow) versus specialized (per-job) models; (2) impact of prediction granularity in the first stage of the two-stage method; (3) using various feature sets, feature selection, and feature importance analysis; (4) applying symbolic regression in addition to classical regressors. Our results provide new valuable insights on using machine learning techniques to predict the runtime behavior of computational jobs.  相似文献   

李洪亮  张弄  孙婷  李想 《计算机应用》2022,42(6):1649-1655
通过分析分布式机器学习中作业性能干扰的问题,发现性能干扰是由于内存过载、带宽竞争等GPU资源分配不均导致的,为此设计并实现了快速预测作业间性能干扰的机制,该预测机制能够根据给定的GPU参数和作业类型自适应地预测作业干扰程度。首先,通过实验获取分布式机器学习作业运行时的GPU参数和干扰率,并分析出各类参数对性能干扰的影响;其次,依托多种预测技术建立GPU参数-干扰率模型进行作业干扰率误差分析;最后,建立自适应的作业干扰率预测算法,面向给定的设备环境和作业集合自动选择误差最小的预测模型,快速、准确地预测作业干扰率。选取5种常用的神经网络作业,在两种GPU设备上设计实验并进行结果分析。结果显示,所提出的自适应干扰预测(AIP)机制能够在不提供任何预先假设信息的前提下快速完成预测模型的选择和性能干扰预测,耗时在300 s以内,预测干扰率误差在2%~13%,可应用于作业调度和负载均衡等场景。  相似文献   

Failure prediction is the task of forecasting whether a material system of interest will fail at a specific point of time in the future. This task attains significance for strategies of industrial maintenance, such as predictive maintenance. For solving the prediction task, machine learning (ML) technology is increasingly being used, and the literature provides evidence for the effectiveness of ML-based prediction models. However, the state of recent research and the lessons learned are not well documented. Therefore, the objective of this review is to assess the adoption of ML technology for failure prediction in industrial maintenance and synthesize the reported results. We conducted a systematic search for experimental studies in peer-reviewed outlets published from 2012 to 2020. We screened a total of 1,024 articles, of which 34 met the inclusion criteria. We focused on understanding the datasets analyzed, the preprocessing to generate features, and the training and evaluation of prediction models. The results reveal (1) a broad range of systems and domains addressed, (2) the adoption of up-to-date approaches to preprocessing and training, (3) some lack of performance evaluation mitigating the overfitting problem, and (4) considerable heterogeneity in the reporting of experimental designs and results. We identify opportunities for future research and suggest ways to facilitate the comparison and integration of evidence obtained from single studies.  相似文献   

Multimodal machine learning(MML)aims to understand the world from multiple related modalities.It has attracted much attention as multimodal data has become increasingly available in real-world application.It is shown that MML can perform better than single-modal machine learning,since multi-modalities containing more information which could complement each other.However,it is a key challenge to fuse the multi-modalities in MML.Different from previous work,we further consider the side-information,which reflects the situation and influences the fusion of multi-modalities.We recover multimodal label distribution(MLD)by leveraging the side-information,representing the degree to which each modality contributes to describing the instance.Accordingly,a novel framework named multimodal label distribution learning(MLDL)is proposed to recover the MLD,and fuse the multimodalities with its guidance to learn an in-depth understanding of the jointly feature representation.Moreover,two versions of MLDL are proposed to deal with the sequential data.Experiments on multimodal sentiment analysis and disease prediction show that the proposed approaches perform favorably against state-of-the-art methods.  相似文献   

韩敏  王新迎 《控制理论与应用》2013,30(11):1467-1472
针对多元混沌时间序列具有强非线性, 难以建立数学模型进行准确预测的问题, 本文提出一种加权极端学习机预测算法. 首先对多元混沌时间序列进行相空间重构, 并根据相空间中输入数据对预测误差的影响施加不同的权重. 然后, 提出一种支持向量极端学习机预测模型, 具有支持向量机的核映射表达能力以及极端学习机的一步快速训练能力, 因此训练简便且具有较好的泛化性能. 所提算法具有和训练样本三次方成正比的计算复杂度, 因此适用于10^2~10^3样本规模的平稳时间序列. 基于Lorenz混沌时间序列和年太阳黑子和黄河年径流混沌时间序列预测的仿真结果证明所提算法的有效性.  相似文献   

We present a comparative study on the most popular machine learning methods applied to the challenging problem of customer churning prediction in the telecommunications industry. In the first phase of our experiments, all models were applied and evaluated using cross-validation on a popular, public domain dataset. In the second phase, the performance improvement offered by boosting was studied. In order to determine the most efficient parameter combinations we performed a series of Monte Carlo simulations for each method and for a wide range of parameters. Our results demonstrate clear superiority of the boosted versions of the models against the plain (non-boosted) versions. The best overall classifier was the SVM-POLY using AdaBoost with accuracy of almost 97% and F-measure over 84%.  相似文献   

交通流信息预测是智能交通系统进行交通疏导管理的重要基础,为城市交通管理规划提供可靠的数据支持和科学的决策依据。由于交通流量数据是实时更新的增量流数据,每次更新历史数据集时都需要重新构建预测模型,消耗了大量计算资源和运行时间,为此提出一种基于改进在线顺序极限学习机的交通流预测模型(IOS-ELM),通过构建新增数据的增强特征映射关系,生成交通流动态更新特征表示空间,实现短时交通流预测模型的动态更新。利用长沙市远大一路交通流数据评估该模型,实验结果表明,IOS-ELM模型在NRMSE和MAPE的预测性能上均超过其他基准预测模型(MLP、ELM、OS-ELM和SVR),同时模型的预测耗时较小,可以保证一定实时性,满足城市道路交通流的实时准确预测的需求。  相似文献   

Parameter server (PS) as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied. However, existing PS-based systems often depend on memory implementations. With memory constraints, machine learning (ML) developers cannot train large-scale ML models in their rather small local clusters. Moreover, renting large-scale cloud servers is always economically infeasible for research teams and small companies. In this paper, we propose a disk-resident parameter server system named DRPS, which reduces the hardware requirement of large-scale machine learning tasks by storing high dimensional models on disk. To further improve the performance of DRPS, we build an efficient index structure for parameters to reduce the disk I/O cost. Based on this index structure, we propose a novel multi-objective partitioning algorithm for the parameters. Finally, a flexible workerselection parallel model of computation (WSP) is proposed to strike a right balance between the problem of inconsistent parameter versions (staleness) and that of inconsistent execution progresses (straggler). Extensive experiments on many typical machine learning applications with real and synthetic datasets validate the effectiveness of DRPS.  相似文献   

In recent years, financial distress prediction (FDP), also known as corporate failure prediction or bankruptcy prediction, has gained significant importance due to its impact on organizations, especially during unexpected events like pandemics and wars. Machine learning (ML) models have emerged as innovative and essential tools in predicting financial distress, leveraging the ever-increasing volume of databases and computing power. This study utilizes bibliographic techniques to contribute to the field's literature review to address the disorganized nature of the existing literature on FDP, reduce confusion, and provide clarity to domain researchers. These techniques enable identifying the progress of articles published over the years, influential authors, and highly cited articles. Additionally, the study examines crucial aspects of data preprocessing, such as missing data, imbalanced data, feature selection, and outliers, as they significantly impact the robustness and performance of ML models. Furthermore, it discusses essential models employed in FDP, focusing on recent advancements that represent promising trends. In conclusion, this study contributes to the field by uncovering novel trends and proposing possible directions for advancing FDP research. These findings will guide researchers, practitioners, and stakeholders in their quest for improved prediction and decision-making in financial distress.  相似文献   

