首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Customer churn has emerged as a critical issue for Customer Relationship Management and customer retention in the telecommunications industry, thus churn prediction is necessary and valuable to retain the customers and reduce the losses. Moreover, high predictive accuracy and good interpretability of the results are two key measures of a classification model. More studies have shown that single model-based classification methods may not be good enough to achieve a satisfactory result. To obtain more accurate predictive results, we present a novel hybrid model-based learning system, which integrates the supervised and unsupervised techniques for predicting customer behaviour. The system combines a modified k-means clustering algorithm and a classic rule inductive technique (FOIL).Three sets of experiments were carried out on telecom datasets. One set of the experiments is for verifying that the weighted k-means clustering can lead to a better data partitioning results; the second set of experiments is for evaluating the classification results, and comparing it to other well-known modelling techniques; the last set of experiment compares the proposed hybrid-model system with several other recently proposed hybrid classification approaches. We also performed a comparative study on a set of benchmarks obtained from the UCI repository. All the results show that the hybrid model-based learning system is very promising and outperform the existing models.  相似文献   

2.
BackgroundSoftware fault prediction is the process of developing models that can be used by the software practitioners in the early phases of software development life cycle for detecting faulty constructs such as modules or classes. There are various machine learning techniques used in the past for predicting faults.MethodIn this study we perform a systematic review of studies from January 1991 to October 2013 in the literature that use the machine learning techniques for software fault prediction. We assess the performance capability of the machine learning techniques in existing research for software fault prediction. We also compare the performance of the machine learning techniques with the statistical techniques and other machine learning techniques. Further the strengths and weaknesses of machine learning techniques are summarized.ResultsIn this paper we have identified 64 primary studies and seven categories of the machine learning techniques. The results prove the prediction capability of the machine learning techniques for classifying module/class as fault prone or not fault prone. The models using the machine learning techniques for estimating software fault proneness outperform the traditional statistical models.ConclusionBased on the results obtained from the systematic review, we conclude that the machine learning techniques have the ability for predicting software fault proneness and can be used by software practitioners and researchers. However, the application of the machine learning techniques in software fault prediction is still limited and more number of studies should be carried out in order to obtain well formed and generalizable results. We provide future guidelines to practitioners and researchers based on the results obtained in this work.  相似文献   

3.
To survive in today's telecommunication business it is imperative to distinguish customers who are not reluctant to move toward a competitor. Therefore, customer churn prediction has become an essential issue in telecommunication business. In such competitive business a reliable customer predictor will be regarded priceless. This paper has employed data mining classification techniques including Decision Tree, Artificial Neural Networks, K-Nearest Neighbors, and Support Vector Machine so as to compare their performances. Using the data of an Iranian mobile company, not only were these techniques experienced and compared to one another, but also we have drawn a parallel between some different prominent data mining software. Analyzing the techniques’ behavior and coming to know their specialties, we proposed a hybrid methodology which made considerable improvements to the value of some of the evaluations metrics. The proposed methodology results showed that above 95% accuracy for Recall and Precision is easily achievable. Apart from that a new methodology for extracting influential features in dataset was introduced and experienced.  相似文献   

4.
In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to accurately classify some e-learning students, whereas another may succeed, three decision schemes, which combine in different ways the results of the three machine learning techniques, were also tested. The method was examined in terms of overall accuracy, sensitivity and precision and its results were found to be significantly better than those reported in relevant literature.  相似文献   

5.
A business incurs much higher charges when attempting to win new customers than to retain existing ones. As a result, much research has been invested into new ways of identifying those customers who have a high risk of churning. However, customer retention efforts have also been costing organisations large amounts of resource. In response to these issues, the next generation of churn management should focus on accuracy. A variety of churn management techniques have been developed as a response to the above requirements. The focus of this paper is to review some of the most popular technologies that have been identified in the literature for the development of a customer churn management platform. The advantages and disadvantages of the identified technologies are discussed, and a discussion on the future research directions is offered.  相似文献   

6.
Failure prediction is the task of forecasting whether a material system of interest will fail at a specific point of time in the future. This task attains significance for strategies of industrial maintenance, such as predictive maintenance. For solving the prediction task, machine learning (ML) technology is increasingly being used, and the literature provides evidence for the effectiveness of ML-based prediction models. However, the state of recent research and the lessons learned are not well documented. Therefore, the objective of this review is to assess the adoption of ML technology for failure prediction in industrial maintenance and synthesize the reported results. We conducted a systematic search for experimental studies in peer-reviewed outlets published from 2012 to 2020. We screened a total of 1,024 articles, of which 34 met the inclusion criteria. We focused on understanding the datasets analyzed, the preprocessing to generate features, and the training and evaluation of prediction models. The results reveal (1) a broad range of systems and domains addressed, (2) the adoption of up-to-date approaches to preprocessing and training, (3) some lack of performance evaluation mitigating the overfitting problem, and (4) considerable heterogeneity in the reporting of experimental designs and results. We identify opportunities for future research and suggest ways to facilitate the comparison and integration of evidence obtained from single studies.  相似文献   

7.
As churn management is a major task for companies to retain valuable customers, the ability to predict customer churn is necessary. In literature, neural networks have shown their applicability to churn prediction. On the other hand, hybrid data mining techniques by combining two or more techniques have been proved to provide better performances than many single techniques over a number of different domain problems. This paper considers two hybrid models by combining two different neural network techniques for churn prediction, which are back-propagation artificial neural networks (ANN) and self-organizing maps (SOM). The hybrid models are ANN combined with ANN (ANN + ANN) and SOM combined with ANN (SOM + ANN). In particular, the first technique of the two hybrid models performs the data reduction task by filtering out unrepresentative training data. Then, the outputs as representative data are used to create the prediction model based on the second technique. To evaluate the performance of these models, three different kinds of testing sets are considered. They are the general testing set and two fuzzy testing sets based on the filtered out data by the first technique of the two hybrid models, i.e. ANN and SOM, respectively. The experimental results show that the two hybrid models outperform the single neural network baseline model in terms of prediction accuracy and Types I and II errors over the three kinds of testing sets. In addition, the ANN + ANN hybrid model significantly performs better than the SOM + ANN hybrid model and the ANN baseline model.  相似文献   

8.
Customer retention in telecommunication companies is one of the most important issues in customer relationship management, and customer churn prediction is a major instrument in customer retention. Churn prediction aims at identifying potential churning customers. Traditional approaches for determining potential churning customers are based only on customer personal information without considering the relationship among customers. However, the subscribers of telecommunication companies are connected with other customers, and network properties among people may affect the churn. For this reason, we proposed a new procedure of the churn prediction by examining the communication patterns among subscribers and considering a propagation process in a network based on call detail records which transfers churning information from churners to non-churners. A fast and effective propagation process is possible through community detection and through setting the initial energy of churners (the amount of information transferred) differently in churn date or centrality. The proposed procedure was evaluated based on the performance of the prediction model trained with a social network feature and traditional personal features.  相似文献   

9.
Support vector machine (SVM) is currently state-of-the-art for classification tasks due to its ability to model nonlinearities. However, the main drawback of SVM is that it generates “black box” model, i.e. it does not reveal the knowledge learnt during training in human comprehensible form. The process of converting such opaque models into a transparent model is often regarded as rule extraction. In this paper we proposed a hybrid approach for extracting rules from SVM for customer relationship management (CRM) purposes. The proposed hybrid approach consists of three phases. (i) During first phase; SVM-RFE (SVM-recursive feature elimination) is employed to reduce the feature set. (ii) Dataset with reduced features is then used in the second phase to obtain SVM model and support vectors are extracted. (iii) Rules are then generated using Naive Bayes Tree (NBTree) in the final phase. The dataset analyzed in this research study is about Churn prediction in bank credit card customer (Business Intelligence Cup 2004) and it is highly unbalanced with 93.24% loyal and 6.76% churned customers. Further we employed various standard balancing approaches to balance the data and extracted rules. It is observed from the empirical results that the proposed hybrid outperformed all other techniques tested. As the reduced feature dataset is used, it is also observed that the proposed approach extracts smaller length rules, thereby improving the comprehensibility of the system. The generated rules act as an early warning expert system to the bank management.  相似文献   

10.
ContextOne of the most important factors in the development of a software project is the quality of their requirements. Erroneous requirements, if not detected early, may cause many serious problems, such as substantial additional costs, failure to meet the expected objectives and delays in delivery dates. For these reasons, great effort must be devoted in requirements engineering to ensure that the project’s requirements results are of high quality. One of the aims of this discipline is the automatic processing of requirements for assessing their quality; this aim, however, results in a complex task because the quality of requirements depends mostly on the interpretation of experts and the necessities and demands of the project at hand.ObjectiveThe objective of this paper is to assess the quality of requirements automatically, emulating the assessment that a quality expert of a project would assess.MethodThe proposed methodology is based on the idea of learning based on standard metrics that represent the characteristics that an expert takes into consideration when deciding on the good or bad quality of requirements. Using machine learning techniques, a classifier is trained with requirements earlier classified by the expert, which then is used for classifying newly provided requirements.ResultsWe present two approaches to represent the methodology with two situations of the problem in function of the requirement corpus learning balancing, obtaining different results in the accuracy and the efficiency in order to evaluate both representations. The paper demonstrates the reliability of the methodology by presenting a case study with requirements provided by the Requirements Working Group of the INCOSE organization.ConclusionsA methodology that evaluates the quality of requirements written in natural language is presented in order to emulate the quality that the expert would provide for new requirements, with 86.1 of average in the accuracy.  相似文献   

11.
磁盘是保存数据的重要载体,提高磁盘的可靠性和数据可用性具有重要意义。现代磁盘普遍支持SMART协议,用来监控磁盘的内部工作状态。采用机器学习方法,分析磁盘的SMART信息,实现对磁盘故障的预测。所采用的机器学习方法包括反向神经网络、决策树、支持向量机以及简单贝叶斯,并采用实际磁盘SMART数据进行验证与分析。基于上述数据,对不同机器学习方法的有效性及其效果进行了对比。结果表明,决策树方法的预测率最好,支持向量机方法的误报率最低。  相似文献   

12.
Software effort estimation accuracy is a key factor in effective planning, controlling, and delivering a successful software project within budget and schedule. The overestimation and underestimation both are the key challenges for future software development, henceforth there is a continuous need for accuracy in software effort estimation. The researchers and practitioners are striving to identify which machine learning estimation technique gives more accurate results based on evaluation measures, datasets and other relevant attributes. The authors of related research are generally not aware of previously published results of machine learning effort estimation techniques. The main aim of this study is to assist the researchers to know which machine learning technique yields the promising effort estimation accuracy prediction in software development. In this article, the performance of the machine learning ensemble and solo techniques are investigated on publicly and non-publicly domain datasets based on the two most commonly used accuracy evaluation metrics. We used the systematic literature review methodology proposed by Kitchenham and Charters. This includes searching for the most relevant papers, applying quality assessment (QA) criteria, extracting data, and drawing results. We have evaluated a state-of-the-art accuracy performance of 35 selected studies (17 ensemble, 18 solo) using mean magnitude of relative error and PRED (25) as a set of reliable accuracy metrics for performance evaluation of accuracy among two techniques to report the research questions stated in this study. We found that machine learning techniques are the most frequently implemented in the construction of ensemble effort estimation (EEE) techniques. The results of this study revealed that the EEE techniques usually yield a promising estimation accuracy than the solo techniques.  相似文献   

13.
客户流失预测的现状与发展研究   总被引:5,自引:1,他引:5  
根据客户流失预测研究的发展历程和智能化程度的高低,将客户流失预测研究划分为三个阶段,包括基于传统统计学的预测方法、基于人工智能的预测方法和基于统计学习理论的预测方法,并通过分析每个阶段存在的问题提出了未来可研究的方向。  相似文献   

14.

Context

Software defect prediction studies usually built models using within-company data, but very few focused on the prediction models trained with cross-company data. It is difficult to employ these models which are built on the within-company data in practice, because of the lack of these local data repositories. Recently, transfer learning has attracted more and more attention for building classifier in target domain using the data from related source domain. It is very useful in cases when distributions of training and test instances differ, but is it appropriate for cross-company software defect prediction?

Objective

In this paper, we consider the cross-company defect prediction scenario where source and target data are drawn from different companies. In order to harness cross company data, we try to exploit the transfer learning method to build faster and highly effective prediction model.

Method

Unlike the prior works selecting training data which are similar from the test data, we proposed a novel algorithm called Transfer Naive Bayes (TNB), by using the information of all the proper features in training data. Our solution estimates the distribution of the test data, and transfers cross-company data information into the weights of the training data. On these weighted data, the defect prediction model is built.

Results

This article presents a theoretical analysis for the comparative methods, and shows the experiment results on the data sets from different organizations. It indicates that TNB is more accurate in terms of AUC (The area under the receiver operating characteristic curve), within less runtime than the state of the art methods.

Conclusion

It is concluded that when there are too few local training data to train good classifiers, the useful knowledge from different-distribution training data on feature level may help. We are optimistic that our transfer learning method can guide optimal resource allocation strategies, which may reduce software testing cost and increase effectiveness of software testing process.  相似文献   

15.
准确的用户流失预测能力有助于企业提高用户保持率、增加用户数量和增加盈利。现有的流失用户预测模型大多为单一模型或是多个模型的简单融合,没有充分发挥多模型集成的优势。借鉴了随机森林的Bootstrap Sampling的思想,提出了一种改进的Stacking集成方法,并将该方法应用到了真实数据集上进行流失用户的预测。通过验证集上的实验比较可知,提出的方法在流失用户F1值、召回率和预测准确率3项指标上均好于所有相同结构的经典Stacking集成方法;当采用恰当的集成结构时,其表现可超越基分类器上的最优表现。  相似文献   

16.
Large area land-cover monitoring scenarios, involving large volumes of data, are becoming more prevalent in remote sensing applications. Thus, there is a pressing need for increased automation in the change mapping process. The objective of this research is to compare the performance of three machine learning algorithms (MLAs); two classification tree software routines (S-plus and C4.5) and an artificial neural network (ARTMAP), in the context of mapping land-cover modifications in northern and southern California study sites between 1990/91 and 1996. Comparisons were based on several criteria: overall accuracy, sensitivity to data set size and variation, and noise. ARTMAP produced the most accurate maps overall ( 84%), for two study areas — in southern and northern California, and was most resistant to training data deficiencies. The change map generated using ARTMAP has similar accuracies to a human-interpreted map produced by the U.S. Forest Service in the southern study area. ARTMAP appears to be robust and accurate for automated, large area change monitoring as it performed equally well across the diverse study areas with minimal human intervention in the classification process.  相似文献   

17.
针对数据挖掘方法在电信客户流失预测中的局限性,提出将信息融合与数据挖掘相结合,分别从数据层、特征层、决策层构建客户流失预测模型。确定客户流失预测指标;根据客户样本在特征空间分布的差异性对客户进行划分,得到不同特征的客户群;不同客户群采用不同算法构建客户流失预测模型,再通过人工蚁群算法求得模型融合权重,将各模型的预测结果加权得到预测最终结果。实验结果表明,基于信息融合的客户流失预测模型确实比传统模型更优。  相似文献   

18.
To build a successful customer churn prediction model, a classification algorithm should be chosen that fulfills two requirements: strong classification performance and a high level of model interpretability. In recent literature, ensemble classifiers have demonstrated superior performance in a multitude of applications and data mining contests. However, due to an increased complexity they result in models that are often difficult to interpret. In this study, GAMensPlus, an ensemble classifier based upon generalized additive models (GAMs), in which both performance and interpretability are reconciled, is presented and evaluated in a context of churn prediction modeling. The recently proposed GAMens, based upon Bagging, the Random Subspace Method and semi-parametric GAMs as constituent classifiers, is extended to include two instruments for model interpretability: generalized feature importance scores, and bootstrap confidence bands for smoothing splines. In an experimental comparison on data sets of six real-life churn prediction projects, the competitive performance of the proposed algorithm over a set of well-known benchmark algorithms is demonstrated in terms of four evaluation metrics. Further, the ability of the technique to deliver valuable insight into the drivers of customer churn is illustrated in a case study on data from a European bank. Firstly, it is shown how the generalized feature importance scores allow the analyst to identify the relative importance of churn predictors in function of the criterion that is used to measure the quality of the model predictions. Secondly, the ability of GAMensPlus to identify nonlinear relationships between predictors and churn probabilities is demonstrated.  相似文献   

19.
应用简易支持向量机(SSVM)进行客户流失预测,以提高机器学习方法的预测能力。以国外电信公司客户流失预测为实例,与最近邻算法(NPA)进行了对比,发现该方法在获得与NPA近似准确率的条件下,所花费的时间和时间增加值远小于NPA,是研究客户流失预测问题的有效方法。  相似文献   

20.
较高精度的煤与瓦斯突出预测是煤矿安全生产的必要前提和保证.为了提高煤与瓦斯突出预测模型的预测精度,提出了一种改进的极限学习机煤与瓦斯突出预测模型.首先利用核主成分分析法对煤与瓦斯突出的影响指标进行降维简化处理,提取指标数据的主成分序列;把主成分序列分为训练样本和验证样本,然后在训练阶段,使用训练样本通过结合了全局搜索和局部搜索的文化基因算法对极限学习机的输入权值和隐含层偏差进行优化,得到最佳预测模型;最后,在最佳预测模型中,用验证样本对煤与瓦斯突出强度进行预测.通过实例验证,该模型能够有效预测煤与瓦斯突出强度.与BP、SVM、ELM、KPCA-ELM等预测模型相比,该模型具有更高的预测精度.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号