首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents a novel rough-based feature selection method for gene expression data analysis. It can find the relevant features without requiring the number of clusters to be known a priori and identify the centers that approximate to the correct ones. In this paper, we attempt to introduce a prediction scheme that combines the rough-based feature selection method with radial basis function neural network. For further consider the effect of different feature selection methods and classifiers on this prediction process, we use the NaIve Bayes and linear support vector machine as classifiers, and compare the performance with other feature selection methods, including information gain and principle component analysis. We demonstrate the performance by several published datasets and the results show that our proposed method can achieve high classification accuracy rate.  相似文献   

2.
Automobile users experiencing soft failures, often delay reporting of warranty claims till the coverage is about to expire. This results into a customer-rush near the warranty expiration limit leading to an occurrence of ‘spikes’ in warranty claims towards the end of warranty period and thereby introducing a bias into the dataset. At the same time, an occurrence of manufacturing/assembly defects in addition to the usage related failures, lead to ‘spikes’ in warranty claims near the beginning of the warranty period. When such data are used to capture the field failures for obtaining feedback on product quality/reliability, it may lead product or reliability engineers to potentially obtain a distorted picture of the reality. Although in reliability studies from automobile warranty data, several authors have addressed the well-recognized issues of incomplete and unclean nature of warranty data, the issue of ‘spikes’ has not received much attention. In this article, we address the issue of ‘spikes’ in the presence of the incomplete and unclean nature of warranty data and provide a methodology to arrive at component-level empirical hazard plots from such automobile warranty data.  相似文献   

3.
Warranty data are a rich source of information for feedback on product reliability. However, two-dimensional automobile warranties that include both time and mileage limits, pose two interesting and challenging problems in reliability studies. First, warranty data are restricted only to the reported failures within warranty coverage and such incompleteness can lead to inaccurate estimates of field failure rate or hazard rate. Second, factors such as inexact time/mileage data and vague reported failures in a warranty claim make warranty data unclean that can suppress inherent failure pattern. In this paper we discuss two parameter estimation methods that address the incompleteness issue. We use a simulation-based experiment to study these estimation methods when departure from normality and varying amount of truncation exists. Using a life cycle model of the vehicle, we also highlight and explore issues that lead to warranty data not being very clean. We then propose a five-step methodology to arrive at meaningful component level empirical hazard plots from incomplete and unclean warranty data.  相似文献   

4.
This paper proposes a general mixture model framework for automobile warranty data that includes parameters for product field performance, the manufacturing and assembly process, and dealer preparation process. The model fits warranty claims as a mixture of manufacturing or assembly defects (quality problems) and usage related failures (reliability problems). The model also estimates the fraction of vehicles containing a manufacturing or assembly defect when leaving the assembly plant. This parameter measures the quality of the entire vehicle production process, i.e. component manufacturing and final assembly. The model also measures the proportion of manufacturing or assembly defects repaired by the automobile dealer prior to customer delivery. This conditional probability quantifies the ability of the vehicle preparation process to identify and repair defects prior to customer delivery. To apply the model to field failure or warranty data, the practitioner must identify parametric distributions for each of the two failure processes. To demonstrate the model, this paper develops a Weibull-Uniform mixture for manufacturer supplied warranty claim data.  相似文献   

5.
In 2018, 1.76 million people worldwide died of lung cancer. Most of these deaths are due to late diagnosis, and early-stage diagnosis significantly increases the likelihood of a successful treatment for lung cancer. Machine learning is a branch of artificial intelligence that allows computers to quickly identify patterns within complex and large datasets by learning from existing data. Machine-learning techniques have been improving rapidly and are increasingly used by medical professionals for the successful classification and diagnosis of early-stage disease. They are widely used in cancer diagnosis. In particular, machine learning has been used in the diagnosis of lung cancer due to the benefits it offers doctors and patients. In this context, we performed a study on machine-learning techniques to increase the classification accuracy of lung cancer with 32 × 56 sized numerical data from the Machine Learning Repository web site of the University of California, Irvine. In this study, the precision of the classification model was increased by the effective employment of pre-processing methods instead of direct use of classification algorithms. Nine datasets were derived with pre-processing methods and six machine-learning classification methods were used to achieve this improvement. The study results suggest that the accuracy of the k-nearest neighbors algorithm is superior to random forest, naïve Bayes, logistic regression, decision tree, and support vector machines. The performance of pre-processing methods was assessed on the lung cancer dataset. The most successful pre-processing methods were Z-score (83% accuracy) for normalization methods, principal component analysis (87% accuracy) for dimensionality reduction methods, and information gain (71% accuracy) for feature selection methods.  相似文献   

6.
Warranty data contain valuable information on product field reliability and customer behaviors. Most previous studies on analysis of warranty data implicitly assume that all failures within the warranty period are reported and recorded. However, the failed-but-not-reported (FBNR) phenomenon is quite common for a product whose price is not very high. Ignorance of the FBNR phenomenon leads to an overestimate of product reliability based on field return data or an overestimate of warranty cost based on lab data or tracking data. Being an indicator of customer satisfaction, the FBNR proportion provides valuable managerial insights. In this study, statistical inference for the FBNR phenomenon as well as field lifetime distribution is described. We first propose a flexible FBNR function to model the time-dependent FBNR behavior. Then, a framework for data analysis is developed. In the framework, both semiparametric and parametric approaches are used to jointly analyze warranty claim data and supplementary tracking data from a follow-up of selected customers. The FBNR problem in the tracking data is minimal and thus the data can be used to effectively decouple the FBNR information from the warranty claim data. The proposed methods are illustrated with an example. Supplementary materials for this article are available online.  相似文献   

7.
Metabolomics experiments involve the simultaneous detection of a high number of metabolites leading to large multivariate datasets and computer-based applications are required to extract relevant biological information. A high-throughput metabolic fingerprinting approach based on ultra performance liquid chromatography (UPLC) and high resolution time-of-flight (TOF) mass spectrometry (MS) was developed for the detection of wound biomarkers in the model plant Arabidopsis thaliana. High-dimensional data were generated and analysed with chemometric methods.Besides, machine learning classification algorithms constitute promising tools to decipher complex metabolic phenotypes but their application remains however scarcely reported in that research field. The present work proposes a comparative evaluation of a set of diverse machine learning schemes in the context of metabolomic data with respect to their ability to provide a deeper insight into the metabolite network involved in the wound response. Standalone classifiers, i.e. J48 (decision tree), kNN (instance-based learner), SMO (support vector machine), multilayer perceptron and RBF network (neural networks) and Naive Bayes (probabilistic method), or combinations of classification and feature selection algorithms, such as Information Gain, RELIEF-F, Correlation Feature-based Selection and SVM-based methods, are concurrently assessed and cross-validation resampling procedures are used to avoid overfitting.This study demonstrates that machine learning methods represent valuable tools for the analysis of UPLC-TOF/MS metabolomic data. In addition, remarkable performance was achieved, while the models' stability showed the robustness and the interpretability potential. The results allowed drawing attention to both temporal and spatial metabolic patterns in the context of stress signalling and highlighting relevant biomarkers not evidenced with standard data treatment.  相似文献   

8.
Warranty claims are not always due to product failures. They can also be caused by two types of human factors. On the one hand, consumers might claim warranty due to misuse and/or failures caused by various human factors. Such claims might account for more than 10% of all reported claims. On the other hand, consumers might not be bothered to claim warranty for failed items that are still under warranty, or they may claim warranty after they have experienced several intermittent failures. These two types of human factors can affect warranty claim costs. However, research in this area has received rather little attention.In this paper, we propose three models to estimate the expected warranty cost when the two types of human factors are included. We consider two types of failures: intermittent and fatal failures, which might result in different claim patterns. Consumers might report claims after a fatal failure has occurred, and upon intermittent failures they might report claims after a number of failures have occurred. Numerical examples are given to validate the results derived.  相似文献   

9.
《中国工程学刊》2012,35(1):80-92
ABSTRACT

Using machine learning algorithms for early prediction of the signs and symptoms of breast cancer is in demand nowadays. One of these algorithms is the K-nearest neighbor (KNN), which uses a technique for measuring the distance among data. The performance of KNN depends on the number of neighboring elements known as the K value. This study involves the exploration of KNN performance by using various distance functions and K values to find an effective KNN. Wisconsin breast cancer (WBC) and Wisconsin diagnostic breast cancer (WDBC) datasets from the UC Irvine machine learning repository were used as our main data sources. Experiments with each dataset were composed of three iterations. The first iteration of the experiment was without feature selection. The second one was the L1-norm based selection from the model, which used the linear support vector classifier feature selection, and the third iteration was with Chi-square-based feature selection. Numerous evaluation metrics like accuracy, receiver operating characteristic (ROC) curve with the area under curve (AUC) and sensitivity, etc., were used for the assessment of the implemented techniques. The results indicated that the technique involving the Chi-square-based feature selection achieved the highest accuracy with the Canberra or Manhattan distance functions for both datasets. The optimal K values for these distance functions ranged from 1 to 9. This study indicated that with the appropriate selection of the K value and a distance function in KNN, the Chi-square-based feature selection for the WBC datasets gives the highest accuracy rate as compared with the existing models.

Abbreviations: KNN: K-nearest neighbor; Chi2: Chi-square; WBC: Wisconsin breast cancer  相似文献   

10.
The mechanisms used to understand and reduce warranty costs often focus exclusively on the analysis of product failures. However, warranty costs can also be incurred by events such as support calls that do not involve a product failure. We describe a method used by a major electronics manufacturer to understand warranty costs by modeling warranty events. Furthermore, event modeling that uses a time‐dependent warranty event rate instead of the more standard average rate of failure allows the improved prediction of warranty costs and better accruals. As a result of the modeling, warranty engineers can help product managers more accurately predict the costs associated with removing certain warranty events, changing warranty policies and offering extended warranties for their electronic products. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

11.
An automobile with over 7000 parts is a highly complex product. In spite of employing the best quality and reliability practices during product development, manufacturing, and assembly, unexpected failures during warranty period do occur and cost automobile companies billions of dollars annually in warranty alone. Warranty coverage for an automobile is generally stated in terms of mileage (in miles) and time (in months or years). The coverage expires when any of the two limits is crossed. Any change in warranty coverage too, influences warranty cost significantly. However, changes made to warranty coverage are often market driven. In either case, a company needs to plan for maintaining a large cash reserve to pay for the warranty services on their products.In this paper, we present a simple method to assess the impact of new time/mileage warranty limits on the number and cost of warranty claims for components/sub-systems of a new product. We highlight the use of mileage accumulation rates of a population of vehicles to arrive at claims per thousand vehicles, sold with new time/mileage warranty limits. We also discuss the bias in warranty cost estimates that may result in using cumulative cost per repair information. We recommend the use of incremental cost per repair especially when populations with different mileage accumulation rates are under consideration. Application examples are included to illustrate the use of the proposed methodology.  相似文献   

12.
Prediction of cardiovascular disease (CVD) is a critical challenge in the area of clinical data analysis. In this study, an efficient heart disease prediction is developed based on optimal feature selection. Initially, the data pre‐processing process is performed using data cleaning, data transformation, missing values imputation, and data normalisation. Then the decision function‐based chaotic salp swarm (DFCSS) algorithm is used to select the optimal features in the feature selection process. Then the chosen attributes are given to the improved Elman neural network (IENN) for data classification. Here, the sailfish optimisation (SFO) algorithm is used to compute the optimal weight value of IENN. The combination of DFCSS–IENN‐based SFO (IESFO) algorithm effectively predicts heart disease. The proposed (DFCSS–IESFO) approach is implemented in the Python environment using two different datasets such as the University of California Irvine (UCI) Cleveland heart disease dataset and CVD dataset. The simulation results proved that the proposed scheme achieved a high‐classification accuracy of 98.7% for the CVD dataset and 98% for the UCI dataset compared to other classifiers, such as support vector machine, K‐nearest neighbour, Elman neural network, Gaussian Naive Bayes, logistic regression, random forest, and decision tree.Inspec keywords: cardiovascular system, medical diagnostic computing, feature extraction, regression analysis, data mining, learning (artificial intelligence), Bayes methods, neural nets, support vector machines, diseases, pattern classification, data handling, decision trees, cardiology, data analysis, feature selectionOther keywords: efficient heart disease prediction‐based, optimal feature selection, improved Elman‐SFO, cardiovascular disease, clinical data analysis, data pre‐processing process, data cleaning, data transformation, values imputation, data normalisation, decision function‐based chaotic salp swarm algorithm, optimal features, feature selection process, improved Elman neural network, data classification, sailfish optimisation algorithm, optimal weight value, DFCSS–IENN‐based SFO algorithm, DFCSS–IESFO, California Irvine Cleveland heart disease dataset, CVD dataset, high‐classification accuracy  相似文献   

13.
Nowadays, the amount of wed data is increasing at a rapid speed, which presents a serious challenge to the web monitoring. Text sentiment analysis, an important research topic in the area of natural language processing, is a crucial task in the web monitoring area. The accuracy of traditional text sentiment analysis methods might be degraded in dealing with mass data. Deep learning is a hot research topic of the artificial intelligence in the recent years. By now, several research groups have studied the sentiment analysis of English texts using deep learning methods. In contrary, relatively few works have so far considered the Chinese text sentiment analysis toward this direction. In this paper, a method for analyzing the Chinese text sentiment is proposed based on the convolutional neural network (CNN) in deep learning in order to improve the analysis accuracy. The feature values of the CNN after the training process are nonuniformly distributed. In order to overcome this problem, a method for normalizing the feature values is proposed. Moreover, the dimensions of the text features are optimized through simulations. Finally, a method for updating the learning rate in the training process of the CNN is presented in order to achieve better performances. Experiment results on the typical datasets indicate that the accuracy of the proposed method can be improved compared with that of the traditional supervised machine learning methods, e.g., the support vector machine method.  相似文献   

14.
《技术计量学》2013,55(2):148-159
Assessment of risk due to product failure is important both for purposes of finance (e.g., warranty costs) and safety (e.g., potential loss of human life). In many applications a prediction of the number of future failures is an important input to such an assessment.

Usually the field-data response used to make predictions of future failures is the number of weeks (or another unit of real time) in service. Use-rate information usually is not available (automobile warranty data are an exception, where both weeks in service and number of miles driven are available for units returned for warranty repair). With new technology, however, sensors and smart chips are being installed in many modern products ranging from computers and printers to automobiles and aircraft engines. Thus the coming generations of field data for many products will provide information on how the product was used and the environment in which it was used. This article was motivated by the need to predict warranty returns for a product with multiple failure modes. For this product, cycles-to-failure/use-rate information was available for those units that were connected to the network. We show how to use a cycles-to-failure model to compute predictions and prediction intervals for the number of warranty returns. We also present prediction methods for units not connected to the network. To provide insight into the reasons that use-rate models provide better predictions, we also present a comparison of asymptotic variances comparing the cycles-to-failure and time-to-failure models. This article has supplementary material online.  相似文献   

15.
Time series classification (TSC) has attracted various attention in the community of machine learning and data mining and has many successful applications such as fault detection and product identification in the process of building a smart factory. However, it is still challenging for the efficiency and accuracy of classification due to complexity, multi-dimension of time series. This paper presents a new approach for time series classification based on convolutional neural networks (CNN). The proposed method contains three parts: short-time gap feature extraction, multi-scale local feature learning, and global feature learning. In the process of short-time gap feature extraction, large kernel filters are employed to extract the features within the short-time gap from the raw time series. Then, a multi-scale feature extraction technique is applied in the process of multi-scale local feature learning to obtain detailed representations. The global convolution operation with giant stride is to obtain a robust and global feature representation. The comprehension features used for classifying are a fusion of short time gap feature representations, local multi-scale feature representations, and global feature representations. To test the efficiency of the proposed method named multi-scale feature fusion convolutional neural networks (MSFFCNN), we designed, trained MSFFCNN on some public sensors, device, and simulated control time series data sets. The comparative studies indicate our proposed MSFFCNN outperforms other alternatives, and we also provided a detailed analysis of the proposed MSFFCNN.  相似文献   

16.
Warranty claims and supplementary data contain useful information about product quality and reliability. Analysing such data can therefore be of benefit to manufacturers in identifying early warnings of abnormalities in their products, providing useful information about failure modes to aid design modification, estimating product reliability for deciding on warranty policy and forecasting future warranty claims needed for preparing fiscal plans. In the last two decades, considerable research has been conducted in warranty data analysis (WDA) from several different perspectives. This article attempts to summarise and review the research and developments in WDA with emphasis on models, methods and applications. It concludes with a brief discussion on current practices and possible future trends in WDA. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

17.
直接将入侵检测算法应用在粗糙数据上,其入侵检测分析的效率非常低.为解决该问题,提出了一种基于主成分分析的入侵检测方法.该方法通过提取网络连接中的相关信息,对它进行解码,并将解码的网络连接记录与已知的网络连接记录数据进行比较,发现记录中的变化和连接记录分布的主成分,最后将机器学习方法和主成分分析方法结合实现入侵检测.实验结果表明该方法应用到各种不同KDD99入侵检测数据集中可以有效减少学习时间、降低各种数据集的表示空间,提高入侵检测效率.  相似文献   

18.
This paper proposes a new feature selection method that uses a backward elimination procedure similar to that implemented in support vector machine recursive feature elimination (SVM-RFE). Unlike the SVM-RFE method, at each step, the proposed approach computes the feature ranking score from a statistical analysis of weight vectors of multiple linear SVMs trained on subsamples of the original training data. We tested the proposed method on four gene expression datasets for cancer classification. The results show that the proposed feature selection method selects better gene subsets than the original SVM-RFE and improves the classification accuracy. A Gene Ontology-based similarity assessment indicates that the selected subsets are functionally diverse, further validating our gene selection method. This investigation also suggests that, for gene expression-based cancer classification, average test error from multiple partitions of training and test sets can be recommended as a reference of performance quality.  相似文献   

19.
Lung cancer is a leading cause of cancer‐related death worldwide. The early diagnosis of cancer has demonstrated to be greatly helpful for curing the disease effectively. Microarray technology provides a promising approach of exploiting gene profiles for cancer diagnosis. In this study, the authors propose a gene expression programming (GEP)‐based model to predict lung cancer from microarray data. The authors use two gene selection methods to extract the significant lung cancer related genes, and accordingly propose different GEP‐based prediction models. Prediction performance evaluations and comparisons between the authors’ GEP models and three representative machine learning methods, support vector machine, multi‐layer perceptron and radial basis function neural network, were conducted thoroughly on real microarray lung cancer datasets. Reliability was assessed by the cross‐data set validation. The experimental results show that the GEP model using fewer feature genes outperformed other models in terms of accuracy, sensitivity, specificity and area under the receiver operating characteristic curve. It is concluded that GEP model is a better solution to lung cancer prediction problems.Inspec keywords: lung, cancer, medical diagnostic computing, patient diagnosis, genetic algorithms, feature selection, learning (artificial intelligence), support vector machines, multilayer perceptrons, radial basis function networks, reliability, sensitivity analysisOther keywords: lung cancer prediction, cancer‐related death, cancer diagnosis, gene profiles, gene expression programming‐based model, gene selection, GEP‐based prediction models, prediction performance evaluations, representative machine learning methods, support vector machine, multilayer perceptron, radial basis function neural network, real microarray lung cancer datasets, cross‐data set validation, reliability, receiver operating characteristic curve  相似文献   

20.
Burn-in and preventive maintenance (PM) are effective approaches to reduce the number of warranty claims and warranty cost during post-sale support. With harsher burn-in settings, early product defects can be removed, but at the same time product degradation is accelerated and more wear-out failures may be introduced. PM actions within warranty alleviate these negative effects. This paper proposes an optimal burn-in strategy for repairable products sold with a two-dimensional base warranty (BW) and an optional extended warranty (EW). Both performance-based and cost-based models incorporating PMs are developed to obtain optimal burn-in settings, including the burn-in duration and the burn-in usage rate, so as to minimise the expected number of warranty claims and total cost respectively. The impacts of different accelerated coefficients and PM degrees on the optimal burn-in strategy are analysed. In view of the performance and cost structures, we conduct numerical examples to illustrate the applicability of the proposed models. Practical implications from a sensitivity analysis for key parameters are also elaborated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号