共查询到13条相似文献,搜索用时 0 毫秒
1.
A comprehensive empirical evaluation of missing value imputation in noisy software measurement data 总被引:1,自引:0,他引:1
Jason Van Hulse Author Vitae Author Vitae 《Journal of Systems and Software》2008,81(5):691-708
The handling of missing values is a topic of growing interest in the software quality modeling domain. Data values may be absent from a dataset for numerous reasons, for example, the inability to measure certain attributes. As software engineering datasets are sometimes small in size, discarding observations (or program modules) with incomplete data is usually not desirable. Deleting data from a dataset can result in a significant loss of potentially valuable information. This is especially true when the missing data is located in an attribute that measures the quality of the program module, such as the number of faults observed in the program module during testing and after release. We present a comprehensive experimental analysis of five commonly used imputation techniques. This work also considers three different mechanisms governing the distribution of missing values in a dataset, and examines the impact of noise on the imputation process. To our knowledge, this is the first study to thoroughly evaluate the relationship between data quality and imputation. Further, our work is unique in that it employs a software engineering expert to oversee the evaluation of all of the procedures and to ensure that the results are not inadvertently influenced by poor quality data. Based on a comprehensive set of carefully controlled experiments, we conclude that Bayesian multiple imputation and regression imputation are the most effective techniques, while mean imputation performs extremely poorly. Although a preliminary evaluation has been conducted using Bayesian multiple imputation in the empirical software engineering domain, this is the first work to provide a thorough and detailed analysis of this technique. Our studies also demonstrate conclusively that the presence of noisy data has a dramatic impact on the effectiveness of imputation techniques. 相似文献
2.
Neural networks for the identification and control of blast furnace hot metal quality 总被引:8,自引:0,他引:8
The operation and control of blast furnaces poses a great challenge because of the difficult measurement and control problems associated with the unit. The measurement of hot metal composition with respect to silica and sulfur are critical to the economic operation of blast furnaces. The measurement of the compositions require spectrographic techniques which can be performed only off line. An alternate technique for measuring these variables is a Soft Sensor based on neural networks. In the present work a neural network based model has been developed and trained relating the output variables with a set of thirty three process variables. The output variables include the quantity of the hot metal and slag as well as their composition with respect to all the important constituents. These process variables can be measured on-line and hence the soft sensor can be used on-line to predict the output parameters. The soft sensor has been able to predict the variables with an error less than 3%. A supervisory control system based on the neural network estimator and an expert system has been found to substantially improve the hot metal quality with respect to silicon and sulfur. 相似文献
3.
In this paper, we propose new missing imputation methods for the missing genotype data of single nucleotide polymorphism (SNP). The common objective of imputation methods is to minimize the loss of information caused by experimental missing elements. In general, imputation of missing genotype data has used a major allele method, but this approach is not far from the objective of the imputation - minimizing the loss of information. This method generally produces high error rates of missing value estimation, since the characteristics of the genotype data are not considered over the structure of given genotype data. In our methods, we use the linkage disequilibrium and haplotype information for the missing SNP genotype. As a result, we provide the results of the comparative evaluation of our methods and major allele imputation method according to the various randomized missing rates. 相似文献
4.
Incidents happening in the blast furnace will strongly affect the stability and smoothness of the iron-making process. Thus far, diagnosis of abnormalities in furnaces still mainly relies on the personal experiences of individual workers in many iron works. In this paper, principal component analysis (PCA)-based algorithms are developed to monitor the iron-making process and achieve early abnormality detection. Because the process exhibits a non-normal distribution and a time-varying nature in the measurement data, a static convex hull-based PCA algorithm (SCHPCA) which replaces the traditional T2-based abnormality detection logic with the convex hull-based abnormality detection logic, and its moving window version, called the moving window convex hull-based PCA algorithm (MWCHPCA) are proposed, respectively. These two algorithms are tested on the real process data to verify their effectiveness in the early abnormality detection of iron-making process. 相似文献
5.
Radhakrishna Vangipuram Rajesh Kumar Gunupudi Veereswara Kumar Puligadda Janaki Vinjamuri 《Expert Systems》2020,37(5):e12556
The problem of anomaly and attack detection in IoT environment is one of the prime challenges in the domain of internet of things that requires an immediate concern. For example, anomalies and attacks in IoT environment such as scan, malicious operation, denial of service, spying, data type probing, wrong setup, malicious control can lead to failure of an IoT system. Datasets generated in an IoT environment usually have missing values. The presence of missing values makes the classifier unsuitable for classification task. This article introduces (a) a novel imputation technique for imputation of missing data values (b) a classifier which is based on feature transformation to perform classification (c) imputation measure for similarity computation between any two instances that can also be used as similarity measure. The performance of proposed classifier is studied by using imputed datasets obtained through applying Kmeans, F-Kmeans and proposed imputation methods. Experiments are also conducted by applying existing and proposed classifiers on the imputed dataset obtained using proposed imputation technique. For experimental study in this article, we have used an open source dataset named distributed smart space orchestration system publicly available from Kaggle. Experiment results obtained are also validated using Wilcoxon non-parametric statistical test. It is proved that the performance of proposed approach is better when compared to existing classifiers when the imputation process is performed using F-Kmeans and K-Means imputation techniques. It is also observed that accuracies for attack classes scan, malicious operation, denial of service, spying, data type probing, wrong setup are 100% while it is 99% for malicious control attack class when the proposed imputation and classification technique are applied. 相似文献
6.
介绍了AC800F系统在高炉鼓风机控制系统中的防喘振控制及定风量、定风压控制的控制方案。 相似文献
7.
8.
袁静 《自动化与仪器仪表》2009,(6):51-52
着重介绍了无基坑与有基坑2种轨道衡在高炉上的应用特点,阐述了高炉出铁场特殊环境对轨道衡的影响,以及在采购、安装和维护过程中的注意事项。 相似文献
9.
Jian Tang Zhixiang Chen Ada Waichee Fu David W. Cheung 《Knowledge and Information Systems》2007,11(1):45-84
Outlier detection is concerned with discovering exceptional behaviors of objects. Its theoretical principle and practical implementation lay a foundation for some important applications such as credit card fraud detection, discovering criminal behaviors in e-commerce, discovering computer intrusion, etc. In this paper, we first present a unified model for several existing outlier detection schemes, and propose a compatibility theory, which establishes a framework for describing the capabilities for various outlier formulation schemes in terms of matching users'intuitions. Under this framework, we show that the density-based scheme is more powerful than the distance-based scheme when a dataset contains patterns with diverse characteristics. The density-based scheme, however, is less effective when the patterns are of comparable densities with the outliers. We then introduce a connectivity-based scheme that improves the effectiveness of the density-based scheme when a pattern itself is of similar density as an outlier. We compare density-based and connectivity-based schemes in terms of their strengths and weaknesses, and demonstrate applications with different features where each of them is more effective than the other. Finally, connectivity-based and density-based schemes are comparatively evaluated on both real-life and synthetic datasets in terms of recall, precision, rank power and implementation-free metrics.
Jian Tang received an MS degree from the University of Iowa in 1983, and PhD from the Pennsylvania State University in 1988, both from the Department of Computer Science. He joined the Department of Computer Science, Memorial University of Newfoundland, Canada, in 1988, where he is currently a professor. He has visited a number of research institutions to conduct researches ranging over a variety of topics relating to theories and practices for database management and systems. His current research interests include data mining, e-commerce, XML and bioinformatics.
Zhixiang Chen is an associate professor in the Computer Science Department, University of Texas-Pan American. He received his PhD in computer science from Boston University in January 1996, BS and MS degrees in software engineering from Huazhong University of Science and Technology. He also studied at the University of Illinois at Chicago. He taught at Southwest State University from Fall 1995 to September 1997, and Huazhong University of Science and Technology from 1982 to 1990. His research interests include computational learning theory, algorithms and complexity, intelligent Web search, informational retrieval, and data mining.
Ada Waichee Fu received her BSc degree in computer science in the Chinese University of Hong Kong in 1983, and both MSc and PhD degrees in computer science in Simon Fraser University of Canada in 1986, 1990, respectively; worked at Bell Northern Research in Ottawa, Canada, from 1989 to 1993 on a wide-area distributed database project; joined the Chinese University of Hong Kong in 1993. Her research interests are XML data, time series databases, data mining, content-based retrieval in multimedia databases, parallel, and distributed systems.
David Wai-lok Cheung received the MSc and PhD degrees in computer science from Simon Fraser University, Canada, in 1985 and 1989, respectively. He also received the BSc degree in mathematics from the Chinese University of Hong Kong. From 1989 to 1993, he was a member of Scientific Staff at Bell Northern Research, Canada. Since 1994, he has been a faculty member of the Department of Computer Science in the University of Hong Kong. He is also the Director of the Center for E-Commerce Infrastructure Development. His research interests include data mining, data warehouse, XML technology for e-commerce and bioinformatics. Dr. Cheung was the Program Committee Chairman of the Fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2001), Program Co-Chair of the Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2005). Dr. Cheung is a member of the ACM and the IEEE Computer Society. 相似文献
10.
提出了一种基于图像处理的对车辆违章逆行进行自动检测与识别的方法。利用中值滤波、帧数统计、帧计数和边界检测等图像处理方法,对视频图像中的逆行车辆进行运动特征提取与识别,实现对违章逆行车辆的自动检测,实验结果表明该方法是有效的。 相似文献
11.
This paper proposes a novel subspace approach towards identification of optimal residual models for process fault detection and isolation (PFDI) in a multivariate continuous-time system. We formulate the problem in terms of the state space model of the continuous-time system. The motivation for such a formulation is that the fault gain matrix, which links the process faults to the state variables of the system under consideration, is always available no matter how the faults vary with time. However, in the discrete-time state space model, the fault gain matrix is only available when the faults follow some known function of time within each sampling interval. To isolate faults, the fault gain matrix is essential. We develop subspace algorithms in the continuous-time domain to directly identify the residual models from sampled noisy data without separate identification of the system matrices. Furthermore, the proposed approach can also be extended towards the identification of the system matrices if they are needed. The newly proposed approach is applied to a simulated four-tank system, where a small leak from any tank is successfully detected and isolated. To make a comparison, we also apply the discrete time residual models to the tank system for detection and isolation of leaks. It is demonstrated that the continuous-time PFDI approach is practical and has better performance than the discrete-time PFDI approach. 相似文献
12.
13.
Identification, prediction and detection of the process fault in a cement rotary kiln by locally linear neuro-fuzzy technique 总被引:1,自引:0,他引:1
In this paper, we use nonlinear system identification method to predict and detect process fault of a cement rotary kiln. After selecting proper inputs and output, an input-output model is identified for the plant. To identify the various operation points in the kiln, locally linear neuro-fuzzy (LLNF) model is used. This model is trained by LOLIMOT algorithm which is an incremental tree-structure algorithm. Then, using this method, we obtained 3 distinct models for the normal and faulty situations in the kiln. One of the models is for normal condition of the kiln with 15 min prediction horizon. The other two models are presented for the two faulty situations in the kiln with 7 min prediction horizon. At the end, we detect these faults in validation data. The data collected from White Saveh Cement Company is used in this study. 相似文献