首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   25篇
  免费   2篇
电工技术   1篇
化学工业   2篇
轻工业   1篇
无线电   1篇
一般工业技术   2篇
自动化技术   20篇
  2019年   2篇
  2016年   2篇
  2015年   2篇
  2014年   1篇
  2012年   3篇
  2011年   2篇
  2010年   2篇
  2009年   1篇
  2008年   5篇
  2007年   3篇
  2006年   3篇
  2002年   1篇
排序方式: 共有27条查询结果,搜索用时 31 毫秒
1.
The problem of missing values in software measurement data used in empirical analysis has led to the proposal of numerous potential solutions. Imputation procedures, for example, have been proposed to ‘fill-in’ the missing values with plausible alternatives. We present a comprehensive study of imputation techniques using real-world software measurement datasets. Two different datasets with dramatically different properties were utilized in this study, with the injection of missing values according to three different missingness mechanisms (MCAR, MAR, and NI). We consider the occurrence of missing values in multiple attributes, and compare three procedures, Bayesian multiple imputation, k Nearest Neighbor imputation, and Mean imputation. We also examine the relationship between noise in the dataset and the performance of the imputation techniques, which has not been addressed previously. Our comprehensive experiments demonstrate conclusively that Bayesian multiple imputation is an extremely effective imputation technique.
Jason Van HulseEmail:

Taghi M. Khoshgoftaar   is a professor of the Department of Computer Science and Engineering, Florida Atlantic University and the Director of the Empirical Software Engineering and Data Mining and Machine Learning Laboratories. His research interests are in software engineering, software metrics, software reliability and quality engineering, computational intelligence, computer performance evaluation, data mining, machine learning, and statistical modeling. He has published more than 300 refereed papers in these areas. He is a member of the IEEE, IEEE Computer Society, and IEEE Reliability Society. He was the program chair and General Chair of the IEEE International Conference on Tools with Artificial Intelligence in 2004 and 2005 respectively. He has served on technical program committees of various international conferences, symposia, and workshops. Also, he has served as North American Editor of the Software Quality Journal, and is on the editorial boards of the journals Software Quality and Fuzzy systems. Jason Van Hulse   received the Ph.D. degree in Computer Engineering from the Department of Computer Science and Engineering at Florida Atlantic University in 2007, the M.A. degree in Mathematics from Stony Brook University in 2000, and the B.S. degree in Mathematics from the University at Albany in 1997. His research interests include data mining and knowledge discovery, machine learning, computational intelligence, and statistics. He has published numerous peer-reviewed research papers in various conferences and journals, and is a member of the IEEE, IEEE Computer Society, and ACM. He has worked in the data mining and predictive modeling field at First Data Corp. since 2000, and is currently Vice President, Decision Science.   相似文献   
2.
Transient disturbances in process measurements compromise the accuracy of some methods for plant-wide oscillation analysis. This paper presents a method to remove such transients while maintaining the dynamic features of the original measurement. The method is based on a nearest neighbors imputation technique. It replaces the removed transient with an estimate which is based on the time series of the whole measurement. The method is demonstrated on experimental and industrial case studies. The results demonstrate the efficacy of the method and recommended parameters. Furthermore, inconsistency indices are proposed which facilitate the automation of the method.  相似文献   
3.
Imputation through finite Gaussian mixture models   总被引:1,自引:0,他引:1  
Imputation is a widely used method for handling missing data. It consists in the replacement of missing values with plausible ones. Parametric and nonparametric techniques are generally adopted for modelling incomplete data. Both of them have advantages and drawbacks. Parametric techniques are parsimonious but depend on the model assumed, while nonparametric techniques are more flexible but require a high amount of observations. The use of finite mixture of multivariate Gaussian distributions for handling missing data is proposed. The main reason is that it allows to control the trade-off between parsimony and flexibility. An experimental comparison with the widely used imputation nearest neighbour donor is illustrated.  相似文献   
4.
Data plays a vital role as a source of information to organizations, especially in times of information and technology. One encounters a not-so-perfect database from which data is missing, and the results obtained from such a database may provide biased or misleading solutions. Therefore, imputing missing data to a database has been regarded as one of the major steps in data mining. The present research used different methods of data mining to construct imputative models in accordance with different types of missing data. When the missing data is continuous, regression models and Neural Networks are used to build imputative models. For the categorical missing data, the logistic regression model, neural network, C5.0 and CART are employed to construct imputative models. The results showed that the regression model was found to provide the best estimate of continuous missing data; but for categorical missing data, the C5.0 model proved the best method.  相似文献   
5.
This study suggests a method for improving spatial consistency in the estimation of forest stand data. Traditional nearest neighbor imputation can preserve between-variable consistency within a unit, but not between geographically nearby units. The lack of spatial consistency may cause problems when data are used for purposes of forestry planning or scenario analysis. In spatially consistent nearest neighbor imputation, adjacent units are considered in the estimation. The first step of the method is a k-Nearest Neighbor imputation. Secondly, based on the initial imputation an optimization algorithm, Simulated Annealing, is applied in order to reach certain spatial variation targets. The proposed method was tested in a case study where tree stem volume data were imputed to each unit (pixel) of forest stands, using satellite digital numbers as carrier data. The spatial variation measures used were between-pixel correlation and short-range variance. In addition, accuracy of the estimated stand level mean volume was used as a target in order to avoid drifts in mean volume during the optimization. The method was successful in three out of four stands where it resulted in imputations corresponding exactly to the target spatial variation measures. In the fourth stand it was not possible to find an exact solution. However, in this case the two spatial variation targets were reached whereas the mean stem volume was slightly overestimated (stem volume of 375 m3 ha− 1 rather than 336 m3 ha− 1).  相似文献   
6.
Numerous industrial and research databases include missing values. It is not uncommon to encounter databases that have up to a half of the entries missing, making it very difficult to mine them using data analysis methods that can work only with complete data. A common way of dealing with this problem is to impute (fill-in) the missing values. This paper evaluates how the choice of different imputation methods affects the performance of classifiers that are subsequently used with the imputed data. The experiments here focus on discrete data. This paper studies the effect of missing data imputation using five single imputation methods (a mean method, a Hot deck method, a Na?¨ve-Bayes method, and the latter two methods with a recently proposed imputation framework) and one multiple imputation method (a polytomous regression based method) on classification accuracy for six popular classifiers (RIPPER, C4.5, K-nearest-neighbor, support vector machine with polynomial and RBF kernels, and Na?¨ve-Bayes) on 15 datasets. This experimental study shows that imputation with the tested methods on average improves classification accuracy when compared to classification without imputation. Although the results show that there is no universally best imputation method, Na?¨ve-Bayes imputation is shown to give the best results for the RIPPER classifier for datasets with high amount (i.e., 40% and 50%) of missing data, polytomous regression imputation is shown to be the best for support vector machine classifier with polynomial kernel, and the application of the imputation framework is shown to be superior for the support vector machine with RBF kernel and K-nearest-neighbor. The analysis of the quality of the imputation with respect to varying amounts of missing data (i.e., between 5% and 50%) shows that all imputation methods, except for the mean imputation, improve classification error for data with more than 10% of missing data. Finally, some classifiers such as C4.5 and Na?¨ve-Bayes were found to be missing data resistant, i.e., they can produce accurate classification in the presence of missing data, while other classifiers such as K-nearest-neighbor, SVMs and RIPPER benefit from the imputation.  相似文献   
7.
As more and more real time spatio-temporal datasets become available at increasing spatial and temporal resolutions, the provision of high quality, predictive information about spatio-temporal processes becomes an increasingly feasible goal. However, many sensor networks that collect spatio-temporal information are prone to failure, resulting in missing data. To complicate matters, the missing data is often not missing at random, and is characterised by long periods where no data is observed. The performance of traditional univariate forecasting methods such as ARIMA models decreases with the length of the missing data period because they do not have access to local temporal information. However, if spatio-temporal autocorrelation is present in a space–time series then spatio-temporal approaches have the potential to offer better forecasts. In this paper, a non-parametric spatio-temporal kernel regression model is developed to forecast the future unit journey time values of road links in central London, UK, under the assumption of sensor malfunction. Only the current traffic patterns of the upstream and downstream neighbouring links are used to inform the forecasts. The model performance is compared with another form of non-parametric regression, K-nearest neighbours, which is also effective in forecasting under missing data. The methods show promising forecasting performance, particularly in periods of high congestion.  相似文献   
8.
针对k最近邻填充算法(kNNI)在缺失数据的k个最近邻可能存在噪声,提出一种新的缺失值填充算法——相互k最近邻填充算法MkNNI(Mutualk-NearestNeighborImputa—tion)。用于填充缺失值的数据,不仅是缺失数据的k最近邻,而且它的k最近邻也包含该缺失数据.从而有效地防止kNNI算法选取的k个最近邻点可能存在噪声这一情况。实验结果表明.MkNNI算法的填充准确性总体上要优于kNNI算法。  相似文献   
9.
符欲梅  朱芳  昝昕武 《传感技术学报》2012,25(12):1706-1710
针对桥梁健康监测系统中采集数据具有小样本、非线性且时序的特点,提出一种基于支持向量机的残缺数据填补方法,在分析数据的自相关性基础上,利用支持向量回归机原理,选择适当维数的样本作为支持向量机的输入向量,据此进行了残缺数据的预测;并与BP神经网络的填补效果相比较,实验结果显示了支持向量机在更小样本情况下填补残缺数据的优势和强泛化能力。  相似文献   
10.
何云  皮德常 《计算机科学》2015,42(11):251-255, 283
基因表达数据时常出现缺失,阻碍了对基因表达的研究。提出了一种新的相似性度量方案——精简关联度,在此基础上,又提出了基于精简关联度的缺失数据迭代填补算法(RKNNimpute)。精简关联度是对灰色关联度的一种改进,能达到与灰色关联度同样的效果,却显著降低了算法的时间复杂度。RKNNimpute算法以精简关联度作为相似度量,将填补后的基因扩充到近邻的候选基因集,通过迭代的方式填补其他缺失数据,提高了算法的填补效果和性能。选用时序、非时序、混合等不同类型的基因表达数据集进行了大量实验来评估RKNNimpute算法的性能。实验结果表明,精简关联度是一种高效的距离度量方法,所提出的RKNNimpute算法优于常规填补算法。  相似文献   
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号