首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
一种基于双聚类的缺失数据填补方法   总被引:1,自引:0,他引:1  
针对现实数据集的数据缺失问题,提出了一种基于双聚类的缺失数据填补新方法.该算法利用双聚类簇内平均平方残值越小簇内数据相似性越高的这一特性,将缺失数据的填补问题转换为求解特定双聚类簇最小平均平方残值的问题,进而实现了数据集中缺失元素的预测;再利用二次函数求解极小值的思想对包含有缺失数据的特定双聚类簇最小平均平方残值的问题进行求解,并进行了数学上的分析证明.最后进行仿真验证,通过观察UCI数据集的实验结果可知,提出的算法具有较高的填补准确性.  相似文献   

2.
缺失填补是机器学习与数据挖掘领域中极富有挑战性的工作。数据源中的缺失值会对学习算法的性能与学习的质量产生较大的负面影响。目前存在的缺失值填补方法还不能满足用户的需要。提出了一种基于灰色系统理论的缺失值填补方法,该方法采用了基于实例学习的非参拟合和灰色理论技术,对缺失数据进行重复填补,直至填补结果收敛或者满足用户的需要。实验结果表明,该方法在填补效果与效率方面都比现有的KNN填补法和普通的均值替代法要好。  相似文献   

3.
现有人力资源数据缺失值填补方法均方根误差大、填补命中率低等问题。提出一种基于数据挖掘的人力资源数据缺失值填补方法。采用分裂Bregman迭代算法消除人力资源数据中存在的噪声,根据人力资源数据的时间序列特征,挖掘数据中存在的隐藏变量。根据特征对缺失值进行检测。通过FCMSI算法根据缺失值检测结果对缺失值进行填补,采用平均比率法首次填充人力资源数据,通过模糊C均值聚类算法对填充后的数据进行聚类处理,其次在协同过滤思想的基础上进一步对人力资源数据的缺失值进行填补。实验结果表明,所提方法的均方根误差小、填补命中率高。  相似文献   

4.
5.
王凤梅  胡丽霞 《计算机工程》2012,38(21):53-55,62
数据缺失是数据挖掘与分析过程中的常见问题,若直接删除含缺失的事例可能导致不可靠的决策。为此,针对缺失数据的填补问题,提出一种基于近邻规则的缺失数据填补方法。根据关联规则的后件数据项进行分类,计算分类后的规则项与缺失项集间的相似度,用最相似的规则项值填补缺失值。实验结果表明,该方法具有较高的填补正确率。  相似文献   

6.
为提升PM2.5浓度预测精度,提出基于时空融合与缺失值填补的预测方法。抓住时空相关性,以历史气象和PM2.5浓度数据作为输入,利用长短时记忆神经网络和人工神经网络从时空两个维度对未来一小时PM2.5水平进行预测,用模型树进行融合。由于数据集中存在大量的连续缺失数据,为弥补其带来的不利影响,利用所提算法对预测模型进行辅助。实验结果表明,时空融合比单维度单模型的预测表现更佳,提出的填补算法使预测误差进一步降低。  相似文献   

7.
随着数据来源的不断丰富,数据的获取变得愈发容易,但质量难以得到保证,从而导致缺失值在真实数据集中普遍存在且难以避免,缺失值填补也就成为数据质量管理领域的经典问题之一。目前,大多数的缺失值填补算法均是针对静态数据提出的,并不适用于高速到达的动态数据流,且现有算法大多未同时考虑数据的稀疏性和异构性问题。基于此,文中提出了一种新的基于独立模型的在线缺失值填补算法RIIM。该算法同时考虑了数据的稀疏性和异构性问题,并结合近邻填补和回归填补的基本思想对缺失值进行有效填补。首先,针对数据的动态实时性,提出了高效的填补模型增量更新算法;其次,针对数据近邻查找时间代价高以及近邻个数难以确定的问题,提出了最优近邻自适应周期性更新策略;最后基于真实数据集通过大量实验验证了所提算法的有效性。  相似文献   

8.
缺失飞参数据填补的组合方法研究   总被引:1,自引:0,他引:1  
针对飞参数据的特点,将B样条曲线拟合和最小二乘支持向量机相互结合,提出了一种缺失飞参数据填补的组合方法。该方法将两者优势互补,对单一方法的填补结果进行加权平均,增强了算法的可靠性,提高了数据填补的精度。对比实验的结果表明了方法的可行性和适用性。  相似文献   

9.
提出一种基于支持向量机的缺失值填补方法。该方法将缺失值填补分为连续属性缺失值填补和类别属性缺失值填补两种情况。对于连续属性的情况,采用支持向量机回归进行缺失值的预测;对于类别属性的情况,采用支持向量机分类进行缺失值的预测。在几个UCI数据集和MINIT手写阿拉伯数字数据集上的对比实验说明,该算法较传统的均值填补方法和基于决策树回归的缺失值填补方法具有更高的恢复率。  相似文献   

10.
在工程中,为了达到高速率的数据传输和良好的外场接收性能,LTE系统通常采用最小均方误差(MMSE)信道估计方法。针对传统的MMSE算法对多径时变信道的适应能力较差,提出了一种自适应参数MMSE信道估计系数调整算法。通过对信道均方根时延扩展(RMS Delay Spread)和对信噪比的估计,自适应地调整信道估计参数并生成准最佳的MMSE信道估计系数进行滤波。仿真结果表明,此算法比固定系数的MMSE信道估计算法有更好的信道估计性能。  相似文献   

11.
Numerous industrial and research databases include missing values. It is not uncommon to encounter databases that have up to a half of the entries missing, making it very difficult to mine them using data analysis methods that can work only with complete data. A common way of dealing with this problem is to impute (fill-in) the missing values. This paper evaluates how the choice of different imputation methods affects the performance of classifiers that are subsequently used with the imputed data. The experiments here focus on discrete data. This paper studies the effect of missing data imputation using five single imputation methods (a mean method, a Hot deck method, a Na?¨ve-Bayes method, and the latter two methods with a recently proposed imputation framework) and one multiple imputation method (a polytomous regression based method) on classification accuracy for six popular classifiers (RIPPER, C4.5, K-nearest-neighbor, support vector machine with polynomial and RBF kernels, and Na?¨ve-Bayes) on 15 datasets. This experimental study shows that imputation with the tested methods on average improves classification accuracy when compared to classification without imputation. Although the results show that there is no universally best imputation method, Na?¨ve-Bayes imputation is shown to give the best results for the RIPPER classifier for datasets with high amount (i.e., 40% and 50%) of missing data, polytomous regression imputation is shown to be the best for support vector machine classifier with polynomial kernel, and the application of the imputation framework is shown to be superior for the support vector machine with RBF kernel and K-nearest-neighbor. The analysis of the quality of the imputation with respect to varying amounts of missing data (i.e., between 5% and 50%) shows that all imputation methods, except for the mean imputation, improve classification error for data with more than 10% of missing data. Finally, some classifiers such as C4.5 and Na?¨ve-Bayes were found to be missing data resistant, i.e., they can produce accurate classification in the presence of missing data, while other classifiers such as K-nearest-neighbor, SVMs and RIPPER benefit from the imputation.  相似文献   

12.
Previous research has shown that method two-way with error for multiple imputation in test and questionnaire data produces small bias in statistical analyses. This method is based on a two-way ANOVA model of persons by items but it is improper from a Bayesian point of view. Proper two-way imputations are generated using data augmentation. Simulation results show that the resulting method two-way with data augmentation produces unbiased results in Cronbach's alpha, the mean of squares in ANOVA, the item means, and small bias in the mean test score and the factor loadings from principal components analysis. The data with imputed scores result in statistics having a slightly larger standard deviation than the original complete data. Method two-way with error produces results that are only slightly more biased, especially for low percentages of missingness. Thus, it may serve as an accurate approximation to the more involved method two-way with data augmentation.  相似文献   

13.
Microarray data are used in many biomedical experiments. They often contain missing values which significantly affect statistical algorithms. Although a number of imputation algorithms have been proposed, they have various limitations to exploit local and global information effectively for estimation. It is necessary to develop more effective techniques to solve the data imputation problem. In this paper, we propose a theoretic framework of local weighted approximation for missing value estimation, based on the Taylor series approximation. Besides revealing that k-nearest neighbor imputation (KNNimpute) is a special case of the framework, we focus on the study of its linear case—local weighted linear approximation imputation (LWLAimpute) from theory to experiment. Experimental results show that LWLAimpute and its iterative version can achieve better performance than some existing imputation methods, the superiority becomes more significant with increasing level of missing values.  相似文献   

14.
过程系统的控制与优化要求可靠的过程数据。通过测量得到的过程数据含有随机误差和过失误差,采用数据校正技术可有效地减小过程测量数据的误差,从而提高过程控制与优化的准确性。针对传统基于最小二乘的数据校正方法:和基于准最小二乘的鲁棒数据校正方法:,分析了它们的优缺点,并提出了一种最小二乘与准最小二乘组合方法:。该方法:先采用准最小二乘估计器检测过失误差并剔除,然后再采用最小二乘估计器进行数据校正,可以综合前两种方法:各自的优点,使得数据校正结果:更加准确。将提出最小二乘与准最小二乘组合方法:应用于线性与非线性系统的数据校正中,通过校正结果:的比较说明此方法:的具有较好的过失误差检测能力和较准确的数据校正结果:。最后将此方法:应用于实际过程系统空气分离流程的数据校正中,结果:说明了此方法:的有效性。  相似文献   

15.
CDN带宽异常值的预测和准确告警一直是网络运营的重点和难点,为此在时间序列LSTM(long short term memory network)基础之上,提出并实现了一套新的算法框架——局部加权回归串行LSTM.框架采用时序插值采样方法构造数据集,局部加权算法融入最小二乘回归拟合模型进行初始预测,预测结果串行LSTM...  相似文献   

16.
为了使预测器在特定应用环境中的有限字长效应满足应用系统的性能要求,以小目标检测为应用背景,提出了理论和实验相结合的确定TDNLMS(二维归一化最小均方误差)自适应预测器运算字长的方法。同时分析了步长参数、输人数据字长、图像统计特性、预测器支撑区域等因素与TDNLMS自适应预测器权值和迭代运算中间结果量化误差之间的联系,并通过实验对分析结果进行了验证。仿真结果表明,用该方法设计的有限精度预测器,其小目标检测性能与无限精度预测器十分接近。  相似文献   

17.
In this paper, the classical least squares (LS) and recursive least squares (RLS) for parameter estimation have been re-examined in the light of the present day computing capabilities. It has been demonstrated that for linear time-invariant systems, the performance of blockwise least squares (BLS) is always superior to that of RLS. In the context of parameter estimation for dynamic systems, the current computational capability of personal computers are more than adequate for BLS. However, for time-varying systems with abrupt parameter changes, standard blockwise LS may no longer be suitable due to its inefficiency in discarding “old” data. To deal with this limitation, a novel sliding window blockwise least squares approach with automatically adjustable window length triggered by a change detection scheme is proposed. Two types of sliding windows, rectangular and exponential, have been investigated. The performance of the proposed algorithm has been illustrated by comparing with the standard RLS and an exponentially weighted RLS (EWRLS) using two examples. The simulation results have conclusively shown that: (1) BLS has better performance than RLS; (2) the proposed variable-length sliding window blockwise least squares (VLSWBLS) algorithm can outperform RLS with forgetting factors; (3) the scheme has both good tracking ability for abrupt parameter changes and can ensure the high accuracy of parameter estimate at the steady-state; and (4) the computational burden of VLSWBLS is completely manageable with the current computer technology. Even though the idea presented here is straightforward, it has significant implications to virtually all areas of application where RLS schemes are used.  相似文献   

18.
In practical industrial applications, the key performance indicator (KPI)-related prediction and diagnosis are quite important for the product quality and economic benefits. To meet these requirements, many advanced prediction and monitoring approaches have been developed which can be classified into model-based or data-driven techniques. Among these approaches, partial least squares (PLS) is one of the most popular data-driven methods due to its simplicity and easy implementation in large-scale industrial process. As PLS is totally based on the measured process data, the characteristics of the process data are critical for the success of PLS. Outliers and missing values are two common characteristics of the measured data which can severely affect the effectiveness of PLS. To ensure the applicability of PLS in practical industrial applications, this paper introduces a robust version of PLS to deal with outliers and missing values, simultaneously. The effectiveness of the proposed method is finally demonstrated by the application results of the KPI-related prediction and diagnosis on an industrial benchmark of Tennessee Eastman process.  相似文献   

19.
相关数据集的最小二乘处理方法   总被引:4,自引:0,他引:4  
数据的最小二乘处理可以归结为求解线性方程组Ax=b,不论在何种情形下(常定,超定或欠定),它都有最小二乘意义下的最优解.这要求数据矩阵A的相关矩阵的逆矩阵存在,即欠定增况下的AAT或超定情况下的ATA是满秩的.对于降秩的AAT或ATA的情况,文中提出用奇异值分解的方法求其矩阵伪逆,使数据的最小二乘处理适应于相关数据集的处理.同直接对数据矩阵A进行奇异值分解求AX=b的最小二乘解相比,本文提出的方法只需对阶数较低的对称方阵进行分解,可在微机上实现高维数据的处理.  相似文献   

20.
为了提高传统Z-Score财务预警模型的预警能力,本文将改进FOA算法的良好寻优能力和Z-Score财务预警模型相结合,提出了一种改进FOA算法的上市公司Z-Score财务预警模型.采用改进FOA算法来优化Z-Score模型的参数,降低预测值和目标值之间的均方根误差(RMSE).经对选取上市公司财务数据的预测值和目标值对比,且检验其准确率.实验结果:传统的Z-Score模型、基本FOA算法优化Z-Score模型和改进FOA算法优化Z-Score模型的预测准确率分别为65%、70%和80%.实验表明改进的算法较大提升了Z-Score财务预警模型的预测能力,也表明了该算法的有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号