首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于动态自适应数据窗口的模糊k-均值聚类缺失数据估算算法
引用本文:廖再飞,吕新杰,罗雄飞,刘伟,王宏安.一种基于动态自适应数据窗口的模糊k-均值聚类缺失数据估算算法[J].计算机研究与发展,2009,46(Z2).
作者姓名:廖再飞  吕新杰  罗雄飞  刘伟  王宏安
作者单位:1. 中国科学院软件研究所,北京,100190;中国科学院研究生院,北京,100049
2. 中国科学院软件研究所,北京,100190
基金项目:国家"八六三"高技术研究发展计划基金项目 
摘    要:完整性是数据质量的一个重要维度,由于数据本身固有的不确定性、采集的随机性及不准确性,导致现实应用中产生了大量具有如下特点的数据集:1)数据规模庞大;2)数据往往是不完整、不准确的.因此将大规模数据集分段到不同的数据窗口中处理是数据处理的重要方法,但缺失数据估算的相关研究大都忽视了数据集的特点和窗口的应用,而且回定大小的数据窗17容易造成算法的准确性和性能受窗口大小及窗口内数据值分布的影响.假设数据满足一定的领域相关的约束,首先提出了一种新的基于时间的动态自适应数据窗口检测算法,并基于此窗口提出了一种改进的模糊k-均值聚类算法来进行不完整数据的缺失数据估算.实验表明较之其他算法,不仅能更适应数据集的特点,具有较好的性能,而且能够保证准确性.

关 键 词:缺失数据  模糊k-均值  动态自适应  数据窗口  数据质量

Missing Data Imputation: A Fuzzy k-means Clustering Algorithm over Dynamic Adaptive Data Window
Liao Zaifei,Lü Xinjie,Luo Xiongfei,Liu Wei,Wang Hongan.Missing Data Imputation: A Fuzzy k-means Clustering Algorithm over Dynamic Adaptive Data Window[J].Journal of Computer Research and Development,2009,46(Z2).
Authors:Liao Zaifei  Lü Xinjie  Luo Xiongfei  Liu Wei  Wang Hongan
Abstract:Completeness is an important dimension of data quality.As the inherent uncertainty of data,along with the randomness and imprecision of data acquisition,the practical applications generate numerous datasets with following features:1)The data volume often is large-scale;2)Data items are always incomplete and inaccurate.So data window is indispensable in large-scale dataset analysis.But the data window and the nature of datasets are ignored by most related work.Moreover,the fixed size window can easily lead to that the accuracy and performance are affected by the window size and the distribution of data values.In this paper,it is supposed that the data meet some domain specified constraints.Firstly a novel time-based dynamic adaptive data window detection algorithm is presented:and then an improved fuzzy k-means clustering algorithm is proposed for the missing value imputation of incomplete data items.The experiments show that the proposed algorithm tends to be more tolerant of the features of dataset and can lead to better performance with accuracy guarantees.
Keywords:missing data  fuzzy k-means  dynamic adaptive  data window  data quality
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号