首页 | 本学科首页   官方微博 | 高级检索  
     

基于元组相似度的不完备数据填补方法研究
引用本文:王俊陆,王玲,王妍,宋宝燕. 基于元组相似度的不完备数据填补方法研究[J]. 计算机科学, 2017, 44(2): 98-102, 106
作者姓名:王俊陆  王玲  王妍  宋宝燕
作者单位:辽宁大学信息学院 沈阳110036,辽宁大学信息学院 沈阳110036,辽宁大学信息学院 沈阳110036;东北大学信息与工程学院 沈阳110819,辽宁大学信息学院 沈阳110036
基金项目:本文受国家自然科学基金项目(61472169,61472072),国家科技支撑计划项目(2012BAF13B08),国家“973”重点基础研究发展计划前期研究专项(2014CB360509),辽宁省科学事业公益研究基金项目(2015003003),辽宁大学科研基金(科技类)项目(LDQN2015001)资助
摘    要:随着互联网及信息技术的发展,数据缺失、损坏等问题越来越普遍,尤其随着数据收集工作从人工转向机器,存储介质的不稳定性及网络传输出现遗漏等原因都导致数据缺失更加严重。数据库中大量的缺失值不仅严重影响了用户查询质量,还对数据挖掘与数据分析结果的正确性造成了影响,进而误导决策。目前,对缺失数据的填补还没有一种比较通用的方法,大部分策略都是针对某一类型的缺失值问题进行处理。因此,针对不同缺失类型同时出现在不完备数据中的复杂情况,提出了一种基于元组相似度的不完备数据填补方法(IATS)。采用数据挖掘的方法提取出不完备数据集中的加权关联规则,并根据此规则进行常规缺失数据的填补,而对于数据集的异常缺失问题,又引入数据推荐算法,采用推荐筛选策略进行元组相似度的计算并实现相应填补,在很大程度上提高了数据的有效利用率和用户查询结果的质量。实验表明,IATS策略在保证填补率的前提下具有更好的准确率。

关 键 词:海量数据  缺失类型  加权关联规则  元组相似度
收稿时间:2015-10-12
修稿时间:2016-01-21

Missing Data Imputation Approach Based on Tuple Similarity
WANG Jun-lu,WANG Ling,WANG Yan and SONG Bao-yan. Missing Data Imputation Approach Based on Tuple Similarity[J]. Computer Science, 2017, 44(2): 98-102, 106
Authors:WANG Jun-lu  WANG Ling  WANG Yan  SONG Bao-yan
Affiliation:School of Information,Liaoning University,Shenyang 110036,China,School of Information,Liaoning University,Shenyang 110036,China,School of Information,Liaoning University,Shenyang 110036,China;School of Information Science and Engineering,Northeastern University,Shenyang 110819,China and School of Information,Liaoning University,Shenyang 110036,China
Abstract:With the development of Internet and information technology,the data loss,damage and other problems become more and more popular.Especially with data collection from the manual to machine,storage medium is not stability,transmission omissions appear and other reasons,resulting that missing data are more serious.A large number of missing values in the database not only seriously affect the quality of the query,but also affect the accuracy of the results of data mining and data analysis.At present,there is not a general method to deal with missing data.Most of the strategies are based on the problem of the missing value of a certain type.Therefore,in view of this complex situation of that the different deletion types also appear in the incomplete data at the same time,this paper put forward missing data imputation approach based on tuple similarity(IATS).Incomplete data sets of weighted association rules are extracted by the method of data mining,and according to the rules imputate normal missing data,and for abnormal missing data,this paper introduced data recommendation algorithm,the recommended screening strategy of tuple similarity calculation and the realization of the corresponding fill,and then it greatly improves the data effective utilization rate and user query result quality.The experimental results show that the IATS strategy has better accuracy under the premise of ensuring the filling ratio.
Keywords:Massive data  Deletion type  Weighted association rules  Tuple similarity
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号