首页 | 本学科首页   官方微博 | 高级检索  
     

基于非时间属性关联的数据逼真生成算法
引用本文:张锐,肖如良,倪友聪,杜欣,蔡声镇.基于非时间属性关联的数据逼真生成算法[J].计算机系统应用,2018,27(2):30-36.
作者姓名:张锐  肖如良  倪友聪  杜欣  蔡声镇
作者单位:福建师范大学 数学与信息学院, 福州 350117;福建师范大学 数字福建环境监测物联网实验室, 福州 350117,福建师范大学 数学与信息学院, 福州 350117;福建师范大学 数字福建环境监测物联网实验室, 福州 350117,福建师范大学 数学与信息学院, 福州 350117;福建师范大学 数字福建环境监测物联网实验室, 福州 350117,福建师范大学 数学与信息学院, 福州 350117;福建师范大学 数字福建环境监测物联网实验室, 福州 350117,福建师范大学 数学与信息学院, 福州 350117;福建师范大学 数字福建环境监测物联网实验室, 福州 350117
基金项目:福建省科技计划重大项目(2016H6007); 福州市市校合作项目(2016-G-40)
摘    要:提出基于非时间属性关联的数据逼真生成算法. 该算法可以解决数据生成器研发中非时间属性关联构建的困难问题, 在大数据测评领域中对仿真数据生成有重要应用价值. 首先, 从数据集中提取关键的两个非时间属性, 对它们分别做两重频数统计. 然后, 根据两次统计结果计算最大信息系数值来评估相关性, 用拉伸指数分布进行拟合, 构建出关联模型. 最后, 通过模型参数构建约束, 在此约束的二维矩阵中生成数据. 实验结果表明, 该算法能够有效地模拟真实数据集的数据特征.

关 键 词:数据逼真生成  关联  最大信息系数  拉伸指数分布  属性关联
收稿时间:2017/5/2 0:00:00
修稿时间:2017/5/19 0:00:00

Table Data Simulation Generating Algorithm Based on Not-Temporal Attribute
ZHANG Rui,XIAO Ru-Liang,NI You-Cong,DU Xin and CAI Sheng-Zhen.Table Data Simulation Generating Algorithm Based on Not-Temporal Attribute[J].Computer Systems& Applications,2018,27(2):30-36.
Authors:ZHANG Rui  XIAO Ru-Liang  NI You-Cong  DU Xin and CAI Sheng-Zhen
Affiliation:College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, China;Fujian Provincial Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, China,College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, China;Fujian Provincial Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, China,College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, China;Fujian Provincial Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, China,College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, China;Fujian Provincial Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, China and College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, China;Fujian Provincial Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, China
Abstract:A table data simulation generating algorithm is proposed based on not-temporal attribute correlation. This algorithm can overcome the difficulty in building not-temporal attribute correlation in the development of big data simulation generator, and play an important role in the field of measurement of the big data simulation generated. Firstly, we extract the two key not-temporal attributes from the data set, and make the statistics of twofold frequency. Then, based on the statistical results, we calculate the maximal information coefficient (MIC) value to measure dependence for two-variable relationships. We use the stretched exponential (SE) distribution to fit the relationship, and build the correlation model. Finally, we generate data in a two-dimensional matrix with this model. The experimental results show that this algorithm can effectively describe the data characteristics of the real data set.
Keywords:data simulation generator  correlation  the maximal information coefficient (MIC)  the stretched exponential distribution (SE)  attribute correlation
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号