首页 | 本学科首页   官方微博 | 高级检索  
     

逼真生成表格式数据的非时间属性关联模型
引用本文:张锐,肖如良,倪友聪,杜欣. 逼真生成表格式数据的非时间属性关联模型[J]. 计算机应用, 2017, 37(9): 2684-2688. DOI: 10.11772/j.issn.1001-9081.2017.09.2684
作者姓名:张锐  肖如良  倪友聪  杜欣
作者单位:1. 福建师范大学 软件学院, 福州 350117;2. 福建省公共服务大数据挖掘与应用工程研究中心, 福州 350117
基金项目:福建省科技计划重大项目(2016H6007);福州市市校合作项目(2016-G-40)。
摘    要:针对数据仿真过程中表格数据属性间关联难的问题,提出一种刻画表格数据中非时间属性间关联特征的H模型。首先,从数据集中提取评价主体和被评价主体关键属性,进行两重频数统计,得到关于关键属性的4个关系对;然后,计算各关系对的最大信息系数(MIC)来评估各关系对的相关性,并采用拉伸指数分布(SE)对各关系对进行关系拟合;最后,设置评价主体和被评价主体的数据规模,根据拟合出的关系计算出评价主体的活跃度和被评价主体的流行度,通过活跃度总和等于流行度总和建立关联,得到非时间属性关联的H模型。实验结果表明,利用H模型能有效地刻画真实数据集中非时间属性间的关联特征。

关 键 词:数据仿真  关联  最大信息系数  拉伸指数分布  属性关联  
收稿时间:2017-03-29
修稿时间:2017-05-16

Not-temporal attribute correlation model to generate table data realistically
ZHANG Rui,XIAO Ruliang,NI Youcong,DU Xin. Not-temporal attribute correlation model to generate table data realistically[J]. Journal of Computer Applications, 2017, 37(9): 2684-2688. DOI: 10.11772/j.issn.1001-9081.2017.09.2684
Authors:ZHANG Rui  XIAO Ruliang  NI Youcong  DU Xin
Affiliation:1. Faculty of Software, Fujian Normal University, Fuzhou Fujian 350117, China;2. Fujian Provincial Engineering Research Center of Public Service Big Data Mining and Application, Fuzhou Fujian 350117, China
Abstract:To solve the difficulty of attribute correlation in the process of simulating table data, an H model was proposed for describing not-temporal attribute correlation in table data. Firstly, the key attributes of the evaluation subject and the evaluated subject were extracted from the data set, by the twofold frequency statistics, four relationships of the key attributes were obtained. Then, the Maximum Information Coefficient (MIC) of each relationship was calculated to evaluate the correlation of each relationship, and each relationship was fitted by the Stretched Exponential (SE) distribution. Finally, the data scales of the evaluation subject and the evaluated subject were set. According to the result of fitting, the activity of the evaluation subject was calculated, and the popularity of the evaluated subject was calculated. H model was obtained through the association that was established by equal sum of activity and popularity. The experimental results show that H model can effectively describe the correlation characteristics of the non-temporal attributes in real data sets.
Keywords:data simulation  correlation  Maximum Information Coefficient (MIC)  Stretched Exponential (SE) distribution  attribute correlation  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号