首页 | 本学科首页   官方微博 | 高级检索  
     

数据集分类可用性评估的置信区间方法
引用本文:谈询滔,顾依依,阮彤,袁玉波.数据集分类可用性评估的置信区间方法[J].计算机科学,2019,46(1):78-85.
作者姓名:谈询滔  顾依依  阮彤  袁玉波
作者单位:华东理工大学计算机科学与工程系 上海200237,华东理工大学计算机科学与工程系 上海200237,华东理工大学计算机科学与工程系 上海200237,华东理工大学计算机科学与工程系 上海200237
基金项目:本文受国家自然科学基金项目(61772201),上海市科委基金项目(16511101000),上海市科委基金项目(17DZ11011003)资助
摘    要:如何有效评价训练数据集的可用性,一直是困扰智能分类系统应用的难点问题。针对机器学习领域的数据分类问题,提出了一种基于区间分析和信息粒化的数据集分类可用性的评估方法,用于评价数据集的可分程度。该方法将待评估的数据集定义为分类信息系统,提出了分类置信区间的概念,通过区间分析进行信息粒化。在此信息粒化策略下,定义分类可用性的数学模型,并进一步给出单个属性以及整体数据集的分类可用性的计算方法。选择18个UCI标准数据集作为评估对象,给出了部分数据集分类可用性的评估结果,并且选取3种分类器对所选数据集进行分类实验,最终通过对上述实验结果的分析证明了该评估方法的有效性和可行性。

关 键 词:数据可用性  分类系统  区间分析  信息粒化  分类可用性
收稿时间:2018/6/8 0:00:00
修稿时间:2018/7/14 0:00:00

Confidence Interval Method for Classification Usability Evaluation of Data Sets
TAN Xun-tao,GU Yi-yi,RUAN Tong and YUAN Yu-bo.Confidence Interval Method for Classification Usability Evaluation of Data Sets[J].Computer Science,2019,46(1):78-85.
Authors:TAN Xun-tao  GU Yi-yi  RUAN Tong and YUAN Yu-bo
Affiliation:Department of Computer Science and Engineering,East China University of Science and Technology,Shanghai 200237,China,Department of Computer Science and Engineering,East China University of Science and Technology,Shanghai 200237,China,Department of Computer Science and Engineering,East China University of Science and Technology,Shanghai 200237,China and Department of Computer Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
Abstract:It is always a difficult problem to evaluate the usability of training data sets effectively,which hinders the application of intelligent classification systems.Aiming at the issue of data classification in the field of machine learning,based on interval analysis and information granulation,this paper proposed an evaluation method of data classification usability to measure the separability of data sets .In this method,dataset is defined as the classification information system,and the concept of classification confidence interval is put forward,then the information granulation is carried out by interval analysis.Under this information granulation strategy,this paper defined the mathematical model of classification usability,and further gave the calculation method of the classification usability for single attribute and the total data set.In this paper,18 UCI standard data sets were selected as evaluation objects,the evaluation results of classification usability were given,and 3 classifiers were selected to classify the above data sets.Finally,the effectiveness and feasibility of this evaluation method are verified by the analysis of experimental results.
Keywords:Data usability  Classification system  Interval analysis  Information granulation  Classification usability
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号