首页 | 本学科首页   官方微博 | 高级检索  
     

基于非平稳割点的大数据分类样例选择*
引用本文:王熙照,邢 胜,赵士欣.基于非平稳割点的大数据分类样例选择*[J].模式识别与人工智能,2016,29(9):780-789.
作者姓名:王熙照  邢 胜  赵士欣
作者单位:1.河北大学 数学与信息科学学院 保定 071002。2.河北大学 管理学院 保定071002。3.沧州师范学院 计算机科学与工程学院 沧州 061001。4.石家庄铁道大学 数理系 石家庄 050043
基金项目:国家自然科学基金项目(No.713710630)、深圳市科技计划项目(No.JCYJ20150324140036825)资助
摘    要:针对传统样例选择方法压缩大数据集时,存在计算复杂度较高、时间消耗较大的问题,文中提出基于非平稳割点的样例选择方法。依据在区间端点得到凸函数的极值这一基本性质,通过标记非平衡割点度量一个样例为端点的程度,然后选取端点程度较高的样例,从而避免样例之间距离的计算。该方法旨在不影响分类精度的前提下,达到压缩数据集、提高计算效率的目的。实验表明,文中方法对于类别不平衡度较高的数据集压缩效果明显,同时表现出较强的抗噪性。

关 键 词:大数据分类    样例选择    非平稳割点    决策树  
收稿时间:2016-05-03

Unstable Cut-Points Based Sample Selection for Large Data Classification
WANG Xizhao,XING Sheng,ZHAO Shixin.Unstable Cut-Points Based Sample Selection for Large Data Classification[J].Pattern Recognition and Artificial Intelligence,2016,29(9):780-789.
Authors:WANG Xizhao  XING Sheng  ZHAO Shixin
Affiliation:1.College of Mathematics and Information Science, Hebei University, Baoding 071002.2.School of Management, Hebei University, Baoding 071002.3.College of Computer Science and Engineering, Cangzhou Normal University, Cangzhou 061001.4.Department of Mathematics and Physics, Shijiazhuang Tiedao University, Shijiazhuang 050043
Abstract:When the traditional sample selection methods are used to compress the large data, the computational complexity and large time consumption are high. Aiming at this problem, a sample selection method based on unstable cuts for the compression of large data sets is proposed in this paper. The extreme value is obtained at the interval endpoint for convex function, and therefore the endpoint degree of a sample is measured by making the unstable cuts of all attributes according to the basic property. The samples with higher endpoint degree are selected,and the calculation of the distance between the samples is avoided. The efficiency of the computation is improved without affecting the classification accuracy. The experimental results show a significant effect of the proposed algorithm on the compression for the large data set with high imbalance ratio and strong ability of anti-noise.
Keywords:Large Data Classification  Sample Selection  Unstable cut-points  Decision Tree  
点击此处可从《模式识别与人工智能》浏览原始摘要信息
点击此处可从《模式识别与人工智能》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号