首页 | 本学科首页   官方微博 | 高级检索  
     

基于自适应随机梯度下降方法的非平衡数据分类
引用本文:陶秉墨,鲁淑霞.基于自适应随机梯度下降方法的非平衡数据分类[J].计算机科学,2018,45(Z6):487-492.
作者姓名:陶秉墨  鲁淑霞
作者单位:河北大学数学与信息科学学院 河北 保定071002,河北大学数学与信息科学学院 河北 保定071002;河北省机器学习与计算智能重点实验室 河北 保定 071002
基金项目:本文受河北省自然科学基金(F2015201185)资助
摘    要:对于不平衡数据分类问题,传统的随机梯度下降方法在求解一般的支持向量机问题时会产生一定的偏差,导致效果较差。自适应随机梯度下降算法定义了一个分布p,在选择样例进行迭代更新时,其依据分布p而非依据均匀分布来选择样例,并且在优化问题中使用光滑绞链损失函数。对于不平衡的训练集,依据均匀分布选择样例时,数据的不平衡比率越大,多数类中的样例被选择的次数就越多,从而导致结果偏向少数类。分布p在很大程度上解决了这个问题。普通的随机梯度下降算法没有明确的停机准则,这导致何时停机成为一个很重要的问题,尤其是在大型数据集上进行训练时。以训练集或训练集的子集中的分类准确率为标准来设定停机准则,如果参数设定恰当,算法几乎可以在迭代的早期就停止,这种现象在大中型数据集上表现得尤为突出。在一些不平衡数据集上的实验证明了所提算法的有效性。

关 键 词:随机梯度下降  非均匀分布  停机准则  支持向量机  损失函数

Adaptive Stochastic Gradient Descent for Imbalanced Data Classification
TAO Bing-mo and LU Shu-xia.Adaptive Stochastic Gradient Descent for Imbalanced Data Classification[J].Computer Science,2018,45(Z6):487-492.
Authors:TAO Bing-mo and LU Shu-xia
Affiliation:College of Mathematics and Information Science,Hebei University,Baoding,Hebei 071002,China and College of Mathematics and Information Science,Hebei University,Baoding,Hebei 071002,China;Hebei Province Key Laboratory of Machine Learning and Computational Intelligence,Baoding,Hebei 071002,China
Abstract:For imbalanced data classification,the performance of using traditional stochastic gradient descent for solving SVM problems is not very well.Adaptive stochastic gradient descent algorithm defines a distribution p instead of using uniform distribution to choose examples,and the smoothing hinge loss function is used in the optimization problem.Because of the training sets are imbalanced,using uniform distribution will cause the algorithm choose more majority class based on the imbalanced ratio.That would result the classifier bias towards the minority class.The distribution p largely overcomes this issue.When to stop the programs becomes an important problem,because the normal stochastic gradient descent algorithm does not have a stop criterion especially for large data sets.The stop criterion was setted according to the classification accuracy on the training sets or its subsets.This stop criterion could stop the programs very early especially for large data sets if the parameters are chosen properly.Some experiments on imbalanced data sets show that the proposed algorithm is effective.
Keywords:Stochastic gradient descent  Nonuniform distribution  Stop criterion  Support vector machine  Loss function
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号