首页 | 本学科首页   官方微博 | 高级检索  
     

基于平衡采样的轻量级广告点击率预估方法
引用本文:施梦圜,顾津吉. 基于平衡采样的轻量级广告点击率预估方法[J]. 计算机应用研究, 2014, 31(1): 33-36
作者姓名:施梦圜  顾津吉
作者单位:1. 南京大学 软件新技术国家重点实验室, 南京 210093; 2. 百度中国有限公司 联盟研发部, 上海 210203
摘    要:类似Google AdSense这样的定向广告投放系统在过去十年得到了长足的发展和进步, 在定向广告投放系统中, 机器学习方法在广告点击率预估扮演着重要角色。目前, 广告点击率预估模型中的训练数据逐渐呈指数级增长, 越来越大的训练数据给模型的扩展性带来了极大的不便。很多有用的特征以及复杂的模型受限制于训练集规模而无法加入到模型之中。借鉴类别不平衡问题中的平衡采样策略, 通过多次采样的负样本数据和集成学习, 缩短训练时间, 改善学习准确率。实验证明在采用了平衡采样之后, 点击率预估效果和线上资源消耗都得到了优化。

关 键 词:广告点击率  机器学习  计算广告学  类别不平衡学习

Balance-sampling based light-weighted advertisement CTR prediction method
SHI Meng-yuan,GU Jin-ji. Balance-sampling based light-weighted advertisement CTR prediction method[J]. Application Research of Computers, 2014, 31(1): 33-36
Authors:SHI Meng-yuan  GU Jin-ji
Affiliation:1. National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China; 2. Dept. of Union Development, Baidu Inc, Shanghai 210203, China
Abstract:Targeting advertisement system, such as Google AdSense, media. net, has grown dramatically in the past 10 years, and machine learning plays a significant role in advertisement click-through-rate(CTR) prediction. However, the size of trai-ning set grows exponentially as time passed by, which greatly impedes the extendibility of the model. Large number of powerful features and complex models couldn't be applied to CTR prediction due to the large scale of training set size. This paper adopted balanced sampling strategy to CTR prediction. After sampling small portion of negative examples multiple times and ensemble learning, it cut down the training time and improved prediction accuracy. Experiment tests show after balance sampling, prediction results is improved and computation resources are saved.
Keywords:click-through rate(CTR)  machine learning  computational advertising  class imbalance learning
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号