首页 | 本学科首页   官方微博 | 高级检索  
     

基于隐含狄列克雷分配分类特征扩展的微博广告过滤方法
引用本文:邢金彪,崔超远,孙丙宇,宋良图.基于隐含狄列克雷分配分类特征扩展的微博广告过滤方法[J].计算机应用,2016,36(8):2257-2261.
作者姓名:邢金彪  崔超远  孙丙宇  宋良图
作者单位:1. 中国科学院 合肥智能机械研究所, 合肥 230031;2. 中国科学技术大学 信息科学技术学院, 合肥 230026
基金项目:国家科技支撑计划项目(2014BAD10B08);安徽省科技攻关计划项目(1401032010)。
摘    要:传统的微博广告过滤方法忽略了微博广告文本的数据稀疏性、语义信息和广告背景领域特征等因素的影响。针对这些问题,提出一种基于隐含狄列克雷分配(LDA)分类特征扩展的广告过滤方法。首先,将微博分为正常微博和广告型微博,并分别构建LDA主题模型预测短文本对应的主题分布,将主题中的词作为特征扩展的基础;其次,在特征扩展时结合文本类别信息提取背景领域特征,以降低其对文本分类的影响;最后,将扩展后的特征向量作为分类器的输入,根据支持向量机(SVM)的分类结果过滤广告。实验结果表明,与现有的仅基于短文本分类的过滤方法相比,其准确率平均提升4个百分点。因此,该方法能有效扩展文本特征,并降低背景领域特征的影响,更适用于数据量较大的微博广告过滤。

关 键 词:广告过滤    隐含狄列克雷分配    短文本分类    支持向量机    特征扩展
收稿时间:2016-01-15
修稿时间:2016-03-11

Microblog advertisement filtering method based on classification feature extension of latent Dirichlet allocation
XING Jinbiao,CUI Chaoyuan,SUN Bingyu,SONG Liangtu.Microblog advertisement filtering method based on classification feature extension of latent Dirichlet allocation[J].journal of Computer Applications,2016,36(8):2257-2261.
Authors:XING Jinbiao  CUI Chaoyuan  SUN Bingyu  SONG Liangtu
Affiliation:1. Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei Anhui 230031, China;2. School of Information Science and Technology, University of Science and Technology of China, Hefei Anhui 230026, China
Abstract:The traditional microblog advertisement filtering methods neglect the impact of factors such as data sparseness, semantic information, and advertisement background characteristics. Focusing on these issues, a new filtering method based on classification feature extension of Latent Dirichlet Allocation (LDA) was proposed. Firstly, microblogs were divided into normal microblog and advertising microblog, and the topic model of LDA was built respectively to infer the corresponding topic distribution, the words in the topic model were regarded as the basis of feature extension. Secondly, the background characteristics were extracted in conjunction with text category information during extension to reduce the impact on text classification. Finally, the extended feature vectors were served as the input of the classifier, and the advertisements were filtered depending on the results of Support Vector Machine (SVM) classification. In comparison experiments with the method only based on short text classification, the precision of the proposed method was averagely increased by 4 percentage points. The results indicate that the proposed method can effectively extend the text features and reduce the influence of background characteristics, it is more suitable for the filtering of microblog advertisement with great amount of data.
Keywords:
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号