首页 | 本学科首页   官方微博 | 高级检索  
     

面向文章流量预测的特征筛选与分析
引用本文:胡宝灵,李志涛,周燕.面向文章流量预测的特征筛选与分析[J].通信技术,2020(4):885-889.
作者姓名:胡宝灵  李志涛  周燕
作者单位:华南农业大学
摘    要:探究微信公众号文章阅读量预测所需特征。首先,爬取目标公众号在指定期限内的所有文章,估计其阅读量达到稳定所需的时间,再对数据进行清洗。其次,通过包括分词模型、主题概率模型等多种技术和数据处理方法进行特征筛选,提取到文章及文章的标题、正文等包括词频、主题、文章发布情况等多种特征,构成125个变量。最后,通过假设检验探究所提取特征与阅读量之间的关系,分析并为公众号提供具有指导意义的文章流量影响因素。

关 键 词:文本数据挖掘  中文分词  主题概率模型  特征筛选  假设检验

Feature Screening and Analysis for Article Traffic Prediction
HU Bao-ling,LI Zhi-tao,ZHOU Yan.Feature Screening and Analysis for Article Traffic Prediction[J].Communications Technology,2020(4):885-889.
Authors:HU Bao-ling  LI Zhi-tao  ZHOU Yan
Affiliation:(South China Agricultural University,Guangzhou Guangdong 510642,China)
Abstract:The characteristics of the WeChat public account article reading forecast are explored.Firstly,all the articles in the target public account within the specified period are crawled,the time required for their reading to reach stability estimated,and then the data cleaned.Then,through a variety of technologies and data processing methods including word segmentation model,topic probability model,etc.,features such as word frequency,topic,and article release status are extracted to form 125 variables.Finally,through the hypothesis test to explore the relationship between the extracted features and the reading volume,the influential factors for article flow that have guiding significance are analyzed and provided for the public account.
Keywords:text data mining  Chinese word segmentation  probabilistic topic model  feature screening  hypothesis testing
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号