首页 | 本学科首页   官方微博 | 高级检索  
     

改进词向量模型的用户画像研究
引用本文:陈泽宇,黄勃.改进词向量模型的用户画像研究[J].计算机工程与应用,2020,56(1):180-184.
作者姓名:陈泽宇  黄勃
作者单位:1.上海工程技术大学 电子电气工程学院,上海 201620 2.江西省经济犯罪侦查与防控技术协同创新中心,南昌 330000
基金项目:江西省经济犯罪侦查与防控技术协同创新中心开放基金;国家自然科学基金
摘    要:用户画像技术可以给企业带来巨大的商业价值。针对用户的历史查询词,利用词向量可以得到查询词在语义层次上的表达,但词向量模型对于同一个单词生成的词向量是相同的,使得该模型无法很好的处理一词多义的情况。因此,使用LDA主题模型为每个查询词分配主题,使查询词和其主题共同放入神经网络模型中学习得到其主题词向量,最后采用随机森林分类算法对用户基本属性进行分类构建用户画像。实验结果表明,该模型的分类精度要高于词向量模型。

关 键 词:用户画像  词向量  LDA主题模型  随机森林  

Research on User Portrait of Improved Word Vector Model
CHEN Zeyu,HUANG Bo.Research on User Portrait of Improved Word Vector Model[J].Computer Engineering and Applications,2020,56(1):180-184.
Authors:CHEN Zeyu  HUANG Bo
Affiliation:1.School of Electrical and Electronic Engineering, Shanghai University of Engineering and Technology, Shanghai 201620, China 2.Jiangxi Collaborative Innovation Center for Economic Crime Detection and Prevention and Control, Nanchang 330000, China
Abstract:User portrait technology can bring great commercial value to enterprises. For the user’s historical query words, the word vector can be used to obtain the expression of the query word at the semantic level, but the word vector model that generates the word vector for the same word is the same, which makes the model unable to deal with the polysemy of a word. Therefore, this paper uses the LDA topic model to assign topics to each query word, so that the query word and its topic are put together in the neural network model to learn the topical word vector. Finally, the random forest classification algorithm is used to classify the basic attributes of users and build the user portrait. The experimental results show that the classification accuracy of this model is higher than that of the word vector model.
Keywords:user portrait  word vector  LDA topic model  random forest  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号