首页 | 本学科首页   官方微博 | 高级检索  
     

基于强化语义的中文广告文本识别技术研究
引用本文:赵伟,邓叶勋,赵建强,李文瑞,韩冰,欧荣安. 基于强化语义的中文广告文本识别技术研究[J]. 计算机技术与发展, 2021, 0(3)
作者姓名:赵伟  邓叶勋  赵建强  李文瑞  韩冰  欧荣安
作者单位:广州市刑事科学技术研究所;厦门市美亚柏科信息股份有限公司;西安电子科技大学
基金项目:广州科技攻关重大专项(201903007)。
摘    要:互联网是广告推广的重要媒介,但是低质、诈骗、违法等违规广告也大量充斥其中,严重污染网络空间,因此,实现恶意广告的有效甄别对构建安全清朗的网络环境意义重大。针对各类违法违规中文广告内容的识别需求,利用Bert(bidirectional encoder representation from transformers)和Word2vec分别提取文本字粒度和词粒度嵌入特征,使用CNN(convolutional neural networks)网络对Bert高层特征做深层抽取,同时将词粒度特征向量输入到双向LSTM(long short-term memory)网络提取全局语义,并采用Attention机制对语义特征强化,将强化特征和Bert字粒度特征进行融合,充分利用动态词向量和静态词向量的语义表征优势,提出一种基于强化语义的中文广告识别模型CARES(Chinese advertisement text recognition based on enhanced semantic)。在真实的社交聊天文本数据集上的实验表明,与使用卷积神经网络、循环神经网络等文本分类模型相比,CARES模型分类性能最优,能更加精确识别社交聊天文本中的广告内容,模型识别的正确率达到97.73%。

关 键 词:广告文本分类  语义强化  特征融合  预训练  注意力机制

Research on Chinese Advertisement Text Recognition Based on Enhanced Semantic
ZHAO Wei,DENG Ye-xun,ZHAO Jian-qiang,LI Wen-rui,HAN Bing,OU Rong-an. Research on Chinese Advertisement Text Recognition Based on Enhanced Semantic[J]. Computer Technology and Development, 2021, 0(3)
Authors:ZHAO Wei  DENG Ye-xun  ZHAO Jian-qiang  LI Wen-rui  HAN Bing  OU Rong-an
Affiliation:(Guangzhou Institute of Criminal Science and Technology,Guangzhou 510030,China;Xiamen Meiya Pico Information Co.,Ltd.,Xiamen 361008,China;Xidian University,Xi’an 710071,China)
Abstract:The Internet is an important medium for advertising promotion.Low-quality,fraud,illegal advertisements are full of the Internet,which pollute cyberspace seriously.Therefore,the realization of effective screening of malicious advertising is of great significance to construct a safe and clean network environment.We use Bert(bidirectional encoder representation from transformers)and Word2vec to extract char and word level embedding features respectively,and use CNN(revolutionary neural networks)to extract the high-level features of Bert,input the word features vector into the long short term memory(LSTM)network to extract the global semantics,and use the attention mechanism to strengthen the semantic features,integrate the enhanced features and Bert word features,which make full use of the semantic representation advantages of dynamic and static word vectors.We propose a Chinese advertising recognition model CARES(Chinese advertisement text recognition based on enhanced semantic).Compared with other text classification models such as convolutional neural network and recurrent neural network,CARES has the best classification performance and can recognize the advertising content in social chat text more accurately,the accuracy of advertising text recognition reaches 97.73%.
Keywords:advertising text classification  semantic enhanced  feature fusion  pre-training  attention mechanism
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号