首页 | 本学科首页   官方微博 | 高级检索  
     

基于弱监督深度学习的文本聚类算法及应用
引用本文:谭敏,张宏源,张海超.基于弱监督深度学习的文本聚类算法及应用[J].计算机应用与软件,2019,36(4):171-177.
作者姓名:谭敏  张宏源  张海超
作者单位:杭州电子科技大学计算机学院 浙江杭州310018;杭州电子科技大学计算机学院 浙江杭州310018;杭州电子科技大学计算机学院 浙江杭州310018
摘    要:围绕基于用户点击数据的文本聚类展开研究。利用点击数据将查询文本表征为图像点击特征图,并在此上训练深度点击模型。为了应对文本噪声,引入可刻画文本可靠性的权重,提出基于弱监督深度学习的文本聚类算法来迭代更新文本权重和深度模型。将该算法应用于基于点击特征的图像识别中,通过合并相似文本,为图像构建紧凑的文本集点击特征向量,实现高效的图像识别。在Clickture-Dog和Clickture-Bird两个公开点击数据集上进行验证,结果表明:用图像点击特征图来表征查询文本可有效解决原始点击特征向量的稀疏和不连续性,帮助获得优秀识别率;弱监督深度聚类模型不仅帮助学习强大的文本表征,还能有效选择高质量文本数据训练模型,进一步提高性能。

关 键 词:图像识别  深度聚类  用户点击数据  查询合并  弱监督学习

TEXT CLUSTERING ALGORITHM AND ITS APPLICATION BASED ON WEAKLY-SUPERVISED DEEP LEARNING
Tan Min,Zhang Hongyuan,Zhang Haichao.TEXT CLUSTERING ALGORITHM AND ITS APPLICATION BASED ON WEAKLY-SUPERVISED DEEP LEARNING[J].Computer Applications and Software,2019,36(4):171-177.
Authors:Tan Min  Zhang Hongyuan  Zhang Haichao
Affiliation:(School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China)
Abstract:The research is based on the text clustering from user-click data. With click data, a query-text was represented as a smooth image-click-graph, and a deep click model was trained. In order to deal with heavy noise in the clicked query-text set, a weight vector that could measure the reliability of the query-text was introduced, and a text clustering algorithm based on weakly-supervised training method was proposed to iteratively update the weight vector and deep model. The text clustering algorithm was applied to click-feature-based image recognition. After combining similar query-text, a compact click-frequency-vector for images was constructed to achieve accurate image recognition. The proposed method was verified on public Clickture-Dog and Clickture-Bird datasets. The experimental results show that representing each query as an image-click-graph can deal with the non-smoothness and sparseness in the original click vectors, which helps to improve image recognition accuracy. Weakly-supervised deep learning not only helps to learn powerful representations, but also can effectively select queries of high quality, which further improved the recognition performance.
Keywords:Image recognition  Deep clustering  User-click data  Query clustering  Weakly-supervised learning
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号