首页 | 本学科首页   官方微博 | 高级检索  
     

基于层级注意力模型的无监督文档表示学习
引用本文:欧阳文俊,徐林莉.基于层级注意力模型的无监督文档表示学习[J].计算机系统应用,2018,27(9):40-46.
作者姓名:欧阳文俊  徐林莉
作者单位:中国科学技术大学 计算机科学与技术学院, 合肥 230027,中国科学技术大学 计算机科学与技术学院, 合肥 230027
基金项目:国家自然科学基金(61673364)
摘    要:许多自然语言应用需要将输入的文本表示成一个固定长度的向量,现有的技术如词嵌入(Word Embeddings)和文档表示(Document Representation)为自然语言任务提供特征表示,但是它们没有考虑句子中每个单词的重要性差别,同时也忽略一个句子在一篇文档中的重要性差别.本文提出一个基于层级注意力机制的文档表示模型(HADR),而且考虑文档中重要的句子和句子中重要的单词因素.实验结果表明,在考虑了单词的重要和句子重要性的文档表示具有更好的性能.该模型在文档(IMBD)的情感分类上的正确率高于Doc2Vec和Word2Vec模型.

关 键 词:文档表示  词嵌入  注意力  层级  无监督学习  文档分类
收稿时间:2018/1/17 0:00:00
修稿时间:2018/2/9 0:00:00

Unsupervised Document Representation Learning Based on Hierarchical Attention Model
OUYANG Wen-Jun and XU Lin-Li.Unsupervised Document Representation Learning Based on Hierarchical Attention Model[J].Computer Systems& Applications,2018,27(9):40-46.
Authors:OUYANG Wen-Jun and XU Lin-Li
Affiliation:School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China and School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
Abstract:Many natural language applications need to represent the input text into a fixed-length vector. Existing technologies such as word embeddings and document representation provide natural representation for natural language tasks, but they do not consider the importance of each word in the sentence, and also ignore the significance of a sentence in a document. This study proposes a Document Representation model based on a Hierarchical Attention (HADR) mechanism, taking into account important sentences in document and important words in sentence. Experimental results show that documents that take into account the importance of words and importance of sentences have better performance. The accuracy of this model in the sentiment classification of documents (IMBD) is higher than that of Doc2Vec and Word2Vec models.
Keywords:document representation  word embeddings  attention  hierarchical  unsupervised learning  document classification
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号