首页 | 本学科首页   官方微博 | 高级检索  
     

基于语义特征空间上下文的短文本表示学习
引用本文:脱婷,马慧芳,魏家辉,刘海姣.基于语义特征空间上下文的短文本表示学习[J].计算机工程与科学,2019,41(2):378-384.
作者姓名:脱婷  马慧芳  魏家辉  刘海姣
作者单位:西北师范大学计算机科学与工程学院,甘肃 兰州,730070;西北师范大学计算机科学与工程学院, 甘肃 兰州 730070;桂林电子科技大学广西可信软件重点实验室, 广西 桂林 541004
基金项目:国家自然科学基金(61762078,61363058);广西可信软件重点实验室研究课题(kx201705);西北师范大学“学生创新能力计划”2018年支持项目(CX2018Y048)
摘    要:文本表示是自然语言处理中的基础任务,针对传统短文本表示高维稀疏问题,提出1种基于语义特征空间上下文的短文本表示学习方法。考虑到初始特征空间维度过高,通过计算词项间互信息与共现关系,得到初始相似度并对词项进行聚类,利用聚类中心表示降维后的语义特征空间。然后,在聚类后形成的簇上结合词项的上下文信息,设计3种相似度计算方法分别计算待表示文本中词项与特征空间中特征词的相似度,以形成文本映射矩阵对短文本进行表示学习。实验结果表明,所提出的方法能很好地反映短文本的语义信息,能对短文本进行合理而有效的表示学习。

关 键 词:语义特征空间  相似度计算  文本映射矩阵  短文本表示
收稿时间:2017-10-24
修稿时间:2019-02-25

Short text representation learning based on semantic feature space context
TUO Ting,MA Hui fang,WEI Jia hui,LIU Hai jiao.Short text representation learning based on semantic feature space context[J].Computer Engineering & Science,2019,41(2):378-384.
Authors:TUO Ting  MA Hui fang  WEI Jia hui  LIU Hai jiao
Affiliation:(1.College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070; 2.Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 514004,China)  
Abstract:Text representation is a basic task in natural language processing. Aiming at the drawback of the traditional high dimensional sparse representation of short text, we propose a short text representation learning method based on semantic feature space context, called SFCR. Given the high dimension of the initial feature space, we firstly calculate the mutual information and co occurrence relationship between terms, based on which we obtain the initial similarity and perform semantic clustering of terms. And the semantic feature space after dimensionality reduction can then be represented via the cluster center. Secondly, by combining the context information of the terms on the cluster formed after clustering, three similarity calculation methods are designed to calculate the similarity between the terms of the short text to be represented and the feature terms in the feature space. Thereafter the text mapping matrix for short text representation learning is constructed. Experimental results show that the proposed method can well reflect the semantic information of short text, and make reasonable and effective representation learning of short text.
Keywords:semantic feature space  similarity calculation  text mapping matrix  short text representation  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号