首页 | 本学科首页   官方微博 | 高级检索  
     

基于词向量的藏文语义相似词知识库构建
引用本文:龙从军,周毛克,刘汇丹. 基于词向量的藏文语义相似词知识库构建[J]. 中文信息学报, 1986, 34(10): 33
作者姓名:龙从军  周毛克  刘汇丹
作者单位:1.中国社会科学院 民族学与人类学研究所,北京 100081;
2.中国社会科学院大学(研究生院),北京 102488;
3.中国科学院 软件研究所,北京 100083
基金项目:中国社会科学院创新工程项目(2019MZSCX005);喜马拉雅区域协同创新中心项目(ZFYJY201901009)
摘    要:词向量在自然语言处理研究的各个领域发挥着重要作用。该文从语言学角度出发,讨论了词向量技术与语言学理论的关系;根据词向量的特征,提出利用藏文词向量构建语义相似词知识库。该文以哈尔滨工业大学的《词林》为基础,通过汉藏双语词典对译,在获取对译词的词向量的基础上,计算对译词的词向量与原子词群平均词向量的差值,利用不同的差值,自动筛选出与原子词群语义相似度较小的词。该文分别以藏文的词和音节为单位计算词向量,自动筛出不属于原子词群的词,通过对自动筛选结果与人工筛选结果对比,发现两者具有较高的一致性,这说明词向量计算结果与人的语言直觉具有较高的一致性。总体来说,该文所采用的方法有助于提高藏文语义相似词知识库构建效率。

关 键 词:词向量  藏文  语义相似词  

Construction of Knowledge Base of Semantic Similar Tibetan Words Based on Word Vectors
LONG Congjun,ZHOU Maoke,LIU Huidan. Construction of Knowledge Base of Semantic Similar Tibetan Words Based on Word Vectors[J]. Journal of Chinese Information Processing, 1986, 34(10): 33
Authors:LONG Congjun  ZHOU Maoke  LIU Huidan
Affiliation:1.Institute of Ethnology and Anthropology, Chinese Academy of Social Sciences, Beijing 100081, China;
2.Graduate School, University of Chinese Academy of Social Sciences, Beijing 102488, China;
3.Institute of Software, Chinese Academy of Sciences, Beijing 100083, China
Abstract:Word vectors play an important role in various fields of natural language processing. This paper tries to reveal the relationship between word vector technology and linguistic theory. Based on the features of word vectors, this paper proposes an approach to construct knowledge base of semantic similar Tibetan words. Based on the Chinese <Cilin> dictionary, published by Harbin University of Technology, we compute the differences between every word vector and the average word vectors of the atomic word group. With the help of Chinese-Tibetan bilingual dictionary, we deploy such differences to select the similar words from word vectors by Tibetan words and Tibetan syllables, respectively. Compared with those of manual verification, we find that the results of word vector computing are consistent with human language intuition. This approach may improve the efficiency of constructing Tibetan knowledge base of semantic similar words.
Keywords:word vector    Tibetan    semantic similar word  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号