藏语句子相似度算法的研究 Research on Similarity Algorithm Tibetan Sentences期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

藏语句子相似度算法的研究

引用本文：	安见才让.藏语句子相似度算法的研究[J].中文信息学报,2011,25(4):110-115.

作者姓名：	安见才让

作者单位：	青海民族大学计算机学院,青海西宁 810007

基金项目：	国家社会科学基金项目资助

摘要：	该文提出了一种藏语句子相似度的计算方法,即采用散列单词倒排索引和基于句长相似度粗选的算法,快速从语料库中筛选出候选句子的集合,散列单词倒排索引能够有效提高算法的查找速度;再采用基于词形和连续单词序列相似度的多策略精选算法,可以有效衡量两个藏语句子的相似程度。实验结果证明算法是有效的。
关键词：	自然语言处理语料库连续单词序列藏语句子相似度
Research on Similarity Algorithm Tibetan Sentences

Anjiancairang.Research on Similarity Algorithm Tibetan Sentences[J].Journal of Chinese Information Processing,2011,25(4):110-115.

Authors:	Anjiancairang

Affiliation:	Computer Department’Qinghai University of Nationalities,Xining,Qinghai 810007,China

Abstract:	A method to compute the similarity of Tibetan sentences is proposed in this paper. This method takes advantage of the reverse index of a hashed vocabulary and the sentence length based coarse-selection algorithm toextract candidate sentences from the corpus rapidly. The reverse index of the hashed vocabulary promotes the searching speed effectively. The multi-strategy delicate selection algorithm adopting word shape based similarity and the continuous word sequence based similarity, which could effectively assess the similarity extent of two Tibetan sentences. The method is validated by the experiments. Key wordsnatural language processing;corpus; continuous word series; Tibetan language;sentence similarity

Keywords:	natural language processing corpus continuous word series Tibetan language sentence similarity
本文献已被万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏