首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于Bigram二级哈希的中文索引结构
引用本文:孙德才,王晓霞. 一种基于Bigram二级哈希的中文索引结构[J]. 电子设计工程, 2014, 0(12): 1-4
作者姓名:孙德才  王晓霞
作者单位:渤海大学,辽宁 锦州 121013
基金项目:国家自然科学(61173142);辽宁省社科联2014年度辽宁经济社会发展立项重点课题(20141slktzdian-04)
摘    要:为通过构建高速的中文索引结构来提高Off-line模式的串匹配速度,提出了一种基于Bigram二级哈希的中文索引结构。该索引采用中文GB2312编码处理中文汉字,以中文Bigram项作为词汇项,并实现了基于二级哈希的词汇表存储结构。实验数据显示,本文索引结构虽然占用存储空间为词索引的2倍多,但其匹配速度是词索引的4倍多。结果表明本文索引在中文匹配中具有速度优势。

关 键 词:串匹配  中文  倒排索引

A Chinese index structure based on Bigram and two level hashes
SUN De-cai,WANG Xiao-xia. A Chinese index structure based on Bigram and two level hashes[J]. Electronic Design Engineering, 2014, 0(12): 1-4
Authors:SUN De-cai  WANG Xiao-xia
Affiliation:( Bohai University, Jinzhou 121013, China)
Abstract:In order to enhance off-line string matching speed by constructing a high speed index structure for Chinese, a new index structure based on Bigram and two level hashes is proposed in this paper. First, GB2312 code is empolyed to process Chinese and Bigrams are adopted as vocabulary terms in the new index. Second, a two level hashes scheme is designed as the structure of vocabulary. Experimental data shows that new index's matching speed is more than 4 times as against that of word index though its space consumption is more than 2 times as against that of word index. The results demonstrate that the new index has the advantage of speed in Chinese string matching.
Keywords:Bigram  string matching  Chinese  inverted index  Bigram
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号