首页 | 本学科首页   官方微博 | 高级检索  
     

基于压缩后缀数组技术的搜索引擎
引用本文:姚全珠,张楠,杨增辉,田元.基于压缩后缀数组技术的搜索引擎[J].计算机工程,2008,34(10):83-85.
作者姓名:姚全珠  张楠  杨增辉  田元
作者单位:西安理工大学计算机学院,西安,710048
摘    要:目前,搜索引擎的核心模块(索引器)均采用倒排文件结构,对短语查询的准确率较低。该文引入后缀数组技术进行全文索引,为克服全文索引时占用空间大的缺点,研究了压缩后缀数组技术,把后缀数组索引的大小压缩到了O(n)位,并给出应用压缩后缀数组索引的步骤和核心操作伪代码。对比实验表明,基于压缩后缀数组的索引比传统倒排文件索引的短语查准率提高了近20%。

关 键 词:压缩后缀数组  倒排文件  后缀数组  搜索引擎
文章编号:1000-3428(2008)10-0083-03
修稿时间:2007年5月25日

Search Engine Technology Based on Compressed Suffix Array
YAO Quan-zhu,ZHANG Nan,YANG Zeng-hui,TIAN Yuan.Search Engine Technology Based on Compressed Suffix Array[J].Computer Engineering,2008,34(10):83-85.
Authors:YAO Quan-zhu  ZHANG Nan  YANG Zeng-hui  TIAN Yuan
Affiliation:(School of Computer Science, Xi’an University of Technology, Xi’an 710048)
Abstract:The core module of search engines, namely indexer, is usually based on inverted file. But this solution to solve phrase-search is in difficulty(the lower hitting rate). In this paper the Suffix Array(SA) are employed for full-text indexing. In order to overcome the disadvantage of large memory cost as with full-text indexing, research is done for Compressed Suffix Array(CSA). The paper presents the step of using CSA index and the false code of core operate. The experiments show that this technique, compared with inverted file, improves the hitting rate for phrase by 20%.
Keywords:Compressed Suffix Array(CSA)  inverted file  Suffix Array(SA)  search engine
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号