首页 | 官方网站   微博 | 高级检索  
     

n-Gram/2L索引结构的存储与时间优化算法
引用本文:刘凤晨,刘庆文,胡玥,黄河.n-Gram/2L索引结构的存储与时间优化算法[J].计算机工程与应用,2008,44(5):180-183.
作者姓名:刘凤晨  刘庆文  胡玥  黄河
作者单位:1. 北京航空航天大学,软件学院,北京,100083
2. 北京科技大学,计算机科学系,北京,100083
基金项目:国家高技术研究发展计划(863计划)
摘    要:对分词检索算法n-Gram/2L的索引结构作了改进,在第二级倒排表中加入对文章标识的索引,提出一种基于Zigzag的分词检索算法n-Gram/2LZ(n-Gram/2LonZigzagjoin)。在对数据量较大的文章进行检索和索引时,该算法在保留原有算法特性的基础上进一步减少了索引冗余,降低了索引的存储量,同时对查询算法的优化降低了查询时的系统开销,并且减少索引中记录访问次数,提高了查询效率。

关 键 词:算法  索引  n-gram  倒排表
文章编号:1002-8331(2008)05-0180-04
收稿时间:2007-06-05
修稿时间:2007-08-13

Space and time optimized algorithm of n-Gram/2L index structure
LIU Feng-chen,LIU Qing-wen,HU Yue,HUANG He.Space and time optimized algorithm of n-Gram/2L index structure[J].Computer Engineering and Applications,2008,44(5):180-183.
Authors:LIU Feng-chen  LIU Qing-wen  HU Yue  HUANG He
Affiliation:1.College of Software,Beijing University of Aeronautics and Astronautics,Beijing 100083,China2.Department of Computer Science,Beijing University of Science and Technology,Beijing 100083,China
Abstract:This paper presents an improved algorithm of n-Gram/2L index for text retrieval by adding document identifier index into the secondary level inverted index,and proposes a retrieval algorithm:n-Gram/2LZ(n-Gram/2L on Zigzag join) based on Zigzag join.This algorithm retains the advantage of former n-Gram/2L algorithm and reduces redundancy and storage of the document index,while retrieving and indexing large data.And the optimization of the query algorithm decreases the system overhead when processing query as well as enhances query efficiency by reducing reading the same record repeatedly.
Keywords:algorithms  indexing  n-gram  inverted index
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号