首页 | 本学科首页   官方微博 | 高级检索  
     

基于计数型布隆过滤器的文本检索模型
引用本文:冯加军,王晓琳,田 青.基于计数型布隆过滤器的文本检索模型[J].计算机工程,2014(2):58-61.
作者姓名:冯加军  王晓琳  田 青
作者单位:山东大学计算机科学与技术学院,济南250101
基金项目:山东省自然科学基金资助项目(ZR2009GM021)
摘    要:分布式文本检索系统难以兼顾高效率的数据检索和低成本的索引维护。为此,提出一种基于计数型布隆过滤器的文本检索模型CBFTRM。该模型将物理节点分为数据节点和索引节点,分别采用结构化P2P进行网络覆盖。每个数据节点负责存储文档数据并维护与之相应的倒排索引,同时通过倒排索引中的关键词集合计算出计数型布隆过滤器值,发送给相应的索引节点。每个索引节点建立一棵以部分数据节点的特征信息(包括过滤器值)为叶节点、以过滤器值运算结果为内部节点的搜索树,并在叶节点发生变化时对搜索树进行维护。仿真实验结果表明,该模型文档定位快,索引维护通信量小,而且具有较高的查准率。

关 键 词:计数型布隆过滤器  搜索树  结构化PP  文本检索  倒排索引

Text Retrieval Model Based on Counting Bloom Filter
FENG Jia-jun,WANG Xiao-lin,TIAN Qing.Text Retrieval Model Based on Counting Bloom Filter[J].Computer Engineering,2014(2):58-61.
Authors:FENG Jia-jun  WANG Xiao-lin  TIAN Qing
Affiliation:(College of Computer Science and Technology, Shandong University, Jinan 250101, China)
Abstract:The distributed text retrieval system is difficult to take both high retrieval efficiency and low cost of index maintenance into account, so this paper proposes a Text Retrieval Model based on Counting Bloom Filter(CBFTRM) to solve the problems above. This model divides the physical node into the data node and the index node, both of which are overlaid with structured P2P network. Each data node is responsible for storing documents, and maintaining the inverted index of the documents. It also transmits the values of Counting Bloom Filter(CBF) which are computed by the inverted index's keywords to the corresponding index node. Each index node builds a search tree and maintains it when the tree's leaf node changes. The search tree is built by leaf nodes with the data node's character(including their counting bloom filter's value), and its internal nodes with the result computed by the values of counting bloom filter, Simulation result shows that this model locates the document faster, and has less traffic doing index maintenance and higher precision.
Keywords:Counting Bloom Filter(CBF)  search tree  structured P2P  text retrieval  inverted index
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号