首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 31 毫秒
1.
互关联后继树模型及其实现   总被引:6,自引:0,他引:6  
全文检索是文本数据库研究的核心,而全文检索的首要问题是全文检索模型的选择。本文介绍了一种新颖的全文检索模型——互关联后继树模型及其实现,并将该模型与传统的倒排表模型进行了比较,发现它在各方面的性能均优于倒排表模型。  相似文献   

2.
全文检索系统的重心是全文索引,全文检索的动态性取决于全文索引创建和更新的动态性.本文在对互关联后继树模型研究的基础上,借用操作系统和数据库的实现思想,对互关联后继树模型的存储结构进行优化,提高了索引更新的灵活性.文中给出了该结构的详细设计并提出了基于该结构的操作算法.实验证明,独特的结构很好地解决了索引的更新问题,较好地满足了数据频繁变化的应用需求.  相似文献   

3.
本文介绍了一种新的全文数据库的数据模型——三元互关联后继树,并探讨其在呈现指数增长的非结构化的海量信息的存储和检索中的应用。  相似文献   

4.
全文检索是一种非常有效的信息检索技术,本文通过分析全文检索系统中静态索引技术的优缺点,以及影响动态性能的因素,提出一种基于互关联后继树模型的动态索引技术,该技术在不影响查询效率等性能的情况下,很好地解决了索引的更新问题,提高了索引的动态性能。  相似文献   

5.
全文检索是一种非常有效的信息检索技术,本文通过分析全文检索系统中静态索引技术的优缺点,以及影响动态性能的因素,提出一种基于互关联后继树模型的动态索引技术,该技术在不影响查询效率等性能的情况下,很好地解决了索引的更新问题,提高了索引的动态性能。  相似文献   

6.
一个改进的互关联后继树数据模型   总被引:4,自引:1,他引:3  
马科  胡运发 《计算机工程》2003,29(21):70-72
介绍了一种新型的全文数据库模型——互关联后继树,阐述它与其它全文数据库模型相比在呈现非结构化信息的存储和检索中的巨大优势,并探讨了面对呈现指数增长的非结构化的海量信息时如何提高其性能。  相似文献   

7.
针对全文检索领域的索引结构模型的研究,基于三元互关联后继树模型,提出并实现了一个存储结构良好的索引系统.利用该系统实现了多种有效的查询.  相似文献   

8.
二元互关联后继树精简索引模型研究   总被引:1,自引:0,他引:1  
全文检索领域的关键问题是索引模型以及索引的创建与检索算法.基于二元互关联后继树模型,提出一个实用性能好的后继节点有序的后继树精简索引模型(SIRST),并给出此模型下索引的创建与检索算法.通过将该模型与使用广泛的倒排文件模型(IF)进行比较,表明SIRST的检索效率远远高于IF,同时,随着文本集规模越来越大,SIRST的创建效率优势愈发明显.  相似文献   

9.
面向网络的全文检索中索引文件的组织   总被引:5,自引:0,他引:5  
为了提高网络中全文检索的效率 ,需要对Web页面中内容进行分析、建立全文索引 ,并对索引的结构进行高效率的组织。讨论了索引的组织结构及其实现方法 ,并分析了不同的组织方法的性能。  相似文献   

10.
在多核处理器平台上,针对互关联后继树索引模型,采用OpenMP指导语句对其创建算法进行改进优化.通过与未优化的串行程序结果进行比较,表明在多核处理平台上,对程序进行并行化优化可以提高程序的性能.  相似文献   

11.
一种全文检索系统的设计与实现   总被引:4,自引:0,他引:4  
在对全文检索有关技术进行分析和研究的基础之上,提出并实现了一个实用的全文检索系统UFRS,它能够处理中英文文档并可以扩展到其它语言,支持多种不同的索引存储方案以及分布式检索。依次讨论了该系统中的存储层、词法语法分析层、系统核心接口层。最后给出了该系统的一种分布式部署方案。  相似文献   

12.
Our research extends the bit-sliced signature organization by introducing a partial evaluation approach for queries. The partial evaluation approach minimizes the response time by using a subset of the on-bits of the query signature. A new signature file optimization method, Partially evaluated Bit-Sliced Signature File (P-BSSF), for multi-term query environments using the partial evaluation approach is introduced. The analysis shows that, with 14% increase in space overhead, P-BSSF provides a query processing time improvement of more than 85% for multi-term query environments with respect to the best performance of the bit-sliced signature file (BSSF) method. Under the sequentiality assumption of disk blocks, P-BSSF provides a desirable response time of 1 second for a database size of one million records with a 28% space overhead. Due to partial evaluation, the desirable response time is guaranteed for queries with several terms.  相似文献   

13.
一种基于可扩展散列表的倒排索引更新策略   总被引:5,自引:0,他引:5  
吴恒山  刘兴字  左琼 《计算机工程》2004,30(8):83-84,F003
该文提出一种新的基于可扩展散列表的倒排索引更新策略,使倒排索引具有良好的可扩展性。它既支持文档的插入、删除操作,又具有较高的查询效率和空间利用率。并在它的基础上,实现了倒排索引的增量更新和实时更新。  相似文献   

14.
HIRMA results in an integrated environment to query any full-text document base system by natural language sentences, obtaining a document set relevant to the query. Moreover it supports hypertextual navigation into the document base. The system uses content based document representation and retrieval methods.

In this paper the representation framework as well as the retrieval and navigation algorithms used by HIRMA are described. Coverage and portability throughout application domains are supported by the lexical acquisition system ARIOSTO that provides the suitable lexical knowledge and processing methods to extract from raw text the semantic representation of documents content.  相似文献   


15.
Text retrieval systems require an index to allow efficient retrieval of documents at the cost of some storage overhead. This paper proposes a novel full-text indexing model for Chinese text retrieval based on the concept of adjacency matrix of directed graph. Using this indexing model, on one hand, retrieval systems need to keep only the indexing data, instead of the indexing data and the original text data as the traditional retrieval systems always do. On the other hand, occurrences of index term are identified by labels of the so-called s-strings where the index term appears, rather than by its positions as in traditional indexing models. Consequently, system space cost as a whole can be reduced drastically while retrieval efficiency is maintained satisfactory. Experiments over several real-world Chinese text collections are carried out to demonstrate the effectiveness and efficiency of this model. In addition to Chinese, The proposed indexing model is also effective and efficient for text retrieval of other Oriental languages, such as Japanese and Korean. It is especially useful for digital library application areas where storage resource is very limited (e.g., e-books and CD-based text retrieval systems).  相似文献   

16.
随着互联网的迅猛发展,用户在信息海洋里查找自己所需的信息,就像大海捞针一样,搜索引擎技术恰好解决了这一难题。论文首先简单的介绍了全文检索的原理,然后重点讲解了compass搜索引擎在全文检索系统中的具体应用。  相似文献   

17.
    
Recently, permutation based indexes have attracted interest in the area of similarity search. The basic idea of permutation based indexes is that data objects are represented as appropriately generated permutations of a set of pivots (or reference objects). Similarity queries are executed by searching for data objects whose permutation representation is similar to that of the query, following the assumption that similar objects are represented by similar permutations of the pivots. In the context of permutation-based indexing, most authors propose to select pivots randomly from the data set, given that traditional pivot selection techniques do not reveal better performance. However, to the best of our knowledge, no rigorous comparison has been performed yet. In this paper we compare five pivot selection techniques on three permutation-based similarity access methods. Among those, we propose a novel technique specifically designed for permutations. Two significant observations emerge from our tests. First, random selection is always outperformed by at least one of the tested techniques. Second, there is no technique that is universally the best for all permutation-based access methods; rather different techniques are optimal for different methods. This indicates that the pivot selection technique should be considered as an integrating and relevant part of any permutation-based access method.  相似文献   

18.
    
An inverted index is a core data structure of Information Retrieval systems, especially in search engines. Since the search environments have become more dynamic, many on-line index maintenance strategies have been proposed. Previous strategies were designed for HDDs. Consequently, in order to avoid expensive random access cost, Merge-based strategies have been preferred to In-place index update strategies on HDDs. However, flashSSDs have become solid alternatives to HDDs. FlashSSDs currently are adopted in a wide range of areas due to their superior features such as the short access latency, energy efficiency, and high bandwidth. In this article, we first reexamined potentials of In-place index update strategies on flashSSDs. Thanks to the insignificant access latency of flashSSDs, we discovered that In-place index update strategies outperform Merge-based strategies, since In-place index update strategies generate much less amount of I/O than Merge-based strategies despite inducing frequent random accesses. Based on this discovery, we suggest a new inverted index maintenance strategy based on an In-place index update strategy for flashSSDs, called Multipath Flash In-place Strategy (MFIS). To enhance the index maintenance performance, MFIS stores the posting list of each term non-contiguously and exploits the internal parallelism of flashSSDs. Thus, MFIS not only induces the minimum amount of I/O but also utilizes the maximum bandwidth of flashSSDs. Furthermore, MFIS is designed to show high query processing performance by utilizing the internal parallelism of flashSSDs even though the posting list of each term is stored non-contiguously. In our experiments, the index maintenance performance of MFIS was considerably better than other previous maintenance strategies. The index maintenance performance was up to 14.93, 4.04, 5.12, and 2.33 times higher than Merge-based strategies such as Immediate Merge, Geometric Partitioning, Hybrid, and SSD-aware Hybrid, respectively. The query processing performance of MFIS was up to 1.62 times higher than non-contiguous In-place. In addition, MFIS showed almost the best query processing performance as Merge-based strategies did. In conclusion, MFIS is the best on-line inverted index maintenance strategy on flashSSDs in terms of both index maintenance and query processing performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号