首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 242 毫秒
1.
为提高XML文档的查询效率,提出一种基于倒排表与B+树的联合索引技术。DTD结构索引和内容索引采用倒排表作为索引单位,XML文档索引使用B+树作为索引基本组织。在DTD结构索引的结点编码中设置标识信息,便于确定需要查询的文档。通过建立DTD结构索引、XML文档索引和内容索引,实现混合型XML文档的查询。理论分析与实验结果表明,该技术具有较小的空间开销和较高的查询效率。  相似文献   

2.
针对实际密文数据库的应用,在全文检索倒排索引技术的基础上,设计了一种通过密文倒排索引文件对其进行快速检索的方法。密文索引文件中主要包含有索引项、相对应的记录主键等信息。检索时,通过用检索词匹配索引文件中的索引项,找到对应的记录主键集合,再根据记录主键集合查询密文数据库,获取相应的密文数据,进行解密即可获取明文数据信息。整个检索过程中不对数据库进行解密,从而实现了在不解密的情况下对密文数据库的快速检索。  相似文献   

3.
基于Hash表的数据库索引结构设计与实现   总被引:1,自引:0,他引:1  
索引结构的优劣对RDBMS的查询速度起着至关重要的作用,目前比较成熟的组织索引的数据结构有Hash表和B-Tree结构。基于Hash表给出了一种RDBMS索引以及存储结构的详细设计方案并加以实现。  相似文献   

4.
路径表达式查询是XML数据查询处理的核心研究问题之一,研究者开展了大量的研究工作.但这些研究更多关注XML数据上路径表达式的匹配,忽略了谓词"包含".研究XML查询处理中谓词"包含"的查询处理方法.采用了两种方法,第一种是采用跳跃表的方法,在XML分枝模式匹配时动态地对结点数据进行读取和关键字匹配.第二种是为XML文档中的词语建立倒排索引,来实现关键字的匹配.并从分枝模式路径长度、查询关键的数量和"包含"谓词判断结点的类型,对两种方法进行了分析和比较.  相似文献   

5.
压缩树索引技术是XML数据压缩的热点问题之一,本文提出一种压缩树索引改进方法.针对压缩树在查询过程中不能很好的解决向上匹配与向下匹配的问题,改进方法引入正排索引和倒排索引.当查询到组一级时,利用正排索引可以快速的查找出以该组为父节点的子组.而选出符合值谓词的元素后,在进行向上匹配时利用倒排索引可找出该元素的父节点.新的索引方法在保留原压缩树索引优点的基础上,解决了压缩树索引在查询过程中匹配问题.  相似文献   

6.
结构索引和倒排表在处理XML文档查询时,有不足之处。该文提出了一种结合结构索引、倒排表的策略、连接路径表达式的查询算法,有效地降低了实际执行的代价,提高了查询速度。  相似文献   

7.
在全文信息检索系统中,存储文本及其上关键词的索引结构需要大量的空间。位图索引不能支持基于信息量的查询,倒排文件需要的空间比较大。提出了频率向量这种索引结构的压缩存储方法,设计并实现了基于这种压缩存储方法的存储结构,理论分析表明该压缩方法与存储结构可以获得较高的压缩比;此外,还讨论了压缩频率向量上的查询处理技术,实验结果表明这种压缩的索引结构能够保证查询结果的完备性,并能有效地提高频率向量的存储和查询效率。  相似文献   

8.
倒排索引创建效率和查询效率是全文检索技术的两个重要方面.针对传统倒排索引创建方法效率低下的问题,提出了基于缓存满再写临时文件和双缓冲区相结合的索引创建机制,充分利用内存和CPU资源以加快倒排索引的创建速度;提出了查询缓存机制,以提高倒排索引的查询效率.分析及实验结果表明,提出的索引创建机制能有效地提高索引创建速度,查询缓存机制能有效地加快查询速度,提高了全文检索系统的时间和空间效率.  相似文献   

9.
杨茸  牛保宁 《计算机学报》2021,44(8):1732-1750
空间文本数据流上连续k近邻查询(Continuous k-nearest neighbor Queries over Spatial-Textual data streams,CkQST)能在空间文本对象组成的数据流上检索并实时更新k个包含指定关键字的空间邻近对象,是空间文本数据流上连续查询(Continuous Queries over Spatial-Textual data streams,CQST)的一种,以预订(subscribe)的方式广泛应用于广告定位、微博分析、地图导航等领域.求解CkQST采用CQST的求解框架——构建空间文本混合索引组织查询,利用索引的空间过滤和文本过滤能力,为不断到来的对象匹配查询.该框架的求解效率取决于索引的过滤能力,提高索引过滤能力的主要途径是将查询的空间搜索范围映射到索引结构的最小区域,减少需要验证的查询数量.这一途径适用于查询空间搜索范围很少变化的情况.对于CkQST,覆盖k个最邻近对象的空间范围随着符合文本匹配条件的对象的数量的变化而变化,与之对应的索引项需要同步更新,代价高.针对这一问题,本文选择能够高效支持空间范围变化的Quad-tree和关键字查找的倒排索引,构成空间文本混合索引,组织CkQST.在空间过滤方面,提出内存代价模型VUMBCM(Verification and Update of Memory-Based Cost Model),通过平衡索引更新代价和验证代价,优化查询空间搜索范围到Quad-tree节点的映射.在文本过滤方面,采用基于块的有序倒排索引,组织Quad-tree节点内的查询,以快速定位需要验证的查询,避免对倒排列表中大量不可能匹配查询的访问;批量处理包含共同文本项的对象,提高文本验证时的对象吞吐量.由此构建的混合索引,称为OIQ-tree.实验表明,OIQ-tree中的代价模型及基于块的有序倒排索引能够支持CkQST的高效求解.与目前先进的索引技术相比,当查询规模达到2000万时,因数据流中对象的变化导致的索引平均更新时间降低了 46%,数据流中对象的平均处理时间降低了 22%.  相似文献   

10.
当前图数据库中的子图同构查询算法主要是依赖倒排索引,然而处理那些具有庞大数据的数据库和复杂的查询愈发成为挑战。研究目的是设计一个算法,使用新的索引作为查询处理的核心,记录查询图的每一个细小改变,并使用一种特殊的数据结构来维护。先是引出一个索引算法,然后逐渐分析整个索引、查询过程,并利用该算法实现一个系统,最后在不同数据集和查询上进行实验。实验证明了该算法具有良好的时间、空间效率和扩展性。新的索引算法能够支持更大的查询图和更加灵活的查询。通过实现的系统和其他系统的对比实验,验证了算法的有效性。  相似文献   

11.
The inverted index is widely used in the existing information retrieval field. In order to support containment queries for structured documents such as XML, it needs to be extended. Previous work suggested an extension in storing the inverted index for XML documents and processing containment queries, and compared two implementation options: using an RDBMS and using an Information Retrieval (IR) engine. However, the previous work has two drawbacks in extending the inverted index. One is that the RDBMS implementation is generally much worse in the performance than the IR engine implementation. The other is that when a containment query is processed in an RDBMS, the number of join operations increases in proportion to the number of containment relationships in the query and a join operation always occurs between large relations. In order to solve these problems, we propose in this paper a novel approach to extend the inverted index for containment query processing, and show its effectiveness through experimental results. In particular, our performance study shows that (1) our RDBMS approach almost always outperforms the previous RDBMS and IR approaches, (2) our RDBMS approach is not far behind our IR approach with respect to performance, and (3) our approach is scalable to the number of containment relationships in queries. Therefore, our results suggest that, without having to make any modifications on the RDBMS engine, a native implementation using an RDBMS can support containment queries as efficiently as an IR implementation.  相似文献   

12.
基于关系数据库有效地实现RPE查询   总被引:5,自引:1,他引:5  
各种XML查询语言的共同特点就是利用正则路径表达式(RPE)来导航XML文档的查询。本文结合我们提出的一种新的XML数据的关系存储模式,对有效地实现RPE查询的相关研究工作进行了总结,并提出了两个有效地实现包含连接的索引改进归并连接算法。算法采用索引定位技术、短路技术和预侦技术来减少连接代价。因此,不仅能够在当前上下文计算环境下有效地实现包含连接的计算,而且能够大量地避免包含连接中不必要的扫描和搜索。  相似文献   

13.
提出了一种基于倒排表的索引,能很好地支持文档结构和内容的动态更新。该索引结构建有基于词条的水平索引和基于元素标志GID的垂直索引,这种双重索引结构能高效地支持文档的局部更新。另外给出了基于上下文共现分析技术的 语义检索和利用关系数据库实现该索引的方法。  相似文献   

14.
随着大数据时代的到来,传统的计算机因为单机资源有限、运行速度慢、分布式处理支持差,已满足不了现行的医疗体系中的大数据处理需求,基于时空数据的移动医疗呼叫系统方法可以很好地解决这些问题。在移动云计算环境下研究[k]最近邻查询算法是当前一个热点问题,支持可扩展和分布式的空间数据索引对于kNN查询的效率影响很大,目前已有的查询算法不适合并行化或者会导致内容冗余。将MapReduce分布式处理技术与空间kNN查询方法相结合,设计可以快速检索到满足用户查询需求的医生位置信息的移动医疗呼叫算法。提出并构建了一个新的分布式空间数据索引方法:倒排Voronoi图索引,它将倒排索引和Voronoi图索引进行结合;提出了一种基于MapReduce的利用Voronoi图来处理kNN查询的高效算法,其在分布式环境下可以有效提高查询效率;用真实的和仿真的数据集来进行大量实验评估,实验结果表明所提出的方法具有良好的高效性和可扩展性。  相似文献   

15.
Set queries are an important topic and have attracted a lot of attention. Earlier research mainly concentrated on set containment queries. In this paper we focus on the T-Overlap query which is the foundation of the set similarity query. To address this issue, unlike traditional algorithms that are based on an inverted index, we design a new paradigm based on the prefix tree (trie) called the expanded trie index (ETI) which expands the trie node structure by adding some new properties. Based on ETI, we convert the TOverlap problem to finding query nodes with specific query depth equaling to T and propose a new algorithm called TSimilarity to solve T-Overlap efficiently. Then we carry out a three-step framework to extend T-Overlap to other similarity predicates. Extensive experiments are carried out to compare T-Similarity with other inverted index based algorithms from cardinality of query, overlap threshold, dataset size, the number of distinct elements and so on. Results show that T-Similarity outperforms the state-of-the-art algorithms in many aspects.  相似文献   

16.
Effective support for temporal applications by database systems represents an important technical objective that is difficult to achieve since it requires an integrated solution for several problems, including (i) expressive temporal representations and data models, (ii) powerful languages for temporal queries and snapshot queries, (iii) indexing, clustering and query optimization techniques for managing temporal information efficiently, and (iv) architectures that bring together the different pieces of enabling technology into a robust system. In this paper, we present the ArchIS system that achieves these objectives by supporting a temporally grouped data model on top of RDBMS. ArchIS’ architecture uses (a) XML to support temporally grouped (virtual) representations of the database history, (b) XQuery to express powerful temporal queries on such views, (c) temporal clustering and indexing techniques for managing the actual historical data in a relational database, and (d) SQL/XML for executing the queries on the XML views as equivalent queries on the relational database. The performance studies presented in the paper show that ArchIS is quite effective at storing and retrieving under complex query conditions the transaction-time history of relational databases, and can also assure excellent storage efficiency by providing compression as an option. This approach achieves full-functionality transaction-time databases without requiring temporal extensions in XML or database standards, and provides critical support to emerging application areas such as RFID.  相似文献   

17.
The query containment problem is a fundamental computer science problem which was originally defined for relational queries. With the growing popularity of the sparql query language, it became relevant and important in this new context: reliable and efficient sparql query containment solvers may have various applications within static analysis of queries, especially in the area of query optimizations and refactoring. In this paper, we present a new approach for solving the query containment problem in sparql. The approach is based on reducing the query containment problem to the satisfiability problem in first order logic. It covers a wide range of the sparql language constructs, including union of conjunctive queries, blank nodes, projections, subqueries, clauses from, filter, optional, graph, etc. It also covers containment under rdf schema entailment regime, and it can deal with the subsumption relation. We describe an implementation of the approach, an open source solver SpeCS and its thorough experimental evaluation on two relevant benchmarks, Query Containment Benchmark and SQCFramework. As a side result, SpeCS identified incorrect test cases within both benchmarks, which were manually checked, confirmed and fixed, resulting in better and more reliable benchmarks. The evaluation also shows that SpeCS is highly efficient and that compared to the state-of-the-art solvers, it gives more precise results in a shorter amount of time. In addition, SpeCS has the highest coverage of the supported language constructs.  相似文献   

18.
One of the most important reasoning tasks on queries is checking containment, i.e., verifying whether one query yields necessarily a subset of the result of another one. Query containment is crucial in several contexts, such as query optimization, query reformulation, knowledge-base verification, information integration, integrity checking, and cooperative answering. Containment is undecidable in general for Datalog, the fundamental language for expressing recursive queries. On the other hand, it is known that containment between monadic Datalog queries and between Datalog queries and unions of conjunctive queries are decidable. It is also known that containment between unions of conjunctive two-way regular path queries, which are queries used in the context of semistructured data models containing a limited form of recursion in the form of transitive closure, is decidable. In this paper, we combine the automata-theoretic techniques at the base of these two decidability results to show that containment of Datalog in union of conjunctive two-way regular path queries is decidable in 2EXPTIME. By sharpening a known lower bound result for containment of Datalog in union of conjunctive queries we show also a matching lower bound.  相似文献   

19.
基于最小生成树的图数据库索引算法   总被引:1,自引:0,他引:1  
李楠  高宏  李建中 《软件学报》2009,20(Z1):144-153
对复杂数据进行图模式建模近几年越来越流行,因此,在查询执行的优化过程中图索引技术变得至关重要.研究了图模式的索引问题,并且提出了一种近似的索引方法,称为MSTA方法.MSTA方法利用最小生成树结构作为索引特征,依据最小生成树边序列的包含关系和基于最大公共子图的图距离度量,将最小生成树组织到一个称为MST树的索引结构中.MST树索引结构可以高效地支持多种查询,例如子图查询.MSTA方法具备高效的索引性能.在索引大小和索引建立时间方面,传统方法是MSTA方法的数十倍,甚至上百倍.MSTA方法虽然不能返回完整结果,但是可以返回经图距离度量排序最好的部分结果.  相似文献   

20.
基于嵌入式SQL的Datalog演绎规则解释器的设计   总被引:1,自引:0,他引:1  
文章提出了一个建立在传统关系数据库基础上的能支持ANSISQL与嵌入式SQL的演绎规则解释器。利用这个解释器,用户能够定义一个蕴含关系并可以像在演绎数据库中使用Datalog规则一样来提出查询。其方法是把演绎规则和查询翻译成嵌入式SQL程序,该程序在执行查询时能被调用。这个解释器可以被认为是扩充RDBMS演绎查询功能的一个前端工具。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号