首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
XML文档存储是NXD(Native XML Database)系统必须解决的问题.Internet中XML主要应用于信息交换过程的数据结构及语义描述,NXD系统也需要支持XQuery标准,提供高效率的XML文档访问接口.本文较完整地设计了NXD存储系统的体系结构,针对XML的路径查询特点,设计XML结点存储的数据结构及存储系统的索引.包括结构及其建立、维护的算法,索引采用一种HASH算BH(平衡HASH)算法实现.一通过试验系统测试,这些存储结构和算法可以保证NXD系统的访问效率及路径查询效率.  相似文献   

2.
北京方正新天地软件科技有限公司结合柯达的扫描技术,于近日开发出了全能档案电子化管理系统——“文档之星”电子档案室 SEV1.0,同时还推出了“文档之星”企业级解决方案。“文档之星”电子档案室的主要功能包括扫描、建立索引、存储(刻盘)、查询、管理等模块,适合企业的档案管理工作。“文档之星”电子档案室以Microsoft VC 6.0为开发工具,影像使用标准 TIFF格式存储;扫描模块基于 TWAIN 接口开发,可连接各种支持 TWAIN 接口的扫描仪,支持连续批扫描,具有自动滤白、自动去黑边、自动纠偏等功能;用户可以键盘方式方便地建立索引。在查询功能方面,系统支持精确查询或模糊查  相似文献   

3.
为了实现对海量数据的高效存储和查询,众多NoSQL数据库被开发出来,HBase是其中之一。但原生的HBase数据库在进行数据查询时只支持主键索引,对非主键数据只能通过全表扫描的方式进行查询,极大降低了HBase的多条件查询速度。为此,提出了基于协处理器的HBase内存索引构建方案,通过协处理器实现对二级索引的快速构建并可根据HBase表的变化自动更新索引。同时,将建立的索引进行持久化操作,在使用时通过内存计算,极大地提高了索引数据检索速度,保证了索引的可用性和容错性。实验结果表明,该方案相比原生数据库的条件检索速度有了极大提升,相比于基于Solr和HiBase的二级索引方案检索速度也有所提升。  相似文献   

4.
互联网技术的发展产生的海量非结构化数据在传统关系型数据库中难以被高速有效地进行存储和处理,各类NoSQL数据库可以有效存储处理非结构化数据,但是对关系运算功能的弱化难以满足应用场景的需求。具备非结构化数据处理能力的新型关系型数据库提供了适用多种应用场景的高效存储方式。为了能够定量地比较关系型数据库和面向文档的NoSQL数据库的数据存储与处理能力,比较了PostgreSQL的hstore数据类型和MongoDB的内嵌文档对非结构化数据的储存方式,并通过非结构化数据的批量加载、磁盘占用、主键查询、非主键查询、地理空间坐标查询等方面的对比来以分析性能特征与适用场景。  相似文献   

5.
在全文信息检索系统中,存储文本及其上关键词的索引结构需要大量的空间。位图索引不能支持基于信息量的查询,倒排文件需要的空间比较大。提出了频率向量这种索引结构的压缩存储方法,设计并实现了基于这种压缩存储方法的存储结构,理论分析表明该压缩方法与存储结构可以获得较高的压缩比;此外,还讨论了压缩频率向量上的查询处理技术,实验结果表明这种压缩的索引结构能够保证查询结果的完备性,并能有效地提高频率向量的存储和查询效率。  相似文献   

6.
目前,关系数据库中的分区技术应用相当广泛,但是用分区策略管理海量要素图层数据的存储与索引没有比较系统的技术方法。采用不同管理方式、不同分区粒度、不同索引方式及其组合的分区技术来系统地管理海量空间图层数据,进一步研究了不同的分区粒度及索引方式对查询效率的影响,并通过实验验证了关系数据库中的分区技术对海量要素图层数据的存储与管理具有优化作用。结果表明,在不使用分区键作为查询条件时,分区粒度越大查询效率越高;使用分区键作为查询条件时,本地分区索引查询效率更高等。利用合理的分区方案使得海量要素图层数据存储和管理得以优化,对矢量大数据的存储和管理研究具有重要意义,为更好地应用分区技术来解决实际遇到的存储与检索效率问题提供决策支持。  相似文献   

7.
一种支持高效检索的即时更新倒排索引方法   总被引:8,自引:1,他引:8  
随着万维网的快速发展,产生了一种全新概念的高效文档索引技术,文章实现了一种支持高效检索及即时更新的倒排索引,它是WebME(WebMiningEnvironment)原型系统的一部分,这部分用来对特定的查询进行高效的检索,并支持即时增量索引,即对新加入的文档可以立即加入索引,且不用重新对原内容进行重索引,并且在更新索引时不会影响查询的进行。  相似文献   

8.
(对象)关系数据库中XML文档的存储技术   总被引:7,自引:0,他引:7  
XML逐渐成为Web上数据表示和数据交换的标准,随着Web上大量的数据用XML文档表示出来,有必要对这些XML文档进行存储和查询。目前大多数商业数据库产品都支持对XML文档的存储、查询、索引等操作,本文讨论了在ORDB中存储XML文档时所涉及的存储技术,比较了三大商业数据库产品(IBM DB2,Oracle9i,Microsoft SQL Server 2000)在存储XML文档时所采用的方法。  相似文献   

9.
Key-Value(KV)是NoSQL系统中使用较为广泛的一种存储模型。针对当前主流NoSQL系统存在检索功能有限、内存容量小和宕机等问题,提出一种KV型NoSQL系统设计方法。使用可持久化的混合主索引结构解决范围查询和快速重启,采用数据分布算法和内外存混合存储技术,通过数据在内存和固态硬盘之间的智能调度,实现了大规模数据的高效存取和低成本存储。实验结果表明,与Redis相比,对于典型大小Value,该系统在单节点读写性能上提升约8倍,配置成本降低约3/4,在海量数据的低成本高效存储上有明显优势。  相似文献   

10.
海量结构化数据存储检索系统   总被引:4,自引:0,他引:4  
Big Data是近年在云计算领域中出现的一种新型数据,传统关系型数据库系统在数据存储规模、检索效率等方面不再适用.目前的分布式No-SQL数据库可以提供分布式数据存储环境,但是无法支持多列查询.设计并实现分布式海量结构化数据存储检索系统(MDSS).系统采用列存储结构,采用集中分布式B+Tree索引和局部索引相结合的方法提高检索效率.在此基础上讨论复杂查询条件的任务分解机制,支持大数据的多属性检索、模糊检索以及统计分析等查询功能.实验结果表明,提出的分布式结构化数据管理技术和查询任务分解机制可以显著提高分布式条件下大数据集的查询效率,适合应用在日志类数据、流记录数据等海量结构化数据的存储应用场合.  相似文献   

11.
Wide-column NoSQL databases are an important class of NoSQL (Not only SQL) databases which scale horizontally and feature high access performance on sparse tables. With current trends towards big Data Warehouses (DWs), it is attractive to run existing business intelligence/data warehousing applications on higher volumes of data in wide-column NoSQL databases for low latency by mapping multidimensional models to wide-column NoSQL models or using additional SQL add-ons. For examples, applications like retail management can run over integrated data sets stored in big DWs or in the cloud to capture current item-selling trends. Many of these systems also employ Snapshot Isolation (SI) as a concurrency control mechanism to achieve high throughput for read-heavy workloads. SI works well in a DW environment, as analytical queries can now work on (consistent) snapshots and are not impacted by concurrent update jobs performed by online incremental Extract-Transform-Load (ETL) flows that refresh fact/dimension tables. However, the snapshot made available in the DW is often stale, since at the moment when an analytical query is issued, the source updates (e.g. in a remote retail store) may not have been extracted and processed by the ETL process in time due to high input data volume or slow processing speed. This staleness may cause incorrect results for time-critical decision support queries. To address this problem, snapshots which are supposed to be accessed by analytical queries need to be first maintained by corresponding ETL flows to reflect source updates based on given freshness needs. Snapshot maintenance in this work means maintaining the distributed data partitions that are required by a query. Since most NoSQL databases are not ACID compliant and do not provide full-fledged distributed transaction support, snapshot may be inconsistently derived when its data partitions are updated by different ETL maintenance jobs.This paper describes an extended version of HBelt system [1] which tightly integrates the wide-column NoSQL database HBase with a clustered & pipelined ETL engine. Our objective is to efficiently refresh HBase tables with remote source updates while a consistent snapshot is guaranteed across distributed partitions for each scan request in analytical queries. A consistency model is defined and implemented to address so-called distributed snapshot maintenance. To achieve this, ETL jobs and analytical queries are scheduled in a distributed processing environment. In addition, a partitioned, incremental ETL pipeline is introduced to increase the performance of ETL (update) jobs. We validate the efficiency gain in terms of data pipelining and data partitioning using the TPC-DS benchmark, which simulates a modern decision support system for a retail product supplier. Experimental results show that high query throughput can be achieved in HBelt when distributed, refreshed snapshots are demanded.  相似文献   

12.
随着生态学研究尺度的不断扩大,现有的数据采集管理系统已不能满足生态系统观测数据的跨学科、大规模等转变。我们针对现代生态学研究的要求,设计了一种通用的跨学科观测数据模型,采用 NoSQL 的技术方案构建了支持分布式大规模存储的数据库,并设计实现了通用的生态观测数据管理平台。解决了跨学科异构数据的融合存储、数据模型的高度可扩展等科学问题。  相似文献   

13.
In the last decade, a new class of data management systems collectively called NoSQL systems emerged and are now intensively developed. The main feature of these systems is that they abandon the relational data model and the SQL, do not fully support ACID transactions, and use distributed architecture (even though there are non-distributed NoSQL systems as well). As a result, such systems outperform the conventional SQL-oriented DBMSs in some applications; in addition, such systems are highly scalable under increasing workloads and huge amounts of data, which is important, in particular, for Web applications. Unfortunately, the absence of transactional semantics imposes certain constraints on the class of applications where NoSQL systems can be effectively used and the choice of a particular system significantly depends on the application. In this paper, a review of the main classes of NoSQL data management systems is given and examples of systems and applications where they can be used are discussed.  相似文献   

14.
本文在传统的网络结构基础上,结合P2P技术思想,建立适合大规模用户参与的虚拟地理环境的分布式网络结构,重点研究在该网络结构下的用户节点组织、资源注册、资源发现和失效恢复,以网络环境下的大规模地形实时漫游和视频协同为例,设计和实现了一个原型系统并进行了初步实验。  相似文献   

15.
NoSQL systems are increasingly adopted for Web applications requiring scalability that relational database systems cannot meet. Although NoSQL systems have not been designed to support joins, as they are applied to a wide variety of applications, the need to support joins has emerged. Furthermore, joins performed in NoSQL systems are generally similarity joins, rather than exact-match joins, which find similar pairs of records. Since Web applications often use the MapReduce framework, we develop a solution to perform similarity joins in NoSQL systems using the MapReduce framework.  相似文献   

16.
传统的关系型数据库已无法满足海量数据的存储与访问需求。针对该问题,提出一种非关系型数据库(NoSQL)的分布式存储与扩展解决方法。分析并改进NoSQL,讨论基于一致性哈希算法键值对的分布式存储,以及基于双hash环的数据库服务器节点的扩展方法,提出将NoSQL作为镜像引入数据库架构系统。实际应用结果表明,该方法可以避免资源浪费及服务器过载。  相似文献   

17.
随着云计算时代的到来,大型Web应用的不断发展,海量数据不断增加,集中式的数据检索已不再满足需求.如何在分布式的环境中高效地处理数据检索成为亟待解决的问题.传统的关系型数据存储也无法完全适应云环境,NoSQL(Not only SQL)作为一种云存储形式应运而生,其中Cassandra的应用较为广泛.以分布式的多节点架...  相似文献   

18.
针对通用数据库海量数据检索速度慢的缺点,文章提出了一个数据检索优化系统.该系统通过将海量数据拆分成短语和单词,利用哈稀算法和基数排序算法,将拆分的短语和单词重新组织成词典,并对每个短语和单词建立倒排表,利用该倒排表对通用数据库中的海量数据做索引.使用这种基于倒排表的数据索引能够将数据检索速度降低到毫秒级.  相似文献   

19.
The query in inverted file organization can be expressed in the form of a Boolean expression, that states the data selection criterion. In response to a query the system accesses inverted lists, merges them and then accesses those records in the secondary storage, that satisfy the search logic. Because the stored inverted lists are selected by taking into consideration the file organization as a whole, it is not always advantageous to use all the stored inverted lists. The remaining conditions will be examined from the records that fulfil the conditions, which are examined using inverted lists, In this paper we present an optimization algorithm for the selection of the address lists that are worth using.  相似文献   

20.
In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations that are frequently present in user queries, and (2) the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. These two properties guarantee acceptable latency and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the posting lists transmitted during retrieval never exceed a constant size. A novel index update mechanism efficiently handles adding of new documents to the document collection. Thus, the generated distributed index corresponds to a constantly evolving query-driven indexing structure that efficiently follows current information needs of the users and changes in the document collection.We show that the size of the index and the generated indexing/retrieval traffic remains manageable even for Web-size document collections at the price of a marginal loss in precision for rare queries. Our theoretical analysis and experimental results provide convincing evidence about the feasibility of the query-driven indexing strategy for large scale P2P text retrieval.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号