首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 928 毫秒
1.
磁盘作为海量数据的主要存储介质,具有容量大、成本低的优点,但是磁盘IO带宽远远落后于数据增长速度,日益成为大数据管理系统的性能瓶颈。因此,优化存储结构、提高读写效率是大数据时代管理系统面临的重要挑战。提出了一种基于关键列分组排序的混合列存储结构KCGS-Store,根据关键列分组将关系表划分为存储池,确保池内所有记录在关键列上的取值或取值范围相同,然后逐列进行池合并。合并后的关键列,以池为单位有序排列,执行条件查询时能够有效过滤无关列值,减少数据读取量,提升查询性能。同时利用池号索引,以少量时间空间代价完成记录重组。实验数据表明,与ORCFile、Parquet存储结构相比,KCGS-STORE在存储空间、数据加载、SQL查询等方面都有不同程度的优化。  相似文献   

2.
连接操作是影响列存储数据查询效率的重要操作之一,对于列存储系统中的连接操作优化,以往的研究工作大多专注于对数据组织结构的优化以及辅助物理结构的建立上,极少涉及逻辑层特别是早期的连接策略优化.为此,根据列存储数据的特点和分析型查询需求的特征,提出了一种新的列存储连接优化方法.该方法采用提早优化的策略,使用“事实表下推”的优化规则,并在多事实表查询条件下引入浓密树进行连接顺序决策,以较小的时空复杂度获得“最优”的连接执行顺序.使用代价估计模型对提出的连接策略优化方法进行了理论验证.同时,在大规模数据仓库基准数据集SSB上通过实验验证了提旱优化机制及下推规则的有效性.  相似文献   

3.
为了实现海量数据的高质量查询和调度,降低查询消耗、提高查询效率,研究基于遗传算法的物理数据模型优化方法。构建动态—增量物理数据模型,利用状态基态库存储当前时刻高频变化的数据对象,通过动态增量库体现存储数据变化过程,结合状态基态表和增量表建立基于优先级的历史库,提高数据的查询能力、降低对硬盘的需求;运用自适应遗传算法,经种群初始化、适应度函数设计、交叉概率和变异概率选取等步骤,对物理数据模型进行查询优化,获取最佳查询结果。实验结果表明:该方法的迭代收敛效果好,采用差异大的交叉算子和变异算子可提升物理数据模型的查询能力;查询任务越多,该方法查询消耗比越低,优势越显著;且用户并发数量增加对于该方法并发延迟时间的影响甚微。  相似文献   

4.
查询执行时间预测(Query Performance Prediction, QPP)是数据库系统中一个重要的研究问题。当数据库系统中存在并发执行的事务时,现有的QPP方法无法在不改变数据库查询性能的前提下建立准确的QPP模型。为此,提出了一种基于物理操作的查询执行时间预测新方法,该方法以查询的物理操作为单位建立单元预测模型,根据查询计划将单元预测模型组合为完整的QPP模型,把能够刻画数据库系统并发状态的统计信息纳入模型的输入特征。所提方法只须使用DBMS提供的基本手段即可获取构建模型所需的数据库统计信息,无须改变DBMS,也不会影响数据库系统上原有工作负载的执行。实验结果表明,所提方法无论在OLTP还是OLAP应用中,在不同的查询计划和并发度下的预测准确性均高于其他对比方法。  相似文献   

5.
为了提高查询效率,从数据流查询过程中查询操作单元和查询存储结构的共享两个方面展开研究.设计一种基于共享的二级索引队列,用于存储数据流中间结果.该结构使得中间查询结果可以再利用的同时也为数据共享情况下的迁移提供了一定的灵活性.对于多查询共享,通过抽取相同数据流中的相同谓词进行查询共享,实现一处计算多处使用的目的.最后对相关模型和算法进行了分析.  相似文献   

6.
随着XML技术的发展,如何利用现有的数据库技术存储和查询XML文档已成为XML数据管理领域研究的热点问题。本文介绍了一种新的文档编码方法,以及基于这种编码方式提出了一种新的XML文档存储方法。方法按照文档中结点类型将XML文档树型结构分解为结点,分别存储到对应的关系表中,这种方法能够将任意结构的文档存储到一个固定的关系模式中。同时为了便于实现数据的查询,将文档中出现的简单路径模式也存储为一个表。这种新的文档存储方法能够有效地支持文档的查询操作,并能根据结点的编码信息实现原XML文档的正确恢复。最后,对本文提出的存储方法和恢复算法进行了实验验证。  相似文献   

7.
列存储数据查询优化的重点是列的连接策略.现有的列存储系统通过存储的改变来简化列的连接,致使列的连接缺少查询优化处理,策略单一且无法满足复杂查询.在剖析现有连接选择策略的基础上,提出一种新的连接策略优化方法,即首先利用基于规则的优化方法为列存储数据查询制定优化规则,过滤不可能产生最优计划的候选计划;然后设计了基于代价的优化算法,根据动态Huffman树和左深连接树原理对查询执行顺序进行改进,进一步减少候选计划的规模;根据列存储数据的特点将候选计划中每个连接节点的执行策略归纳为串行连接和并行连接两类,并在此基础上提出代价估计模型,进而可针对这两种连接策略进行代价估计和策略选择.最后在SSB数据集上通过实验证明了方法在列存储数据查询中的有效性.  相似文献   

8.
关系数据库具有成熟的索引、存储、查询技术,将XML数据存储到关系数据库中将极具意义。但是,XML数据复杂的层次结构和关系数据库扁平的表达结构之间的不匹配,使得在存储过程中出现了很多复杂的问题。从上述应用背景出发,文章提出了一种基于模型映射以及嵌套集合模型(Nested Sets Model)的通用的关系存储模型以及查询算法。通过实验并且分析了在各个查询场景下实验数据的正确性以及性能情况后表明,该中间件能够使XML数据在关系模式中有效存储,并且能够有效地满足查询要求。  相似文献   

9.
非结构网格预处理方法是非结构网格CFD并行计算的关键技术之一。提出基于缓冲数据结构的快速搜索算法来建立全局网格单元邻接关系图,算法复杂度低,能够显著降低非结构网格预处理的存储需求;在提高核心计算访存命中率方面,提出网格单元重排序算法,该算法能够提高核心计算效率,并通用于各种非结构网格问题。实验结果表明,在用于大网格量的复杂计算区域时该非结构网格预处理技术仍能得到较理想的结果。  相似文献   

10.
混合存储下的MapReduce启发式多表连接优化   总被引:1,自引:0,他引:1  
对Map Reduce下的多表连接查询进行了研究,发现由于Map Reduce框架本身的局限性,造成执行效率较低。针对此问题,提出了Map Reduce启发式多表连接优化方法(Map Reduce based heuristic multi-join optimization,MHMO),为不同的连接模式启发式地推荐不同的执行算法。特别的,对于混合连接,首先将其分组为多个简单连接模式,进而定义代价模型确定各分组的最优执行顺序。结合列存储的延迟物化技术,大大提高了Map Reduce下多表连接的执行性能。最后,在数据仓库基准测试数据集TPCH上进行了实验,验证了MHMO的有效性。  相似文献   

11.
We address the issue of mining frequent conjunctive queries in a relational database, a problem known to be intractable even for conjunctive queries over a single table. In this article, we show that mining frequent projection-selection-join queries becomes tractable if joins are performed along keys and foreign keys, in a database satisfying functional and inclusion dependencies, under certain restrictions. We note that these restrictions cover most practical cases, including databases operating over star schemas, snow-flake schemas and constellation schemas. In our approach, we define an equivalence relation over queries using a pre-ordering with respect to which the support is shown to be anti-monotonic. We propose a level-wise algorithm for computing all frequent queries by exploiting the fact that equivalent queries have the same support. We report on experiments showing that, in our context, mining frequent projection-selection-join queries is indeed tractable, even for large data sets.  相似文献   

12.
In this paper, we propose a deterministic column-based matrix decomposition method. Conventional column-based matrix decomposition (CX) computes the columns by randomly sampling columns of the data matrix. Instead, the newly proposed method (termed as CX_D) selects columns in a deterministic manner, which well approximates singular value decomposition. The experimental results well demonstrate the power and the advantages of the proposed method upon three real-world data sets.  相似文献   

13.
XML文档及其函数依赖到关系的映射   总被引:16,自引:2,他引:16  
有许多文章提出了根据DTD将XML映射成关系的方法,但都没有考虑XML的语义,而语义信息对数据存储模式设计、查询优化、更新异常检查等来说是十分重要的,如果在DTD上指定了XML的函数依赖,在映射到关系数据库中就需要将其考虑进去.基于Hybrid Inlining方法并考虑XML函数依赖,提出了一种既能保持XML文档的内容和结构信息,又能保持函数依赖信息的映射方法.通过这种方法可以减少存储冗余,同时证明了映射后的关系都满足第三范式.  相似文献   

14.
In a fuzzy relational database where a relation is a fuzzy set of tuples and ill-known data are represented by possibility distributions, nested fuzzy queries can be expressed in the Fuzzy SQL language. Although it provides a very convenient way for users to express complex queries, a nested fuzzy query may be very inefficient to process with the naive evaluation method based on its semantics. In conventional databases, nested queries are unnested to improve the efficiency of their evaluation. In this paper, we extend the unnesting techniques to process several types of nested fuzzy queries. An extended merge-join is used to evaluate the unnested fuzzy queries. As shown by both theoretical analysis and experimental results, the unnesting techniques with the extended merge-join significantly improve the performance of evaluating nested fuzzy queries  相似文献   

15.
Sharing of structured data in decentralized environments is a challenging problem, especially in the absence of a global schema. Social network structures map network links to semantic relations between participants in order to assist in efficient resource discovery and information exchange. In this work, we propose a scheme that automates the process of creating schema synopses from semantic clusters of peers which own autonomous relational databases. The resulting mediated schemas can be used as global interfaces for relevant queries. Active nodes are able to initiate the group schema creation process, which produces a mediated schema representative of nodes with similar semantics. Group schemas are then propagated in the overlay and used as a single interface for relevant queries. This increases both the quality and the quantity of the retrieved answers and allows for fast discovery of interest groups by joining peers. As our experimental evaluations show, this method increases both the quality and the quantity of the retrieved answers and allows for faster discovery of semantic groups by joining peers.  相似文献   

16.
This article deals with query processing techniques for the SQLf language which is an extended version of SQL supporting imprecise queries interpreted in the framework of fuzzy sets. SQLf, as well as SQL, allows for the use of nested queries, in which a (fuzzy) condition involved in a select block, calls on another select block (the nested one). Two types of processing strategies for nested queries are discussed. The first one tends to take advantage of existing database management systems (DBMS) to process fuzzy queries thanks to an additional layer which is in charge of translating the initial query into a Boolean one. In this perspective, the performances obtained depend strongly on the efficiency of the underlying DBMS. The other strategy is slightly different and it is situated in the context of the design of systems involving specific algorithms for processing fuzzy queries. In this article, the focus is put on algorithms related to the generic nesting construct “exists.” © 1996 John Wiley & Sons, Inc.  相似文献   

17.
嵌套查询的非嵌套化处理研究   总被引:2,自引:0,他引:2  
孟小峰  王珊 《计算机学报》1995,18(4):241-251
嵌套查询是SQL查询语言的重要特色,传统的数据为系统处理嵌套查询的方法是TIS.TIS方法处理效率很低。目前高嵌套查询处理效率的有效方法是非嵌套化处理方法。  相似文献   

18.
Analysis of historical data in data warehouses contributes significantly toward future decision-making. A number of design factors including, slowly changing dimensions (SCDs), affect the quality of such analysis. In SCDs, attribute values may change over time and must be tracked. They should maintain consistency and correctness of data, and show good query performance. We identify that SCDs can have three types of validity periods: disjoint, overlapping, and same validity periods. We then show that the third type cannot be handled through the temporal star schema for temporal data warehouses (TDWs). We further show that a hybrid/Type6 scheme and temporal star schema may be used to handle this shortcoming. We demonstrate that the use of a surrogate key in the hybrid scheme efficiently identifies data, avoids most time comparisons, and improves query performance. Finally, we compare the TDWs and a surrogate key-based temporal data warehouse (SKTDW) using query formulation, query performance, and data warehouse size as parameters. The results of our experiments for 23 queries of five different types show that SKTDW outperforms TDW for all type of queries, with average and maximum performance improvements of 165% and 1071%, respectively. The results of our experiments are statistically significant.  相似文献   

19.
In many applications, XML documents need to be modelled as graphs. The query processing of graph-structured XML documents brings new challenges. In this paper, we design a method based on labelling scheme for structural queries processing on graph-structured XML documents. We give each node some labels, the reachability labelling scheme. By extending an interval-based reachability labelling scheme for DAG by Rakesh et al., we design labelling schemes to support the judgements of reachability relationships for general graphs. Based on the labelling schemes, we design graph structural join algorithms to answer the structural queries with only ancestor-descendant relationship efficiently. For the processing of subgraph query, we design a subgraph join algorithm. With efficient data structure, the subgraph join algorithm can process subgraph queries with various structures efficiently. Experimental results show that our algorithms have good performance and scalability. Support by the Key Program of the National Natural Science Foundation of China under Grant No.60533110; the National Grand Fundamental Research 973 Program of China under Grant No. 2006CB303000; the National Natural Science Foundation of China under Grant No. 60773068 and No. 60773063.  相似文献   

20.
This research investigates and approach to query processing in a multidatabase system that uses an objectoriented model to capture the semantics of other data models. The object-oriented model is used to construct a global schema, defining an integrated view of the different schemas in the environment. The model is also used as a self-describing model to build a meta-database for storing information about the global schema. A unique aspect of this work is that the object-oriented model is used to describe the different data models of the multidatabase environment, thereby extending the meta database with semantic information about the local schemas. With the global and local schemas all represented in an object-oriented form, structural mappings between the global schema and each local schema are then easily supported. An object algebra then provides a query language for expressing global queries, using the structural mappings to translate object algebra queries into SQL queries over local relational schema. The advantage of using an object algebra is that the object-oriented database can be viewed as a blackboard for temporary storage of local data and for establishing relationships between different databases. The object algebra can be used to directly retrieve temporarily-stored data from the object-oriented database or to transparently retrieve data from local sources using the translation process described in this paper.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号