期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Comparing columnar,row and array DBMSs to process recursive queries on graphs

《Information Systems》2017

Analyzing graphs is a fundamental problem in big data analytics, for which DBMS technology does not seem competitive. On the other hand, SQL recursive queries are a fundamental mechanism to analyze graphs in a DBMS, whose processing and optimization are significantly harder than traditional SPJ queries. Columnar DBMSs are a new faster class of database system, with significantly different storage and query processing mechanisms compared to row DBMSs, still the dominating technology. With that motivation in mind, we study the optimization of recursive queries on a columnar DBMS focusing on two fundamental and complementary graph problems: transitive closure and adjacency matrix multiplication. From a query processing perspective we consider the three fundamental relational operators: selection, projection and join (SPJ), where projection subsumes SQL group-by aggregation. We present comprehensive experiments comparing recursive query processing on columnar, row and array DBMSs to analyze large graphs with different shape and density. We study the relative impact of query optimizations and we compare raw speed of DBMSs to evaluate recursive queries on graphs. Results confirm classical query optimizations that keep working well in a columnar DBMS, but their relative impact is different. Most importantly, a columnar DBMS with tuned query optimization is uniformly faster than row and array systems to analyze large graphs, regardless of their shape, density and connectivity. On the other hand, there is no clear winner between the row and array DBMSs. 相似文献

2.

Query optimization in XML structured-document databases

Dunren Che Karl Aberer M. Tamer Özsu 《The VLDB Journal The International Journal on Very Large Data Bases》2006,15(3):263-289

While the information published in the form of XML-compliant documents keeps fast mounting up, efficient and effective query processing and optimization for XML have now become more important than ever. This article reports our recent advances in XML structured-document query optimization. In this article, we elaborate on a novel approach and the techniques developed for XML query optimization. Our approach performs heuristic-based algebraic transformations on XPath queries, represented as PAT algebraic expressions, to achieve query optimization. This article first presents a comprehensive set of general equivalences with regard to XML documents and XML queries. Based on these equivalences, we developed a large set of deterministic algebraic transformation rules for XML query optimization. Our approach is unique, in that it performs exclusively deterministic transformations on queries for fast optimization. The deterministic nature of the proposed approach straightforwardly renders high optimization efficiency and simplicity in implementation. Our approach is a logical-level one, which is independent of any particular storage model. Therefore, the optimizers developed based on our approach can be easily adapted to a broad range of XML data/information servers to achieve fast query optimization. Experimental study confirms the validity and effectiveness of the proposed approach. 相似文献

3.

Seamlessly integrating similarity queries in SQL

M. C. N. Barioni H. L. Razente A. J. M. Traina C. Traina Jr 《Software》2009,39(4):355-384

Modern database applications are increasingly employing database management systems (DBMS) to store multimedia and other complex data. To adequately support the queries required to retrieve these kinds of data, the DBMS need to answer similarity queries. However, the standard structured query language (SQL) does not provide effective support for such queries. This paper proposes an extension to SQL that seamlessly integrates syntactical constructions to express similarity predicates to the existing SQL syntax and describes the implementation of a similarity retrieval engine that allows posing similarity queries using the language extension in a relational DBMS. The engine allows the evaluation of every aspect of the proposed extension, including the data definition language and data manipulation language statements, and employs metric access methods to accelerate the queries. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献

4.

多媒体对象查询语言及其查询处理 总被引：4，自引：0，他引：4

田增平党华锐周傲英施伯乐《软件学报》1999,10(7):694-701

文章研究了多媒体数据库的查询需求,提出结构化的多媒体对象查询语言MOQL(multi-media object query language).它能够支持基于类型、结构特征、同步关系、时态关系和内容信息的多媒体查询.以DB2数据库为存储机制,定义了一组代数算子和变换规则,利用它们可以将用户定义的MOQL查询变换为代数表达式,进行代数优化,并将代数查询表达式转换为能够在DB2数据库上运行的DB2SQL和C++查询过程. 相似文献

5.

Kaleidoscope: a cooperative menu-guided query interface (SQLversion)

Cha S.K. 《Knowledge and Data Engineering, IEEE Transactions on》1991,3(1):42-47

Kaleidoscope's approach is presented in the context of seeking improvement in the usability of interactive structured query language (SQL) interfaces. The system's cooperation is summarized as proposing valid query constituents step-by-step and providing lexical and semantic feedback immediately to users. To implement this intraquery guidance, the context-free grammar (CFG) is extended to capture the constraints useful for intraquery guidance, and the knowledge useful for pruning nonsensical queries and providing semantic feedback is articulated. For the SQL interface, this knowledge includes a strong domain concept, functional dependency, and integrity constraint rules, which can be acquired once in the database design step. The same types of knowledge are useful both for postquery cooperation and intraquery guidance. As SQL is supported bv virtually all database management system (DBMS) vendors, the approach presents a practical solution for casual database access 相似文献

6.

Some approaches for processing SQLf nested queries

Patrick Bosc 《国际智能系统杂志》1996,11(9):613-632

This article deals with query processing techniques for the SQLf language which is an extended version of SQL supporting imprecise queries interpreted in the framework of fuzzy sets. SQLf, as well as SQL, allows for the use of nested queries, in which a (fuzzy) condition involved in a select block, calls on another select block (the nested one). Two types of processing strategies for nested queries are discussed. The first one tends to take advantage of existing database management systems (DBMS) to process fuzzy queries thanks to an additional layer which is in charge of translating the initial query into a Boolean one. In this perspective, the performances obtained depend strongly on the efficiency of the underlying DBMS. The other strategy is slightly different and it is situated in the context of the design of systems involving specific algorithms for processing fuzzy queries. In this article, the focus is put on algorithms related to the generic nesting construct “exists.” © 1996 John Wiley & Sons, Inc. 相似文献

7.

SQL能耗建模及优化研究

国冰磊于炯廖彬杨德先《计算机科学》2015,42(10):202-207, 231

IT系统能耗的节节攀升,使得设计新一代DBMS时必须考虑其能耗效率问题。由于SQL语句的执行过程大约消耗70%~90%的数据库资源,因此对SQL进行能耗建模及优化对提高数据库的能源使用效率具有重要的意义。在对SQL查询处理机制进行研究的基础上,构建了SQL能耗模型,并对一系列查询优化原则进行了实验,以表明不同优化原则对性能提升及能耗减少的有效性。实验及能耗数据分析表明:CPU利用率是影响系统功耗的最关键因素,SQL能耗优化方法可忽略内存优化且应该均衡考虑性能优化及功耗优化两方面,提出的SQL能耗模型及节能优化方法具有较强的应用价值。相似文献

8.

一种基于对象关系模型的时空数据库管理系统体系结构 总被引：4，自引：0，他引：4

金培权柳建平赵振西岳丽华《小型微型计算机系统》2004,25(1):108-111

时空数据库的关键与难点在于其实现技术．本文提出了一种基于对象关系模型的优化型时空数据库管理系统体系结构，该体系结构采用时空数据类型扩展和时空操作扩展技术对数据库管理系统的内核进行扩充，使其具有内建的时空数据管理能力，同时以时空查询优化层实现时空查询的逻辑优化，解决了底层数据库管理系统的查询优化问题。相似文献

9.

Decision support queries on a tape-resident data warehouse

《Information Systems》2005,30(2):133-149

Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are stored on very large disk arrays, while the remainder is stored on tape-based tertiary storage (which is one to two orders of magnitude less expensive than on-line storage). However, the inherently sequential nature of access to tape-based tertiary storage makes the efficient access to tape-resident data difficult to accomplish through conventional databases.In this paper, we present a way to make access to a massive tape-resident data warehouse easy and efficient. Ad hoc decision support queries usually involve large scale and complex aggregation over the detail data. These queries are difficult to express in SQL, and frequently require self-joins on the detail data (which are prohibitively expensive on the disk-resident data and infeasible to compute on tape-resident data), or unnecessary multiple passes through the detail data. An extension to SQL, the extended multi feature SQL (EMF SQL) expresses complex aggregation computations in a clear manner without using self-joins. The detail data in a data warehouse usually represents a record of past activities, and therefore is temporal. We show that complex queries involving sequences can be easily expressed in EMF SQL. An EMF SQL query can be optimized to minimize the number of passes through the detail data required to evaluate the query, in many cases to only one pass. We describe an efficient query evaluation algorithm along with a query optimization algorithm that minimizes the number of passes through the detail data, and which minimizes the amount of main memory required to evaluate the query. These algorithms are useful not only in the context of tape-resident data warehouses but also in data stream systems which require similar processing techniques. 相似文献

10.

基于树状模型的复杂自然语言查询转SQL技术研究

赵猛陈珂寿黎但伍赛陈刚《软件学报》2022,33(12):4727-4745

自然语言查询转SQL(NL2SQL)是指将自然语言表达的查询文本自动转化成数据库系统可以理解并执行的结构化查询语言SQL表达式的技术.NL2SQL可以为普通用户提供数据库查询访问的自然交互界面,从而实现基于数据库的自然问答.复杂查询的NL2SQL是当前数据库学术界的研究热点,主流方法采用序列到序列(Seq2seq)的编解码方式对问题进行建模.然而,已有的工作大多基于英文场景,面向中文领域实际应用时,中文特殊的口语化表达导致复杂查询转化困难;此外,现有工作难以正确输出包含复杂计算表达式的查询子句.针对上述问题,提出一种树状模型取代序列表示,将复杂查询自顶向下分解为多叉树,树结点代表SQL的各组成元素,采用深度优先搜索来预测生成SQL语句.在Du SQL中文NL2SQL竞赛的两个官方测试集中,该方法分别取得了第1名和第2名的成绩,验证了其有效性. 相似文献

11.

ArchIS: an XML-based approach to transaction-time temporal database systems

Fusheng Wang Carlo Zaniolo Xin Zhou 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(6):1445-1463

Effective support for temporal applications by database systems represents an important technical objective that is difficult to achieve since it requires an integrated solution for several problems, including (i) expressive temporal representations and data models, (ii) powerful languages for temporal queries and snapshot queries, (iii) indexing, clustering and query optimization techniques for managing temporal information efficiently, and (iv) architectures that bring together the different pieces of enabling technology into a robust system. In this paper, we present the ArchIS system that achieves these objectives by supporting a temporally grouped data model on top of RDBMS. ArchIS’ architecture uses (a) XML to support temporally grouped (virtual) representations of the database history, (b) XQuery to express powerful temporal queries on such views, (c) temporal clustering and indexing techniques for managing the actual historical data in a relational database, and (d) SQL/XML for executing the queries on the XML views as equivalent queries on the relational database. The performance studies presented in the paper show that ArchIS is quite effective at storing and retrieving under complex query conditions the transaction-time history of relational databases, and can also assure excellent storage efficiency by providing compression as an option. This approach achieves full-functionality transaction-time databases without requiring temporal extensions in XML or database standards, and provides critical support to emerging application areas such as RFID. 相似文献

12.

不一致数据库上带信任标记的查询结果

吴爱华谈子敬汪卫《软件学报》2012,23(5):1167-1182

不一致数据无法正确反映现实世界,其上的查询结果内含错误或矛盾,而现有的很多不一致数据查询处理相关研究都存在信息丢失的问题.AQA(annotation based query answer)针对这一问题采用信任标签在属性级别上区分一致和不一致数据,避免了信息丢失.但AQA假设记录在依赖左边属性上的分量可信,且只针对函数依赖一种约束,具有应用局限性.在综合约束(函数依赖、包含依赖和域约束)范围内、不确定属性任意的情况下扩展了AQA,重新审视了AQA的数据模型及其上的查询代数,讨论了任意约束在查询结果上的蕴含约束计算问题.实验结果表明,扩展后的AQA非连接类查询的性能和普通的SQL基夺相同,连接查询经优化后性能接近普通SQL查询,但AQA不丢失信息与部分同类研究相比有很大优势. 相似文献

13.

基于语义的分布式查询优化

柳诚飞孙钟秀《计算机学报》1991,14(10):748-756

本文研究了语义查询变换的逻辑基础,讨论了分布式数据库系统中语义变换的可能性和必要性,概括了应用领域中的一些语义信息,提出了一个基于启发式规则的分布式查询变换机制. 相似文献

14.

Optimizing complex queries based on similarities of subqueries

Qiang Zhu Yingying Tao Calisto Zuzarte 《Knowledge and Information Systems》2005,8(3):350-373

As database technology is applied to more and more application domains, user queries are becoming increasingly complex (e.g. involving a large number of joins and a complex query structure). Query optimizers in existing database management systems (DBMS) were not developed for efficiently processing such queries and often suffer from problems such as intolerably long optimization time and poor optimization results. To tackle this challenge, we present a new similarity-based approach to optimizing complex queries in this paper. The key idea is to identify similar subqueries that often appear in a complex query and share the optimization result among similar subqueries in the query. Different levels of similarity for subqueries are introduced. Efficient algorithms to identify similar queries in a given query and optimize the query based on similarity are presented. Related issues, such as choosing good starting nodes in a query graph, evaluating identified similar subqueries and analyzing algorithm complexities, are discussed. Our experimental results demonstrate that the proposed similarity-based approach is quite promising in optimizing complex queries with similar subqueries in a DBMS. 相似文献

15.

Flexible and efficient IR using array databases

Roberto Cornacchia Sándor Héman Marcin Zukowski Arjen P. de Vries Peter Boncz 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(1):151-168

The Matrix Framework is a recent proposal by Information Retrieval (IR) researchers to flexibly represent information retrieval models and concepts in a single multi-dimensional array framework. We provide computational support for exactly this framework with the array database system SRAM (Sparse Relational Array Mapping), that works on top of a DBMS. Information retrieval models can be specified in its comprehension-based array query language, in a way that directly corresponds to the underlying mathematical formulas. SRAM efficiently stores sparse arrays in (compressed) relational tables and translates and optimizes array queries into relational queries. In this work, we describe a number of array query optimization rules. To demonstrate their effect on text retrieval, we apply them in the TREC TeraByte track (TREC-TB) efficiency task, using the Okapi BM25 model as our example. It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. The use of the high-performance MonetDB/X100 relational backend, that provides transparent database compression, allows the system to achieve very fast response times with good precision and low resource usage. 相似文献

16.

Optimizing relational queries in connection hypergraphs: nested queries, views, and binding propagations

Jia Liang Han 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(1):1-11

We optimize relational queries using connection hypergraphs (CHGs). All operations including value-passing between SQL blocks can be set-oriented. By introducing partial evaluations, reordering operations can be achieved for nested queries. For a query using views, we merge CHGs for the views and the query into one CHG and then apply query optimization. Furthermore, we may simulate magic sets methods elegantly in a CHG. Sideways information-passing strategies (SIPS) in a CHG amount to partial evaluations of SIPS paths. We introduce the maximum SIPS strategy, which performs SIPS for all bindings and all SIPS paths for a query. The new method has several advantages. First, the maximum SIPS strategy can be more efficient than the previous SIPS based on simple heuristics. Second, it is conceptually simple and easy to implement. Third, the processing strategies may be incorporated with the search space for query execution plans, which is a proven optimization strategy introduced by System R. Fourth, it provides a general framework of query optimization and may potentially be used to optimize next-generation database systems. Received September 1, 1993 / Accepted January 8, 1996 相似文献

17.

Detecting common subexpressions for multiple query optimization over loosely-coupled heterogeneous data sources

Mahesh B. Chaudhari Suzanne W. Dietrich 《Distributed and Parallel Databases》2016,34(2):119-143

The research presented in this paper supports the identification of common subexpressions as candidates for potential materialized views that form the basis of multiple query optimization in a loosely-coupled distributed system where query expressions access heterogeneous data sources, including relations and data-centric XML. This paper introduces a unifying mixed multigraph formalism to represent SQL, XQuery, and LINQ queries in a common query graph model and a heuristics-based algorithm to detect common subexpressions. The identified common subexpressions represent an opportunity for defining a materialized view to avoid repeating computation. The common subexpressions may access only relations, only XML, or a combination of relations and XML. The mixed multigraph model and the heuristic rules presented in this paper have distinguished advantages over the existing approaches that consider only relational or XML data sources individually. The mixed multigraph model can present SQL, XQuery, and LINQ queries in a single graph model and the heuristic rules are designed to consider the identical and subsumed conditions at the same time. A prototype implementation of the algorithm illustrates the applicability of the approach using various examples from the research literature as well as scenarios over a Criminal Justice enterprise that include common subexpressions across relational and XML data sources. 相似文献

18.

基于嵌入式SQL的Datalog演绎规则解释器的设计 总被引：1，自引：0，他引：1

胡虚怀《计算机工程与应用》2006,42(3):168-171,174

文章提出了一个建立在传统关系数据库基础上的能支持ANSISQL与嵌入式SQL的演绎规则解释器。利用这个解释器,用户能够定义一个蕴含关系并可以像在演绎数据库中使用Datalog规则一样来提出查询。其方法是把演绎规则和查询翻译成嵌入式SQL程序,该程序在执行查询时能被调用。这个解释器可以被认为是扩充RDBMS演绎查询功能的一个前端工具。相似文献

19.

PARADISE: Big data analytics using the DBMS tightly integrated with the distributed file system

Jun-Sung Kim Kyu-Young Whang Hyuk-Yoon Kwon Il-Yeol Song 《World Wide Web》2016,19(3):299-322

There has been a lot of research on MapReduce for big data analytics. This new class of systems sacrifices DBMS functionality such as query languages, schemas, or indexes in order to maximize scalability and parallelism. However, as high functionality of the DBMS is considered important for big data analytics as well, there have been a lot of efforts to support DBMS functionality in MapReduce. HadoopDB is the only work that directly utilizes the DBMS for big data analytics in the MapReduce framework, taking advantage of both the DBMS and MapReduce. However, HadoopDB does not support sharability for the entire data since it stores the data into multiple nodes in a shared-nothing manner—i.e., it partitions a job into multiple tasks where each task is assigned to a fragment of data. Due to this limitation, HadoopDB cannot effectively process queries that require internode communication. That is, HadoopDB needs to re-load the entire data to process some queries (e.g., 2-way joins) or cannot support some complex queries (e.g., 3-way joins). In this paper, we propose a new notion of the DFS-integrated DBMS where a DBMS is tightly integrated with the distributed file system (DFS). By using the DFS-integrated DBMS, we can obtain sharability of the entire data. That is, a DBMS process in the system can access any data since multiple DBMSs are run on an integrated storage system in the DFS. To process big data analytics in parallel, our approach use the MapReduce framework on top of a DFS-integrated DBMS. We call this framework PARADISE. In PARADISE, we employ a job splitting method that logically splits a job based on the predicate in the integrated storage system. This contrasts with physical splitting in HadoopDB. We also propose the notion of locality mapping for further optimization of logical splitting. We show that PARADISE effectively overcomes the drawbacks of HadoopDB by identifying the following strengths. (1) It has a significantly faster (by up to 6.41 times) amortized query processing performance since it obviates the need to re-load data required in HadoopDB. (2) It supports query types more complex than the ones supported by HadoopDB. 相似文献

20.

基于SQL Server 2000的通用模糊查询工具设计

陈逸菲张颖超叶小岭《数字社区&智能家居》2007,2(5):611-613

本文设计并实现了一个基于SQL Server 2000的通用模糊查询工具,该工具可以把带权重的模糊查询转换为标准的SQL语句。用户可以对SQL Server中建立的任何数据库表进行模糊查询。系统提供了以下功能：定义模糊谓词及其隶属函数;定义模糊算子;构造带权重的模糊的、精确的或混合的查询语句,权重和阈值可以由用户给出。满足查询条件的记录将根据匹配度的降序输出。本系统的设计方法也可以推广到其他的数据库系统,如Oracle,Access等。相似文献