期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Hierarchical Grid Index (HGI), spatial queries in wireless data broadcasting

Kwangjin Park Patrick Valduriez 《Distributed and Parallel Databases》2013,31(3):413-446

The main requirements for spatial query processing via mobile terminals include rapid and accurate searching and low energy consumption. Most location-based services (LBSs) are provided using an on-demand method, which is suitable for light-loaded systems where contention for wireless channels and server processing is not severe. However, as the number of users of LBSs increases, performance deteriorates rapidly since the servers’ capability to process queries is limited. Furthermore, the response time of a query may significantly increase with the concentration of users’ queries in a server at the same time. That is because the server has to check the locations of users and potential objects for the final result and then individually send answers to clients via a point-to-point channel. At this time, an inefficient structure of spatial index and searching algorithm may incur an extremely large access latency. To address this problem, we propose the Hierarchical Grid Index (HGI), which provides a light-weight sequential location-based index structure for efficient LBSs. We minimize the index size through the use of hierarchical location-based identifications. And we support efficient query processing in broadcasting environments through sequential data transfer and search based on the object locations. We also propose Top-Down Search and Reduction-Counter Search algorithms for efficient searching and query processing. HGI has a simple structure through elimination of replication pointers and is therefore suitable for broadcasting environments with one-dimensional characteristics, thus enabling rapid and accurate spatial search by reducing redundant data. Our performance evaluation shows that our proposed index and algorithms are accurate and fast and support efficient spatial query processing. 相似文献

2.

基于本体的关系数据集成的查询处理

王进鹏张亚非苗壮《计算机科学》2010,37(12):134-137

为实现异构关系数据库的语义集成,针对传统集成技术存在的问题,在对语义网等相关技术进行分析的基础上,研究基于本体的关系数据集成系统中的查询处理问题,提出了一种基于本体的关系数据库集成框架。设计了基于本体的关系数据的描述方法,使用本体作为集成的全局模式来描述关系模式的语义。设计了查询重写算法,该算法可以将基于全局模式的SPARQL查询重写为针对具体关系数据库的查询,从而实现对异构关系数据库的集成。实验表明,该算法具有良好的可扩展性。相似文献

3.

Efficient processing of top-k dominating queries in distributed environments

Daichi Amagata Yuya Sasaki Takahiro Hara Shojiro Nishio 《World Wide Web》2016,19(4):545-577

Due to the recent massive data generation, preference queries are becoming an increasingly important for users because such queries retrieve only a small number of preferable data objects from a huge multi-dimensional dataset. A top-k dominating query, which retrieves the k data objects dominating the highest number of data objects in a given dataset, is particularly important in supporting multi-criteria decision making because this query can find interesting data objects in an intuitive way exploiting the advantages of top-k and skyline queries. Although efficient algorithms for top-k dominating queries have been studied over centralized databases, there are no studies which deal with top-k dominating queries in distributed environments. The recent data management is becoming increasingly distributed, so it is necessary to support processing of top-k dominating queries in distributed environments. In this paper, we address, for the first time, the challenging problem of processing top-k dominating queries in distributed networks and propose a method for efficient top-k dominating data retrieval, which avoids redundant communication cost and latency. Furthermore, we also propose an approximate version of our proposed method, which further reduces communication cost. Extensive experiments on both synthetic and real data have demonstrated the efficiency and effectiveness of our proposed methods. 相似文献

4.

一种基于共享执行策略的间隔查询优化技术

周新张孝薛忠斌王珊《软件学报》2016,27(12):3067-3084

间隔查询作为重要的查询类型,广泛应用在社交网络、信息检索和数据库领域.为了支持高效的间隔查询,涌现出多种优化技术.尽管已有方法能够快速响应单个间隔查询,然而当查询负载超过服务器的处理能力时,70%的查询均不能在期望时间内得到响应.针对这一问题,提出采用共享执行策略优化间隔查询的方法SESIQ（shared execution strategy for interval queries）.SESIQ对间隔查询进行批处理,分析一组间隔查询间可共享的操作,减少重复数据的访问,从而降低磁盘I/O和网络传输代价,提高检索性能.理论分析并实验验证了SESIQ的可行性,基于两种真实数据集的大量实验结果表明,SESIQ是有效的,间隔查询的检索性能可提升数十倍. 相似文献

5.

Semantics preserving SPARQL-to-SQL translation 总被引：2，自引：0，他引：2

Artem Shiyong Farshad 《Data & Knowledge Engineering》2009,68(10):973-1000

Most existing RDF stores, which serve as metadata repositories on the Semantic Web, use an RDBMS as a backend to manage RDF data. This motivates us to study the problem of translating SPARQL queries into equivalent SQL queries, which further can be optimized and evaluated by the relational query engine and their results can be returned as SPARQL query solutions. The main contributions of our research are: (i) We formalize a relational algebra based semantics of SPARQL, which bridges the gap between SPARQL and SQL query languages, and prove that our semantics is equivalent to the mapping-based semantics of SPARQL; (ii) Based on this semantics, we propose the first provably semantics preserving SPARQL-to-SQL translation for SPARQL triple patterns, basic graph patterns, optional graph patterns, alternative graph patterns, and value constraints; (iii) Our translation algorithm is generic and can be directly applied to existing RDBMS-based RDF stores; and (iv) We outline a number of simplifications for the SPARQL-to-SQL translation to generate simpler and more efficient SQL queries and extend our defined semantics and translation to support the bag semantics of a SPARQL query solution. The experimental study showed that our proposed generic translation can serve as a good alternative to existing schema dependent translations in terms of efficient query evaluation and/or ensured query result correctness. 相似文献

6.

基于统计方法的Hive数据仓库查询优化实现

王有为王伟平孟丹《计算机研究与发展》2015,52(6):1452-1462

Map/Reduce是海量离线数据分析中广泛应用的并行编程模型.Hive数据仓库基于Map/Reduce实现了查询处理引擎,然而Map/Reduce框架在处理偏斜数据时会出现工作负载分布不均的问题.均衡计算模型(computation balanced model, CBM),其核心思想是通过数据分布特征指导查询计划优化.相应研究贡献包括2部分,首先针对应用极广的GroupBy查询和Join查询建立了运行估价模型,确定了不同场景下查询计划的优化选择分支;其次基于Hive ETL机制设计了一种统计信息收集方法,解决了统计海量数据分布特征的问题.实验数据表明,通过CBM优化的 GroupBy查询耗时节省了8%～45%,Join查询耗时节省了12%～46%;集群CPU负载均衡指标优化了60%～80%,I/O负载均衡指标优化了60%～90%.实验结果证实了基于CBM模型优化的查询计划生成器能显著均衡化Hive查询运行时的集群负载,并优化了查询处理效率. 相似文献

7.

Dynamic querying of streaming data with the dQUOB system

Plale B. Schwan K. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(4):422-432

Data streaming has established itself as a viable communication abstraction in data-intensive parallel and distributed computations, occurring in applications such as scientific visualization, performance monitoring, and large-scale data transfer. A known problem in large-scale event communication is tailoring the data received at the consumer. It is the general problem of extracting data of interest from a data source, a problem that the database community has successfully addressed with SOL queries, a time tested, user-friendly way for noncomputer scientists to access data. By leveraging the efficiency of query processing provided by relational queries, the dQUOB system provides a conceptual relational data model and SOL query access over streaming data. Queries can be used to extract data, combine streams, and create new streams. The language augments queries with an action to enable more complex data transformations such as Fourier transforms. The dQUOB system has been applied to two large-scale distributed applications: a safety critical autonomous robotics simulation and scientific software visualization for global atmospheric transport modeling. In this paper, we present the dQUOB system and the results of performance evaluation undertaken to assess its applicability in data-intensive wide-area computations, where the benefit of portable data transformation must be evaluated against the cost of continuous query evaluation. 相似文献

8.

On the finite controllability of conjunctive query answering in databases under open-world assumption

Riccardo Rosati 《Journal of Computer and System Sciences》2011,77(3):572-594

In this paper we study queries over relational databases with integrity constraints (ICs). The main problem we analyze is OWA query answering, i.e., query answering over a database with ICs under open-world assumption. The kinds of ICs that we consider are inclusion dependencies and functional dependencies, in particular key dependencies; the query languages we consider are conjunctive queries and unions of conjunctive queries. We present results about the decidability of OWA query answering under ICs. In particular, we study OWA query answering both over finite databases and over unrestricted databases, and identify the cases in which such a problem is finitely controllable, i.e., when OWA query answering over finite databases coincides with OWA query answering over unrestricted databases. Moreover, we are able to easily turn the above results into new results about implication of ICs and query containment under ICs, due to the deep relationship between OWA query answering and these two classical problems in database theory. In particular, we close two long-standing open problems in query containment, since we prove finite controllability of containment of conjunctive queries both under arbitrary inclusion dependencies and under key and foreign key dependencies. The results of our investigation are very relevant in many research areas which have recently dealt with databases under an incomplete information assumption: e.g., data integration, data exchange, view-based information access, ontology-based information systems, and peer data management systems. 相似文献

9.

gStore: a graph-based SPARQL query engine

Lei Zou M. Tamer Özsu Lei Chen Xuchuan Shen Ruizhe Huang Dongyan Zhao 《The VLDB Journal The International Journal on Very Large Data Bases》2014,23(4):565-590

We address efficient processing of SPARQL queries over RDF datasets. The proposed techniques, incorporated into the gStore system, handle, in a uniform and scalable manner, SPARQL queries with wildcards and aggregate operators over dynamic RDF datasets. Our approach is graph based. We store RDF data as a large graph and also represent a SPARQL query as a query graph. Thus, the query answering problem is converted into a subgraph matching problem. To achieve efficient and scalable query processing, we develop an index, together with effective pruning rules and efficient search algorithms. We propose techniques that use this infrastructure to answer aggregation queries. We also propose an effective maintenance algorithm to handle online updates over RDF repositories. Extensive experiments confirm the efficiency and effectiveness of our solutions. 相似文献

10.

一种支持高效XML 路径查询的自适应结构索引 总被引：1，自引：0，他引：1

张博耿志华周傲英《软件学报》2009,20(7):1812-1824

提出了一种新的自适应结构索引:AS-Index(adaptive structural index),能够克服现有静态索引和自适应索引的缺陷,具备高效的查询和调整性能.AS-Index 建立在F&B-Index 的基础之上,其索引结构包括F&B-Index,Query-Table 和Part-Table.Query-Table 能够记录频繁查询,避免了查询过程中的冗余操作.并且,在Query-Table 的基础上提出了自底向上的查询处理过程,能够充分利用现有的频繁查询高效地回答非频繁查询.Part-Table 用于优化包含祖先后裔边的查询,进一步提高了查询性能.现有的自适应结构索引的调整粒度是XML 元素节点,调整过程往往需要遍历整个文档.而AS-Index 是基于F&B-Index 节点的增量调整,其过程是局部的,高效的,并且能够支持复杂分支查询的调整.实验结果表明,AS-Index 在查询和调整性能上优于现有的XML 结构索引.同时,相比于现有的自适应结构索引,AS-Index 针对大规模文档具有更加优良的可扩展性. 相似文献

11.

一种并行处理Skyline查询的有效方法

黄震华向阳薛永生赵杠《自动化学报》2010,36(7):968-975

Skyline查询是近年来数据库领域的一个研究重点和热点, 这主要是因为Skyline查询在许多领域有着广泛的应用. 现有的工作大都集中于单处理机环境, 然而, 由于Skyline查询是CPU敏感的, 因此,在实际应用中, 现有的方法具有很大的局限性. 基于此, 提出一种有效降低处理Skyline查询时间开销的并行算法PAPSQ (Parallel algorithm for processing skyline queries). 算法有机结合多维数据对象的自身特性和通用多处理机系统的实施优点, 以Skyline查询搜索偏序格为底层结构, 利用多维数据对象的同胚评估值和偏序格加权技术来有效提高并行处理Skyline查询的效率. 实验评估表明, PAPSQ算法具有有效性和实用性. 相似文献

12.

大数据环境中交互式查询差分隐私保护模型

袁健王迪申泽宇《计算机应用研究》2019,36(6)

随着大数据时代的到来,数据挖掘技术被广泛应用,而线性查询作为该技术中最基础和最频繁的操作,其隐私保护在数据分析和数据发布隐私保护中占有极其重要的位置。交互式线性查询的交互增加了数据的处理量,运用传统的隐私保护模型效率较低。为了解决大数据环境中交互式查询差分隐私保护问题,模型针对大规模数据集中交互式线性查询差分隐私保护的特点,通过数据关联性分析减少冗余信息,采用交替方向乘子法对查询负载矩阵进行分解,利用自适应加噪技术产生差分隐私保护所需要的合理数量的噪声,设计并行处理方法实现该模型的计算。实验将提出的模型与以往模型进行对比。结果表明,所提出的模型在提升隐私保护精度的同时,也极大地提高了算法性能,因此模型切实可行。相似文献

13.

一种可扩展的XPath查询最小化算法框架

林峰冯建华塔娜李国良洪亲《计算机科学》2008,35(3):58-60

XPath是XML的基本查询语言,XPath查询最小化对于提高XML数据库的查询性能具有重要意义.但是,由于XPath查询最小化是一个coNP完备问题,大部分已有的算法局限于处理简单的XPath片段.本文从一个新的角度入手,综合考虑完备性和高效性,提出了一个新的查询最小化框架,与已有算法"面向结点",即逐个删除冗余结点的解决思路不同,本文提出"面向树模式"的方式,即通过计算树模式的自同态映射,寻找目标结点集最小的自同态映射,进而求解最小等价查询树的方法.该方法具有较高的效率,而且在--Z..情况下是完备的,尤其是可以进一步扩展到更复杂的XPath片段.本文以此框架为基础,给出一个可以计算复杂查询模式的算法. 相似文献

14.

Secure query processing against encrypted XML data using Query-Aware Decryption

Jae-Gil Lee Kyu-Young Whang 《Information Sciences》2006,176(13):1928-1947

Dissemination of XML data on the internet could breach the privacy of data providers unless access to the disseminated XML data is carefully controlled. Recently, the methods using encryption have been proposed for such access control. However, in these methods, the performance of processing queries has not been addressed. A query processor cannot identify the contents of encrypted XML data unless the data are decrypted. This limitation incurs overhead of decrypting the parts of the XML data that would not contribute to the query result. In this paper, we propose the notion of Query-Aware Decryption for efficient processing of queries against encrypted XML data. Query-Aware Decryption allows us to decrypt only those parts that would contribute to the query result. For this purpose, we disseminate an encrypted XML index along with the encrypted XML data. This index, when decrypted, informs us where the query results are located in the encrypted XML data, thus preventing unnecessary decryption for other parts of the data. Since the size of this index is much smaller than that of the encrypted XML data, the cost of decrypting this index is negligible compared with that for unnecessary decryption of the data itself. The experimental results show that our method improves the performance of query processing by up to six times compared with those of existing methods. Finally, we formally prove that dissemination of the encrypted XML index does not compromise security. 相似文献

15.

Decision support queries on a tape-resident data warehouse

《Information Systems》2005,30(2):133-149

Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are stored on very large disk arrays, while the remainder is stored on tape-based tertiary storage (which is one to two orders of magnitude less expensive than on-line storage). However, the inherently sequential nature of access to tape-based tertiary storage makes the efficient access to tape-resident data difficult to accomplish through conventional databases.In this paper, we present a way to make access to a massive tape-resident data warehouse easy and efficient. Ad hoc decision support queries usually involve large scale and complex aggregation over the detail data. These queries are difficult to express in SQL, and frequently require self-joins on the detail data (which are prohibitively expensive on the disk-resident data and infeasible to compute on tape-resident data), or unnecessary multiple passes through the detail data. An extension to SQL, the extended multi feature SQL (EMF SQL) expresses complex aggregation computations in a clear manner without using self-joins. The detail data in a data warehouse usually represents a record of past activities, and therefore is temporal. We show that complex queries involving sequences can be easily expressed in EMF SQL. An EMF SQL query can be optimized to minimize the number of passes through the detail data required to evaluate the query, in many cases to only one pass. We describe an efficient query evaluation algorithm along with a query optimization algorithm that minimizes the number of passes through the detail data, and which minimizes the amount of main memory required to evaluate the query. These algorithms are useful not only in the context of tape-resident data warehouses but also in data stream systems which require similar processing techniques. 相似文献

16.

An efficient workload-based data layout scheme for multidimensional data

Kazi A. Sriram 《Data & Knowledge Engineering》2001,39(3):271-291

Physical data layout is a crucial factor in the performance of queries and updates in large data warehouses. Data layout enhances and complements other performance features such as materialized views and dynamic caching of aggregated results. Prior work has identified that the multidimensional nature of large data warehouses imposes natural restrictions on the query workload. A method based on a “uniform” query class approach has been proposed for data clustering and shown to be optimal. However, we believe that realistic query workloads will exhibit data access skew. For instance, if time is a dimension in the data model, then more queries are likely to focus on the most recent time interval. The query class approach does not adequately model the possibility of multidimensional data access skew. We propose the affinity graph model for capturing workload characteristics in the presence of access skew and describe an efficient algorithm for physical data layout. Our proposed algorithm considers declustering and load balancing issues which are inherent to the multidisk data layout problem. We demonstrate the validity of this approach experimentally. 相似文献

17.

A cost model for spatio-temporal queries using the TPR-tree

《Journal of Systems and Software》2004,73(1):101-112

A query optimizer requires cost models to calculate the costs of various access plans for a query. An effective method to estimate the number of disk (or page) accesses for spatio-temporal queries has not yet been proposed. The TPR-tree is an efficient index that supports spatio-temporal queries for moving objects. Existing cost models for the spatial index such as the R-tree do not accurately estimate the number of disk accesses for spatio-temporal queries using the TPR-tree, because they do not consider the future locations of moving objects, which change continuously as time passes.In this paper, we propose an efficient cost model for spatio-temporal queries to solve this problem. We present analytical formulas which accurately calculate the number of disk accesses for spatio-temporal queries. Extensive experimental results show that our proposed method accurately estimates the number of disk accesses over various queries to spatio-temporal data combining real-life spatial data and synthetic temporal data. To evaluate the effectiveness of our method, we compared our spatio-temporal cost model (STCM) with an existing spatial cost model (SCM). The application of the existing SCM has the average error ratio from 52% to 93%, whereas our STCM has the average error ratio from 11% to 32%. 相似文献

18.

基于本体的半结构化数据的柔性查询

王真星顾宁施伯乐《计算机研究与发展》2003,40(11):1571-1578

半结构化数据库没有固定的库模式，用户对其结构难以产生清晰的认识，从而无法有效地查询所需的内容．提出了一种基于本体的柔性查询，用户通过了解数据库本体语义信息而发出的查询不必遵循严格的数据库模式也能得出结果．由于在半结构化数据库上直接查找效率很低，故在其上生成描述结构模式的概念本体库．查询模块先在本体库上评估能否得出查询结果，再在数据库上执行查询．然而由于本体库可能是图的形式，其查询代价仍然很高，本质上是NP问题，进一步研究了将图转化为树的方法，并给出了相应的算法．相似文献

19.

Semantic integration in Xyleme: a uniform tree-based approach

Claude Chantal Marie-Christine Jean-Pierre Dan 《Data & Knowledge Engineering》2003,44(3):267-298

Xyleme is a huge warehouse integrating XML data of the Web. Xyleme considers a simple data model with data trees and tree types for describing the data sources, and a simple query language based on tree queries with boolean conditions. The main components of the data model are a mediated schema modeled by an abstract tree type, as a view of a set of tree types associated with actual data trees, called concrete tree types, and a mapping expressing the connection between the mediated schema and the concrete tree types. The first contribution of this paper is formal: we provide a declarative model-theoretic semantics for Xyleme tree queries, a way of checking tree query containment, and a characterization of tree queries as a composition of branch queries. The other contributions are algorithmic and handle the potentially huge size of the mapping relation which is a crucial issue for semantic integration and query evaluation in Xyleme. First, we propose a method for pre-evaluating queries at compile time by storing some specific meta-information about the mapping into map translation tables. These map translation tables summarize the set of all the branch queries that can be generated from the mediated schema and the set of all the mappings. Then, we propose different operators and strategies for relaxing queries which, having an empty map translation table, will have no answer if they are evaluated against the data. Finally, we present a method for semi-automatically generating the mapping relation. 相似文献

20.

基于翻译模型的查询会话检测方法研究

张振中孙乐韩先培《中文信息学报》2015,29(4):95-102

查询会话检测的目的是确定用户为了满足某个特定需求而连续提交的相关查询。查询会话检测对于查询日志分析以及用户行为分析来说是非常有用的。传统的查询会话检测方法大都基于查询词的比较,无法解决词语不匹配问题(vocabulary-mismatch problem)——有些主题相关的查询之间并没有相同的词语。为了解决词语不匹配问题,我们在该文提出了一种基于翻译模型的查询会话检测方法,该方法将词与词之间的关系刻画为词与词之间的翻译概率,这样即使词与词之间没有相同的词语,我们也可以捕捉到它们之间的语义关系。同时,我们也提出了两种从查询日志中估计词翻译概率的方法,第一种方法基于查询的时间间隔,第二种方法基于查询的点击URLs。实验结果证明了该方法的有效性。相似文献