期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Learning-based SPARQL query performance modeling and prediction

Wei?Emma?Zhang Email author View author&#;s OrcID profile Quan?Z.?Sheng Yongrui?Qin Kerry?Taylor Lina?Yao 《World Wide Web》2018,21(4):1015-1035

One of the challenges of managing an RDF database is predicting performance of SPARQL queries before they are executed. Performance characteristics, such as the execution time and memory usage, can help data consumers identify unexpected long-running queries before they start and estimate the system workload for query scheduling. Extensive works address such performance prediction problem in traditional SQL queries but they are not directly applicable to SPARQL queries. In this paper, we adopt machine learning techniques to predict the performance of SPARQL queries. Our work focuses on modeling features of a SPARQL query to a vector representation. Our feature modeling method does not depend on the knowledge of underlying systems and the structure of the underlying data, but only on the nature of SPARQL queries. Then we use these features to train prediction models. We propose a two-step prediction process and consider performances in both cold and warm stages. Evaluations are performed on real world SPRAQL queries, whose execution time ranges from milliseconds to hours. The results demonstrate that the proposed approach can effectively predict SPARQL query performance and outperforms state-of-the-art approaches. 相似文献

2.

Query by video clip 总被引：15，自引：0，他引：15

Anil K. Jain Aditya Vailaya Xiong Wei 《Multimedia Systems》1999,7(5):369-384

Typical digital video search is based on queries involving a single shot. We generalize this problem by allowing queries that involve a video clip (say, a 10-s video segment). We propose two schemes: (i) retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features around the key frames. For each key frame in the query, a similarity value (using color, texture, and motion) is obtained with respect to the key frames in the database video. Consecutive key frames in the database video that are highly similar to the query key frames are then used to generate the set of retrieved video clips. (ii) In retrieval using sub-sampled frames, we uniformly sub-sample the query clip as well as the database video. Retrieval is based on matching color and texture features of the sub-sampled frames. Initial experiments on two video databases (basketball video with approximately 16,000 frames and a CNN news video with approximately 20,000 frames) show promising results. Additional experiments using segments from one basketball video as query and a different basketball video as the database show the effectiveness of feature representation and matching schemes. 相似文献

3.

Decomposing a window into maximal quadtree blocks

Walid G. Aref Hanan Samet 《Acta Informatica》1993,30(5):425-439

Window operations serve as the basis of a number of queries that can be posed in a spatial database. Examples of window-based queries include the exist query (i.e., determining whether or not a spatial feature exists inside a window), the report query (i.e., report the identity of all the features that exist inside a window), and the select query (i.e., determine the locations covered by a given feature inside a window). Often spatial databases make use of a quadtree decomposition, which yields a set of maximal blocks, to enable the features to be accessed quickly without having to search the entire database. One way to perform a window query is to decompose the window into its maximal quadtree blocks. An algorithm is described for decomposing a two-dimensional window into its maximal quadtree blocks inO(nlog logT) time for a window of sizen×n in a feature space (e.g., an image) of sizeT×T (e.g., pixel elements).The support of the National Science Foundation under Grant IRI-9017393 is gratefully acknowledged. 相似文献

4.

A theory of translation from relational queries to hierarchicalqueries

Weiyi Meng Yu C. Won Kim 《Knowledge and Data Engineering, IEEE Transactions on》1995,7(2):228-245

In a heterogeneous database system, a query for one type of database system (i.e., a source query) may have to be translated to an equivalent query (or queries) for execution in a different type of database system (i.e., a target query). Usually, for a given source query, there is more than one possible target query translation. Some of them can be executed more efficiently than others by the receiving database system. Developing a translation procedure for each type of database system is time-consuming and expensive. We abstract a generic hierarchical database system (GHDBS) which has properties common to database systems whose schema contains hierarchical structures (e.g., System 2000, IMS, and some object-oriented database systems). We develop principles of query translation with GHDBS as the receiving database system. Translation into any specific system can be accomplished by a translation into the general system with refinements to reflect the characteristics of the specific system. We develop rules that guarantee correctness of the target queries, where correctness means that the target query is equivalent to the source query. We also provide rules that can guarantee a minimum number of target queries in cases when one source query needs to be translated to multiple target queries. Since the minimum number of target queries implies the minimum number of times the underlying system is invoked, efficiency is taken into consideration 相似文献

5.

Interaction-aware scheduling of report-generation workloads

Mumtaz Ahmad Ashraf Aboulnaga Shivnath Babu Kamesh Munagala 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(4):589-615

The typical workload in a database system consists of a mix of multiple queries of different types that run concurrently. Interactions among the different queries in a query mix can have a significant impact on database performance. Hence, optimizing database performance requires reasoning about query mixes rather than considering queries individually. Current database systems lack the ability to do such reasoning. We propose a new approach based on planning experiments and statistical modeling to capture the impact of query interactions. Our approach requires no prior assumptions about the internal workings of the database system or the nature and cause of query interactions, making it portable across systems. To demonstrate the potential of modeling and exploiting query interactions, we have developed a novel interaction-aware query scheduler for report-generation workloads. Our scheduler, called QShuffler, uses two query scheduling algorithms that leverage models of query interactions. The first algorithm is optimized for workloads where queries are submitted in large batches. The second algorithm targets workloads where queries arrive continuously, and scheduling decisions have to be made online. We report an experimental evaluation of QShuffler using TPC-H workloads running on IBM DB2. The evaluation shows that QShuffler, by modeling and exploiting query interactions, can consistently outperform (up to 4x) query schedulers in current database systems. 相似文献

6.

Semantics preserving SPARQL-to-SQL translation 总被引：2，自引：0，他引：2

Artem Shiyong Farshad 《Data & Knowledge Engineering》2009,68(10):973-1000

Most existing RDF stores, which serve as metadata repositories on the Semantic Web, use an RDBMS as a backend to manage RDF data. This motivates us to study the problem of translating SPARQL queries into equivalent SQL queries, which further can be optimized and evaluated by the relational query engine and their results can be returned as SPARQL query solutions. The main contributions of our research are: (i) We formalize a relational algebra based semantics of SPARQL, which bridges the gap between SPARQL and SQL query languages, and prove that our semantics is equivalent to the mapping-based semantics of SPARQL; (ii) Based on this semantics, we propose the first provably semantics preserving SPARQL-to-SQL translation for SPARQL triple patterns, basic graph patterns, optional graph patterns, alternative graph patterns, and value constraints; (iii) Our translation algorithm is generic and can be directly applied to existing RDBMS-based RDF stores; and (iv) We outline a number of simplifications for the SPARQL-to-SQL translation to generate simpler and more efficient SQL queries and extend our defined semantics and translation to support the bag semantics of a SPARQL query solution. The experimental study showed that our proposed generic translation can serve as a good alternative to existing schema dependent translations in terms of efficient query evaluation and/or ensured query result correctness. 相似文献

7.

Storing and querying XML data using denormalized relational databases

Andrey?Balmin Email author Yannis?Papakonstantinou 《The VLDB Journal The International Journal on Very Large Data Bases》2005,14(1):30-49

XML database systems emerge as a result of the acceptance of the XML data model. Recent works have followed the promising approach of building XML database management systems on underlying RDBMSs. Achieving query processing performance reduces to two questions: (i) How should the XML data be decomposed into data that are stored in the RDBMS? (ii) How should the XML query be translated into an efficient plan that sends one or more SQL queries to the underlying RDBMS and combines the data into the XML result? We provide a formal framework for XML Schema-driven decompositions, which encompasses the decompositions proposed in prior work and extends them with decompositions that employ denormalized tables and binary-coded XML fragments. We provide corresponding query processing algorithms that translate the XML query conditions into conditions on the relational tables and assemble the decomposed data into the XML query result. Our key performance focus is the response time for delivering the first results of a query. The most effective of the described decompositions have been implemented in XCacheDB, an XML DBMS built on top of a commercial RDBMS, which serves as our experimental basis. We present experiments and analysis that point to a class of decompositions, called inlined decompositions, that improve query performance for full results and first results, without significant increase in the size of the database.Received: 21 December 2001, Accepted: 1 July 2003, Published online: 23 June 2004Edited by: A. HalevyAndrey Balmin: Andrey Balmin has been supported by NSF IRI-9734548.Yannis Papakonstantinou: The authors built the XCacheDB system while on leave at Enosys Software, Inc., during 2000. 相似文献

8.

Automatic content based image retrieval using semantic analysis

Eugene Santos Jr. Qi Gu 《Journal of Intelligent Information Systems》2014,43(2):247-269

We present a new text-to-image re-ranking approach for improving the relevancy rate in searches. In particular, we focus on the fundamental semantic gap that exists between the low-level visual features of the image and high-level textual queries by dynamically maintaining a connected hierarchy in the form of a concept database. For each textual query, we take the results from popular search engines as an initial retrieval, followed by a semantic analysis to map the textual query to higher level concepts. In order to do this, we design a two-layer scoring system which can identify the relationship between the query and the concepts automatically. We then calculate the image feature vectors and compare them with the classifier for each related concept. An image is relevant only when it is related to the query both semantically and content-wise. The second feature of this work is that we loosen the requirement for query accuracy from the user, which makes it possible to perform well on users’ queries containing less relevant information. Thirdly, the concept database can be dynamically maintained to satisfy the variations in user queries, which eliminates the need for human labor in building a sophisticated initial concept database. We designed our experiment using complex queries (based on five scenarios) to demonstrate how our retrieval results are a significant improvement over those obtained from current state-of-the-art image search engines. 相似文献

9.

Keyword proximity search in XML trees 总被引：3，自引：0，他引：3

Hristidis V. Koudas N. Papakonstantinou Y. Divesh Srivastava 《Knowledge and Data Engineering, IEEE Transactions on》2006,18(4):525-539

Recent works have shown the benefits of keyword proximity search in querying XML documents in addition to text documents. For example, given query keywords over Shakespeare's plays in XML, the user might be interested in knowing how the keywords cooccur. In this paper, we focus on XML trees and define XML keyword, proximity queries to return the (possibly heterogeneous) set of minimum connecting trees (MCTs) of the matches to the individual keywords in the query. We consider efficiently executing keyword proximity queries on labeled trees (XML) in various settings: 1) when the XML database has been preprocessed and 2) when no indices are available on the XML database. We perform a detailed experimental evaluation to study the benefits of our approach and show that our algorithms considerably outperform prior algorithms and other applicable approaches. 相似文献

10.

Multi-dimensional top-k dominating queries 总被引：1，自引：0，他引：1

Man Lung Yiu Nikos Mamoulis 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(3):695-718

The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top-k and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, top-k dominating queries have not received adequate attention from the research community. This paper is an extensive study on the evaluation of top-k dominating queries. First, we propose a set of algorithms that apply on indexed multi-dimensional data. Second, we investigate query evaluation on data that are not indexed. Finally, we study a relaxed variant of the query which considers dominance in dimensional subspaces. Experiments using synthetic and real datasets demonstrate that our algorithms significantly outperform a previous skyline-based approach. We also illustrate the applicability of this multi-dimensional analysis query by studying the meaningfulness of its results on real data. 相似文献

11.

Implementation and evaluation of parallel query processing algorithms and data partitioning heuristics in object-oriented databases

Yaw-Huei Chen Stanley Y. W. Su 《Distributed and Parallel Databases》1996,4(2):107-142

Object-oriented database management systems (OODBMSs) provide rich facilities for the modeling and processing of structural as well as behavioral properties of complex application objects. However, due to their inherent generality and continuously evolving functionalities, efficient implementations are important for these OODBMSs to support the present and future applications, particularly when the databases are very large. In this paper, we present several parallel, multi-wavefront algorithms based on two processing approaches, i.e., identification and elimination approaches, to verify association patterns specified in queries. Both approaches allow more processors to operate concurrently on a query than the traditional tree-structured query processing approach, thus introducing a higher degree of parallelism in query processing. A heuristic method is presented for partitioning an object-oriented database (OODB). The main consideration for partitioning the database is load balancing. This method also tries to reduce the communication time by reducing the length of the path that wavefronts need to be propagated. Multiple wavefront algorithms based on the two approaches for tree-structured queries have been implemented on an nCUBE 2 parallel computer. The implementation of the query processor allows multiple queries to be executed simultaneously. This implementation provides an environment for evaluating the algorithms and the heuristic method for partitioning the database. The evaluation results are presented in this paper.Recommended by: Patrick Valduriez 相似文献

12.

基于翻译模型的查询会话检测方法研究

张振中孙乐韩先培《中文信息学报》2015,29(4):95-102

查询会话检测的目的是确定用户为了满足某个特定需求而连续提交的相关查询。查询会话检测对于查询日志分析以及用户行为分析来说是非常有用的。传统的查询会话检测方法大都基于查询词的比较,无法解决词语不匹配问题(vocabulary-mismatch problem)——有些主题相关的查询之间并没有相同的词语。为了解决词语不匹配问题,我们在该文提出了一种基于翻译模型的查询会话检测方法,该方法将词与词之间的关系刻画为词与词之间的翻译概率,这样即使词与词之间没有相同的词语,我们也可以捕捉到它们之间的语义关系。同时,我们也提出了两种从查询日志中估计词翻译概率的方法,第一种方法基于查询的时间间隔,第二种方法基于查询的点击URLs。实验结果证明了该方法的有效性。相似文献

13.

<Emphasis Type="BoldItalic">IRSJ</Emphasis>: incremental refining spatial joins for interactive queries in GIS

Wan D. Bae Shayma Alkobaisi Scott T. Leutenegger 《GeoInformatica》2010,14(4):507-543

An increasing number of emerging web database applications deal with large georeferenced data sets. However, exploring these large data sets through spatial queries can be very time and resource intensive. The need for interactive spatial queries has arisen in many applications such as Geographic Information Systems (GIS) for efficient decision-support. In this paper, we propose a new interactive spatial query processing technique for GIS. We present a family of the Incremental Refining Spatial Join (IRSJ) algorithms that can be used to report incrementally refined running estimates for aggregate queries while simultaneously displaying the actual query result tuples of the data sets sampled so far. Our goal is to minimize the time until an acceptably accurate estimate of the query result is available (to users) measured by a confidence interval. Our approach enables more interactive data exploration and analysis. While similar work has been done in relational databases, to the best of our knowledge, this is the first work using this approach in GIS. We investigate and evaluate different sampling methodologies through extensive experimental performance comparisons. Experiments on both real and synthetic data show an order of magnitude response time improvement relative to the final answer obtained when using a full R-tree join. We also show the impact of different index structures on the performance of our algorithms using three known sampling methods. 相似文献

14.

Approximation-Based Similarity Search for 3-D Surface Segments 总被引：1，自引：0，他引：1

Hans-Peter Kriegel Thomas Seidl 《GeoInformatica》1998,2(2):113-147

The issue of finding similar 3-D surface segments arises in many recent applications of spatial database systems, such as molecular biology, medical imaging, CAD, and geographic information systems. Surface segments being similar in shape to a given query segment are to be retrieved from the database. The two main questions are how to define shape similarity and how to efficiently execute similarity search queries. We propose a new similarity model based on shape approximation by multi-parametric surface functions that are adaptable to specific application domains. We then define shape similarity of two 3-D surface segments in terms of their mutual approximation errors. Applying the multi-step query processing paradigm, we propose algorithms to efficiently support complex similarity search queries in large spatial databases. A new query type, called the ellipsoid query, is utilized in the filter step. Ellipsoid queries, being specified by quadratic forms, represent a general concept for similarity search. Our major contribution is the introduction of efficient algorithms to perform ellipsoid queries on multidimensional index structures. Experimental results on a large 3-D protein database containing 94,000 surface segments demonstrate the successful application and the high performance of our method. 相似文献

15.

A knowledge-based approach for retrieving images by content 总被引：10，自引：0，他引：10

Chih-Cheng Hsu Chu W.W. Taira R.K. 《Knowledge and Data Engineering, IEEE Transactions on》1996,8(4):522-532

A knowledge based approach is introduced for retrieving images by content. It supports the answering of conceptual image queries involving similar-to predicates, spatial semantic operators, and references to conceptual terms. Interested objects in the images are represented by contours segmented from images. Image content such as shapes and spatial relationships are derived from object contours according to domain specific image knowledge. A three layered model is proposed for integrating image representations, extracted image features, and image semantics. With such a model, images can be retrieved based on the features and content specified in the queries. The knowledge based query processing is based on a query relaxation technique. The image features are classified by an automatic clustering algorithm and represented by Type Abstraction Hierarchies (TAHs) for knowledge based query processing. Since the features selected for TAH generation are based on context and user profile, and the TAHs can be generated automatically by a clustering algorithm from the feature database, our proposed image retrieval approach is scalable and context sensitive. The performance of the proposed knowledge based query processing is also discussed 相似文献

16.

Ultrawrap: SPARQL execution on relational data

《Journal of Web Semantics》2013

The Semantic Web’s promise of web-wide data integration requires the inclusion of legacy relational databases,¹ i.e. the execution of SPARQL queries on RDF representation of the legacy relational data. We explore a hypothesis: existing commercial relational databases already subsume the algorithms and optimizations needed to support effective SPARQL execution on existing relationally stored data. The experiment is embodied in a system, Ultrawrap, that encodes a logical representation of the database as an RDF graph using SQL views and a simple syntactic translation of SPARQL queries to SQL queries on those views. Thus, in the course of executing a SPARQL query, the SQL optimizer uses the SQL views that represent a mapping of relational data to RDF, and optimizes its execution. In contrast, related research is predicated on incorporating optimizing transforms as part of the SPARQL to SQL translation, and/or executing some of the queries outside the underlying SQL environment.Ultrawrap is evaluated using two existing benchmark suites that derive their RDF data from relational data through a Relational Database to RDF (RDB2RDF) Direct Mapping and repeated for each of the three major relational database management systems. Empirical analysis reveals two existing relational query optimizations that, if applied to the SQL produced from a simple syntactic translations of SPARQL queries (with bound predicate arguments) to SQL, consistently yield query execution time that is comparable to that of SQL queries written directly for the relational representation of the data. The analysis further reveals the two optimizations are not uniquely required to achieve a successful wrapper system. The evidence suggests effective wrappers will be those that are designed to complement the optimizer of the target database. 相似文献

17.

Optimizing complex queries based on similarities of subqueries

Qiang Zhu Yingying Tao Calisto Zuzarte 《Knowledge and Information Systems》2005,8(3):350-373

As database technology is applied to more and more application domains, user queries are becoming increasingly complex (e.g. involving a large number of joins and a complex query structure). Query optimizers in existing database management systems (DBMS) were not developed for efficiently processing such queries and often suffer from problems such as intolerably long optimization time and poor optimization results. To tackle this challenge, we present a new similarity-based approach to optimizing complex queries in this paper. The key idea is to identify similar subqueries that often appear in a complex query and share the optimization result among similar subqueries in the query. Different levels of similarity for subqueries are introduced. Efficient algorithms to identify similar queries in a given query and optimize the query based on similarity are presented. Related issues, such as choosing good starting nodes in a query graph, evaluating identified similar subqueries and analyzing algorithm complexities, are discussed. Our experimental results demonstrate that the proposed similarity-based approach is quite promising in optimizing complex queries with similar subqueries in a DBMS. 相似文献

18.

ArchIS: an XML-based approach to transaction-time temporal database systems

Fusheng Wang Carlo Zaniolo Xin Zhou 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(6):1445-1463

Effective support for temporal applications by database systems represents an important technical objective that is difficult to achieve since it requires an integrated solution for several problems, including (i) expressive temporal representations and data models, (ii) powerful languages for temporal queries and snapshot queries, (iii) indexing, clustering and query optimization techniques for managing temporal information efficiently, and (iv) architectures that bring together the different pieces of enabling technology into a robust system. In this paper, we present the ArchIS system that achieves these objectives by supporting a temporally grouped data model on top of RDBMS. ArchIS’ architecture uses (a) XML to support temporally grouped (virtual) representations of the database history, (b) XQuery to express powerful temporal queries on such views, (c) temporal clustering and indexing techniques for managing the actual historical data in a relational database, and (d) SQL/XML for executing the queries on the XML views as equivalent queries on the relational database. The performance studies presented in the paper show that ArchIS is quite effective at storing and retrieving under complex query conditions the transaction-time history of relational databases, and can also assure excellent storage efficiency by providing compression as an option. This approach achieves full-functionality transaction-time databases without requiring temporal extensions in XML or database standards, and provides critical support to emerging application areas such as RFID. 相似文献

19.

语义缓存的聚集查询匹配研究

蔡建宇吴泉源贾焰邹鹏《计算机研究与发展》2006,43(12):2124-2130

为提高海量数据库系统的查询效率，围绕海量数据库系统中的聚集查询技术，把通常应用于小型数据库查询的语义缓存技术拓展到海量数据库的聚集查询中．首先研究了面向聚集查询的语义缓存形式化描述，在此基础上讨论了利用缓存处理查询的条件并对查询匹配进行了分类，提出并实现了包含匹配判定算法和相交匹配判定算法，最后给出了相应的实验结果．在某大型实际工程中的应用表明上述判定算法是有效的．相似文献

20.

Evaluating refined queries in top-k retrieval systems 总被引：2，自引：0，他引：2

Kaushik Chakrabarti Ortega-Binderberger M. Mehrotra S. Porkaew K. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(2):256-270

In many applications, users specify target values for certain attributes/features without requiring exact matches to these values in return. Instead, the result is typically a ranked list of "top k" objects that best match the specified feature values. User subjectivity is an important aspect of such queries, i.e., which objects are relevant to the user and which are not depends on the perception of the user. Due to the subjective nature of top-k queries, the answers returned by the system to an user query often do not satisfy the users need right away, either because the weights and the distance functions associated with the features do not accurately capture the users perception or because the specified target values do not fully capture her information need or both. In such cases, the user would like to refine the query and resubmit it in order to get back a better set of answers. While there has been a lot of research on query refinement models, there is no work that we are aware of on supporting refinement of top-k queries efficiently in a database system. Done naively, each "refined" query can be treated as a "starting" query and evaluated from scratch. We explore alternative approaches that significantly improve the cost of evaluating refined queries by exploiting the observation that the refined queries are not modified drastically from one iteration to another. Our experiments over a real-life multimedia data set show that the proposed techniques save more than 80 percent of the execution cost of refined queries over the naive approach and is more than an order of magnitude faster than a simple sequential scan. 相似文献