首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
One of the key challenges in a peer-to-peer (P2P) network is to efficiently locate relevant data sources across a large number of participating peers. With the increasing popularity of the extensible markup language (XML) as a standard for information interchange on the Internet, XML is commonly used as an underlying data model for P2P applications to deal with the heterogeneity of data and enhance the expressiveness of queries. In this paper, we address the problem of efficiently locating relevant XML documents in a P2P network, where a user poses queries in a language such as XPath. We have developed a new system called psiX that runs on top of an existing distributed hashing framework. Under the psiX system, each XML document is mapped into an algebraic signature that captures the structural summary of the document. An XML query pattern is also mapped into a signature. The query's signature is used to locate relevant document signatures. Our signature scheme supports holistic processing of query patterns without breaking them into multiple path queries and processing them individually. The participating peers in the network collectively maintain a collection of distributed hierarchical indexes for the document signatures. Value indexes are built to handle numeric and textual values in XML documents. These indexes are used to process queries with value predicates. Our experimental study on PlanetLab demonstrates that psiX provides an efficient location service in a P2P network for a wide variety of XML documents.  相似文献   

2.
Graphs are widely used for modeling complicated data such as social networks, bibliographical networks and knowledge bases. The growing sizes of graph databases motivate the crucial need for developing powerful and scalable graph-based query engines. We propose a SPARQL-like language, G-SPARQL, for querying attributed graphs. The language enables the expression of different types of graph queries that are of large interest in the databases that are modeled as large graph such as pattern matching, reachability and shortest path queries. Each query can combine both structural predicates and value-based predicates (on the attributes of the graph nodes/edges). We describe an algebraic compilation mechanism for our proposed query language which is extended from the relational algebra and based on the basic construct of building SPARQL queries, the Triple Pattern. We describe an efficient hybrid Memory/Disk representation of large attributed graphs where only the topology of the graph is maintained in memory while the data of the graph are stored in a relational database. The execution engine of our proposed query language splits parts of the query plan to be pushed inside the relational database (using SQL) while the execution of other parts of the query plan is processed using memory-based algorithms, as necessary. Experimental results on real and synthetic datasets demonstrate the efficiency and the scalability of our approach and show that our approach outperforms native graph databases by several factors.  相似文献   

3.
知识图谱数据管理研究综述   总被引:2,自引:0,他引:2  
王鑫  邹磊  王朝坤  彭鹏  冯志勇 《软件学报》2019,30(7):2139-2174
知识图谱是人工智能的重要基石.各领域大规模知识图谱的构建和发布对知识图谱数据管理提出了新的挑战.以数据模型的结构和操作要素为主线,对目前的知识图谱数据管理理论、方法、技术与系统进行研究综述.首先,介绍知识图谱数据模型,包括RDF图模型和属性图模型,介绍5种知识图谱查询语言,包括SPARQL、Cypher、Gremlin、PGQL和G-CORE;然后,介绍知识图谱存储管理方案,包括基于关系的知识图谱存储管理和原生知识图谱存储管理;其次,探讨知识图谱上的图模式匹配、导航式和分析型3种查询操作.同时,介绍主流的知识图谱数据库管理系统,包括RDF三元组库和原生图数据库,描述目前面向知识图谱的分布式系统与框架,给出知识图谱评测基准.最后,展望知识图谱数据管理的未来研究方向.  相似文献   

4.
Recently research has deeply investigated the problem of querying semi-structured data and data which can be represented by means of graphs (e.g. object-oriented data, XML data, etc.). Typically queries on graph-like data, called path queries, are expressed by means of regular expressions denoting paths in the graph. The result of a path query is the set of nodes reachable by means of a path expressed by a specified regular expression. In this paper we investigate the problem of extracting a subgraph satisfying a given property from a given graph representing some information. We propose a new form of queries, called graph queries, whose answers are (marked) graphs having a particular structure, extracted from the source graph. We show that a simple form of graph grammars can be profitably used to define graph queries. The result of a graph query, using a grammar G over a database D, is a marked subgraph of D ‘matching’ a graph derived from G. We consider different types of graph grammars which can be used to query graph-like data and consider their expressiveness and complexity.  相似文献   

5.
Nowadays, huge volumes of data are organized or exported in tree-structured form. Querying capabilities are provided through tree-pattern queries. The need for querying tree-structured data sources when their structure is not fully known, and the need to integrate multiple data sources with different tree structures have driven, recently, the suggestion of query languages that relax the complete specification of a tree pattern. In this paper, we consider a query language that allows the partial specification of a tree pattern. Queries in this language range from structureless keyword-based queries to completely specified tree patterns. To support the evaluation of partially specified queries, we use semantically rich constructs, called dimension graphs, which abstract structural information of the tree-structured data. We address the problem of query containment in the presence of dimension graphs and we provide necessary and sufficient conditions for query containment. As checking query containment can be expensive, we suggest two heuristic approaches for query containment in the presence of dimension graphs. Our approaches are based on extracting structural information from the dimension graph that can be added to the queries while preserving equivalence with respect to the dimension graph. We considered both cases: extracting and storing different types of structural information in advance, and extracting information on-the-fly (at query time). Both approaches are implemented, validated, and compared through experimental evaluation.  相似文献   

6.
The blooming of different cloud data management infrastructures, specialized for different kinds of data and tasks, has led to a wide diversification of DBMS interfaces and the loss of a common programming paradigm. In this paper, we present the design of a cloud multidatastore query language (CloudMdsQL), and its query engine. CloudMdsQL is a functional SQL-like language, capable of querying multiple heterogeneous data stores (relational and NoSQL) within a single query that may contain embedded invocations to each data store’s native query interface. The query engine has a fully distributed architecture, which provides important opportunities for optimization. The major innovation is that a CloudMdsQL query can exploit the full power of local data stores, by simply allowing some local data store native queries (e.g. a breadth-first search query against a graph database) to be called as functions, and at the same time be optimized, e.g. by pushing down select predicates, using bind join, performing join ordering, or planning intermediate data shipping. Our experimental validation, with three data stores (graph, document and relational) and representative queries, shows that CloudMdsQL satisfies the five important requirements for a cloud multidatastore query language.  相似文献   

7.
8.
Currently relational databases are widely used, while object-oriented databases are emerging as a new generation of database technology. This paper presents a methodology to provide effective sharing of information in object-oriented databases and relational databases. The object-oriented data model is selected as a common data model to build an integrated view of the diverse databases. An object-oriented query language is used as a standard query language. A method is developed to transform a relational data definition to an equivalent object-oriented data definition and to integrate local data definitions. Two distributed query processing methods are derived. One is for general queries and the other for a special class of restricted queries. Using the methods developed, it is possible to access distributed object-oriented databases and relational databases such that the locations and the structural differences of the databases are transparent to users.  相似文献   

9.
Efficient Distributed Skyline Queries for Mobile Applications   总被引:3,自引:0,他引:3       下载免费PDF全文
In this paper, we consider skyline queries in a mobile and distributed environment, where data objects are distributed in some sites (database servers) which are interconnected through a high-speed wired network, and queries are issued by mobile units (laptop, cell phone, etc.) which access the data objects of database servers by wireless channels. The inherent properties of mobile computing environment such as mobility, limited wireless bandwidth, frequent disconnection, make skyline queries more complicated. We show how to efficiently perform distributed skyline queries in a mobile environment and propose a skyline query processing approach, called efficient distributed skyline based on mobile computing (EDS-MC). In EDS-MC, a distributed skyline query is decomposed into five processing phases and each phase is elaborately designed in order to reduce the network communication, network delay and query response time. We conduct extensive experiments in a simulated mobile database system, and the experimental results demonstrate the superiority of EDS-MC over other skyline query processing techniques on mobile computing.  相似文献   

10.
Provenance has become increasingly important in scientific workflows to understand, verify, and reproduce the result of scientific data analysis. Most existing systems store provenance data in provenance stores with proprietary provenance data models and conduct query processing over the physical provenance storages using query languages, such as SQL, SPARQL, and XQuery, which are closely coupled to the underlying storage strategies. Querying provenance at such low level leads to poor usability of the system: a user needs to know the underlying schema to formulate queries; if the schema changes, queries need to be reformulated; and queries formulated for one system will not run in another system. In this paper, we present OPQL, a provenance query language that enables the querying of provenance directly at the graph level. An OPQL query takes a provenance graph as input and produces another provenance graph as output. Therefore, OPQL queries are not tightly coupled to the underlying provenance storage strategies. Our main contributions are: (i) we design OPQL, including six types of graph patterns, a provenance graph algebra, and OPQL syntax and semantics, that supports querying provenance at the graph level; (ii) we implement OPQL using a Web service via our OPMProv system; therefore, users can invoke the Web service to execute OPQL queries in a provenance browser, called OPMProVis. The result of OPQL queries is displayed as a provenance graph in OPMProVis. An experimental study is conducted to evaluate the feasibility and performance of OPMProv on OPQL provenance querying.  相似文献   

11.
基于标签约束的可达性查询s→\\-Lt用于回答给定图中顶点s到顶点t是否存在路径标签属于L的有向路径.针对现有方法索引构建时间长、索引规模大、查询效率低的问题,首先基于k个点构建双向路径标签索引,并提出相应的优化措施减小索引规模,以此来加速可达查询的处理速度.由于其索引没有完全覆盖可达查询,虽然索引规模小,但仍然无法避免查询过程中的图遍历操作.为此,进一步提出覆盖所有可达信息的双向路径标签索引,基于该索引,查询处理时可以完全避免图上的遍历操作.最后,基于多个真实数据集进行测试,实验结果从索引大小、索引构建时间和查询响应时间方面验证了所提方法相对现有方法具有索引规模小、索引时间短且查询响应快的优势.  相似文献   

12.
The problem of optimal query processing in distributed database systems was shown to be NP-hard. However, for a special type of queries called star queries, we have developed a polynomial optimal algorithm. Semijoin tactics are applied for query processing. An execution graph is introduced to represent the semijoin programs associated with the distributed processing of the queries. We then identify optimality properties of semijoin programs for star queries, and use these properties to derive the optimal semijoin program. We have shown that the optimal semijoin program can be found from serial semijoin strategies, defined as serial semijoin programs which include each semijoin associated with the query exactly once. By making certain assumptions on the file sizes and the semijoin selectivities, we can obtain the optimal semijoin program from these strategies in polynomial time. Our assumption on selectivites is consistent in the sense that we consider the selectivity of a semijoin based on the current database state, i.e., we take into consideration the reduction effects of all prior semijoins.  相似文献   

13.
14.
Modern applications requiring spatial network processing pose several interesting query optimization challenges. Spatial networks are usually represented as graphs, and therefore, queries involving a spatial network can be executed by using the corresponding graph representation. This means that the cost for executing a query is determined by graph properties such as the graph order and size (i.e., number of nodes and edges) and other graph parameters. In this paper, we present novel methods to estimate the number of nodes and edges in regions of interest in spatial networks, towards predicting the space and time requirements for range queries. The methods are evaluated by using real-life and synthetic data sets. Experimental results show that the number of nodes and edges can be estimated efficiently and accurately, with relatively small space requirements, thus providing useful information to the query optimizer.  相似文献   

15.
杨程  陆佳民  冯钧 《计算机应用》2020,40(11):3184-3191
随着知识图谱的日益发展和在各个垂直领域的广泛应用,对于资源描述框架(RDF)数据的高效处理需求日益成为现代大数据管理领域中的新课题。RDF是W3C提出的用于描述知识图谱实体以及实体间关系的数据模型。为了有效地应对大规模RDF数据的存储和查询,很多学者考虑在分布式环境中管理RDF数据。RDF数据的分布式存储所面临的关键问题是数据的划分,而划分的结果很大程度上决定了SPARQL的查询性能。从数据划分的角度,主要围绕两类:基于图结构的RDF数据划分方法和基于语义的RDF数据划分方法展开深入阐述。前者包括多粒度层次划分、模板划分和聚类划分,适用于通用领域查询的语义范畴较为宽泛的场景;后者包括哈希划分、垂直划分和模式划分,更加适用于垂直领域查询的语义范畴相对固定的环境。此外,针对几种典型的划分方法进行对比与分析,为未来RDF数据划分方法的研究提供参考。最后,对未来RDF数据划分方法的发展方向进行了归纳总结。  相似文献   

16.
Finding the occurrences of structural patterns in XML data is a key operation in XML query processing. Existing algorithms for this operation focus almost exclusively on path patterns or tree patterns. Current applications of XML require querying of data whose structure is complex or is not fully known to the user, or integrating XML data sources with different structures. These applications have motivated recently the introduction of query languages that allow a partial specification of path patterns in a query. In this paper, we consider partial path queries, a generalization of path pattern queries, and we focus on their efficient evaluation under the indexed streaming evaluation model. Our approach explicitly deals with repeated labels (that is, multiple occurrences of the same label in a query). We show that partial path queries can be represented as rooted dags for which a topological ordering of the nodes exists. We present three algorithms for the efficient evaluation of these queries. The first one exploits a structural summary of data to generate a set of path patterns that together are equivalent to a partial path query. To evaluate these path patterns, we extend a previous algorithm for path-pattern queries so that it can work on path patterns with repeated labels. The second one extracts a spanning tree from the query dag, uses a stack-based algorithm to find the matches of the root-to-leaf paths in the tree, and merge-joins the matches to compute the answer. Finally, the third one exploits multiple pointers of stack entries and a topological ordering of the query dag to apply a stack-based holistic technique. We analyze our algorithms and perform extensive experimental evaluations. Our experimental results show that the holistic algorithm outperforms the other ones. Our approaches are the first ones to efficiently evaluate this class of queries in the indexed streaming model.  相似文献   

17.
XML data broadcast is an efficient way to disseminate XML data to a large number of mobile clients in mobile wireless networks. Recently, several indexing methods have been proposed to improve the performance of XML query processing in terms of access time and tuning time over XML streams. However, existing indexing methods cannot process twig pattern XML queries. In this paper, we propose a novel structure for streaming XML data called PS+Pre/Post by integrating the path summary technique and the pre/post labeling scheme. Our proposed XML stream structure exploits the benefits of the path summary technique and the pre/post labeling scheme to efficiently process different types of XML queries over the broadcast stream. Experimental results show that our proposed XML stream structure improves the performance of access time and tuning time in processing different types of XML queries.  相似文献   

18.
Querying multimedia presentations based on content   总被引:2,自引:0,他引:2  
Considers the problem of querying multimedia presentations based on content information. Multimedia presentations are modeled as presentation graphs, which are directed acyclic graphs that visually specify the presentations. We present a graph data model for the specification of multimedia presentations and discuss query languages as effective tools to query and manipulate multimedia presentation graphs with respect to content information. To query the information flow throughout a multimedia presentation, as well as in each individual multimedia stream, we use revised versions of temporal operators Next, Connected and Until, together with path formulas. These constructs allow us to specify and query paths along a presentation graph. We present an icon-based graphical query language, GVISUAL, that provides iconic representations for these constructs and a user-friendly graphical interface for query specification. We also present an OQL-like language, GOQL (Graph OQL), with similar constructs, that allows textual and more traditional specifications of graph queries. Finally, we introduce GCalculus (Graph Calculus), a calculus-based language that establishes the formal grounds for the use of temporal operators in path formulas and for querying presentation graphs with respect to content information. We also discuss GCalculus/S (GCalculus with Sets) which avoids highly complex query expressions by eliminating the universal path quantifier, the negation operator and the universal quantifier. GCalculus/S represents the formal basis for GVISUAL, i.e. GVISUAL uses the constructs of GCalculus/S directly  相似文献   

19.
Ramanath  Maya  Haritsa  Jayant R. 《World Wide Web》2000,3(2):111-124
Current proposals for web querying systems have assumed a centralized processing architecture wherein data is shipped from the remote sites to the user's site. We present here the design and implementation of DIASPORA, a highly distributed query processing system for the web. It is based on the premise that several web applications are more naturally processed in a distributed manner, opening up possibilities of significant reductions in network traffic and user response times. DIASPORA is built over an expressive graph-based data model that utilizes simple heuristics and lends itself to automatic generation. The model captures both the content of web documents and the hyperlink structural framework of a web site. Distributed queries on the model are expressed through a declarative language that permits users to explicitly specify navigation. DIASPORA implements a query-shipping model wherein queries are autonomously forwarded from one web-site to another, without requiring much coordination from the query originating site. Its design addresses a variety of interesting issues that arise in the distributed web context including determining query completion, handling query rewriting, supporting query termination and preventing multiple computations of a query at a site due to the same query arriving through different paths in the hyperlink framework. The DIASPORA system is currently operational and is undergoing testing on our campus network. In this paper we describe the design of the system and report initial performance results that indicate significant performance improvements over comparable centralized approaches. This revised version was published online in August 2006 with corrections to the Cover Date.  相似文献   

20.
With the development of World Wide Web (www), storage and utilization of web data has become a big challenge for data management research community. Web data are essentially heterogeneous data, and may change schema frequently, traditional relational data model is inappropriate for web data management. A new data model, called Wide Table (or WT for simplicity), was introduced for this task. There are several characteristics of the WT model. First, WT is usually highly sparsely populated so that most data can be fit into a line or record. Second, queries are composed on only a small subset of the attributes. Thus, existing query processing and optimization techniques for relational database with normalized tables will not work efficiently anymore. Furthermore, WT is usually of extremely large volume. It is thought that only large-scale distributed storage can accommodate themassive data set. In this paper, requirements and challenges to web data management are discussed. Existing techniques for WT, including logical presentation, physical storage, and query processing, are introduced and analyzed in detail.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号