首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Linked Open Data initiatives have encouraged the publication of large RDF datasets into the Linking Open Data (LOD) cloud, including DBpedia, YAGO, and Geo-Names. Despite the size of LOD datasets and the development of (semi-)automatic methods to create and link LOD data, these datasets may be still incomplete, negatively affecting thus accuracy of Linked Data processing techniques. We acquire query answer completeness by capturing knowledge collected from the crowd, and propose a novel hybrid query processing engine that brings together machine and human computation to execute SPARQL queries. Our system, HARE, implements these hybrid query processing techniques. HARE encompasses several features: (1) a completeness model for RDF that exploits the characteristics of RDF in order to estimate the completeness of an RDF dataset; (2) a crowd knowledge base that captures crowd answers about missing values in the RDF dataset; (3) a query engine that combines on-the-fly crowd knowledge and estimates provided by the RDF completeness model, to decide upon the sub-queries of a SPARQL query that should be executed against the dataset or via crowd computing to enhance query answer completeness; and (4) a microtask manager that exploits the semantics encoded in the dataset RDF properties, to crowdsource SPARQL sub-queries as microtasks and update the crowd knowledge base with the results from the crowd. Effectiveness and efficiency of HARE are empirically studied on a collection of 50 SPARQL queries against the DBpedia dataset. Experimental results clearly show that our solution accurately enhances answer completeness.  相似文献   

We address efficient processing of SPARQL queries over RDF datasets. The proposed techniques, incorporated into the gStore system, handle, in a uniform and scalable manner, SPARQL queries with wildcards and aggregate operators over dynamic RDF datasets. Our approach is graph based. We store RDF data as a large graph and also represent a SPARQL query as a query graph. Thus, the query answering problem is converted into a subgraph matching problem. To achieve efficient and scalable query processing, we develop an index, together with effective pruning rules and efficient search algorithms. We propose techniques that use this infrastructure to answer aggregation queries. We also propose an effective maintenance algorithm to handle online updates over RDF repositories. Extensive experiments confirm the efficiency and effectiveness of our solutions.  相似文献   

Revyu is a live, publicly accessible reviewing and rating Web site, designed to be usable by humans whilst transparently generating machine-readable RDF metadata for the Semantic Web, based on user input. The site uses Semantic Web specifications such as RDF and SPARQL, and the latest Linked Data best practices to create a major node in a potentially Web-wide ecosystem of reviews and related data. Throughout the implementation of Revyu design decisions have been made that aim to minimize the burden on users, by maximizing the reuse of external data sources, and allowing less structured human input (in the form of Web 2.0-style tagging) from which stronger semantics can later be derived. Links to external sources such as DBpedia are exploited to create human-oriented mashups at the HTML level, whilst links are also made in RDF to ensure Revyu plays a first class role in the blossoming Web of Data. In this paper we document design decisions made during the implementation of Revyu, discuss the techniques used for linking Revyu data with external sources, and outline how data from the site is being used to infer the trustworthiness of reviewers as sources of information and recommendations.  相似文献   

In the era of Big Data, users prefer to get knowledge rather than pages from Web. Linked Data, a rather new form of knowledge representation and publishing described by RDF, can provide a more precise and comprehensible semantic structure to satisfy the aforementioned requirement. Besides, as the standard query language for RDF data, SPARQL has become the foundation protocol of Linked Data querying. The core idea of RDF Schema (RDFS) is to extend upon RDF vocabulary and allow attachment of semantics to user defined classes and properties. However, RDFS cannot fully utilize the potential of RDF since it cannot express the implicit semantics between linked entities in Linked Data sources. To fill this gap, in this paper, we design a new semantic annotating and reasoning approach that can extend more implicit semantics from different properties. We firstly establish a well‐defined semantically enhanced annotation strategy for Linked Data sources. In particular, we present some new semantic properties for predicates in RDF triples and design a Semantic Matrix for Predicates (SMP). We then propose a novel general Semantically Extended Scheme for Linked Data Sources (SESLDS) to realize the semantic extension over the target Linked Data source through semantically enhanced reasoning. Lastly, based on the experimental analyses, we verify that our proposal has advantages over the initial Linked Data source and can return more valid results.  相似文献   

Semantics preserving SPARQL-to-SQL translation   总被引:2,自引:0,他引:2  
Most existing RDF stores, which serve as metadata repositories on the Semantic Web, use an RDBMS as a backend to manage RDF data. This motivates us to study the problem of translating SPARQL queries into equivalent SQL queries, which further can be optimized and evaluated by the relational query engine and their results can be returned as SPARQL query solutions. The main contributions of our research are: (i) We formalize a relational algebra based semantics of SPARQL, which bridges the gap between SPARQL and SQL query languages, and prove that our semantics is equivalent to the mapping-based semantics of SPARQL; (ii) Based on this semantics, we propose the first provably semantics preserving SPARQL-to-SQL translation for SPARQL triple patterns, basic graph patterns, optional graph patterns, alternative graph patterns, and value constraints; (iii) Our translation algorithm is generic and can be directly applied to existing RDBMS-based RDF stores; and (iv) We outline a number of simplifications for the SPARQL-to-SQL translation to generate simpler and more efficient SQL queries and extend our defined semantics and translation to support the bag semantics of a SPARQL query solution. The experimental study showed that our proposed generic translation can serve as a good alternative to existing schema dependent translations in terms of efficient query evaluation and/or ensured query result correctness.  相似文献   

The rapid growth of the Linked Open Data cloud, as well as the increasing ability to lift relational enterprise datasets to a semantic, ontology-based level means that vast amounts of information are now available in a representation that closely matches the conceptualizations of the potential users of this information. This makes it interesting to create ontology based, user-oriented tools for searching and exploring this data. Although initial efforts were intended for tech users with knowledge of SPARQL/RDF, there are ongoing proposals designed for lay users. One of the most promising approaches is to use visual query interfaces, but more user studies are needed to assess their effectiveness. In this paper, we compare the effect on usability of two important paradigms for ontology-based query interfaces: form-based and graph-based interfaces. In order to reduce the number of variables affecting the comparison, we performed a user study with two state-of-the-art query tools developed by ourselves, sharing a large part of the code base: the graph-based tool OptiqueVQS*, and the form-based tool PepeSearch. We evaluated these tools in a formal comparison study with 15 participants searching a Linked Open Data version of the Norwegian Company Registry. Participants had to respond to 6 non-trivial search tasks using alternately OptiqueVQS* and PepeSearch. Even without previous training, retrieval performance and user confidence were very high, thus suggesting that both interface designs are effective for searching RDF datasets. Expert searchers had a clear preference for the graph-based interface, and mainstream searchers obtained better performance and confidence with the form-based interface. While a number of participants spontaneously praised the capability of the graph interface for composing complex queries, our results evidence that graph interfaces are difficult to grasp. In contrast, form interfaces are more learnable and relieve problems with disorientation for mainstream users. We have also observed positive results introducing faceted search and dynamic term suggestion in semantic search interfaces.  相似文献   

The Semantic Web’s promise of web-wide data integration requires the inclusion of legacy relational databases,1 i.e. the execution of SPARQL queries on RDF representation of the legacy relational data. We explore a hypothesis: existing commercial relational databases already subsume the algorithms and optimizations needed to support effective SPARQL execution on existing relationally stored data. The experiment is embodied in a system, Ultrawrap, that encodes a logical representation of the database as an RDF graph using SQL views and a simple syntactic translation of SPARQL queries to SQL queries on those views. Thus, in the course of executing a SPARQL query, the SQL optimizer uses the SQL views that represent a mapping of relational data to RDF, and optimizes its execution. In contrast, related research is predicated on incorporating optimizing transforms as part of the SPARQL to SQL translation, and/or executing some of the queries outside the underlying SQL environment.Ultrawrap is evaluated using two existing benchmark suites that derive their RDF data from relational data through a Relational Database to RDF (RDB2RDF) Direct Mapping and repeated for each of the three major relational database management systems. Empirical analysis reveals two existing relational query optimizations that, if applied to the SQL produced from a simple syntactic translations of SPARQL queries (with bound predicate arguments) to SQL, consistently yield query execution time that is comparable to that of SQL queries written directly for the relational representation of the data. The analysis further reveals the two optimizations are not uniquely required to achieve a successful wrapper system. The evidence suggests effective wrappers will be those that are designed to complement the optimizer of the target database.  相似文献   

RDF is the data interchange layer for the Semantic Web. In order to manage the increasing amount of RDF data, an RDF repository should provide not only the necessary scalability and efficiency, but also sufficient inference capabilities. Though existing RDF repositories have made progress towards these goals, there is still ample space for improving the overall performance. In this paper, we propose a native RDF repository, System Π, to pursue a better tradeoff among system scalability, query efficiency, and inference capabilities. System Π takes a hypergraph representation for RDF as the data model for its persistent storage, which effectively avoids the costs of data model transformation when accessing RDF data. Based on this native storage scheme, a set of efficient semantic query processing techniques are designed. First, several indices are built to accelerate RDF data access including a value index, a labeling scheme for transitive closure computation, and three triple indices. Second, we propose a hybrid inference strategy under the pD * semantics to support inference for OWL-Lite with a relatively low computational complexity. Finally, we extend the SPARQL algebra to explicitly express inference semantics in logical query plan by defining some new algebra operators. In addition, MD5 hash value of URI and schema level cache are introduced as practical implementation techniques. The results of performance evaluation on the LUBM benchmark and a real data set show that System Π has a better combined metric value than other comparable systems.  相似文献   

随着语义网的快速发展,为了实现科学数据的共享,越来越多的科学数据被加工发布为关联数据,进而应用于关联查询和关联发现。针对大规模关联数据的管理,本文通过构建 RDF 数据库集群来存储海量数据,设计了基于 SPARQL 端点的联合查询系统来解决用户跨机器透明查询的问题,分析了存储策略和联合查询系统的查询处理相关技术。实际运行表明,本平台易于集成使用,可以实现大规模 RDF 数据的可扩展性存储和有效查询。  相似文献   

One of the challenges of managing an RDF database is predicting performance of SPARQL queries before they are executed. Performance characteristics, such as the execution time and memory usage, can help data consumers identify unexpected long-running queries before they start and estimate the system workload for query scheduling. Extensive works address such performance prediction problem in traditional SQL queries but they are not directly applicable to SPARQL queries. In this paper, we adopt machine learning techniques to predict the performance of SPARQL queries. Our work focuses on modeling features of a SPARQL query to a vector representation. Our feature modeling method does not depend on the knowledge of underlying systems and the structure of the underlying data, but only on the nature of SPARQL queries. Then we use these features to train prediction models. We propose a two-step prediction process and consider performances in both cold and warm stages. Evaluations are performed on real world SPRAQL queries, whose execution time ranges from milliseconds to hours. The results demonstrate that the proposed approach can effectively predict SPARQL query performance and outperforms state-of-the-art approaches.  相似文献   

The Semantic Web is based on accessing and reusing RDF data from many different-sources, which one may assign different levels of authority and credibility. Existing Semantic Web query languages, like SPARQL, have targeted the retrieval, combination and re-use of facts, but have so far ignored all aspects of meta knowledge, such as origins, authorship, recency or certainty of data.In this paper, we present an original, generic, formalized and implemented approach for managing many dimensions of meta knowledge, like source, authorship, certainty and others. The approach re-uses existing RDF modeling possibilities in order to represent meta knowledge. Then, it extends SPARQL query processing in such a way that given a SPARQL query for data, one may request meta knowledge without modifying the query proper. Thus, our approach achieves highly flexible and automatically coordinated querying for data and meta knowledge, while completely separating the two areas of concern.  相似文献   

Knowledge extraction from Chinese wiki encyclopedias   总被引:1,自引:0,他引:1  

Natural Language Interfaces (NLIs) are a viable, human-readable alternative to complex, formal query languages like SPARQL, which are typically used for accessing semantically structured data (e.g. RDF and OWL repositories). However, in order to cope with natural language ambiguities, NLIs typically support a more restricted language. A major challenge when designing such restricted languages is habitability–how easily, naturally and effectively users can use the language to express themselves within the constraints imposed by the system. In this paper, we investigate two methods for improving the habitability of a Natural Language Interface: feedback and clarification dialogues. We model feedback by showing the user how the system interprets the query, thus suggesting repair through query reformulation. Next, we investigate how clarification dialogues can be used to control the query interpretations generated by the system. To reduce the cognitive overhead, clarification dialogues are coupled with a learning mechanism. Both methods are shown to have a positive effect on the overall performance and habitability.  相似文献   

RDF is a knowledge representation language dedicated to the annotation of resources within the framework of the semantic web. Among the query languages for RDF, SPARQL allows querying RDF through graph patterns, i.e., RDF graphs involving variables. Other languages, inspired by the work in databases, use regular expressions for searching paths in RDF graphs. Each approach can express queries that are out of reach of the other one. Hence, we aim at combining these two approaches. For that purpose, we define a language, called PRDF (for “Path RDF”) which extends RDF such that the arcs of a graph can be labeled by regular expression patterns. We provide PRDF with a semantics extending that of RDF, and propose a correct and complete algorithm which, by computing a particular graph homomorphism, decides the consequence between an RDF graph and a PRDF graph. We then define the PSPARQL query language, extending SPARQL with PRDF graph patterns and complying with RDF model theoretic semantics. PRDF thus offers both graph patterns and path expressions. We show that this extension does not increase the computational complexity of SPARQL and, based on the proposed algorithm, we have implemented a correct and complete PSPARQL query engine.  相似文献   

以RDF结构为基础的数据网的发展中,高效数据检索成为关键问题之一。形式化查询语言(如SPARQL)因其语法的复杂性及查询本体的相关性阻碍其效用的发挥,迫切需要新的方法或工具实现以自然语言为基础(如关键字检索)的检索。形式化查询语言是检索这类结构化数据的有效方式,用户习惯自然语言为基础的检索方式。因而如何自动将关键词为基础的检索方式转换成以形式化查询为基础的检索方式是实现数据网的重要一环。关联数据的自然语言查询方法自动将自然语言查询转换成SPARQL查询,提高系统的有效性和效率。文中在抽象转换度量模型的基础上,以本体为基础构建查询语义图及实现语义消歧,构建SPARQL查询。实验结果表明,该方法具有更高的召回率、精度及更低的时间消耗。  相似文献   

A number of accessible RDF stores are populating the linked open data world. The navigation on data reticular relationships is becoming every day more relevant. Several knowledge base present relevant links to common vocabularies while many others are going to be discovered increasing the reasoning capabilities of our knowledge base applications. In this paper, the Linked Open Graph, LOG, is presented. It is a web tool for collaborative browsing and navigation on multiple SPARQL entry points. The paper presented an overview of major problems to be addressed, a comparison with the state of the arts tools, and some details about the LOG graph computation to cope with high complexity of large Linked Open Dada graphs. The LOG.disit.org tool is also presented by means of a set of examples involving multiple RDF stores and putting in evidence the new provided features and advantages using dbPedia, Getty, Europeana, Geonames, etc. The LOG tool is free to be used, and it has been adopted, developed and/or improved in multiple projects: such as ECLAP for social media cultural heritage, Sii-Mobility for smart city, and ICARO for cloud ontology analysis, OSIM for competence/knowledge mining and analysis.  相似文献   

基于电子病历观察性数据的真实世界研究成为目前临床科研的热点。然而关系数据模型无法直接支撑起科研应用中医疗事件的时序关系表示以及知识融合的查询需求。针对上述问题,该文提出了一种新的基于RDF的医疗观察性数据表示模型,该模型可以清晰地表示临床检查、诊断、治疗等多种事件类型以及事件的时序关系。对来源于医院的电子病历数据,经过数据预处理、数据模式转换、时序关系构建以及知识融合4个步骤建立事件图谱。具体地,使用三家上海三甲医院的电子病历数据,构建了包括3个专科、173 395个医疗事件以及501 335个事件时序关系的医疗数据集,并融合了5 313个中文医疗知识库概念。基于临床文献与医生科研需求,该文根据公共卫生流行病学的病因研究、治疗研究等类型,分别提供了针对本数据集的40个问题示例,并将其中的部分问题与传统关系数据库在查询的构建与执行方面进行了实验比对,论证了该事件图谱的优越性。该数据集遵循开放链接标准,在OpenKG上发布并提供了在线访问的SPARQL站点,链接为 https://peg.ecustnlplab.com/dataset.html。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号