期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

KGDB:统一模型和语言的知识图谱数据库管理系统 总被引：2，自引：0，他引：2

刘宝珠王鑫柳鹏凯李思卓张小旺杨雅君《软件学报》2021,32(3):781-804

知识图谱是人工智能的重要基石,其目前主要有RDF图和属性图两种数据模型,在这两种数据模型之上有数种查询语言,RDF图上的查询语言为SPARQL,属性图上的查询语言主要为Cypher.十年来,各个社区开发了分别针对RDF图和属性图的不同数据管理方法,不统一的数据模型和查询语言限制了知识图谱的更广应用.KGDB （Knowledge Graph Database）是统一模型和语言的知识图谱数据库管理系统：（1）以关系模型为基础,提出统一的存储方案,支持RDF图和属性图的高效存储,满足知识图谱数据存储和查询负载的需求;（2）使用基于特征集的聚类方法解决无类型三元组的存储问题;（3）实现了SPARQL和Cypher两种不同知识图谱查询语言的互操作性,使其能够操作同一个知识图谱.在真实数据集和合成数据集上进行的大量实验表明,KGDB与已有知识图谱数据库管理系统相比,不仅能够提供更加高效的存储管理,而且具有更高的查询效率.KGDB平均比gStore和Neo4j节省了30%的存储空间,基本图模式查询上的实验表明,在真实数据集上的查询速度普遍高于gStore和Neo4j,最快可提高2个数量级. 相似文献

2.

KGDB: Knowledge Graph Database System with Unified Model and Query Language

下载免费PDF全文

Baozhu Liu Xin Wang Pengkai Liu Sizhuo Li Xiaowang Zhang Yajun Yang 《International Journal of Software and Informatics》2021,11(1):91-116

Knowledge graph is an important cornerstone of artificial intelligence, which currently has two main data models: RDF graphs and property graphs. There are several query languages on these two data models, including SPARQL on RDF graphs and Cypher on property graphs. Over the last decade, various communities have developed different data management methods for RDF graphs and property graphs. Inconsistent data models and query languages hinder the wider application of knowledge graphs. In this paper, we propose a knowledge graphy database (KGDB) system with unified data model and query language. (1) We work out a unified storage scheme based on the relational model that supports the efficient storage of RDF graphs and property graphs, catering to the smooth storage and query of knowledge graph data. (2) The characteristic set-based clustering is used in KGDB for the storage of typeless entities. (3) It realizes the interoperability of SPARQL and Cypher by enabling them to operate on the same knowledge graph. Extensive experiments on real-world datasets and synthetic datasets reveal that KGDB is more efficient than existing knowledge graph database management systems in storage management and query efficiency. KGDB saves 30% of the storage space on average compared with gStore and Neo4j. In addition, KDGB is two orders of magnitude faster than gStore and Neo4j in the query of the real-world datasets, seen from experiments on the query of basic graph pattern matching. 相似文献

3.

Extending SPARQL with regular expression patterns (for querying RDF)

Faisal Alkhateeb Jean-François Baget Jérôme Euzenat 《Journal of Web Semantics》2009,7(2):57-73

RDF is a knowledge representation language dedicated to the annotation of resources within the framework of the semantic web. Among the query languages for RDF, SPARQL allows querying RDF through graph patterns, i.e., RDF graphs involving variables. Other languages, inspired by the work in databases, use regular expressions for searching paths in RDF graphs. Each approach can express queries that are out of reach of the other one. Hence, we aim at combining these two approaches. For that purpose, we define a language, called PRDF (for “Path RDF”) which extends RDF such that the arcs of a graph can be labeled by regular expression patterns. We provide PRDF with a semantics extending that of RDF, and propose a correct and complete algorithm which, by computing a particular graph homomorphism, decides the consequence between an RDF graph and a PRDF graph. We then define the PSPARQL query language, extending SPARQL with PRDF graph patterns and complying with RDF model theoretic semantics. PRDF thus offers both graph patterns and path expressions. We show that this extension does not increase the computational complexity of SPARQL and, based on the proposed algorithm, we have implemented a correct and complete PSPARQL query engine. 相似文献

4.

gStore: a graph-based SPARQL query engine

Lei Zou M. Tamer Özsu Lei Chen Xuchuan Shen Ruizhe Huang Dongyan Zhao 《The VLDB Journal The International Journal on Very Large Data Bases》2014,23(4):565-590

We address efficient processing of SPARQL queries over RDF datasets. The proposed techniques, incorporated into the gStore system, handle, in a uniform and scalable manner, SPARQL queries with wildcards and aggregate operators over dynamic RDF datasets. Our approach is graph based. We store RDF data as a large graph and also represent a SPARQL query as a query graph. Thus, the query answering problem is converted into a subgraph matching problem. To achieve efficient and scalable query processing, we develop an index, together with effective pruning rules and efficient search algorithms. We propose techniques that use this infrastructure to answer aggregation queries. We also propose an effective maintenance algorithm to handle online updates over RDF repositories. Extensive experiments confirm the efficiency and effectiveness of our solutions. 相似文献

5.

大规模RDF图数据上高效率分布式查询处理

王鑫徐强柴乐乐杨雅君柴云鹏《软件学报》2019,30(3):498-514

相似文献

6.

Semantics preserving SPARQL-to-SQL translation 总被引：2，自引：0，他引：2

Artem Shiyong Farshad 《Data & Knowledge Engineering》2009,68(10):973-1000

Most existing RDF stores, which serve as metadata repositories on the Semantic Web, use an RDBMS as a backend to manage RDF data. This motivates us to study the problem of translating SPARQL queries into equivalent SQL queries, which further can be optimized and evaluated by the relational query engine and their results can be returned as SPARQL query solutions. The main contributions of our research are: (i) We formalize a relational algebra based semantics of SPARQL, which bridges the gap between SPARQL and SQL query languages, and prove that our semantics is equivalent to the mapping-based semantics of SPARQL; (ii) Based on this semantics, we propose the first provably semantics preserving SPARQL-to-SQL translation for SPARQL triple patterns, basic graph patterns, optional graph patterns, alternative graph patterns, and value constraints; (iii) Our translation algorithm is generic and can be directly applied to existing RDBMS-based RDF stores; and (iv) We outline a number of simplifications for the SPARQL-to-SQL translation to generate simpler and more efficient SQL queries and extend our defined semantics and translation to support the bag semantics of a SPARQL query solution. The experimental study showed that our proposed generic translation can serve as a good alternative to existing schema dependent translations in terms of efficient query evaluation and/or ensured query result correctness. 相似文献

7.

Redesign of the gStore system

Li ZENG Lei ZOU 《Frontiers of Computer Science》2018,12(4):623-641

相似文献

8.

Content-based image retrieval based on a fuzzy approach

Krishnapuram R. Medasani S. Sung-Hwan Jung Young-Sik Choi Balasubramaniam R. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(10):1185-1199

A typical content-based image retrieval (CBIR) system would need to handle the vagueness in the user queries as well as the inherent uncertainty in image representation, similarity measure, and relevance feedback. We discuss how fuzzy set theory can be effectively used for this purpose and describe an image retrieval system called FIRST (fuzzy image retrieval system) which incorporates many of these ideas. FIRST can handle exemplar-based, graphical-sketch-based, as well as linguistic queries involving region labels, attributes, and spatial relations. FIRST uses fuzzy attributed relational graphs (FARGs) to represent images, where each node in the graph represents an image region and each edge represents a relation between two regions. The given query is converted to a FARG, and a low-complexity fuzzy graph matching algorithm is used to compare the query graph with the FARGs in the database. The use of an indexing scheme based on a leader clustering algorithm avoids an exhaustive search of the FARG database. We quantify the retrieval performance of the system in terms of several standard measures. 相似文献

9.

基于CouchDB的SPARQL查询引擎实现

彭义倪传蕾柏文阳《计算机技术与发展》2014,(5):6-10

传统的SPARQL查询引擎在处理查询时以三元组模式为基本单位做查询优化处理,在三元组模式较多时存在着过多的连接操作,开销比较大。文中基于文档数据库的存储和查询特点,提出一种利用主语分类的方式来存储RDF数据的方法,将不同的RDF三元组按主语分成不同的类,并存入文档数据库的文档中。在处理SPARQL查询时将三元组模式也按照主语分类,构成以主语相关块为单位的查询图,并提出一种基于属性相关性的选择度估计方法来优化查询执行计划。文中利用文档数据库CouchDB实现了新的SPARQL查询引擎,实验证明文中的方法能够提高SPARQL基本图模式查询的效率。相似文献

10.

Storing and querying fuzzy RDF(S) in HBase databases

Tianyi Fan Li Yan Zongmin Ma 《国际智能系统杂志》2020,35(4):751-780

相似文献

11.

基于协作表示和模糊渐进最大边界嵌入的特征抽取方法

苏宝莉《计算机应用》2013,33(6):1677-1681

针对图嵌入方法在构造邻域关系图的过程中,简单地将样本数据划入某一类的做法并不妥当的问题,提出了模糊渐进的隶属度表示方法。该方法借助模糊数学的思想,通过模糊渐进的隶属度,将样本归属于不同类别。针对图嵌入方法中分类器效率偏低的问题,引入了协作表示分类方法,该分类方法大幅度提高了算法的计算效率。基于这两点,提出了基于协作表示和模糊渐进最大边界嵌入的特征抽取算法。在ORL、AR人脸数据库上,以及USPS数字手写体数据库上的实验表明,该算法优于主成分分析(PCA)、线性鉴别分析(LDA)、局部保留投影(LPP)和边界Fisher分析(MFA)。相似文献

12.

一种高效的RDF图模式匹配算法 总被引：5，自引：0，他引：5

汪锦岭金蓓弘李京《计算机研究与发展》2005,42(10):1763-1770

随着越来越多的信息被表示为RDF格式,如何高效地对RDF信息进行分发和过滤成为一个重要的问题·在语义Web环境下的信息分发系统中,输入的RDF信息需要和大量的用户订阅条件进行匹配,而用户的订阅条件可以被表示为RDF图模式·根据RDF图的特点,并对其增加了一些约束,设计了一种新的RDF图模式匹配算法·实验结果表明,该算法的匹配效率远远高于传统的图模式匹配算法· 相似文献

13.

基于二分图的RDF关键词扩展查询方法

郑志蕴王振涛张行进王振飞《计算机科学》2016,43(11):272-279

使用图表示RDF数据可以保持数据间的关联信息和语义信息,越来越多的关键词查询方法基于图结构实现RDF数据的查询处理。将二分图与RDF数据图相结合,定义RDF二分图模型,并提出一种基于二分图的RDF关键词扩展查询方法KERBG。该方法将文本信息封装在二分图顶点标签上,以支持对关系的查询;利用关键词同义词扩展技术对查询关键词进行语义扩展,有效解决同一对象的描述用词的多样性问题,进而提高查准率;利用RDF二分图的反对称邻接矩阵及其幂矩阵构造包含关键顶点的查询结果子图,实现关键词查询处理,并降低查询响应时间。实验结果表明,在查准率和查询响应时间方面,提出的KERBG方法优于当前主流方法。相似文献

14.

Mapping fuzzy RDF(S) into fuzzy object-oriented databases

Tianyi Fan Li Yan Zongmin Ma 《国际智能系统杂志》2019,34(10):2607-2632

相似文献

15.

Rewriting rules to permeate complex similarity and fuzzy queries within a relational database system

Penzo W. 《Knowledge and Data Engineering, IEEE Transactions on》2005,17(2):255-270

In recent years, the availability of complex data repositories (e.g., multimedia, genomic, semistructured databases) has paved the way to new potentials as to data querying. In this scenario, similarity and fuzzy techniques have proven to be successful principles for effective data retrieval. However, most proposals are domain specific and lack of a general and integrated approach to deal with generalized complex queries, i.e., queries where multiple conditions are expressed, possibly on complex as well as on traditional data. To overcome such limitations, much work has been devoted to the development of middleware systems to support query processing on multiple repositories. On a similar line, We present a formal framework to permeate complex similarity and fuzzy queries within a relational database system. As an example, we focus on multimedia data, which is represented in an integrated view with common database data. We have designed an application layer that relies on an algebraic query language, extended with MM-tailored operators, and that maps complex similarity and fuzzy queries to standard SQL statements that can be processed by a relational database system, exploiting standard facilities of modern extensible RDBMS. To show the applicability of our proposal, we implemented a prototype that provides the user with rich query capabilities, ranging from traditional database queries to complex queries gathering a mixture of Boolean, similarity, and fuzzy predicates on the data. 相似文献

16.

支持动态图数据的子图查询方法

王楠王斌李晓华杨晓春《计算机科学与探索》2014,(2):139-149

近年来,子图查询作为图数据库管理的一项重要课题受到国内外学者的广泛关注。在现实应用中大部分图数据是频繁更新的,而现有方法对图数据的频繁更新的维护代价较高。子图查询本身就是NP完全问题,在动态图数据上子图查询问题就变得更加困难。针对上述问题,提出了支持动态图数据的子图查询方法。该方法首先构造出每张图的拓扑层次序列作为索引,在序列中加入标号以便数据更新后对索引进行维护,再根据序列间的匹配关系过滤出候选集合,最后采用图同构算法验证候选集中的图,最终得到结果集合。该方法的索引构造简单且体积小,并且在图数据库更新后无需重构索引,不仅支持动态图数据上的子图查询,在静态图数据上也表现出良好的性能。相似文献

17.

Ultrawrap: SPARQL execution on relational data

《Journal of Web Semantics》2013

The Semantic Web’s promise of web-wide data integration requires the inclusion of legacy relational databases,¹ i.e. the execution of SPARQL queries on RDF representation of the legacy relational data. We explore a hypothesis: existing commercial relational databases already subsume the algorithms and optimizations needed to support effective SPARQL execution on existing relationally stored data. The experiment is embodied in a system, Ultrawrap, that encodes a logical representation of the database as an RDF graph using SQL views and a simple syntactic translation of SPARQL queries to SQL queries on those views. Thus, in the course of executing a SPARQL query, the SQL optimizer uses the SQL views that represent a mapping of relational data to RDF, and optimizes its execution. In contrast, related research is predicated on incorporating optimizing transforms as part of the SPARQL to SQL translation, and/or executing some of the queries outside the underlying SQL environment.Ultrawrap is evaluated using two existing benchmark suites that derive their RDF data from relational data through a Relational Database to RDF (RDB2RDF) Direct Mapping and repeated for each of the three major relational database management systems. Empirical analysis reveals two existing relational query optimizations that, if applied to the SQL produced from a simple syntactic translations of SPARQL queries (with bound predicate arguments) to SQL, consistently yield query execution time that is comparable to that of SQL queries written directly for the relational representation of the data. The analysis further reveals the two optimizations are not uniquely required to achieve a successful wrapper system. The evidence suggests effective wrappers will be those that are designed to complement the optimizer of the target database. 相似文献

18.

半直觉模糊图与应用

鱼先锋《计算机工程与应用》2016,52(18):88-91

将对象作顶点集,用直觉模糊数刻画对象间的相关性和不相关性表示成直觉模糊边;建立了半直觉模糊图模型。定义了半直觉模糊图的生成子图、度、路径、相关截图、序关系、最大生成树等概念。给出基于半直觉模糊图的聚类分析算法,分析了算法的复杂度。结合经典实例作了基于半直觉模糊图的聚类分析,结果显示基于半直觉模糊图的聚类分析算法复杂度低于一般直觉模糊聚类算法。高效实用且自动化程度高。相似文献

19.

基于关键词的RDF数据图查询模型研究

郑志蕴刘博李伦王振飞《计算机科学》2015,42(7):234-239, 249

随着语义网数据的海量涌现,人们更加关注RDF图的数据查询效率,通过关键词匹配直接查询RDF数据图成为一个研究热点。针对关键词查询中普遍存在的结果冗余与偏离等问题,提出了一种基于关键词的RDF数据图查询模型。该模型首先采用提出的基于迭代的图查询算法(ISGR)对所查询关键词进行子图匹配,得到唯一且最大的结果子图集合;然后根据关键词图与结果子图之间的结构信息,利用统计语言模型,给出了一种结果子图排序方法(SimLM)。对比实验表明,提出的查询模型及排序方法在一致性和相关性方面的性能优于传统模型。相似文献

20.

基于图结构特征采样数据摘要的联邦知识图谱查询

高峰李秋顾进广《计算机工程》2023,49(1):73-81

联邦SPARQL查询是通过构建查询计划来指导查询执行,数据摘要索引文件捕获了RDF数据集的结构和语义信息,对查询计划生成过程中子查询基数评估至关重要。现有的数据摘要生成方法需要远程遍历每个数据源的完整数据,该过程成本消耗较高,且在大部分环境中联邦查询无法完成对大数据集的统计工作。为在减少数据摘要索引文件生成时间和内存开销的同时捕获尽可能真实的计数信息,考虑主语和谓语的分布偏差,提出利用样图生成原始图近似数据摘要的方法。使用对RDF图出度特征加权的采样方法获取原始图的典型样图,通过改进的映射函数将样图中的信息映射到原始图上,从而生成原始图的近似数据摘要。实验结果表明,该方法相比于基线方法至少节省了70%的数据摘要索引文件生成时间,并且仅采样0.5%的原始图生成的近似数据摘要即可在查询正确率上与基线方法保持高度一致。相似文献