首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
基于分布式的RDF数据分割方法能够解决大规模RDF数据的分割和存储问题。为保证RDF数据的分布式存储和解决数据分割效率提出了一种基于贪婪策略的分割方法。先通过启发式贪心策略根据子图的负载均衡,依次选择度数最高的节点或者度数相对较高的节点,将其放入同一个子图中,后进行相邻顶点的优化。然后通过分区策略将子图分配到对应节点,存储到neo4j数据库并建立相应的索引将数据保存到Redis数据库。实验对比了几种数据分割算法以及图形数据库与关系型数据库的RDF数据存储方案,并验证了RDF图数据的存储方案和分割算法的有效性。  相似文献   

2.
KGDB:统一模型和语言的知识图谱数据库管理系统   总被引:2,自引:0,他引:2  
知识图谱是人工智能的重要基石,其目前主要有RDF图和属性图两种数据模型,在这两种数据模型之上有数种查询语言,RDF图上的查询语言为SPARQL,属性图上的查询语言主要为Cypher.十年来,各个社区开发了分别针对RDF图和属性图的不同数据管理方法,不统一的数据模型和查询语言限制了知识图谱的更广应用.KGDB (Knowledge Graph Database)是统一模型和语言的知识图谱数据库管理系统:(1)以关系模型为基础,提出统一的存储方案,支持RDF图和属性图的高效存储,满足知识图谱数据存储和查询负载的需求;(2)使用基于特征集的聚类方法解决无类型三元组的存储问题;(3)实现了SPARQL和Cypher两种不同知识图谱查询语言的互操作性,使其能够操作同一个知识图谱.在真实数据集和合成数据集上进行的大量实验表明,KGDB与已有知识图谱数据库管理系统相比,不仅能够提供更加高效的存储管理,而且具有更高的查询效率.KGDB平均比gStore和Neo4j节省了30%的存储空间,基本图模式查询上的实验表明,在真实数据集上的查询速度普遍高于gStore和Neo4j,最快可提高2个数量级.  相似文献   

3.
在分析民航突发事件应急管理领域本体及其存储特点的基础上,提出了一种基于Neo4j的领域本体RDF图数据存储方法,研究了领域本体RDF有向标记图结构与Neo4j图数据库存储模型的关系,结合民航突发事件应急管理领域本体的实例查询,给出了RDF图与Neo4j之间的映射关系及其实现过程。实验验证了Neo4j图数据库在满足领域本体RDF图数据查询的同时,进一步提高了查询的效率,为大数据平台下的RDF图数据语义检索与推理提供了方法支撑。  相似文献   

4.
《计算机工程》2018,(3):138-143
海量知识的高效管理是网络监测预警发挥效能的前提。为此,提出一种基于图数据库的大规模资源描述框架(RDF)数据存储方法。根据RDF数据的图模型特征,基于启发式的贪婪策略对数据集进行分割,包括子图生成阶段和子图划分阶段,同时采用热点数据动态复制删除的方式实现动态数据流的负载均衡。在3个不同数据集上的对比实验表明,该方法的存储性能优于基于关系型数据库的方法。  相似文献   

5.
分布式存储是解决大规模数据存储的一种比较有效的方法,而数据分割是实现分布式存储的前提。面对不断增长的RDF数据,提出一种基于双目标优化的RDF图分割算法(RDF Graph Partitioning algorithm based on Double Objective Optimization,RGPDOO)。RGPDOO将边割和分割平衡两项图分割指标融合到一个目标函数,并依据此目标函数,实现了RDF图的静态和动态分割。其中静态图分割通过对图进行初始划分,将图中顶点分成内核顶点、交叉顶点和自由顶点三类。然后通过计算目标函数增益对交叉和自由顶点进行分配。动态图分割部分,针对RDF元组的插入和删除给出相应的解决方案。同时,为了满足图分割目标,算法每隔一段时间[T]会根据子图的平衡性和紧密性进行一次动态调整。实验选择合成和真实数据集进行测试,并分别与几种通用的静态和动态图分割算法进行比较。实验结果表明提出的算法能够有效地实现RDF图的静态和动态分割。  相似文献   

6.
杨程  陆佳民  冯钧 《计算机应用》2020,40(11):3184-3191
随着知识图谱的日益发展和在各个垂直领域的广泛应用,对于资源描述框架(RDF)数据的高效处理需求日益成为现代大数据管理领域中的新课题。RDF是W3C提出的用于描述知识图谱实体以及实体间关系的数据模型。为了有效地应对大规模RDF数据的存储和查询,很多学者考虑在分布式环境中管理RDF数据。RDF数据的分布式存储所面临的关键问题是数据的划分,而划分的结果很大程度上决定了SPARQL的查询性能。从数据划分的角度,主要围绕两类:基于图结构的RDF数据划分方法和基于语义的RDF数据划分方法展开深入阐述。前者包括多粒度层次划分、模板划分和聚类划分,适用于通用领域查询的语义范畴较为宽泛的场景;后者包括哈希划分、垂直划分和模式划分,更加适用于垂直领域查询的语义范畴相对固定的环境。此外,针对几种典型的划分方法进行对比与分析,为未来RDF数据划分方法的研究提供参考。最后,对未来RDF数据划分方法的发展方向进行了归纳总结。  相似文献   

7.
杨程  陆佳民  冯钧 《计算机应用》2005,40(11):3184-3191
随着知识图谱的日益发展和在各个垂直领域的广泛应用,对于资源描述框架(RDF)数据的高效处理需求日益成为现代大数据管理领域中的新课题。RDF是W3C提出的用于描述知识图谱实体以及实体间关系的数据模型。为了有效地应对大规模RDF数据的存储和查询,很多学者考虑在分布式环境中管理RDF数据。RDF数据的分布式存储所面临的关键问题是数据的划分,而划分的结果很大程度上决定了SPARQL的查询性能。从数据划分的角度,主要围绕两类:基于图结构的RDF数据划分方法和基于语义的RDF数据划分方法展开深入阐述。前者包括多粒度层次划分、模板划分和聚类划分,适用于通用领域查询的语义范畴较为宽泛的场景;后者包括哈希划分、垂直划分和模式划分,更加适用于垂直领域查询的语义范畴相对固定的环境。此外,针对几种典型的划分方法进行对比与分析,为未来RDF数据划分方法的研究提供参考。最后,对未来RDF数据划分方法的发展方向进行了归纳总结。  相似文献   

8.
随着语史网的发展,存储和查询ILDF数据是亟待解决的问题.为此,展示用来存储RDF数据的基于DHT的P2P网络的体系结构,描述RDF的模型图及查询图,提出在分布式的上下文中查询处理和优化的运算法则.  相似文献   

9.
10.
郑志蕴  刘博李伦  王振飞 《计算机科学》2015,42(7):234-239, 249
随着语义网数据的海量涌现,人们更加关注RDF图的数据查询效率,通过关键词匹配直接查询RDF数据图成为一个研究热点。针对关键词查询中普遍存在的结果冗余与偏离等问题,提出了一种基于关键词的RDF数据图查询模型。该模型首先采用提出的基于迭代的图查询算法(ISGR)对所查询关键词进行子图匹配,得到唯一且最大的结果子图集合;然后根据关键词图与结果子图之间的结构信息,利用统计语言模型,给出了一种结果子图排序方法(SimLM)。对比实验表明,提出的查询模型及排序方法在一致性和相关性方面的性能优于传统模型。  相似文献   

11.
This paper is a historical overview of graph-based methodologies in Pattern Recognition in the last 40 years; history is interpreted with the aim of recognizing the rationale inspiring the papers published in these years, so as to roughly classify them. Despite the extent of scientific production in this field, it is possible to identify three historical periods, each having its own connotation common to most of the corresponding papers, which are called here as the pure, the impure and extreme periods.  相似文献   

12.
Graphs are a powerful and popular representation formalism in pattern recognition. Particularly in the field of document analysis they have found widespread application. From the formal point of view, however, graphs are quite limited in the sense that the majority of mathematical operations needed to build common algorithms, such as classifiers or clustering schemes, are not defined. Consequently, we observe a severe lack of algorithmic procedures that can directly be applied to graphs. There exists recent work, however, aimed at overcoming these limitations. The present paper first provides a review of the use of graph representations in document analysis. Then we discuss a number of novel approaches suitable for making tools from statistical pattern recognition available to graphs. These novel approaches include graph kernels and graph embedding. With several experiments, using different data sets from the field of document analysis, we show that the new methods have great potential to outperform traditional procedures applied to graph representations.  相似文献   

13.
图(Graph)在众多的科学领域和工程领域(如模式识别和计算机视觉)中具有广泛的应用 ,其具备 强大的信息表达能力。当图被用来表示物体结构时,衡量物体的相似程度将会被转化成计算两个图的相似度,这就是图匹配(Graph Matching)。近几十年来,对图匹配相关技术和算法的研究已经成为了研究领域内的一个重要课题,尤其是随着大数据时代的来临,图作为数据之间关系的一种表示形式,将会受到越来越多的关注。文中对图匹配技术的发展现状进行了综述,详细介绍了该技术的理论基础,梳理了解决图匹配问题的几种主流思路。最后,结合图匹配技术的一种具体应用对几种算法的性能进行了对比分析。  相似文献   

14.
In this paper, we formulate a novel question on maximum flow queries. Specifically, this problem aims to find which k edges would have the largest impact on a maximum flow query on a network. This problem has important applications in areas like social network and network planning. We show the inapproximability of the problems and present our heuristic algorithms. Experimental evaluations are carried out on real datasets and results show that our algorithms are scalable and return high quality solutions.  相似文献   

15.
16.
In [A. García, C. Hernando, F. Hurtado, M. Noy, J. Tejel, Packing trees into planar graphs, J. Graph Theory (2002) 172-181] García et al. conjectured that for every two non-star trees there exists a planar graph containing them as edge-disjoint subgraphs. In this paper we prove the conjecture in the case in which one of the trees is a spider tree.  相似文献   

17.
We propose the use of annotations as a way to flexibly enrich a domain of interest with information concerning different contexts of use for its elements. We provide a formal model of annotation in the framework of typed graphs, in which the presence of annotations is reified through nodes and edges of specific types, relating nodes from different domains. This allows the flexible activation and de-activation of annotations, as well as the addition of several annotations from different domains on the same element. We show that annotations give rise to a category, where pushouts are the basic construct for the composition of annotation-related processes. We prove some properties of annotated graphs and discuss examples drawn from several fields.  相似文献   

18.
时空图建模是分析图形结构系统中各要素空间关系与时间趋势的一个基础工作.传统的时空图建模方法,主要基于图中节点与节点关系固定的显式结构进行空间关系挖掘,这严重限制了模型的灵活性.此外,未考虑节点间的时空依赖关系的传统建模方法不能捕获节点间的长时时空趋势.为了克服这些缺陷,研究并提出了一种新的用于时空图建模的图神经网络模型,即面向时空图建模的图小波卷积神经网络模型(Graph Wavelet Convolutional Neural Network for Spatiotemporal Graph Modeling,GWNN-STGM),称为GWNN-STGM.在GWNN-STGM中设计了一个图小波卷积神经网络层,并在该网络层中设计并引入了自适应邻接矩阵进行节点嵌入学习,使得模型能够在不需要结构先验知识的情况下,从数据集中自动发现隐藏的结构信息.此外,GWNN-STGM还包含了一个堆叠的扩张因果卷积网络层,使模型的感受野能够随着卷积网络层数的增加呈指数增长,从而能够处理长时序列.GWNN-STGM成功将图小波卷积神经网络层和扩张因果卷积网络层两个模块进行有效集成.通过在公共交通网络数据集上试验发现,提出的GWNN-STGM的性能优于其他的基准模型,这表明设计的图小波卷积神经网络模型在从输入数据集中探索时空结构方面具有很大的潜力.  相似文献   

19.
In this paper, we investigate the use of heat kernels as a means of embedding the individual nodes of a graph in a vector space. The reason for turning to the heat kernel is that it encapsulates information concerning the distribution of path lengths and hence node affinities on the graph. The heat kernel of the graph is found by exponentiating the Laplacian eigensystem over time. In this paper, we explore how graphs can be characterized in a geometric manner using embeddings into a vector space obtained from the heat kernel. We explore two different embedding strategies. The first of these is a direct method in which the matrix of embedding co-ordinates is obtained by performing a Young–Householder decomposition on the heat kernel. The second method is indirect and involves performing a low-distortion embedding by applying multidimensional scaling to the geodesic distances between nodes. We show how the required geodesic distances can be computed using parametrix expansion of the heat kernel. Once the nodes of the graph are embedded using one of the two alternative methods, we can characterize them in a geometric manner using the distribution of the node co-ordinates. We investigate several alternative methods of characterization, including spatial moments for the embedded points, the Laplacian spectrum for the Euclidean distance matrix and scalar curvatures computed from the difference in geodesic and Euclidean distances. We experiment with the resulting algorithms on the COIL database.  相似文献   

20.
Motivated by a problem of targeted advertising in social networks, we introduce a new model of online learning on labeled graphs where the graph is initially unknown and the algorithm is free to choose which vertex to predict next. For this learning model, we define an appropriate measure of regularity of a graph labeling called the merging degree. In general, the merging degree of a graph is small when its vertices can be partitioned into a few well-separated clusters within which labels are roughly constant. For the special case of binary labeled graphs, the merging degree is a more refined measure than the cutsize. After observing that natural nonadaptive exploration/prediction strategies, like depth-first with majority vote, do not behave satisfactorily on graphs with small merging degree, we introduce an efficiently implementable adaptive strategy whose cumulative loss is controlled by the merging degree. A matching lower bound shows that in the case of binary labels our analysis cannot be improved.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号