首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Modern applications requiring spatial network processing pose several interesting query optimization challenges. Spatial networks are usually represented as graphs, and therefore, queries involving a spatial network can be executed by using the corresponding graph representation. This means that the cost for executing a query is determined by graph properties such as the graph order and size (i.e., number of nodes and edges) and other graph parameters. In this paper, we present novel methods to estimate the number of nodes and edges in regions of interest in spatial networks, towards predicting the space and time requirements for range queries. The methods are evaluated by using real-life and synthetic data sets. Experimental results show that the number of nodes and edges can be estimated efficiently and accurately, with relatively small space requirements, thus providing useful information to the query optimizer.  相似文献   

2.
Graphs are widely used for modeling complicated data such as social networks, bibliographical networks and knowledge bases. The growing sizes of graph databases motivate the crucial need for developing powerful and scalable graph-based query engines. We propose a SPARQL-like language, G-SPARQL, for querying attributed graphs. The language enables the expression of different types of graph queries that are of large interest in the databases that are modeled as large graph such as pattern matching, reachability and shortest path queries. Each query can combine both structural predicates and value-based predicates (on the attributes of the graph nodes/edges). We describe an algebraic compilation mechanism for our proposed query language which is extended from the relational algebra and based on the basic construct of building SPARQL queries, the Triple Pattern. We describe an efficient hybrid Memory/Disk representation of large attributed graphs where only the topology of the graph is maintained in memory while the data of the graph are stored in a relational database. The execution engine of our proposed query language splits parts of the query plan to be pushed inside the relational database (using SQL) while the execution of other parts of the query plan is processed using memory-based algorithms, as necessary. Experimental results on real and synthetic datasets demonstrate the efficiency and the scalability of our approach and show that our approach outperforms native graph databases by several factors.  相似文献   

3.
PEGASUS: mining peta-scale graphs   总被引:2,自引:1,他引:1  
In this paper, we describe PeGaSus, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node, finding the connected components, and computing the importance score of nodes. As the size of graphs reaches several Giga-, Tera- or Peta-bytes, the necessity for such a library grows too. To the best of our knowledge, PeGaSus is the first such library, implemented on the top of the Hadoop platform, the open source version of MapReduce. Many graph mining operations (PageRank, spectral clustering, diameter estimation, connected components, etc.) are essentially a repeated matrix-vector multiplication. In this paper, we describe a very important primitive for PeGaSus, called GIM-V (generalized iterated matrix-vector multiplication). GIM-V is highly optimized, achieving (a) good scale-up on the number of available machines, (b) linear running time on the number of edges, and (c) more than 5 times faster performance over the non-optimized version of GIM-V. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web graphs, thanks to Yahoo!, with ≈ 6.7 billion edges.  相似文献   

4.
We study the properties of the principal eigenvector for the adjacency matrix (and related matrices) for a general directed graph. In particular—motivated by the use of the eigenvector for estimating the “importance” of the nodes in the graph—we focus on the distribution of positive weight in this eigenvector, and give a coherent picture which builds upon and unites earlier results. We also propose a simple method—“T-Rank”—for generating importance scores. T-Rank generates authority scores via a one-level, non-normalized matrix, and is thus distinct from known methods such as PageRank (normalized), HITS (two-level), and SALSA (two-level and normalized). We show, using our understanding of the principal eigenvector, that T-Rank has a much less severe “sink problem” than does PageRank. Also, we offer numerical results which quantify the “tightly-knit community” or TKC effect. We find that T-Rank has a stronger TKC effect than PageRank, and we offer a novel interpolation method which allows for continuous tuning of the strength of this TKC effect. Finally, we propose two new “sink remedies”, i.e., methods for ensuring that the principal eigenvector is positive everywhere. One of our sink remedies (source pumping) is unique among sink remedies, in that it gives a positive eigenvector without rendering the graph strongly connected. We offer a preliminary evaluation of the effects and possible applications of these new sink remedies.  相似文献   

5.
As the amount of text data grows explosively, an efficient index structure for large text databases becomes ever important. The n-gram inverted index (simply, the n-gram index) has been widely used in information retrieval or in approximate string matching due to its two major advantages: language-neutral and error-tolerant. Nevertheless, the n-gram index also has drawbacks: the size tends to be very large, and the performance of queries tends to be bad. In this paper, we propose the two-level n-gram inverted index (simply, the n-gram/2L index) that significantly reduces the size and improves the query performance by using the relational normalization theory. We first identify that, in the (full-text) n-gram index, there exists redundancy in the position information caused by a non-trivial multivalued dependency. The proposed index eliminates such redundancy by constructing the index in two levels: the front-end index and the back-end index. We formally prove that this two-level construction is identical to the relational normalization process. We call this process structural optimization of the n-gram index. The n-gram/2L index has excellent properties: (1) it significantly reduces the size and improves the performance compared with the n-gram index with these improvements becoming more marked as the database size gets larger; (2) the query processing time increases only very slightly as the query length gets longer. Experimental results using real databases of 1 GB show that the size of the n-gram/2L index is reduced by up to 1.9–2.4 times and, at the same time, the query performance is improved by up to 13.1 times compared with those of the n-gram index. We also compare the n-gram/2L index with Makinen’s compact suffix array (CSA) (Proc. 11th Annual Symposium on Combinatorial Pattern Matching pp. 305–319, 2000) stored in disk. Experimental results show that the n-gram/2L index outperforms the CSA when the query length is short (i.e., less than 15–20), and the CSA is similar to or better than the n-gram/2L index when the query length is long (i.e., more than 15–20).  相似文献   

6.
The ever increasing size of graphs makes them difficult to query and store. In this paper, we present Shrink, a compression method that reduces the size of the graph while preserving the distances between the nodes. The compression is based on the iterative merging of the nodes. During each merging, a system of linear equations is solved to define new edge weights in a way that the new weights have the least effect on the distances. Merging nodes continues until the desired size for the compressed graph is reached. The compressed graph, also known as the coarse graph, can be queried without decompression. As the complexity of distance-based queries such as shortest path queries is highly dependent on the size of the graph, Shrink improves the performance in terms of time and storage. Shrink not only provides the length of the shortest path but also identifies the nodes on the path. The approach has been applied to both weighted and unweighted graphs including road network, friendship network, collaboration network, web graph and social network. In the experiment, a road network with more than 2.5 million nodes is reduced to fifth while the average relative error is less than 1%.  相似文献   

7.
k步可达查询用于在给定的有向无环图(DAG)中回答两点之间是否存在长度不超过k的路径。针对现有方法的索引规模大、查询处理效率低的问题,提出一种基于部分点的双向最短路径索引来提升索引的可达信息覆盖率,并提出一组优化规则来减小索引规模;然后提出基于简化图的正反互逆拓扑索引来加速回答不可达查询;最后提出远距离优先的双向遍历策略来提高查询处理的效率。基于21个真实数据集(如引用网络、社交网络等)的实验结果表明,相比已有的高效方法PLL及BFSI-B,所提出的算法具有更小的索引规模和更快的查询响应速度。  相似文献   

8.
Given a large directed graph, rapidly answering reachability queries between source and target nodes is an important problem. Existing methods for reachability tradeoff indexing time and space versus query time performance. However, the biggest limitation of existing methods is that they do not scale to very large real-world graphs. We present a simple yet scalable reachability index, called GRAIL, that is based on the idea of randomized interval labeling and that can effectively handle very large graphs. Based on an extensive set of experiments, we show that while more sophisticated methods work better on small graphs, GRAIL is the only index that can scale to millions of nodes and edges. GRAIL has linear indexing time and space, and the query time ranges from constant time to being linear in the graph order and size. Our reference C++ implementations are open source and available for download at http://www.code.google.com/p/grail/.  相似文献   

9.
10.
Given a graph with edges colored Red and Blue, we study the problem of sampling and approximately counting the number of matchings with exactly k Red edges. We solve the problem of estimating the number of perfect matchings with exactly k Red edges for dense graphs. We study a Markov chain on the space of all matchings of a graph that favors matchings with k Red edges. We show that it is rapidly mixing using non-traditional canonical paths that can backtrack. We show that this chain can be used to sample matchings in the 2-dimensional toroidal lattice of any fixed size with k Red edges, where the horizontal edges are Red and the vertical edges are Blue. An extended abstract appeared in J.R. Correa, A. Hevia and M.A. Kiwi (eds.) Proceedings of the 7th Latin American Theoretical Informatics Symposium, LNCS 3887, pp. 190–201, Springer, 2006. N. Bhatnagar’s and D. Randall’s research was supported in part by NSF grants CCR-0515105 and DMS-0505505. V.V. Vazirani’s research was supported in part by NSF grants 0311541, 0220343 and CCR-0515186. N. Bhatnagar’s and E. Vigoda’s research was supported in part by NSF grant CCR-0455666.  相似文献   

11.
Spatiotemporal objects – that is, objects that evolve over time – appear in many applications. Due to the nature of such applications, storing the evolution of objects through time in order to answer historical queries (queries that refer to past states of the evolution) requires a very large specialized database, what is termed in this article a spatiotemporal archive. Efficient processing of historical queries on spatiotemporal archives requires equally sophisticated indexing schemes. Typical spatiotemporal indexing techniques represent the objects using minimum bounding regions (MBR) extended with a temporal dimension, which are then indexed using traditional multidimensional index structures. However, rough MBR approximations introduce excessive overlap between index nodes, which deteriorates query performance. This article introduces a robust indexing scheme for answering spatiotemporal queries more efficiently. A number of algorithms and heuristics are elaborated that can be used to preprocess a spatiotemporal archive in order to produce finer object approximations, which, in combination with a multiversion index structure, will greatly improve query performance in comparison to the straightforward approaches. The proposed techniques introduce a query efficiency vs. space tradeoff that can help tune a structure according to available resources. Empirical observations for estimating the necessary amount of additional storage space required for improving query performance by a given factor are also provided. Moreover, heuristics for applying the proposed ideas in an online setting are discussed. Finally, a thorough experimental evaluation is conducted to show the merits of the proposed techniques. Edited by B. Seeger A short version of this article appeared as “Efficient indexing of spatiotemporal objects” in the Proceedings of Extending Database Technology 2002 [19]. This work was partially supported by NSF grants IIS-9907477, EIA-9983445, NSF IIS 9984729, NSF ITR 0220148, NSF IIS-0133825, NRDRP, and the U.S. Department of Defense.  相似文献   

12.
Databases with uncertainty and lineage   总被引:2,自引:0,他引:2  
This paper introduces uldbs, an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage and uncertainty. Uncertain data and data lineage are two important areas of data management that have been considered extensively in isolation, however many applications require the features in tandem. Fundamentally, lineage enables simple and consistent representation of uncertain data, it correlates uncertainty in query results with uncertainty in the input data, and query processing with lineage and uncertainty together presents computational benefits over treating them separately. We show that the uldb representation is complete, and that it permits straightforward implementation of many relational operations. We define two notions of uldb minimality—data-minimal and lineage-minimal—and study minimization of uldb representations under both notions. With lineage, derived relations are no longer self-contained: their uncertainty depends on uncertainty in the base data. We provide an algorithm for the new operation of extracting a database subset in the presence of interconnected uncertainty. We also show how uldbs enable a new approach to query processing in probabilistic databases. Finally, we describe the current state of the Trio system, our implementation of uldbs under development at Stanford. This work was supported by the National Science Foundation under grants IIS-0324431, IIS-1098447, and IIS-9985114, by DARPA Contract #03-000225, and by a grant from the Boeing Corporation.  相似文献   

13.
须德  张彤 《软件学报》1993,4(2):58-64
本文对半连接运算进行扩展,提出一个新的循环查询求解方法——标志位映射法,该方法能将循环查询中的所有关系完全化简,代价为5n—4次相邻结点间的数据传输,其中n为查询图中的结点数。  相似文献   

14.
We introduce a new abstract model of database query processing, finite cursor machines, that incorporates certain data streaming aspects. The model describes quite faithfully what happens in so-called “one-pass” and “two-pass query processing”. Technically, the model is described in the framework of abstract state machines. Our main results are upper and lower bounds for processing relational algebra queries in this model, specifically, queries of the semijoin fragment of the relational algebra.  相似文献   

15.
A new visual language, which is called ER-QBE and uses a conceptual model semantics, is proposed for the development of XML views. ER-QBE is based on graph queries that are trees of parametric SQL queries. The ER-QBE language is characterized by formally proved completeness, its expressiveness is higher than that of well-known alternatives, and it supports both relational and object-relational databases. __________ Translated from Kibernetika i Sistemnyi Analiz, No. 2, pp. 171–183, March–April 2008.  相似文献   

16.
Reverse nearest neighbors in large graphs   总被引:3,自引:0,他引:3  
A reverse nearest neighbor (RNN) query returns the data objects that have a query point as their nearest neighbor (NN). Although such queries have been studied quite extensively in Euclidean spaces, there is no previous work in the context of large graphs. In this paper, we provide a fundamental lemma, which can be used to prune the search space while traversing the graph in search for RNN. Based on it, we develop two RNN methods; an eager algorithm that attempts to prune network nodes as soon as they are visited and a lazy technique that prunes the search space when a data point is discovered. We study retrieval of an arbitrary number k of reverse nearest neighbors, investigate the benefits of materialization, cover several query types, and deal with cases where the queries and the data objects reside on nodes or edges of the graph. The proposed techniques are evaluated in various practical scenarios involving spatial maps, computer networks, and the DBLP coauthorship graph.  相似文献   

17.
标签图常用于智能交通网、生物信息网等新兴领域的建模。子图查询作为图数据分析的关键问题,引起了研究者的广泛关注。对现有子图查询算法的研究发现,随着图数据规模增大且频繁更新,传统子图查询算法普遍存在查询效率低,存储开销大,忽略顶点标签信息等问题。为此,提出了一种支持大规模动态标签图子图查询的层次序列索引(Dynamic Hierarchical Sequence,DHS),该索引提取数据图中带有顶点编号的层次拓扑序列关系以实现子图查询;针对图的动态变化,提出了更新点拓扑扩展式索引维护策略,仅从局部变化顶点及边开始进行增量式更新,大大降低了重建索引造成的巨大开销;提出了基于DHS索引的子图查询方法,仅需将查询图与数据图的层次序列进行匹配即可获得候选集,并在其上利用关系匹配策略获得最终查询结果。实验证明提出的方法在保证高效查询的同时降低了索引的创建及维护时间,提高了子图查询效率。  相似文献   

18.
The concept of a database skeleton which reflects both the user's conception of the real world and the system's understanding of the interrelationships among database entities is described. It consists of a conceptual schema (conceptual graphs) and a relational schema (information graph). With the aid of the database skeleton, fuzzy queries can be translated and disambiguated by analyzing the queries using the conceptual graphs of a database skeleton. The query language XQL is introduced, and the XQL translator is described in some detail.  相似文献   

19.
基于超像素的多主体图像交互分割   总被引:2,自引:0,他引:2       下载免费PDF全文
目的 为解决多主体图像的交互分割问题,在保证分割效果的前提上,提高分割的效率,达到实时交互修改分割结果的目的, 提出基于超像素的图像多主体交互分割算法.方法 基于图像的超像素构造一个多层流网络,利用用户交互绘制的简单笔画给出多主体分割的指导信息.流网络的边权值保证利用图割算法将图像分割成多个部分后,每个部分代表图像的一个主体.允许用户交互给出标记,实时修改分割结果,直到得到满意的多主体分割.结果 通过实验显示,本文方法能得到的满意多主体分割结果,而且时间效率较高.对分辨率为449×275的图像,算法能在1 s内给出结果,满足实时修改的要求.结论 基于超像素建立的图规模较小,能大大减少图割算法的运行时间,达到用户实时交互添加新笔画信息,交互地修正分割结果的目的.利用超像素的边界信息,用户只需输入比较简单的笔画信息,分割算法就能得到正确的多主体分割结果.  相似文献   

20.
With the exponential growth of moving objects data to the Gigabyte range, it has become critical to develop effective techniques for indexing, updating, and querying these massive data sets. To meet the high update rate as well as low query response time requirements of moving object applications, this paper takes a novel approach in moving object indexing. In our approach, we do not require a sophisticated index structure that needs to be adjusted for each incoming update. Rather, we construct conceptually simple short-lived index images that we only keep for a very short period of time (sub-seconds) in main memory. As a consequence, the resulting technique MOVIES supports at the same time high query rates and high update rates, trading this property for query result staleness. Moreover, MOVIES is the first main memory method supporting time-parameterized predictive queries. To support this feature, we present two algorithms: non-predictive MOVIES and predictive MOVIES. We obtain the surprising result that a predictive indexing approach—considered state-of-the-art in an external-memory scenario—does not scale well in a main memory environment. In fact, our results show that MOVIES outperforms state-of-the-art moving object indexes such as a main-memory adapted B x -tree by orders of magnitude w.r.t. update rates and query rates. In our experimental evaluation, we index the complete road network of Germany consisting of 40,000,000 road segments and 38,000,000 nodes. We scale our workload up to 100,000,000 moving objects, 58,000,000 updates per second and 10,000 queries per second, a scenario at a scale unmatched by any previous work.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号