首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we study a variant of reachability queries, called label-constraint reachability (LCR) queries. Specifically, given a label set S and two vertices u1 and u2 in a large directed graph G, we check the existence of a directed path from u1 to u2, where edge labels along the path are a subset of S. We propose the path-label transitive closure method to answer LCR queries. Specifically, we t4ransform an edge-labeled directed graph into an augmented DAG by replacing the maximal strongly connected components as bipartite graphs. We also propose a Dijkstra-like algorithm to compute path-label transitive closure by re-defining the “distance” of a path. Comparing with the existing solutions, we prove that our method is optimal in terms of the search space. Furthermore, we propose a simple yet effective partition-based framework (local path-label transitive closure+online traversal) to answer LCR queries in large graphs. We prove that finding the optimal graph partition to minimize query processing cost is a NP-hard problem. Therefore, we propose a sampling-based solution to find the sub-optimal partition. Moreover, we address the index maintenance issues to answer LCR queries over the dynamic graphs. Extensive experiments confirm the superiority of our method.  相似文献   

2.
Given the heterogeneity of complex graph data on the web, such as RDF linked data, it is likely that a user wishing to query such data will lack full knowledge of the structure of the data and of its irregularities. Hence, providing flexible querying capabilities that assist users in formulating their information seeking requirements is highly desirable. In this paper we undertake a detailed theoretical investigation of query approximation, query relaxation, and their combination, for this purpose. The query language we adopt comprises conjunctions of regular path queries, thus encompassing recent extensions to SPARQL to allow for querying paths in graphs using regular expressions (SPARQL 1.1). To this language we add standard notions of query approximation based on edit distance, as well as query relaxation based on RDFS inference rules. We show how both of these notions can be integrated into a single theoretical framework and we provide incremental evaluation algorithms that run in polynomial time in the size of the query and the data, returning answers in ranked order of their ‘distance’ from the original query. We also combine for the first time these two disparate notions into a single ‘flex’ operation that simultaneously applies both approximation and relaxation to a query conjunct, providing even greater flexibility for users, but still retaining polynomial time evaluation complexity and the ability to return query answers in ranked order.  相似文献   

3.
We study the problem of evaluating xpath queries over xml data that is stored in an rdbms via schema-based shredding. The interaction between recursion (descendants-axis) in xpath queries and recursion in dtds makes it challenging to answer xpath queries using rdbms. We present a new approach to translating xpath queries into sql queries based on a notion of extended XP ath expressions and a simple least fixpoint (lfp) operator. Extended xpath expressions are a mild extension of xpath, and the lfp operator takes a single input relation and is already supported by most commercial rdbms. We show that extended xpath expressions are capable of capturing both dtd recursion and xpath queries in a uniform framework. Furthermore, they can be translated into an equivalent sequence of sql queries with the lfp operator. We present algorithms for rewriting xpath queries over a (possibly recursive) dtd into extended xpath expressions and for translating extended xpath expressions to sql queries, as well as optimization techniques. The novelty of our approach consists in its capability to answer a large class of xpath queries by means of only low-end rdbms features already available in most rdbms, as well as its flexibility to accommodate existing relational query optimization techniques. In addition, these translation algorithms provide a solution to query answering for certain (possibly recursive) xml views of xml data. Our experimental results verify the effectiveness of our techniques. An extended abstract was presented at the 31st international conference on Very Large Data Bases (VLDB), 2005.  相似文献   

4.
As database technology is applied to more and more application domains, user queries are becoming increasingly complex (e.g. involving a large number of joins and a complex query structure). Query optimizers in existing database management systems (DBMS) were not developed for efficiently processing such queries and often suffer from problems such as intolerably long optimization time and poor optimization results. To tackle this challenge, we present a new similarity-based approach to optimizing complex queries in this paper. The key idea is to identify similar subqueries that often appear in a complex query and share the optimization result among similar subqueries in the query. Different levels of similarity for subqueries are introduced. Efficient algorithms to identify similar queries in a given query and optimize the query based on similarity are presented. Related issues, such as choosing good starting nodes in a query graph, evaluating identified similar subqueries and analyzing algorithm complexities, are discussed. Our experimental results demonstrate that the proposed similarity-based approach is quite promising in optimizing complex queries with similar subqueries in a DBMS.  相似文献   

5.
We propose a new way of indexing a large database of small and medium-sized graphs and processing exact subgraph matching (or subgraph isomorphism) and approximate (full) graph matching queries. Rather than decomposing a graph into smaller units (e.g., paths, trees, graphs) for indexing purposes, we represent each graph in the database by its graph signature, which is essentially a multiset. We construct a disk-based index on all the signatures via bulk loading. During query processing, a query graph is also mapped into its signature, and this signature is searched using the index by performing multiset operations. To improve the precision of exact subgraph matching, we develop a new scheme using the concept of line graphs. Through extensive evaluation on real and synthetic graph datasets, we demonstrate that our approach provides a scalable and efficient disk-based solution for a large database of small and medium-sized graphs.  相似文献   

6.
Optimization and evaluation of shortest path queries   总被引:1,自引:0,他引:1  
We investigate the problem of how to evaluate efficiently a collection of shortest path queries on massive graphs that are too big to fit in the main memory. To evaluate a shortest path query efficiently, we introduce two pruning algorithms. These algorithms differ on the extent of materialization of shortest path cost and on how the search space is pruned. By grouping shortest path queries properly, batch processing improves the performance of shortest path query evaluation. Extensive study is also done on fragment sizes, cache sizes and query types that we show that affect the performance of a disk-based shortest path algorithm. The performance and scalability of proposed techniques are evaluated with large road systems in the Eastern United States. To demonstrate that the proposed disk-based algorithms are viable, we show that their search times are significant better than that of main-memory Dijkstra's algorithm.  相似文献   

7.
k步可达查询用于在给定的有向无环图(DAG)中回答两点之间是否存在长度不超过k的路径。针对现有方法的索引规模大、查询处理效率低的问题,提出一种基于部分点的双向最短路径索引来提升索引的可达信息覆盖率,并提出一组优化规则来减小索引规模;然后提出基于简化图的正反互逆拓扑索引来加速回答不可达查询;最后提出远距离优先的双向遍历策略来提高查询处理的效率。基于21个真实数据集(如引用网络、社交网络等)的实验结果表明,相比已有的高效方法PLL及BFSI-B,所提出的算法具有更小的索引规模和更快的查询响应速度。  相似文献   

8.
Traditional algorithms for optimizing the execution order of joins are no more valid when selections and projections involve methods and become very expensive operations. Selections and projections could be even more costly than joins such that they are pulled above joins, rather than pushed down in a query tree. In this paper, we take a fundamental look at how to approach query optimization from a top-down design perspective, rather than trying to force one model to fit into another. We present a graph model which is designed to characterize execution plans. Each edge and each vertex of the graph is assigned a weight to model execution plans. We also design algorithms that use these weights to optimize the execution order of operations. A cost model of these algorithms is developed. Experiments are conducted on the basis of this cost model. The results show that our algorithms are superior to similar work proposed in the literature. Received 20 April 1999 / Accepted 9 August 2000 Published online 20 April 2001  相似文献   

9.
It is likely that customers issue requests based on out-of-date information in e-commerce application systems. Hence, the transaction failure rates would increase greatly. In this paper, we present a preference update model to address this problem. A preference update is an extended SQL update statement where a user can request the desired number of target data items by specifying multiple preferences. Moreover, the preference update allows easy extraction of criteria from a set of concurrent requests and, hence, optimal decisions for the data assignments can be made. We propose a group evaluation strategy for preference update processing in a multidatabase environment. The experimental results show that the group evaluation can effectively increase the customer satisfaction level with acceptable cost. Peng Li is the Chief Software Architect of didiom LLC. Before that, he was a visiting assistant professor of computer science department in Western Kentucky University. He received his Ph.D. degree of computer science from the University of Texas at Dallas. He also holds a B.Sc. and M.S. in Computer Science from the Renmin University of China. His research interests include database systems, database security, transaction processing, distributed and Internet computer and E-commerce. Manghui Tu received a Bachelor degree of Science from Wuhan University, P.R. China in 1996, and a Master Degree in Computer Science from the University of Texas at Dallas 2001. He is currently working toward the PhD degree in the Department of Computer Science at the University of Texas at Dallas. Mr. Tu’s research interests include distributed systems, grid computing, information security, mobile computing, and scientific computing. His PhD research work focus on the data management in secure and high performance data grid. He is a student member of the IEEE. I-Ling Yen received her BS degree from Tsing-Hua University, Taiwan, and her MS and PhD degrees in Computer Science from the University of Houston. She is currently an Associate Professor of Computer Science at the University of Texas at Dallas. Dr. Yen’s research interests include fault-tolerant computing, security systems and algorithms, distributed systems, Internet technologies, E-commerce, and self-stabilizing systems. She had published over 100 technical papers in these research areas and received many research awards from NSF, DOD, NASA, and several industry companies. She has served as Program Committee member for many conferences and Program Chair/Co-Chair for the IEEE Symposium on Application-Specific Software and System Engineering & Technology, IEEE High Assurance Systems Engineering Symposium, IEEE International Computer Software and Applications Conference, and IEEE International Symposium on Autonomous Decentralized Systems. She is a member of the IEEE. Zhonghang Xia received the B.S. degree in applied mathematics from Dalian University of Technology in 1990, the M.S. degree in Operations Research from Qufu Normal University in 1993, and the Ph.D. degree in computer science from the University of Texas at Dallas in 2004. He is now an assistant professor in the Department of Computer Science, Western Kentucky University, Bowling Green, KY. His research interests are in the area of multimedia computing and networking, distributed systems, and data mining.  相似文献   

10.
This paper considers the theory of database queries on the complex value data model with external functions. Motivated by concerns regarding query evaluation, we first identify recursive sets of formulas, called embedded allowed, which is a class with desirable properties of “reasonable” queries.We then show that all embedded allowed calculus (or fix-point) queries are domain independent and continuous. An algorithm for translating embedded allowed queries into equivalent algebraic expressions as a basis for evaluating safe queries in all calculus-based query classes has been developed.Finally we discuss the topic of “domain independent query programs”, compare the expressive power of the various complex value query languages and their embedded allowed versions, and discuss the relationship between safety, embedded allowed, and domain independence in the various calculus-based queries.  相似文献   

11.
国冰磊  于炯  廖彬  杨德先 《计算机科学》2015,42(10):202-207, 231
IT系统能耗的节节攀升,使得设计新一代DBMS时必须考虑其能耗效率问题。由于SQL语句的执行过程大约消耗70%~90%的数据库资源,因此对SQL进行能耗建模及优化对提高数据库的能源使用效率具有重要的意义。在对SQL查询处理机制进行研究的基础上,构建了SQL能耗模型,并对一系列查询优化原则进行了实验,以表明不同优化原则对性能提升及能耗减少的有效性。实验及能耗数据分析表明:CPU利用率是影响系统功耗的最关键因素,SQL能耗优化方法可忽略内存优化且应该均衡考虑性能优化及功耗优化两方面,提出的SQL能耗模型及节能优化方法具有较强的应用价值。  相似文献   

12.
Location-based services have attracted the attention of important research in the field of mobile computing. Specifically, different mechanisms have been proposed to process location-dependent queries. In the above mentioned context, it is usually assumed that the location data are expressed at a fine geographic precision. However, a different granularity may be more appropriate in certain situations. Thus, a location resolution higher than required may even be inconvenient or not understandable by the user (for example, if the user expects a city name as an answer and instead the system provides the latitude/longitude coordinates). Moreover, if the locations presented to the user need to be refreshed automatically as the objects move, it is obvious that maintaining up-to-date GPS-like geographic coordinates would be more expensive in terms of processing and communication. Unfortunately, the existing approaches assume queries whose locations are always given with maximum precision (i.e., GPS locations).In this paper, a distributed query processing approach that adapts itself to the level of the location resolution required is presented. Thus, it supports continuous location-dependent queries based on the required terminology for the locations, depending on the granularity used (e.g., GPS, cities, states, provinces, or any other predefined geographic area). For the above mentioned purpose, location granules can be defined to specify the semantics appropriate for the queries and/or the way the results should be presented. A prototype showing the functionality and benefits of the approach has been implemented and used in an extensive experimental evaluation. The proposal not only increases the flexibility and expressive power of the queries considerably but also performs efficiently.  相似文献   

13.
We optimize relational queries using connection hypergraphs (CHGs). All operations including value-passing between SQL blocks can be set-oriented. By introducing partial evaluations, reordering operations can be achieved for nested queries. For a query using views, we merge CHGs for the views and the query into one CHG and then apply query optimization. Furthermore, we may simulate magic sets methods elegantly in a CHG. Sideways information-passing strategies (SIPS) in a CHG amount to partial evaluations of SIPS paths. We introduce the maximum SIPS strategy, which performs SIPS for all bindings and all SIPS paths for a query. The new method has several advantages. First, the maximum SIPS strategy can be more efficient than the previous SIPS based on simple heuristics. Second, it is conceptually simple and easy to implement. Third, the processing strategies may be incorporated with the search space for query execution plans, which is a proven optimization strategy introduced by System R. Fourth, it provides a general framework of query optimization and may potentially be used to optimize next-generation database systems. Received September 1, 1993 / Accepted January 8, 1996  相似文献   

14.
高翔  王晓  王敏 《计算机科学》2014,41(4):163-167,194
网络的举证分析、错误诊断在网络管理和安全方面正发挥着越来越重要的作用。这就要求网络管理系统具有网络溯源的功能。网络溯源可以用于跟踪信息在网络上流传的轨迹,确定信息数据来源。提出了一个网络溯源系统(NPS)框架的设计与实现,该框架可以支持在大规模的分布式环境中获得网络溯源,采用了新近提出的宣告式网络技术来有效地维护和查询分布式网络溯源。该框架采用基于引用的方式来传递溯源信息,采用有向无环图来表示溯源信息,在分布式网络中实现了高效的网络溯源。在ns-3构建的模拟网络中进行了仿真实验,实验结果表明该网络溯源系统框架可以有效地支持一个大规模分布式网络的溯源计算,与传统方法相比显著地减少了带宽的开销。  相似文献   

15.
王剑波 《计算机工程》2011,37(17):49-51
针对关系表达式难以进行无限制一到多数据转换的问题,通过关系代数的扩展表达一到多数据转换,采用递归查询和表函数实现无限制一到多数据转换,在每个输入元组上产生一个或者多个输出元组。递归查询通过创建初始结果,递归获取结果集,并返回最终结果集;表函数声明变量集合,使用过程体和游标循环访问表,迭代输出元组。实验分析不同参数影响因素下有限制和无限制转换的不同方法,结果表明扩展方法能够改善系统性能。  相似文献   

16.
The skyline operator has been extensively explored in the literature, and most of the existing approaches assume that all dimensions are available for all data items. However, many practical applications such as sensor networks, decision making, and location-based services, may involve incomplete data items, i.e., some dimensional values are missing, due to the device failure or the privacy preservation. This paper is the first, to our knowledge, study of k-skyband (kSB) query processing on incomplete data, where multi-dimensional data items are missing some values of their dimensions. We formalize the problem, and then present two efficient algorithms for processing it. Our methods introduce some novel concepts including expired skyline, shadow skyline, and thickness warehouse, in order to boost the search performance. As a second step, we extend our techniques to tackle constrained skyline (CS) and group-by skyline (GBS) queries over incomplete data. Extensive experiments with both real and synthetic data sets demonstrate the effectiveness and efficiency of our proposed algorithms under various experimental settings.  相似文献   

17.
数据库系统性能的改进直接影响到整个应用系统的执行效率。介绍了不同的优化理论方法,针对实际系统出现的问题,讨论了如何选取合适的设计思路方法,从系统的多个层面来分析影响数据库性能的因素,并给出了具体的优化策略方法。  相似文献   

18.
易佳  薛晨  王树鹏 《计算机科学》2017,44(5):172-177
分布式流查询是一种基于数据流的实时查询计算方法,近年来得到了广泛的关注和快速发展。综述了分布式流处理框架在实时关系型查询上取得的研究成果;对涉及分布式数据加载、分布式流计算框架、分布式流查询的产品进行了分析和比较;提出了基于Spark Streaming和Apache Kafka构建的分布式流查询模型,以并发加载多个文件源的形式,设计内存文件系统实现数据的快速加载,相较于基于Apache Flume的加载技术提速1倍以上。在Spark Streaming的基础上,实现了基于Spark SQL的分布式流查询接口,并提出了自行编码解析SQL语句的方法,实现了分布式查询。测试结果表明,在查询语句复杂的情况下,自行编码解析SQL的查询效率具有明显的优势。  相似文献   

19.
Dijkstra defined a distributed system to be self-stabilizing if, regardless of the initial state, the system is guaranteed to reach a legitimate (correct) state in a finite time. Even though the concept of self-stabilization received little attention when it was introduced, it has become one of the most popular fault tolerance approaches. On the other hand, graph algorithms form the basis of many network protocols. They are used in routing, clustering, multicasting and many other tasks. The objective of this paper is to survey the self-stabilizing algorithms for dominating and independent set problems, colorings, and matchings. These graph theoretic problems are well studied in the context of self-stabilization and a large number of algorithms have been proposed for them.  相似文献   

20.
This work addresses graph-based semi-supervised classification and betweenness computation in large, sparse, networks (several millions of nodes). The objective of semi-supervised classification is to assign a label to unlabeled nodes using the whole topology of the graph and the labeling at our disposal. Two approaches are developed to avoid explicit computation of pairwise proximity between the nodes of the graph, which would be impractical for graphs containing millions of nodes. The first approach directly computes, for each class, the sum of the similarities between the nodes to classify and the labeled nodes of the class, as suggested initially in [1] and [2]. Along this approach, two algorithms exploiting different state-of-the-art kernels on a graph are developed. The same strategy can also be used in order to compute a betweenness measure. The second approach works on a trellis structure built from biased random walks on the graph, extending an idea introduced in [3]. These random walks allow to define a biased bounded betweenness for the nodes of interest, defined separately for each class. All the proposed algorithms have a linear computing time in the number of edges while providing good results, and hence are applicable to large sparse networks. They are empirically validated on medium-size standard data sets and are shown to be competitive with state-of-the-art techniques. Finally, we processed a novel data set, which is made available for benchmarking, for multi-class classification in a large network: the U.S. patents citation network containing 3M nodes (of six different classes) and 38M edges. The three proposed algorithms achieve competitive results (around 85% classification rate) on this large network-they classify the unlabeled nodes within a few minutes on a standard workstation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号