首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Keyword queries have long been popular to search engines and to the information retrieval community and have recently gained momentum for its usage in the expert systems community. The conventional semantics for processing a user query is to find a set of top-k web pages such that each page contains all user keywords. Recently, this semantics has been extended to find a set of cohesively interconnected pages, each of which contains one of the query keywords scattered across these pages. The keyword query having the extended semantics (i.e., more than a list of keywords hyperlinked with each other) is referred to the graph query. In case of the graph query, all the query keywords may not be present on a single Web page. Thus, a set of Web pages with the corresponding hyperlinks need to be presented as the search result. The existing search systems reveal serious performance problem due to their failure to integrate information from multiple connected resources so that an efficient algorithm for keyword query over graph-structured data is proposed. It integrates information from multiple connected nodes of the graph and generates result trees with the occurrence of all the query keywords. We also investigate a ranking measure called graph ranking score (GRS) to evaluate the relevant graph results so that the score can generate a scalar value for keywords as well as for the topology.  相似文献   

李鸣鹏  高宏  邹兆年 《软件学报》2016,27(9):2265-2277
研究了基于图压缩的最大Steiner连通k核查询处理,提出了一种支持最大Steiner连通k核查询的图压缩算法SC,证明了基于SC压缩算法的查询正确性.由于最大Steiner连通k核查询仅需要找到符合要求的连通区域,提出了图压缩算法TC,进一步将压缩图压缩为树.证明了基于压缩树的查询正确性,并提出了线性时间的无需解压缩的查询处理算法.真实和虚拟数据上的实验结果表明:压缩算法平均可将原始图压缩掉88%,且对于稠密的原始图,压缩算法的压缩效果更好,可将原始图压缩掉90%,与在原始图上直接进行查询处理相比,基于压缩图的查询处理算法效率更好,平均提升了1~2个数量级.  相似文献   

The increasing popularity of graph data in various domains has lead to a renewed interest in developing efficient graph matching techniques, especially for processing large graphs. In this paper, we study the problem of approximate graph matching in a large attributed graph. Given a large attributed graph and a query graph, we compute a subgraph of the large graph that best matches the query graph. We propose a novel structure-aware and attribute-aware index to process approximate graph matching in a large attributed graph. We first construct an index on the similarity of the attributed graph, by partitioning the large search space into smaller subgraphs based on structure similarity and attribute similarity. Then, we construct a connectivity-based index to give a concise representation of inter-partition connections. We use the index to find a set of best matching paths. From these best matching paths, we compute the best matching answer graph using a greedy algorithm. Experimental results on real datasets demonstrate the efficiency of both index construction and query processing. We also show that our approach attains high-quality query answers.  相似文献   

Keyword search is the most popular technique of searching information from XML (eXtensible markup language) document. It enables users to easily access XML data without learning the structure query language or studying the complex data schemas. Existing traditional keyword query methods are mainly based on LCA (lowest common ancestor) semantics, in which the returned results match all keywords at the granularity of elements. In many practical applications, information is often uncertain and vague. As a result, how to identify useful information from fuzzy data is becoming an important research topic. In this paper, we focus on the issue of keyword querying on fuzzy XML data at the granularity of objects. By introducing the concept of “object tree”, we propose the query semantics for keyword query at object-level. We find the minimum whole matching result object trees which contain all keywords and the partial matching result object trees which contain partial keywords, and return the root nodes of these result object trees as query results. For effectively and accurately identifying the top-K answers with the highest scores, we propose a score mechanism with the consideration of tf*idf document relevance, users’ preference and possibilities of results. We propose a stack-based algorithm named object-stack to obtain the top-K answers with the highest scores. Experimental results show that the object-stack algorithm outperforms the traditional XML keyword query algorithms significantly, and it can get high quality of query results with high search efficiency on the fuzzy XML document.  相似文献   

蒋凯  关佶红 《计算机工程》2011,37(3):42-43,46
基于重启型随机游走模型和个人化PageRank算法,提出一种新的图上关键字搜索算法。该算法将向量空间模型和随机游走模型进行有效的结合,使查询搜索得到的结果可以匹配查询关键字,通过充分挖掘利用图中隐含的结构信息,更好地提供搜索结果。实验结果证明了该算法的有效性。  相似文献   

Keyword query processing over graph structured data is beneficial across various real world applications. The basic unit, of search and retrieval, in keyword search over graph, is a structure (interconnection of nodes) that connects all the query keywords. This new answering paradigm, in contrast to single web page results given by search engines, brings forth new challenges for ranking. In this paper, we propose a simple but effective Fuzzy set theory based Ranking measure, called FRank. Fuzzy sets acknowledge the contribution of each individual query keyword, discretely, to enumerate node relevance. A novel aggregation operator is defined, to combine the content relevance based fuzzy sets and, compute query dependent edge weights. The final rank, of an answer, is computed by non-monotonic addition of edge weights, as per their relevance to keyword query. FRank evaluates each answer based on the distribution of query keywords and structural connectivity between those keywords. An extensive empirical analysis shows superior performance by our proposed ranking measure as compared to the ranking measures adopted by current approaches in the literature.  相似文献   

Finding information located somewhere on the World-Wide Web is an error-prone and frustrating task. The WebQuery system offers a powerful new method for searching the Web based on connectivity and content. We do this by examining links among the nodes returned in a keyword-based query. We then rank the nodes, giving the highest rank to the most highly connected nodes. By doing so, we are finding “hot spots” on the Web that contain information germane to a user's query. WebQuery not only ranks and filters the results of a Web query, it also extends the result set beyond what the search engine retrieves, by finding “interesting” sites that are highly connected to those sites returned by the original query. Even with WebQuery filtering and ranking query results, the result sets can be enormous. So, we need to visualize the returned information. We explore several techniques for visualizing this information—including cone trees, 2D graphs, 3D graphs, lists, and bullseyes-and discuss the criteria for using each of the techniques.  相似文献   

Retrieving 2D shapes using caterpillar decomposition   总被引:1,自引:0,他引:1  
Graphs provide effective data structures modeling complex relations and schemaless data such as images, XML documents, circuits, compounds, and proteins. Given a query graph, finding sufficiently similar database graphs without performing a sequential search is an important problem arising in different domains. In this paper, we propose a new method for indexing tree structures based on a graph-theoretic concept called caterpillar decomposition. Our algorithm starts by representing each tree along with its subtrees in the geometric space using its caterpillar decomposition. After representing the query in the same fashion, similar database trees are retrieved efficiently by means of nearest neighbor searches. We have successfully evaluated the proposed algorithm on two shape databases and include a set of perturbation experiments that establish the algorithm’s robustness to noise. We have also shown that the approach compares favorably to previous approaches for shape retrieval on these datasets.  相似文献   

我国智慧城市安全概念的普及和建设的逐渐落地,以及大数据在智慧城市安全建设方面的深度应用,对关键词检索的处理响应速度提出了更高的要求。针对这一问题,提出了基于城市安全知识图谱的流式知识图谱多关键词并行检索算法(MKPRASKG),该算法能够根据用户输入的查询关键字,通过关联类图的构建、剪枝和融合操作实时构建基于知识图谱实体的查询子图集,再结合评分函数,以高评分的查询子图为指引,在知识图谱实例数据中进行并行搜索,最终返回Top-k查询结果。实验结果证明,该算法在实时搜索、响应时间、搜索效果以及可扩展性等方面均具有较大的优势。  相似文献   

一种基于XLCA的XML关键字搜索方法   总被引:1,自引:0,他引:1  
关键字搜索是大多数普通用户搜索信息的有效手段,因为他们不需要学习复杂的查询语言,也不需要了解底层数据的结构.本文研究了针对XML文档的关键字搜索问题,首先指出前人基于SLCA的结果集定义的不完备性,进而提出基于XLCA的结果集定义,使得其能够包含所有可能的结果.基于这样的结果集定义,给出了一种精简的索引结构以及相应的搜索算法,并实现了这两种不同的方法,实验证明本文提出的方法在性能以及可扩展性方面均有较大的提高.  相似文献   

在关系数据库中,关键词查询无需用户学习查询语言和数据库模式相关知识,而且有效地扩大了查询范围.采用元组图描述关系数据库中元组关系,可使关键词查询问题转化为元组图的最小Steiner树求解问题.本文提出元组图上基于相似度的边权重计算方法,使边权重能够反映元组与关键词相似度的大小.然后,鉴于最小Steiner树求解问题是NP-完全问题,提出按照贪心策略执行Dijkstra算法的最小Steiner树较优解求解算法.最后,通过实验对算法进行了分析和验证.  相似文献   

Reverse nearest neighbors in large graphs   总被引:3,自引:0,他引:3  
A reverse nearest neighbor (RNN) query returns the data objects that have a query point as their nearest neighbor (NN). Although such queries have been studied quite extensively in Euclidean spaces, there is no previous work in the context of large graphs. In this paper, we provide a fundamental lemma, which can be used to prune the search space while traversing the graph in search for RNN. Based on it, we develop two RNN methods; an eager algorithm that attempts to prune network nodes as soon as they are visited and a lazy technique that prunes the search space when a data point is discovered. We study retrieval of an arbitrary number k of reverse nearest neighbors, investigate the benefits of materialization, cover several query types, and deal with cases where the queries and the data objects reside on nodes or edges of the graph. The proposed techniques are evaluated in various practical scenarios involving spatial maps, computer networks, and the DBLP coauthorship graph.  相似文献   

Algorithms used in data mining and bioinformatics have to deal with huge amount of data efficiently.In many applications,the data are supposed to have explicit or implicit structures.To develop efficient algorithms for such data,we have to propose possible structure models and test if the models are feasible.Hence,it is important to make a compact model for structured data,and enumerate all instances efficiently.There are few graph classes besides trees that can be used for a model.In this paper,we inves...  相似文献   

With more and more knowledge provided by WWW, querying and mining the knowledge bases have attracted much research attention. Among all the queries over knowledge bases, which are usually modelled as graphs, a keyword query is the most widely used one. Although the problem of keyword query over graphs has been deeply studied for years, knowledge bases, as special error-tolerant graphs, lead to the results of the traditional defined keyword queries out of users’ satisfaction. Thus, in this paper, we define a new keyword query, called confident r-clique, specific for knowledge bases based on the r-clique definition for keyword query on general graphs, which has been proved to be the best one. However, as we prove in the paper, finding the confident r-cliques is #P-hard. We propose a filtering-and-verification framework to improve the search efficiency. In the filtering phase, we develop the tightest upper bound of the confident r-clique, and design an index together with its search algorithm, which suits the large scale of knowledge bases well. In the verification phase, we develop an efficient sampling method to verify the final answers from the candidates remaining in the filtering phase. Extensive experiments demonstrate that the results derived from our new definition satisfy the users’ requirement better compared with the traditional r-clique definition, and our algorithms are efficient.  相似文献   

In this paper, a graph problem on connected, weighted, undirected graphs, called the searchlight guarding problem, is considered. Assume that there is a fugitive who moves along the edges of the graph at a random speed. The task involves placing a set of searchlights at vertices to search the edges of the graph and to spot the fugitive. Suppose that placing a searchlight at some vertex incurs some building cost. The searchlight guarding problem is to allocate a set S of searchlights at the vertices such that the total cost of the vertices in S is minimized. If there is more than one set of searchlights, each with a minimum building cost, then identify the set with the minimum search time, that is, where the time slots needed to spot the fugitive is the minimum. As is well established, the problem is NP-hard on weighted bipartite graphs but is linear-time solvable on weighted trees. In this paper, the design of a linear-time optimal algorithm for the searchlight guarding problem on weighted interval graphs is presented. It entails two phases. In the first phase, a set of searchlights with minimum guarding cost is identified and the search directions of all edges are assigned. To achieve this task, a new problem, called the edge-direction assignment problem, is first defined and the problem on weighted complete-split graphs is solved by the greedy strategy. Based on this computational result, the problem of finding the set of searchlights with minimum guarding cost and assigning the search directions of all edges is solved by the dynamic programming strategy. Then, in the second phase, the search time slots of each edge are determined on the basis of the results of the first phase and on some properties of interval graphs.  相似文献   

Estimating the partition function is a key but difficult computation in graphical models. One approach is to estimate tractable upper and lower bounds. The piecewise upper bound of Sutton et al. is computed by breaking the graphical model into pieces and approximating the partition function as a product of local normalizing factors for these pieces. The tree reweighted belief propagation algorithm (TRW-BP) by Wainwright et al. gives tighter upper bounds. It optimizes an upper bound expressed in terms of convex combinations of spanning trees of the graph. Recently, Globerson et al. gave a different, convergent iterative dual optimization algorithm TRW-GP for the TRW objective. However, in many practical applications, particularly those that train CRFs with many nodes, TRW-BP and TRW-GP are too slow to be practical. Without changing the algorithm, we prove that TRW-BP converges in a single iteration for associative potentials, and give a closed form for the solution it finds. The closed-form solution obviates the need for complex optimization. We use this result to develop new closed-form upper bounds for MRFs with arbitrary pairwise potentials. Being closed-form, they are much faster to compute than TRW-based bounds. We also prove similar convergence results for loopy belief propagation (LBP) and use it to obtain closed-form solutions to the LBP pseudomarginals and approximation to the partition function for associative potentials. We then use recent results proved by Wainwright et al for binary MRFs to obtain closed-form lower bounds on the partition function. We then develop novel lower bounds for arbitrary associative networks. We report on experiments with synthetic and real-world graphs. Our new upper bounds are considerably tighter than the piecewise bounds in practice. Moreover, we can compute our bounds on several graphs where TRW-BP does not converge. Our novel lower bound, in spite of being closed-form and much faster to compute, outperforms more complicated popular algorithms for computing lower bounds like mean-field on densely connected graphs by wide margins although it does worse on sparsely connected graphs like chains.  相似文献   

郑志蕴  刘博李伦  王振飞 《计算机科学》2015,42(7):234-239, 249
随着语义网数据的海量涌现,人们更加关注RDF图的数据查询效率,通过关键词匹配直接查询RDF数据图成为一个研究热点。针对关键词查询中普遍存在的结果冗余与偏离等问题,提出了一种基于关键词的RDF数据图查询模型。该模型首先采用提出的基于迭代的图查询算法(ISGR)对所查询关键词进行子图匹配,得到唯一且最大的结果子图集合;然后根据关键词图与结果子图之间的结构信息,利用统计语言模型,给出了一种结果子图排序方法(SimLM)。对比实验表明,提出的查询模型及排序方法在一致性和相关性方面的性能优于传统模型。  相似文献   

We proposed a novel solution schema called the Hierarchical Labeling Schema (HLS) to answer reachability queries in directed graphs. Different from many existing approaches that focus on static directed acyclic graphs (DAGs), our schema focuses on directed cyclic graphs (DCGs) where vertices or arcs could be added to a graph incrementally. Unlike many of the traditional approaches, HLS does not require the graph to be acyclic in constructing its index. Therefore, it could, in fact, be applied to both DAGs and DCGs. When vertices or arcs are added to a graph, the HLS is capable of updating the index incrementally instead of re-computing the index from the scratch each time, making it more efficient than many other approaches in the practice. The basic idea of HLS is to create a tree for each vertex in a graph and link the trees together so that whenever two vertices are given, we can immediately know whether there is a path between them by referring to the appropriate trees. We conducted extensive experiments on both real-world datasets and synthesized datasets. We compared the performance of HLS, in terms of index construction time, query processing time and space consumption, with two state-of-the-art methodologies, the path-tree method and the 3-hop method. We also conducted simulations to model the situation when a graph is updated incrementally. The performance comparison of different algorithms against HLS on static graphs has also been studied. Our results show that HLS is highly competitive in the practice and is particularly useful in the cases where the graphs are updated frequently.  相似文献   

Querying graph data is a fundamental problem that witnesses an increasing interest especially for massive graph databases which come as a promising alternative to relational databases for big data modeling. In this paper, we study the problem of subgraph isomorphism search which consists to enumerate the embedding of a query graph in a data graph. The most known solutions of this NP-complete problem are backtracking-based and result in a high computational cost when we deal with massive graph databases. We address this problem and its challenges via graph compression with modular decomposition. In our approach, subgraph isomorphism search is performed on compressed graphs without decompressing them yielding substantial reduction of the search space and consequently a significant saving in processing time as well as in storage space for the graphs. We evaluated our algorithms on nine real-word datasets. The experimental results show that our approach is efficient and scalable.  相似文献   

Frequent subgraph mining in outerplanar graphs   总被引:1,自引:1,他引:0  
In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we consider the class of outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for outerplanar graphs, and show that it works in incremental polynomial time for the practically relevant subclass of well-behaved outerplanar graphs, i.e., which have only polynomially many simple cycles. We evaluate the algorithm empirically on chemo- and bioinformatics applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号