首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 174 毫秒
1.
Searching XML data with a structured XML query can improve the precision of results compared with a keyword search. However, the structural heterogeneity of the large number of XML data sources makes it difficult to answer the structured query exactly. As such, query relaxation is necessary. Previous work on XML query relaxation poses the problem of unnecessary computation of a big number of unqualified relaxed queries. To address this issue, we propose an adaptive relaxation approach which relaxes a query against different data sources differently based on their conformed schemas. In this paper, we present a set of techniques that supports this approach, which includes schema-aware relaxation rules for relaxing a query adaptively, a weighted model for ranking relaxed queries, and algorithms for adaptive relaxation of a query and top-k query processing. We discuss results from a comprehensive set of experiments that show the effectiveness and the efficiency of our approach.  相似文献   

2.
Keyword search enables inexperienced users to easily search XML database with no specific knowledge of complex structured query languages and XML data schemas. Existing work has addressed the problem of selecting data nodes that match keywords and connecting them in a meaningful way, e.g., SLCA and ELCA. However, it is time-consuming and unnecessary to serve all the connected subtrees to the users because in general the users are only interested in part of the relevant results. In this paper, we propose a new keyword search approach which basically utilizes the statistics of underlying XML data to decide the promising result types and then quickly retrieves the corresponding results with the help of selected promising result types. To guarantee the quality of the selected promising result types, we measure the correlations between result types and a keyword query by analyzing the distribution of relevant keywords and their structures within the XML data to be searched. In addition, relevant result types can be efficiently computed without keyword query evaluation and any schema information. To directly return top-k keyword search results that conform to the suggested promising result types, we design two new algorithms to adapt to the structural sensitivity of the keyword nodes over the keyword search results. Lastly, we implement all proposed approaches and present the relevant experimental results to show the effectiveness of our approach.  相似文献   

3.
Existing work of XML keyword search focus on how to find relevant and meaningful data fragments for a query, assuming each keyword is intended as part of it. However, in XML keyword search, user queries usually contain irrelevant or mismatched terms, typos etc, which may easily lead to empty or meaningless results. In this paper, we introduce the problem of content-aware XML keyword query refinement, where the search engine should judiciously decide whether a user query Q needs to be refined during the processing of Q, and find a list of promising refined query candidates which guarantee to have meaningful matching results over the XML data, without any user interaction or a second try. To achieve this goal, we build a novel content-aware XML keyword query refinement framework consisting of two core parts: (1) we build a query ranking model to evaluate the quality of a refined query RQ, which captures the morphological/semantical similarity between Q and RQ and the dependency of keywords of RQ over the XML data; (2) we integrate the exploration of RQ candidates and the generation of their matching results as a single problem, which is fulfilled within a one-time scan of the related keyword inverted lists optimally. Finally, an extensive empirical study verifies the efficiency and effectiveness of our framework.  相似文献   

4.
In big data era, people cannot afford more and more complex computation work due to the constrained computation resources. The high reliability, strong processing capacity, large storage space of cloud computing makes the resource-constrained clients remotely operate the heavy computation task with the help of cloud server. In this paper, a new algorithm for secure outsourcing of high degree polynomials is proposed. We introduce a camouflage technique, which the real polynomial will be disguised to the untrusted cloud server. In addition, the input and output will not be revealed in the computation process and the clients can easily verify the returned result. The application of the secure outsourcing algorithm in keyword search system is also studied. A verification technique for keyword search is generated based on the outsourcing algorithm. The client can easily verify whether the server faithfully implement the search work in the whole ciphertext space. If the server does not implement the search work and returns the client “null” to indicate there is no files with the query keyword, the client can easily verify whether there are some related files in the ciphertext database.  相似文献   

5.
As a large number of corpuses are represented, stored and published in XML format, how to find useful information from XML databases has become an increasingly important issue. Keyword search enables web users to easily access XML data without the need to learn a structured query language or to study complex data schemas. Most existing indexing strategies for XML keyword search are based upon Dewey encoding. In this paper, we proposed a new encoding method called Level Order and Father (LAF) for XML documents. With LAF encoding, we devised a new index structure, called two‐layer LAF inverted index, which can greatly decrease the space complexity compared with Dewey encoding‐based inverted index. Furthermore, with two‐layer LAF inverted index, we proposed a new keyword query algorithm called Algorithm based on Binary Search (ABS) that can quickly find all Smallest Lowest Common Ancestor. We experimentally evaluate two‐layer LAF inverted index and ABS algorithm on four real XML data sets selected from Wikipedia. The experimental results prove the advantages of our index method and querying algorithm. The space consumed by two‐layer LAF index is less than half of that consumed by Dewey inverted index. Moreover, ABS is about one to two orders of magnitude faster than the classic Stack algorithm. Concurrency and Computation: Practice and Experience, 2012.© 2012 Wiley Periodicals, Inc.  相似文献   

6.
Keyword search is an effective paradigm for information discovery and has been introduced recently to query XML documents. Scoring of XML search results is an important issue in XML keyword search. Traditional “bag-of-words” model cannot differentiate the roles of keywords as well as the relationship between keywords, thus is not proper for XML keyword queries. In this paper, we present a new scoring method based on a novel query model, called keyword query with structure (QWS), which is specially designed for XML keyword query. The method is based on a totally new view taken by the QWS model on a keyword query that, a keyword query is a composition of several query units, each representing a query condition. We believe that this method captures the semantic relevance of the search results. The paper first introduces an algorithm reformulating a keyword query to a QWS. Then, a scoring method is presented which measures the relevance of search results according to how many and how well the query conditions are matched. The scoring method is also extended to clusters of search results. Experimental results verify the effectiveness of our methods.  相似文献   

7.
Keyword search is the most popular technique of searching information from XML (eXtensible markup language) document. It enables users to easily access XML data without learning the structure query language or studying the complex data schemas. Existing traditional keyword query methods are mainly based on LCA (lowest common ancestor) semantics, in which the returned results match all keywords at the granularity of elements. In many practical applications, information is often uncertain and vague. As a result, how to identify useful information from fuzzy data is becoming an important research topic. In this paper, we focus on the issue of keyword querying on fuzzy XML data at the granularity of objects. By introducing the concept of “object tree”, we propose the query semantics for keyword query at object-level. We find the minimum whole matching result object trees which contain all keywords and the partial matching result object trees which contain partial keywords, and return the root nodes of these result object trees as query results. For effectively and accurately identifying the top-K answers with the highest scores, we propose a score mechanism with the consideration of tf*idf document relevance, users’ preference and possibilities of results. We propose a stack-based algorithm named object-stack to obtain the top-K answers with the highest scores. Experimental results show that the object-stack algorithm outperforms the traditional XML keyword query algorithms significantly, and it can get high quality of query results with high search efficiency on the fuzzy XML document.  相似文献   

8.
基于Servlet的搜索引擎   总被引:1,自引:1,他引:0  
张文 《软件》2011,32(2):75-77
基于Servlet技术和数据结构中的哈希映射,以构建索引表的方式对网页关键字进行组织。根据客户端提供的关键字对索引表分析,得到搜索结果。由于搜索过程是访问缓存,因而有较高的搜索效率,在中小型服务器中可以广泛采用此技术作为站内搜索引擎,对于大中型服务器可以提供广域网web搜索服务。  相似文献   

9.
Keyword query processing over graph structured data is beneficial across various real world applications. The basic unit, of search and retrieval, in keyword search over graph, is a structure (interconnection of nodes) that connects all the query keywords. This new answering paradigm, in contrast to single web page results given by search engines, brings forth new challenges for ranking. In this paper, we propose a simple but effective Fuzzy set theory based Ranking measure, called FRank. Fuzzy sets acknowledge the contribution of each individual query keyword, discretely, to enumerate node relevance. A novel aggregation operator is defined, to combine the content relevance based fuzzy sets and, compute query dependent edge weights. The final rank, of an answer, is computed by non-monotonic addition of edge weights, as per their relevance to keyword query. FRank evaluates each answer based on the distribution of query keywords and structural connectivity between those keywords. An extensive empirical analysis shows superior performance by our proposed ranking measure as compared to the ranking measures adopted by current approaches in the literature.  相似文献   

10.
As probabilistic data management is becoming one of the main research focuses and keyword search is turning into a more popular query means, it is natural to think how to support keyword queries on probabilistic XML data. With regards to keyword query on deterministic XML documents, ELCA (Exclusive Lowest Common Ancestor) semantics allows more relevant fragments rooted at the ELCAs to appear as results and is more popular compared with other keyword query result semantics (such as SLCAs). In this paper, we investigate how to evaluate ELCA results for keyword queries on probabilistic XML documents. After defining probabilistic ELCA semantics in terms of possible world semantics, we propose an approach to compute ELCA probabilities without generating possible worlds. Then we develop an efficient stack-based algorithm that can find all probabilistic ELCA results and their ELCA probabilities for a given keyword query on a probabilistic XML document. Finally, we experimentally evaluate the proposed ELCA algorithm and compare it with its SLCA counterpart in aspects of result probability, time and space efficiency, and scalability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号