首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper is concerned with the problem of finding a hypothesis in consistent with given positive and negative examples. The hypothesis class consists of all sets of at most two tree patterns and represents the class of unions of at most two tree pattern languages. Especially, we consider the problem from the point of view of the consistency problem for . The consistency problem is a problem for deciding whether there exists a consistent hypothesis with given positive and negative examples within some fixed hypothesis space. Efficient solvability of that problem is closely related to the possibility of efficient machine learning or machine discovery. Unfortunately, however, the consistency problem is known to be NP-complete for many hypothesis spaces. In this paper, the problem for the class is also shown to be NP-complete. In order to overcome this computational hardness, we try to use additional information obtained by making queries. First, we give an algorithm that, using restricted subset queries, solves the consistency problem for in time polynomial in the total size of given positive and negative examples. Next, we show that each subset query made by the algorithm can be replaced by several membership queries under some condition on a set of function symbols. As a result, we have that the consistency problem for is solved in polynomial time using membership queries. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

2.
Finding a dominating set of minimum cardinality is an NP-hard graph problem, even when the graph is bipartite. In this paper we are interested in solving the problem on graphs having a large independent set. Given a graph G with an independent set of size z, we show that the problem can be solved in time O(2nz), where n is the number of vertices of G. As a consequence, our algorithm is able to solve the dominating set problem on bipartite graphs in time O(2n/2). Another implication is an algorithm for general graphs whose running time is O(n1.7088).  相似文献   

3.
4.
Searching XML data using keyword queries has attracted much attention because it enables Web users to easily access XML data without having to learn a structured query language or study possibly complex data schemas. Most of the current approaches identify the meaningful results of a given keyword query based on the semantics of lowest common ancestor (LCA) and its variants. However, given the fact that LCA candidates are usually numerous and of low relevance to the users?? information need, how to effectively and efficiently identify the most relevant results from a large number of LCA candidates is still a challenging and unresolved issue. In this article, we introduce a novel semantics of relevant results based on mutual information between the query keywords. Then, we introduce a novel approach for identifying the relevant answers of a given query by adopting skyline semantics. We also recommend three different ranking criteria for selecting the top-k relevant results of the query. Efficient algorithms are proposed which rely on some provable properties of the dominance relationship between result candidates to rapidly identify the top-k dominant results. Extensive experiments were conducted to evaluate our approach and the results show that the proposed approach has a good performance compared with other existing approaches in different data sets and evaluation metrics  相似文献   

5.
In this paper, we focus on efficient construction of restricted subtree (RSubtree) results for XML keyword queries on amulticore system. We firstly show that the performance bottlenecks for existing methods lie in 1) computing the set of relevant keyword nodes (RKNs) for each subtree root node, 2) constructing the corresponding RSubtree, and 3) parallel execution. We then propose a two-step generic top-down subtree construction algorithm, which computes SLCA/ELCA nodes in the first step, and parallelly gets RKNs and generates RSubtree results in the second step, where genericmeans that 1) our method can be used to compute different kinds of subtree results, 2) our method is independent of the query semantics; top-down means that our method constructs each RSubtree by visiting nodes of the subtree constructed based on an RKN set level-by-level from left to right, such that to avoid visiting as many useless nodes as possible. The experimental results show that our method is much more efficient than existing ones according to various metrics.  相似文献   

6.
7.
In this research, we analyze the relationship among (1) the performance metrics of a sponsored search campaign, (2) the gender orientation of queries, and (3) the occurrence of branded terms in queries. The aim of this research is to investigate the effectiveness of increased personalization of search engine advertising in order to improve the consumer’s online experience. We segregate keyphrases from a dataset covering thirty-three consecutive months from a major US retailer consisting of 7 million daily records of a real time keyword advertising campaign into three gender categories (male, female and neutral) each with two groups (branded and unbranded) term usage. Using ANOVA, we analyze the effect of gender and brand keyphrases on critical sponsored search performance metrics of impressions, clicks, cost-per-clicks, sales revenue, orders, items purchased and return on advertising. Research findings show that the combination of brand focus with the gender-orientation of keyphrases is a significant factor in predicting sponsored search performance and behavior. There are statistically significant variations in consumer behavior as measured by sponsored search metrics among the gender categories. Specifically, females are more attracted to the use of branded terms than males, perhaps due to the trust and customer loyalty generated by brand image. Our results establishes that positive brand reputation creates dramatic influence on consumer’s loyalty over the brand and hence strongly affects their interests, activities and purchasing behavior in e-commerce environment.  相似文献   

8.
Recent advances in graphics workstations allow the development of improved visualization tools for algorithm and program development. Algorithm visualization permits better analysis, development, and presentation of the algorithm characteristics. In this paper, we present a simple algorithm visualization technique using tree graphs. The technique is applied to the visualization of three sorting algorithms: the bubble sort, the quicksort, and the merge and sort, and one matrix algorithm, the Gaussian elimination. Key states of the data are displayed on the nodes, while the graph itself represents the underlying structure of the algorithm. All graphics are displayed under the X Window environment using simple graphics and window programming techniques.  相似文献   

9.
Keyword search over XML data has attracted a lot of research efforts in the last decade, where one of the fundamental research problems is how to efficiently answer a given keyword query w.r.t. a certain query semantics. We found that the key factor resulting in the inefficiency for existing methods is that they all heavily suffer from the common-ancestor-repetition problem. In this paper, we propose a novel form of inverted list, namely the IDList; the IDList for keyword $k$ consists of ordered nodes that directly or indirectly contain $k$ . We then show that finding keyword query results based on the smallest lowest common ancestor and exclusive lowest common ancestor semantics can be reduced to ordered set intersection problem, which has been heavily optimized due to its application in areas such as information retrieval and database systems. We propose several algorithms that exploit set intersection in different directions and with or without using additional indexes. We further propose several algorithms that are based on hash search to simplify the operation of finding common nodes from all involved IDLists. We have conducted an extensive set of experiments using many state-of-the-art algorithms and several large-scale datasets. The results demonstrate that our proposed methods outperform existing methods by up to two orders of magnitude in many cases.  相似文献   

10.
We consider the followingset intersection reporting problem. We have a collection of initially empty sets and would like to process an intermixed sequence ofn updates (insertions into and deletions from individual sets) andq queries (reporting the intersection of two sets). We cast this problem in thearithmetic model of computation of Fredman [F1] and Yao [Ya2] and show that any algorithm that fits in this model must take time (q+nq) to process a sequence ofn updates andq queries, ignoring factors that are polynomial in logn. We also show that this bound is tight in this model of computation, again to within a polynomial in logn factor, improving upon a result of Yellin [Ye]. Furthermore, we consider the caseq=O(n) with an additional space restriction. We only allow the use ofm memory locations, wherem n3/2. We show a tight bound of (n2/m1/3) for a sequence ofn operations, again ignoring the polynomial in logn factors.  相似文献   

11.
Searchable symmetric encryption (SSE) has been introduced for secure outsourcing the encrypted database to cloud storage, while maintaining searchable features. Of various SSE schemes, most of them assume the server is honest but curious, while the server may be trustless in the real world. Considering a malicious server not honestly performing the queries, verifiable SSE (VSSE) schemes are constructed to ensure the verifiability of the search results. However, existing VSSE constructions only focus on single-keyword search or incur heavy computational cost during verification. To address this challenge, we present an efficient VSSE scheme, built on OXT protocol (Cash et al., CRYPTO 2013), for conjunctive keyword queries with sublinear search overhead. The proposed VSSE scheme is based on a privacy-preserving hash-based accumulator, by leveraging a well-established cryptographic primitive, Symmetric Hidden Vector Encryption (SHVE). Our VSSE scheme enables both correctness and completeness verifiability for the result without pairing operations, thus greatly reducing the computational cost in the verification process. Besides, the proposed VSSE scheme can still provide a proof when the search result is empty. Finally, the security analysis and experimental evaluation are given to demonstrate the security and practicality of the proposed scheme.  相似文献   

12.
姚全珠  余训滨 《计算机应用》2012,32(4):1090-1093
针对目前XML关键字查询结果中包含了许多无意义的节点的问题,提出了一种语义相关的查询算法。由于XML文档具有半结构化和自描述的特点,通过充分利用节点间的语义相关性,提出了最小最低实体子树(SLEST)的概念,在这个概念中,关键字之间仅存在物理连接关系;为了捕获关键字之间的IDREF引用关系,提出基于最小相关实体子树(SIEST)的算法,并利用最小最低实体子树和最小相关实体子树代替最小最低公共祖先(SLCA)作为查询结果。实验结果表明,提出的算法能有效提高XML关键字查询结果的查准率。  相似文献   

13.
Keyword search is the most popular technique for querying large tree-structured datasets, often of unknown structure, in the web. Recent keyword search approaches return lowest common ancestors (LCAs) of the keyword matches ranked with respect to their relevance to the keyword query. A major challenge of a ranking approach is the efficiency of its algorithms as the number of keywords and the size and complexity of the data increase. To face this challenge most of the known approaches restrict their ranking to a subset of the LCAs (e.g., SLCAs, ELCAs), missing relevant results.In this work, we design novel top-k-size stack-based algorithms on tree-structured data. Our algorithms implement ranking semantics for keyword queries which is based on the concept of LCA size. Similar to metric selection in information retrieval, LCA size reflects the proximity of keyword matches in the data tree. This semantics does not rank a predefined subset of LCAs and through a layered presentation of results, it demonstrates improved effectiveness compared to previous relevant approaches. To address performance challenges our algorithms exploit a lattice of the partitions of the keyword set, which empowers a linear time performance. This result is obtained without the support of auxiliary precomputed data structures. An extensive experimental study on various and large datasets confirms the theoretical analysis. The results show that, in contrast to other approaches, our algorithms scale smoothly when the size of the dataset and the number of keywords increase.  相似文献   

14.
Tree pattern matching is a fundamental problem that has a wide range of applications in Web data management, XML processing, and selective data dissemination. In this paper we develop efficient algorithms for the tree homeomorphism problem, i.e., the problem of matching a tree pattern with exclusively transitive (descendant) edges. We first prove that deciding whether there is a tree homeomorphism is LOGSPACE-complete, improving on the current LOGCFL upper bound. Furthermore, we develop a practical algorithm for the tree homeomorphism decision problem that is both space- and time-efficient. The algorithm is in LOGDCFL and space consumption is strongly bounded, while the running time is linear in the size of the data tree. This algorithm immediately generalizes to the problem of matching the tree pattern against all subtrees of the data tree, preserving the mentioned efficiency properties.  相似文献   

15.
In order to understand user intents behind their queries, many researchers study similar query finding. Recently, the click graph has shown its utility in describing the relationship between queries and URLs. The previous approaches mainly either generate related terms or find relevant queries based on the co-clicked URLs. However, these approaches may suffer from the complexity of natural language processing and click-through data sparseness. In this paper, we tackle this problem through three query probability distribution representation models: Click Model, Term Model, and Semantic Model. The Click Model extracts credible transition probability from queries to URLs, and describes a query without considering web contents. The Term Model focuses on representing a query via term distribution over its main entities and purposes, which can better capture information needs behind short and ambiguous keyword queries. The Semantic Model learns potential intent distribution of queries to distinguish user intents behind a query. Among the three models, we apply pairwise similarity metrics and graph-based personalized pagerank to find similar queries. Compared to traditional representation models, our representation models are verified to be effective and efficient, especially for long tail queries.  相似文献   

16.
k步可达查询用于在给定的有向无环图(DAG)中回答两点之间是否存在长度不超过k的路径。针对现有方法的索引规模大、查询处理效率低的问题,提出一种基于部分点的双向最短路径索引来提升索引的可达信息覆盖率,并提出一组优化规则来减小索引规模;然后提出基于简化图的正反互逆拓扑索引来加速回答不可达查询;最后提出远距离优先的双向遍历策略来提高查询处理的效率。基于21个真实数据集(如引用网络、社交网络等)的实验结果表明,相比已有的高效方法PLL及BFSI-B,所提出的算法具有更小的索引规模和更快的查询响应速度。  相似文献   

17.
World Wide Web - Due to the popularity of Spatial Databases, many search engine providers have started to expand their text searching capability to include geographical information. Because of this...  相似文献   

18.
Providing top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated.  相似文献   

19.
Semantic query optimization, or knowledge-based query optimization, has received increasing interest in recent years. The authors provide an effective and systematic approach to optimizing queries by appropriately choosing semantically equivalent transformations. Basically, there are two different types of transformations: transformations by eliminating unnecessary joins, and transformations by adding/eliminating redundant beneficial/nonbeneficial selection operations (restrictions). A necessary and sufficient condition to eliminate a single unnecessary join is provided. We prove that it is 𝒩𝒫-𝒞omplete to eliminate as many unnecessary joins as possible for various types of acyclic queries with the exception of the closure chain queries whose query graphs are chains and all equi-join attributes are distinct. An algorithm is provided to minimize the number of joins in tree queries. This algorithm has an important property that, when applied to a closure chain query, it will yield an optimal solution with the time complexity O(n*m), where n is the number of relations referenced in the chain query, and m is the time complexity of a restriction closure computation  相似文献   

20.
A top-k spatial keyword query returns k objects having the highest (or lowest) scores with regard to spatial proximity as well as text relevancy. Approaches for answering top-k spatial keyword queries can be classified into two categories: the separate index approach and the hybrid index approach. The separate index approach maintains the spatial index and the text index independently and can accommodate new data types. However, it is difficult to support top-k pruning and merging efficiently at the same time since it requires two different orders for clustering the objects: the first based on scores for top-k pruning and the second based on object IDs for efficient merging. In this paper, we propose a new separate index method called Rank-Aware Separate Index Method (RASIM) for top-k spatial keyword queries. RASIM supports both top-k pruning and efficient merging at the same time by clustering each separate index in two different orders through the partitioning technique. Specifically, RASIM partitions the set of objects in each index into rank-aware (RA) groups that contain the objects with similar scores and applies the first order to these groups according to their scores and the second order to the objects within each group according to their object IDs. Based on the RA groups, we propose two query processing algorithms: (i) External Threshold Algorithm (External TA) that supports top-k pruning in the unit of RA groups and (ii) Generalized External TA that enhances the performance of External TA by exploiting special properties of the RA groups. RASIM is the first research work that supports top-k pruning based on the separate index approach. Naturally, it keeps the advantages of the separate index approach. In addition, in terms of storage and query processing time, RASIM is more efficient than the IR-tree method, which is the prevailing method to support top-k pruning to date and is based on the hybrid index approach. Experimental results show that, compared with the IR-tree method, the index size of RASIM is reduced by up to 1.85 times, and the query performance is improved by up to 3.22 times.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号