首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Various techniques have been developed for different query types in content-based image retrieval systems such as sampling queries, constrained sampling queries, multiple constrained sampling queries, k-NN queries, constrained k-NN queries, and multiple localized k-NN queries. In this paper, we propose a generalized query model suitable for expressing queries of different types, and investigate efficient processing techniques for this new framework. We exploit sequential access and data sharing by developing new storage and query processing techniques to leverage inter-query concurrency. Our experimental results, based on the Corel dataset, indicate that the proposed optimization can significantly reduce average response time in a multiuser environment, and achieve better retrieval precision and recall compared to two recent techniques.
Ning YuEmail:
  相似文献   

2.
We consider a system where users wish to find similar users. To model similarity, we assume the existence of a set of queries, and two users are deemed similar if their answers to these queries are (mostly) identical. Technically, each user has a vector of preferences (answers to queries), and two users are similar if their preference vectors differ in only a few coordinates. The preferences are unknown to the system initially, and the goal of the algorithm is to classify the users into classes of roughly the same preferences by asking each user to answer the least possible number of queries. We prove nearly matching lower and upper bounds on the maximal number of queries required to solve the problem. Specifically, we present an “anytime” algorithm that asks each user at most one query in each round, while maintaining a partition of the users. The quality of the partition improves over time: for n users and time T, groups of [(O)\tilde](n/T)\tilde{O}(n/T) users with the same preferences will be separated (with high probability) if they differ in sufficiently many queries. We present a lower bound that matches the upper bound, up to a constant factor, for nearly all possible distances between user groups.  相似文献   

3.
A central topic in query learning is to determine which classes of Boolean formulas are efficiently learnable with membership and equivalence queries. We consider the class kconsisting of conjunctions ofkunate DNF formulas. This class generalizes the class ofk-clause CNF formulas and the class of unate DNF formulas, both of which are known to be learnable in polynomial time with membership and equivalence queries. We prove that 2can be properly learned with a polynomial number of polynomial-size membership and equivalence queries, but can be properly learned in polynomial time with such queries if and only if P=NP. Thus the barrier to properly learning 2with membership and equivalence queries is computational rather than informational. Few results of this type are known. In our proofs, we use recent results of Hellersteinet al.(1997,J. Assoc. Comput. Mach.43(5), 840–862), characterizing the classes that are polynomial-query learnable, together with work of Bshouty on the monotone dimension of Boolean functions. We extend some of our results to kand pose open questions on learning DNF formulas of small monotone dimension. We also prove structural results for k. We construct, for any fixedk2, a class of functionsfthat cannot be represented by any formula in k, but which cannot be “easily” shown to have this property. More precisely, for any functionfonnvariables in the class, the value offon any polynomial-size set of points in its domain is not a witness thatfcannot be represented by a formula in k. Our construction is based on BCH codes.  相似文献   

4.
Abstract

A database interface language and system, called Metaform, which automatically generates multi-relational form screen interfaces for use by non-computer professionals has been developed. A form screen is a subset of the relational database, with a particular relation or combination of relations being represented. Through form screens, users can simultaneously query and update several relations in the database without having to know about its underlying structure. An overview of the Metaform system is presented and several examples of the use of the Metaform query language and update operators are described.

A series of ‘usability’ studies were conducted on a prototype of the Metaform system to examine the claims that the form concept aids computer-naive users in building complex database queries. These studies adopted the form screen concept to present six office paper work analogies to users to help them to understand the database retrieval concepts. The analogies of a file cabinet, a file folder, a stack of forms, a single form, a table of information on a form and a field of information were used in a two-staged training module.

At the end of each training sequence, users answered questions with the prototype and with paper and pencil which tapped their understanding of the database retrievals they were learning to perform. The results from these questionnaires were mixed. Users performed successful relational queries for simple retrievals and for those using existential quantifiers. They had difficulty with queries involving multiple steps and intermediate stages. Although users understood and used the analogies, they ran into difficulties with the ambiguities in the English statements of the queries, thus suggesting a need for another level of metaphors and/or problem representation tools not associated with the machine but with the user's comprehension of database retrieval problems.  相似文献   

5.
In spite of significant improvements in video data retrieval, a system has not yet been developed that can adequately respond to a user’s query. Typically, the user has to refine the query many times and view query results until eventually the expected videos are retrieved from the database. The complexity of video data and questionable query structuring by the user aggravates the retrieval process. Most previous research in this area has focused on retrieval based on low-level features. Managing imprecise queries using semantic (high-level) content is no easier than queries based on low-level features due to the absence of a proper continuous distance function. We provide a method to help users search for clips and videos of interest in video databases. The video clips are classified as interesting and uninteresting based on user browsing. The attribute values of clips are classified by commonality, presence, and frequency within each of the two groups to be used in computing the relevance of each clip to the user’s query. In this paper, we provide an intelligent query structuring system, called I-Quest, to rank clips based on user browsing feedback, where a template generation from the set of interesting and uninteresting sets is impossible or yields poor results.
Ramazan Savaş Aygün (Corresponding author)Email:
  相似文献   

6.
Given a multidimensional point q, a reverse k nearest neighbor (RkNN) query retrieves all the data points that have q as one of their k nearest neighbors. Existing methods for processing such queries have at least one of the following deficiencies: they (i) do not support arbitrary values of k, (ii) cannot deal efficiently with database updates, (iii) are applicable only to 2D data but not to higher dimensionality, and (iv) retrieve only approximate results. Motivated by these shortcomings, we develop algorithms for exact RkNN processing with arbitrary values of k on dynamic, multidimensional datasets. Our methods utilize a conventional data-partitioning index on the dataset and do not require any pre-computation. As a second step, we extend the proposed techniques to continuous RkNN search, which returns the RkNN results for every point on a line segment. We evaluate the effectiveness of our algorithms with extensive experiments using both real and synthetic datasets.  相似文献   

7.
We describe a linear-time algorithm for computing the likelihood that a completion joining two contour fragments passes through any given position and orientation in the image plane. Our algorithm is a resolution-pyramid-based method for solving a partial differential equation (PDE) characterizing a distribution of short, smooth completion shapes. The PDE consists of a set of independent advection equations in (x, y) coupled in the θ dimension by the diffusion equation. A previously described algorithm used a first-order, explicit finite difference scheme implemented on a rectangular grid. This algorithm required O(n3m) time for a grid of size n×n with m discrete orientations. Unfortunately, systematic error in solving the advection equations produced unwanted anisotropic smoothing in the (x, y) dimension. This resulted in visible artifacts in the completion fields. The amount of error and its dependence on θ have been previously characterized. We observe that by careful addition of extra spatial smoothing, the error can be made totally isotropic. The combined effect of this error and of intrinsic smoothness due to diffusion in the θ dimension is that the solution becomes smoother with increasing time, i.e., the high spatial frequencies drop out. By increasing Δx and Δt on a regular schedule, and using a second-order, implicit scheme for the diffusion term, it is possible to solve the modified PDE in O(n2m) time, i.e., time linear in the problem size. Using current hardware and for problems of typical size, this means that a solution which previously took 1 h to compute can now be computed in about 2 min.  相似文献   

8.
Combinatorial property testing deals with the following relaxation of decision problems: Given a fixed property and an input x, one wants to decide whether x satisfies the property or is “far” from satisfying it. The main focus of property testing is in identifying large families of properties that can be tested with a certain number of queries to the input. In this paper we study the relation between the space complexity of a language and its query complexity. Our main result is that for any space complexity s(n) ≤ log n there is a language with space complexity O(s(n)) and query complexity 2Ω(s(n)). Our result has implications with respect to testing languages accepted by certain restricted machines. Alon et al. [FOCS 1999] have shown that any regular language is testable with a constant number of queries. It is well known that any language in space o(log log n) is regular, thus implying that such languages can be so tested. It was previously known that there are languages in space O(log n) that are not testable with a constant number of queries and Newman [FOCS 2000] raised the question of closing the exponential gap between these two results. A special case of our main result resolves this problem as it implies that there is a language in space O(log log n) that is not testable with a constant number of queries. It was also previously known that the class of testable properties cannot be extended to all context-free languages. We further show that one cannot even extend the family of testable languages to the class of languages accepted by single counter machines.   相似文献   

9.
We investigate the diameter problem in the streaming and sliding-window models. We show that, for a stream of nn points or a sliding window of size nn, any exact algorithm for diameter requires W(n)\Omega(n) bits of space. We present a simple e\epsilon-approximation algorithm for computing the diameter in the streaming model. Our main result is an e\epsilon-approximation algorithm that maintains the diameter in two dimensions in the sliding-window model using O((1/e3/2) log3n(logR+loglogn + log(1/e)))O(({1}/{\epsilon^{3/2}}) \log^{3}n(\log R+\log\log n + \log ({1}/{\epsilon}))) bits of space, where RR is the maximum, over all windows, of the ratio of the diameter to the minimum non-zero distance between any two points in the window.  相似文献   

10.
Content based image retrieval is an active area of research. Many approaches have been proposed to retrieve images based on matching of some features derived from the image content. Color is an important feature of image content. The problem with many traditional matching-based retrieval methods is that the search time for retrieving similar images for a given query image increases linearly with the size of the image database. We present an efficient color indexing scheme for similarity-based retrieval which has a search time that increases logarithmically with the database size.In our approach, the color features are extracted automatically using a color clustering algorithm. Then the cluster centroids are used as representatives of the images in 3-dimensional color space and are indexed using a spatial indexing method that usesR-tree. The worst case search time complexity of this approach isOn q log(N* navg)), whereN is the number of images in the database, andn q andn avg are the number of colors in the query image and the average number of colors per image in the database respectively. We present the experimental results for the proposed approach on two databases consisting of 337 Trademark images and 200 Flag images.  相似文献   

11.
In this paper we propose two new multilayer grid models for VLSI layout, both of which take into account the number of contact cuts used. For the first model in which nodes exist only on one layer, we prove a tight area × (number of contact cuts) = (n 2) tradeoff for embeddingn-node planar graphs of bounded degree in two layers. For the second model in which nodes exist simultaneously on all layers, we give a number of upper bounds on the area needed to embed groups using no contact cuts. We show that anyn-node graph of thickness 2 can be embedded on two layers inO(n 2) area. This bound is tight even if more layers and any number of contact cuts are allowed. We also show that planar graphs of bounded degree can be embedded on two layers inO(n 3/2(logn)2) area.Some of our embedding algorithms have the additional property that they can respect prespecified grid placements of the nodes of the graph to be embedded. We give an algorithm for embeddingn-node graphs of thicknessk ink layers usingO(n 3) area, using no contact cuts, and respecting prespecified node placements. This area is asymptotically optimal for placement-respecting algorithms, even if more layers are allowed, as long as a fixed fraction of the edges do not use contact cuts. Our results use a new result on embedding graphs in a single-layer grid, namely an embedding ofn-node planar graphs such that each edge makes at most four turns, and all nodes are embedded on the same line.The first author's research was partially supported by NSF Grant No. MCS 820-5167.  相似文献   

12.
Making queries to a database system through a computer application can become a repetitive and time-consuming task for those users who generally make similar queries to get the information they need to work with. We believe that interface agents could help these users by personalizing the query-making and information retrieval tasks. Interface agents are characterized by their ability to learn users' interests in a given domain and to help them by making suggestions or by executing tasks on their behalf. Having this purpose in mind we have developed an agent, named QueryGuesser, to assist users of computer applications in which retrieving information from a database is a key task. This agent observes a user's behavior while he is working with the database and builds the user's profile. Then, QueryGuesser uses this profile to suggest the execution of queries according to the user's habits and interests, and to provide the user information relevant to him by making time-demanding queries in advance or by monitoring the events and operations occurring in the database system. In this way, the interaction between database users and databases becomes personalized while it is enhanced.  相似文献   

13.
In this paper, we present a novel resource brokering service for grid systems which considers authorization policies of the grid nodes in the process of selecting the resources to be assigned to a request. We argue such an integration is needed to avoid scheduling requests onto resources the policies of which do not authorize their execution. Our service, implemented in Globus as a part of Monitoring and Discovery Service (MDS), is based on the concept of fine-grained access control (FGAC) which enables participating grid nodes to specify fine-grained policies concerning the conditions under which grid clients can access their resources. Since the process of evaluating authorization policies, in addition to checking the resource requirements, can be a potential bottleneck for a large scale grid, we also analyze the problem of the efficient evaluation of FGAC policies. In this context, we present GroupByRule, a novel method for policy organization and compare its performance with other strategies.
E. BertinoEmail:
  相似文献   

14.
We consider the problem where π is an unknown permutation on {0,1,…,2n−1}, y0{0,1,…,2n−1}, and the goal is to determine the minimum r>0 such that πr(y0)=1. Information about π is available only via queries that yield πx(y) from any x{0,1,…,2m−1} and y{0,1,…,2n−1} (where m is polynomial in n). The main resource under consideration is the number of these queries. We show that the number of queries necessary to solve the problem in the classical probabilistic bounded-error model is exponential in n. This contrasts sharply with the quantum bounded-error model, where a constant number of queries suffices.  相似文献   

15.
This paper addresses the scheduling problem in decentralized grid systems. Such problem focuses on computing a large set of arbitrary tasks to optimize the system performance while minimizing the average system costs. The mainstream solution flourished in recent literatures is to maximize the total system throughput by modeling such systems in either a network flow or a tree. However, most of them neglect the movements of tasks and load-dependent system costs which, in fact, are crucial to the system performance in real situations. In this paper, a Service-Oriented Overlay Network (SOON) is presented, in which the service nodes encapsulate both computation and communication resources and the links are used to track the movements of tasks instead of describing communication. An analytical Cost-Charge (C2) model, in which both running cost and service charge are dependent on load, is proposed to describe the problem by incorporating degree-dependent task allocation into a closed queuing network model. The Infinitesimal Perturbation Analysis (IPA) is applied to solve C2 theoretically. Following the theoretical analysis, a scalable decentralized scheduler named Liana (the movements of tasks in the proposed system like the growth and spread of evergreen liana, so we use Liana to name the proposed scheduler) is proposed. The major components of Liana are an autonomous scheduling algorithm and a Degree-Driven Protocol (DDP). Furthermore, trace based simulations on the test bed distributed widely across the world are implemented to compare the system performance by Liana with recent approaches. The proposed approach shows promising results that the close-to-optimal service utilization is achieved when taking system cost into account.
Chun-Qing LiEmail:
  相似文献   

16.
Yun  Tae-Seob  Whang  Kyu-Young  Kwon  Hyuk-Yoon  Kim  Jun-Sung  Song  Il-Yeol 《World Wide Web》2019,22(6):2437-2467

We propose two-dimensional indexing—a novel in-memory indexing architecture that operates over distributed memory of a massively-parallel search engine. The goal of two-dimensional indexing is to provide a one-integrated-memory view as in a single node system using one large integrated memory. In two-dimensional indexing, we partition the entire index into n× m fragments and distribute them over the memories of multiple nodes in such a way that each fragment is entirely stored in main memory of one node. The proposed architecture is not only scalable as it uses a scaled-out shared-nothing architecture but also is capable of achieving low query response time as it processes queries in main memory. We also propose the concept of the one-memory point, which is the amount of the memory space required to completely store the entire index in main memory providing a one-integrated-memory view. We first prove the effectiveness of two-dimensional indexing with single-keyword queries, and then, extend the notion so as to be able to handle multiple-keyword queries. To handle multiple-keyword queries, we adopt pre-join that materializes a multiple-keyword query a priori as well as a new notion of semi-memory join that obviates extensive communication overhead to perform join across multiple nodes. In experiments using the real-life search query set over a database consisting of 100 million Web documents crawled, we show that two-dimensional indexing can effectively provide a one-integrated-memory view without too much of additional memory compared with the single node system using one large integrated memory. We also show that, with a six-node prototype, in an ideal case, it significantly improves the query processing performance over a disk-based search engine with an equivalent amount of in-memory buffer but without two-dimensional indexing — by up to 535.54 times. This improvement is expected to get larger as the system is scaled-out with a larger number of machines.

  相似文献   

17.
We consider the XPath evaluation problem: Evaluate an XPath query Q on a streaming XML document D. We consider two versions of the problem: 1) Filtering Problem: Determine if there is a match for Q in D. 2) Node Selection Problem: Determine the set Q(D) of document nodes selected by Q. We consider Conjunctive XPath (CXPath) queries that involve only the child and descendant axes. Let d denote the depth of D, and n denote the number of location steps in Q. Bar-Yossef et al. (2007, 2005) [6] and [7] presented lower bounds on the memory space required by any algorithm to solve these two problems. Their lower bounds apply to each query in a large subset of XPath, and are obtained (mostly) using nonrecursive(Q,D). In this paper, we present larger lower bounds for a different class of queries (namely, CXPath queries with independent predicates), on recursive(Q,D). One of our results is an Ω(nmaxcands(Q,D)) lower bound for the node selection problem, for a worst-case Q; maxcands(Q,D) is the maximum number of nodes of D that can be candidates for output, at any one instant. So, there is no algorithm for the node selection problem that uses O(f(d,|Q|)+maxcands(Q,D)) space, for any function f. This shows that some previously published algorithms are incorrect.  相似文献   

18.
This paper investigates the optimization problem when executing a join in a distributed database environment. The minimization of the communication cost for sending data through links has been adopted as an optimization criterion. We explore in this paper the approach of judiciously using join operations as reducers in distributed query processing. In general, this problem is computationally intractable. A restriction of the execution of a join in a pre-defined combinatorial order leads to a possible solution in polynomial time. An algorithm for a chain query computation has been proposed in [21]. The time complexity of the algorithm isO(m 2 n 2+m 3 n), wheren is the number of sites in the network, andm is the number of relations (fragments) involved in the join. In this paper, we firstly present a proof of the intuitively well understood fact—that the eigenorder of a chain join will be the best pre-defined combinatorial order to implement the algorithm in [21]. Secondly, we show a sufficient and necessary condition for a chain query with the eigenordering to be a simple query. For the process of the class of simple queries, we show a significant reduction of the time complexity fromO(m 2 n 2+m 3 n) toO(mn+m 2). It is encouraging that, in practice, the most frequent queries belong to the category of simple queries. Editor: Peter Apers  相似文献   

19.
In mobile ad hoc peer-to-peer (M-P2P) networks, economic models become a necessity for enticing non-cooperative mobile peers to provide service. M-P2P users may issue queries with varying constraints on query response time, data quality of results and trustworthiness of the data source. Hence, we propose ConQuer, which is an economic incentive model for the efficient processing of constraint queries in M-P2P networks. ConQuer also provides incentives for peer collaboration in order to improve data availability. The main contributions of ConQuer are three-fold. First, it uses a broker-based economic M-P2P model for processing constraint queries via a Vickrey auction mechanism. Second, it proposes the CR*-tree, a dynamic multidimensional R-tree-based index for constraints of data quality, trust and price of data to determine target peers efficiently. The CR*-tree is hosted by brokers, who can sell it to other peers, thereby encouraging the creation of multiple copies of the index for facilitating routing. Third, it provides incentives for peers to form collaborative peer groups for maximizing data availability and revenues by mutually allocating and deallocating data items using royalty-based revenue-sharing. Such reallocations facilitate better data quality, thereby further increasing peer revenues. Our performance study shows that ConQuer is indeed effective in answering constraint queries with improved response time, success rate and data quality, and querying hop-counts.
Masaru KitsuregawaEmail:
  相似文献   

20.
Damaschke  Peter 《Machine Learning》2000,41(2):197-215
We study the complexity of learning arbitrary Boolean functions of n variables by membership queries, if at most r variables are relevant. Problems of this type have important applications in fault searching, e.g. logical circuit testing and generalized group testing. Previous literature concentrates on special classes of such Boolean functions and considers only adaptive strategies. First we give a straightforward adaptive algorithm using O(r2 r log n) queries, but actually, most queries are asked nonadaptively. This leads to the problem of purely nonadaptive learning. We give a graph-theoretic characterization of nonadaptive learning families, called r-wise bipartite connected families. By the probabilistic method we show the existence of such families of size O(r2 r log n + r 22 r ). This implies that nonadaptive attribute-efficient learning is not essentially more expensive than adaptive learning. We also sketch an explicit pseudopolynomial construction, though with a slightly worse bound. It uses the common derandomization technique of small-biased k-independent sample spaces. For the special case r = 2, we get roughly 2.275 log n adaptive queries, which is fairly close to the obvious lower bound of 2 log n. For the class of monotone functions, we prove that the optimal query number O(2 r + r log n) can be already achieved in O(r) stages. On the other hand, (2 r log n) is a lower bound on nonadaptive queries.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号