首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 960 毫秒
1.
Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.Received: 23 December 2003, Accepted: 31 March 2004, Published online: 12 August 2004Edited by: S. AbiteboulExtended version of the paper published in the Proceedings of the 29th International Conference on Very Large Databases, VLDB 2003, Berlin, Germany, pp 754-765  相似文献   

2.
We consider the problem of efficiently computing distributed geographical k-NN queries in an unstructured peer-to-peer (P2P) system, in which each peer is managed by an individual organization and can only communicate with its logical neighboring peers. Such queries are based on local filter query statistics, and require as less communication cost as possible which makes it more difficult than the existing distributed k-NN queries. Especially, we hope to reduce candidate peers and degrade communication cost. In this paper, we propose an efficient pruning technique to minimize the number of candidate peers to be processed to answer the k-NN queries. Our approach is especially suitable for continuous k-NN queries when updating peers, including changing ranges of peers, dynamically leaving or joining peers, and updating data in a peer. In addition, simulation results show that the proposed approach outperforms the existing Minimum Bounding Rectangle (MBR)-based query approaches, especially for continuous queries.  相似文献   

3.
In spite of significant improvements in video data retrieval, a system has not yet been developed that can adequately respond to a user’s query. Typically, the user has to refine the query many times and view query results until eventually the expected videos are retrieved from the database. The complexity of video data and questionable query structuring by the user aggravates the retrieval process. Most previous research in this area has focused on retrieval based on low-level features. Managing imprecise queries using semantic (high-level) content is no easier than queries based on low-level features due to the absence of a proper continuous distance function. We provide a method to help users search for clips and videos of interest in video databases. The video clips are classified as interesting and uninteresting based on user browsing. The attribute values of clips are classified by commonality, presence, and frequency within each of the two groups to be used in computing the relevance of each clip to the user’s query. In this paper, we provide an intelligent query structuring system, called I-Quest, to rank clips based on user browsing feedback, where a template generation from the set of interesting and uninteresting sets is impossible or yields poor results.
Ramazan Savaş Aygün (Corresponding author)Email:
  相似文献   

4.
The problem of kNN (k Nearest Neighbor) queries has received considerable attention in the database and information retrieval communities. Given a dataset D and a kNN query q, the k nearest neighbor algorithm finds the closest k data points to q. The applications of kNN queries are board, not only in spatio-temporal databases but also in many areas. For example, they can be used in multimedia databases, data mining, scientific databases and video retrieval. The past studies of kNN query processing did not consider the case that the server may receive multiple kNN queries at one time. Their algorithms process queries independently. Thus, the server will be busy with continuously reaccessing the database to obtain the data that have already been acquired. This results in wasting I/O costs and degrading the performance of the whole system. In this paper, we focus on this problem and propose an algorithm named COrrelated kNN query Evaluation (COKE). The main idea of COKE is an “information sharing” strategy whereby the server reuses the query results of previously executed queries for efficiently processing subsequent queries. We conduct a comprehensive set of experiments to analyze the performance of COKE and compare it with the Best-First Search (BFS) algorithm. Empirical studies indicate that COKE outperforms BFS, and achieves lower I/O costs and less running time.  相似文献   

5.
The single-server queuing system with finite buffer was considered. The customers may arrive one-by-one or in batches. Arrivals of single customers and their batches obey the Markov input processes. The customers from a batch taken for servicing come one at a time at the exponentially distributed time intervals. The numbers of customers in batches are distributed geometrically. The time of customer servicing has a phase-type distribution. The numbers of batches and single customers that may be simultaneously accepted by the system are controllable parameters. The joint distribution of the number of batches and the number of customers in system, loss probabilities, distribution of the time of batch sojourn, and problems of optimization were analyzed.  相似文献   

6.
In the List H- Homomorphism Problem, for a graph H that is a parameter of the problem, an instance consists of an undirected graph G with a list constraint \({L(v) \subseteq V(H)}\) for each variable \({v \in V(G)}\), and the objective is to determine whether there is a list H-homomorphism \({f:V(G) \to V(H)}\), that is, \({f(v) \in L(v)}\) for every \({v \in V(G)}\) and \({(f(u),f(v)) \in E(H)}\) whenever \({(u,v) \in E(G)}\).We consider the problem of testing list H-homomorphisms in the following weighted setting: An instance consists of an undirected graph G, list constraints L, weights imposed on the vertices of G, and a map \({f:V(G) \to V(H)}\) given as an oracle access. The objective is to determine whether f is a list H-homomorphism or far from any list H-homomorphism. The farness is measured by the total weight of vertices \({v \in V(G)}\) for which f(v) must be changed so as to make f a list H-homomorphism. In this paper, we classify graphs H with respect to the number of queries to f required to test the list H-homomorphisms. Specifically, we show that (i) list H-homomorphisms are testable with a constant number of queries if and only if H is a reflexive complete graph or an irreflexive complete bipartite graph and (ii) list H-homomorphisms are testable with a sublinear number of queries if and only if H is a bi-arc graph.  相似文献   

7.
Given a graph with a source and a sink node, the NP-hard maximum k-splittable s,t-flow (M k SF) problem is to find a flow of maximum value from s to t with a flow decomposition using at most k paths. The multicommodity variant of this problem is a natural generalization of disjoint paths and unsplittable flow problems. Constructing a k-splittable flow requires two interdepending decisions. One has to decide on k paths (routing) and on the flow values for the paths (packing). We give efficient algorithms for computing exact and approximate solutions by decoupling the two decisions into a first packing step and a second routing step. Usually the routing is considered before the packing. Our main contributions are as follows: (i) We show that for constant k a polynomial number of packing alternatives containing at least one packing used by an optimal M k SF solution can be constructed in polynomial time. If k is part of the input, we obtain a slightly weaker result. In this case we can guarantee that, for any fixed ε>0, the computed set of alternatives contains a packing used by a (1−ε)-approximate solution. The latter result is based on the observation that (1−ε)-approximate flows only require constantly many different flow values. We believe that this observation is of interest in its own right. (ii) Based on (i), we prove that, for constant k, the M k SF problem can be solved in polynomial time on graphs of bounded treewidth. If k is part of the input, this problem is still NP-hard and we present a polynomial time approximation scheme for it.  相似文献   

8.
The text searching paradigm still prevails even when users are looking for image data for example in the Internet. Searching for images mostly means searching on basis of annotations that have been made manually. When annotations are left empty, which is usually the case, searches on image file names are performed. This may lead to surprising retrieval results. The graphical search paradigm, searching image data by querying graphically, either with an image or with a sketch, currently seems not to be the preferred method partly because of the complexity in designing the query.In this paper we present our PictureFinder system, which currently supports “full image retrieval” in analogy to full text retrieval. PictureFinder allows graphical queries for the image the user has in his mind by sketching colored and/or textured regions or by whole images (query by example). By adjusting the search tolerances for each region and image feature (i.e. hue, saturation, lightness, texture pattern and coverage) the user can tune his query either to find images matching his sketch or images which differing from the specified colors and/or textures to a certain degree. To compare colors we propose a color distance measure that takes into account the fact that different colors spread differently in the color space, and which take into account that the position of a region in an image may be important.Furthermore, we show our query by example approach. Based on the example image chosen by the user, a graphical query is generated automatically and presented to the user. One major advantage of this approach is the possibility to change and adjust a query by example in the same way as a query which was sketched by the user. By deleting unimportant regions and by adjusting the tolerances of the remaining regions the user may focus on image details which are important to him.  相似文献   

9.
In most of the auction systems the values of bids are known to the auctioneer. This allows him to manipulate the outcome of the auction. Hence, one might be interested in hiding these values. Some cryptographically secure protocols for electronic auctions have been presented in the last decade. Our work extends these protocols in several ways. On the basis of garbled circuits, i.e., encrypted circuits, we present protocols for sealed-bid auctions that fulfill the following requirements: 1) protocols are information-theoretically t-private for honest but curious parties; 2) the number of bits that can be learned by malicious adversaries is bounded by the output length of the auction; 3) the computational requirements for participating parties are very low: only random bit choices and bitwise computation of the XOR-function are necessary. Note that one can distinguish between the protocol that generates a garbled circuit for an auction and the protocol to evaluate the auction. In this paper we address both problems. We will present a t-private protocol for the construction of a garbled circuit that reaches the lower bound of 2t + 1 parties, and Finally, we address the problem of bid changes in an auction. a more randomness efficient protocol for (t + 1)^2 parties  相似文献   

10.
The top-k query on uncertain data set has been a very hot topic these years, and there have been many studies on uncertain top-k queries. Unfortunately, most of the existing algorithms only consider centralized processing environments, and they are not suitable for the large-scale data. In this paper, it is the first attempt to process probabilistic threshold top-k queries (an important uncertain top-k query, PT-k for short) in a distributed environment. We propose 3 efficient algorithms. The serial distributed approach adopts a new method, which only requires a few amount of calculations, to serially process PT-k queries in distributed environments. The global sorting first algorithm for PT-k query processing (GSP) is designed for improving the computation speed. In GSP, a distributed sorting operation is performed, and then we compute the candidates for PT-k queries in parallel. The query results can be computed by using a novel incremental method which can reduce the number of calculations. The local filtering first algorithm for PT-k query processing is designed for reducing the network overhead. Specifically, several filtering strategies are proposed to filter out redundant data locally, and then the incremental method in GSP is used to process the PT-k queries. Finally, the effectiveness of our proposed algorithms is verified through a series of experiments.  相似文献   

11.
With the popularization of wireless networks and mobile intelligent terminals, mobile crowd sensing is becoming a promising sensing paradigm. Tasks are assigned to users with mobile devices, which then collect and submit ambient information to the server. The composition of participants greatly determines the quality and cost of the collected information. This paper aims to select fewest participants to achieve the quality required by a sensing task. The requirement namely “t-sweep k-coverage” means for a target location, every t time interval should at least k participants sense. The participant selection problem for “t-sweep k-coverage” crowd sensing tasks is NP-hard. Through delicate matrix stacking, linear programming can be adopted to solve the problem when it is in small size. We further propose a participant selection method based on greedy strategy. The two methods are evaluated through simulated experiments using users’ call detail records. The results show that for small problems, both the two methods can find a participant set meeting the requirement. The number of participants picked by the greedy based method is roughly twice of the linear programming based method. However, when problems become larger, the linear programming based method performs unstably, while the greedy based method can still output a reasonable solution.  相似文献   

12.
In this paper, we present a Question Answering system based on redundancy and a Passage Retrieval method that is specifically oriented to Question Answering. We suppose that in a large enough document collection the answer to a given question may appear in several different forms. Therefore, it is possible to find one or more sentences that contain the answer and that also include tokens from the original question. The Passage Retrieval engine is almost language-independent since it is based on n-gram structures. Question classification and answer extraction modules are based on shallow patterns.  相似文献   

13.
We propose a non-iterative solution to the PnP problem—the estimation of the pose of a calibrated camera from n 3D-to-2D point correspondences—whose computational complexity grows linearly with n. This is in contrast to state-of-the-art methods that are O(n 5) or even O(n 8), without being more accurate. Our method is applicable for all n≥4 and handles properly both planar and non-planar configurations. Our central idea is to express the n 3D points as a weighted sum of four virtual control points. The problem then reduces to estimating the coordinates of these control points in the camera referential, which can be done in O(n) time by expressing these coordinates as weighted sum of the eigenvectors of a 12×12 matrix and solving a small constant number of quadratic equations to pick the right weights. Furthermore, if maximal precision is required, the output of the closed-form solution can be used to initialize a Gauss-Newton scheme, which improves accuracy with negligible amount of additional time. The advantages of our method are demonstrated by thorough testing on both synthetic and real-data.  相似文献   

14.
A multiserver queuing system with an unlimited buffer is considered. The customer arrival is described by the Markov arrival process. The service time has a phase-type distribution. The service may occur with errors. The service strategy is as follows. If the number of busy servers is higher than a certain threshold value at the moment of the customer arrival, then the customer is copied, and the copies are serviced by all the free servers. If the number of busy servers is not higher than this threshold at the moment of the customer arrival, then the customer is serviced by one server. Stationary distributions of the number of customers and the residence time within the system are obtained. The threshold’s influence on the main parameters of the system’s productivity is numerically investigated.  相似文献   

15.
A particular class of incomplete factorizations is proposed as preconditioners for the linear system Ax = b where A is a symmetric, large and sparse matrix. The ILDL T< (p) factorization (p = 1,2,3, …) determines the density of the lower triangular matrix L selecting the p largest off-diagonal entries of each column during the Gaussian elimination process. This selection may be computationally expensive, but the effectiveness of the preconditioner allows us to choose very low-density factors to reduce both work time and storage requirements. This incomplete factorization can be performed reliably on H-matrices. When A is a positive definite matrix, but not an H-matrix, one can perform an incomplete factorization if positive off-diagonal entries are removed or reduced and diagonally compensated. Numerical results for a variety of problems and comparisons with other incomplete factorizations are presented. Received: August 2002 / Accepted: December 2002 RID="*" ID="*"This work was supported by the Spanish grant BFM 2001-2641.  相似文献   

16.
A new approach to domain-specific reasoning is presented that is based on a type-theoretic logical framework(LF) but does not require the user to be an expert in type theory. The concepts of the domain and its related reasoning systems are formalized in LF, but the user works with the system through a syntax and interface appropriate to his/her work. A middle layer provides translation between the user syntax and LF, and allows additional support for reasoning(e.g., model checking). Thus, the complexity of the logical framework is hidden but the benefits of using type theory and its related tools are retained, such as precision and machine-checkable proofs. This approach is investigated through a number of case studies: here, the authors consider the verification of properties of concurrency. The authors have formalized a specification language (CCS) and logic (μ-calculus) in LF, together with useful lemmas, and a user-oriented syntax has been designed. The authors demonstrate the approach with simple examples. However, applying lemmas to objects introduced by the user may result in framework-level objects which cannot be translated back to the user level. The authors discuss this problem, define a notion of adequacy, and prove that in this case study, translation can always be reversed.  相似文献   

17.
18.
Providing top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated.  相似文献   

19.
Existing spatiotemporal indexes suffer from either large update cost or poor query performance, except for the B x -tree (the state-of-the-art), which consists of multiple B +-trees indexing the 1D values transformed from the (multi-dimensional) moving objects based on a space filling curve (Hilbert, in particular). This curve, however, does not consider object velocities, and as a result, query processing with a B x -tree retrieves a large number of false hits, which seriously compromises its efficiency. It is natural to wonder “can we obtain better performance by capturing also the velocity information, using a Hilbert curve of a higher dimensionality?”. This paper provides a positive answer by developing the B dual -tree, a novel spatiotemporal access method leveraging pure relational methodology. We show, with theoretical evidence, that the B dual -tree indeed outperforms the B x -tree in most circum- stances. Furthermore, our technique can effectively answer progressive spatiotemporal queries, which are poorly supported by B x -trees.  相似文献   

20.
We consider a distributed system where each node keeps a local count for items (similar to elections where nodes are ballot boxes and items are candidates). A top-k query in such a system asks which are the k items whose global count, across all nodes in the system, is the largest. In this paper, we present a Monte Carlo algorithm that outputs, with high probability, a set of k candidates which approximates the top-k items. The algorithm is motivated by sensor networks in that it focuses on reducing the individual communication complexity. In contrast to previous algorithms, the communication complexity depends only on the global scores and not on the partition of scores among nodes. If the number of nodes is large, our algorithm dramatically reduces the communication complexity when compared with deterministic algorithms. We show that the complexity of our algorithm is close to a lower bound on the cell-probe complexity of any non-interactive top-k approximation algorithm. We show that for some natural global distributions (such as the Geometric or Zipf distributions), our algorithm needs only polylogarithmic number of communication bits per node. An extended abstract of this paper appeared in Proc. 13th Int. Colloquium on Structural Information and Communication Complexity, SIROCCO 2006, Lecture Notes in Computer Science 4056, pp. 319–333.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号