首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 984 毫秒
1.
The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user’s query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k retrieval problem in the framework of facility location analysis and prove the submodularity of that objective function which provides a theoretical approximation guarantee of factor 1?\(\frac{1}{e}\) for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first obtains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently.  相似文献   

2.
Recently, Reverse k Nearest Neighbors (RkNN) queries, returning every answer for which the query is one of its k nearest neighbors, have been extensively studied on the database research community. But the RkNN query cannot retrieve spatio-textual objects which are described by their spatial location and a set of keywords. Therefore, researchers proposed a RSTkNN query to find these objects, taking both spatial and textual similarity into consideration. However, the RSTkNN query cannot control the size of answer set and to be sorted according to the degree of influence on the query. In this paper, we propose a new problem Ranked Reverse Boolean Spatial Keyword Nearest Neighbors query called Ranked-RBSKNN query, which considers both spatial similarity and textual relevance, and returns t answers with most degree of influence. We propose a separate index and a hybrid index to process such queries efficiently. Experimental results on different real-world and synthetic datasets show that our approaches achieve better performance.  相似文献   

3.
The advancement of World Wide Web has revolutionized the way the manufacturers can do business. The manufacturers can collect customer preferences for products and product features from their sales and other product-related Web sites to enter and sustain in the global market. For example, the manufactures can make intelligent use of these customer preference data to decide on which products should be selected for targeted marketing. However, the selected products must attract as many customers as possible to increase the possibility of selling more than their respective competitors. This paper addresses this kind of product selection problem. That is, given a database of existing products P from the competitors, a set of company’s own products Q, a dataset C of customer preferences and a positive integer k, we want to find k-most promising products (k-MPP) from Q with maximum expected number of total customers for targeted marketing. We model k-MPP query and propose an algorithmic framework for processing such query and its variants. Our framework utilizes grid-based data partitioning scheme and parallel computing techniques to realize k-MPP query. The effectiveness and efficiency of the framework are demonstrated by conducting extensive experiments with real and synthetic datasets.  相似文献   

4.
With the popularization of wireless networks and mobile intelligent terminals, mobile crowd sensing is becoming a promising sensing paradigm. Tasks are assigned to users with mobile devices, which then collect and submit ambient information to the server. The composition of participants greatly determines the quality and cost of the collected information. This paper aims to select fewest participants to achieve the quality required by a sensing task. The requirement namely “t-sweep k-coverage” means for a target location, every t time interval should at least k participants sense. The participant selection problem for “t-sweep k-coverage” crowd sensing tasks is NP-hard. Through delicate matrix stacking, linear programming can be adopted to solve the problem when it is in small size. We further propose a participant selection method based on greedy strategy. The two methods are evaluated through simulated experiments using users’ call detail records. The results show that for small problems, both the two methods can find a participant set meeting the requirement. The number of participants picked by the greedy based method is roughly twice of the linear programming based method. However, when problems become larger, the linear programming based method performs unstably, while the greedy based method can still output a reasonable solution.  相似文献   

5.
Finding k nearest neighbor objects in spatial databases is a fundamental problem in many geospatial systems and the direction is one of the key features of a spatial object. Moreover, the recent tremendous growth of sensor technologies in mobile devices produces an enormous amount of spatio-directional (i.e., spatially and directionally encoded) objects such as photos. Therefore, an efficient and proper utilization of the direction feature is a new challenge. Inspired by this issue and the traditional k nearest neighbor search problem, we devise a new type of query, called the direction-constrained k nearest neighbor (DCkNN) query. The DCkNN query finds k nearest neighbors from the location of the query such that the direction of each neighbor is in a certain range from the direction of the query. We develop a new index structure called MULTI, to efficiently answer the DCkNN query with two novel index access algorithms based on the cost analysis. Furthermore, our problem and solution can be generalized to deal with spatio-circulant dimensional (such as a direction and circulant periods of time such as an hour, a day, and a week) objects. Experimental results show that our proposed index structure and access algorithms outperform two adapted algorithms from existing kNN algorithms.  相似文献   

6.
Representative skyline computation is a fundamental issue in database area, which has attracted much attention in recent years. A notable definition of representative skyline is the distance-based representative skyline (DBRS). Given an integer k, a DBRS includes k representative skyline points that aims at minimizing the maximal distance between a non-representative skyline point and its nearest representative. In the 2D space, the state-of-the-art algorithm to compute the DBRS is based on dynamic programming (DP) which takes O(k m 2) time complexity, where m is the number of skyline points. Clearly, such a DP-based algorithm cannot be used for handling large scale datasets due to the quadratic time cost. To overcome this problem, in this paper, we propose a new approximate algorithm called ARS, and a new exact algorithm named PSRS, based on a carefully-designed parametric search technique. We show that the ARS algorithm can guarantee a solution that is at most ?? larger than the optimal solution. The proposed ARS and PSRS algorithms run in O(klog2mlog(T/??)) and O(k 2 log3m) time respectively, where T is no more than the maximal distance between any two skyline points. We also propose an improved exact algorithm, called PSRS+, based on an effective lower and upper bounding technique. We conduct extensive experimental studies over both synthetic and real-world datasets, and the results demonstrate the efficiency and effectiveness of the proposed algorithms.  相似文献   

7.
Local search algorithms based on the Configuration Checking (CC) strategy have been shown to be efficient in solving satisfiable random k-SAT instances. The purpose of the CC strategy is to avoid the cycling problem, which corresponds to revisiting already flipped variables too soon. It is done by considering the neighborhood of the formula variables. In this paper, we propose to improve the CC strategy on the basis of an empirical study of a powerful algorithm using this strategy. The first improvement introduces a new and simple criterion, which refines the selection of the variables to flip for the 3-SAT instances. The second improvement is achieved by using the powerful local search algorithm Novelty with the adaptive noise setting. This algorithm enhances the efficiency of the intensification and diversification phases when solving k-SAT instances with k ≥ 4. We name the resulting local search algorithm Ncca+ and show its effectiveness when treating satisfiable random k-SAT instances issued from the SAT Challenge 2012. Ncca+ won the bronze medal of the random SAT track of the SAT Competition 2013.  相似文献   

8.
The problem of kNN (k Nearest Neighbor) queries has received considerable attention in the database and information retrieval communities. Given a dataset D and a kNN query q, the k nearest neighbor algorithm finds the closest k data points to q. The applications of kNN queries are board, not only in spatio-temporal databases but also in many areas. For example, they can be used in multimedia databases, data mining, scientific databases and video retrieval. The past studies of kNN query processing did not consider the case that the server may receive multiple kNN queries at one time. Their algorithms process queries independently. Thus, the server will be busy with continuously reaccessing the database to obtain the data that have already been acquired. This results in wasting I/O costs and degrading the performance of the whole system. In this paper, we focus on this problem and propose an algorithm named COrrelated kNN query Evaluation (COKE). The main idea of COKE is an “information sharing” strategy whereby the server reuses the query results of previously executed queries for efficiently processing subsequent queries. We conduct a comprehensive set of experiments to analyze the performance of COKE and compare it with the Best-First Search (BFS) algorithm. Empirical studies indicate that COKE outperforms BFS, and achieves lower I/O costs and less running time.  相似文献   

9.
Nowadays, location-based services (LBS) are facilitating people in daily life through answering LBS queries. However, privacy issues including location privacy and query privacy arise at the same time. Existing works for protecting query privacy either work on trusted servers or fail to provide sufficient privacy guarantee. This paper combines the concepts of differential privacy and k-anonymity to propose the notion of differentially private k-anonymity (DPkA) for query privacy in LBS. We recognize the sufficient and necessary condition for the availability of 0-DPkA and present how to achieve it. For cases where 0-DPkA is not achievable, we propose an algorithm to achieve ??-DPkA with minimized ??. Extensive simulations are conducted to validate the proposed mechanisms based on real-life datasets and synthetic data distributions.  相似文献   

10.
We examine the orthographic-n-point problem (OnP), which extends the perspective-n-point problem to telecentric cameras. Given a set of 3D points and their corresponding 2D points under orthographic projection, the OnP problem is the determination of the pose of the 3D point cloud with respect to the telecentric camera. We show that the OnP problem is equivalent to the unbalanced orthogonal Procrustes problem for non-coplanar 3D points and to the sub-Stiefel Procrustes problem for coplanar 3D points. To solve the OnP problem, we apply existing algorithms for the respective Procrustes problems and also propose novel algorithms. Furthermore, we evaluate the algorithms to determine their robustness and speed and conclude which algorithms are preferable in real applications. Finally, we evaluate which algorithm is most suitable as a minimal solver in a RANSAC scheme.  相似文献   

11.
We study the strategies in feature selection with sparse support vector machine (SVM). Recently, the socalled L p -SVM (0 < p < 1) has attracted much attention because it can encourage better sparsity than the widely used L 1-SVM. However, L p -SVM is a non-convex and non-Lipschitz optimization problem. Solving this problem numerically is challenging. In this paper, we reformulate the L p -SVM into an optimization model with linear objective function and smooth constraints (LOSC-SVM) so that it can be solved by numerical methods for smooth constrained optimization. Our numerical experiments on artificial datasets show that LOSC-SVM (0 < p < 1) can improve the classification performance in both feature selection and classification by choosing a suitable parameter p. We also apply it to some real-life datasets and experimental results show that it is superior to L 1-SVM.  相似文献   

12.
Travel planning and location recommendation are increasingly important in recent years. In this light, we propose and study a novel aggregate location recommendation query (ALRQ) of discovering aggregate locations for multiple travelers and planning the corresponding travel routes in dynamic transportation networks. Assuming the scenario that multiple travelers target the same destination, given a set of travelers’ locations Q, a set of potential aggregate location O, and a departure time t, the ALRQ finds an aggregate location oO that has the minimum global travel time \({\sum }_{q \in Q} T(q,o,t)\), where T(q,o,t) is the travel time between o and q with departure time t. The ALRQ problem is challenging due to three reasons: (1) how to model the dynamic transportation networks practically, and (2) how to compute ALRQ efficiently. We take two types of dynamic transportation networks into account, and we define a pair of upper and lower bounds to prune the search space effectively. Moreover, a heuristic scheduling strategy is adopted to schedule multiple query sources. Finally, we conducted extensive experiments on real and synthetic spatial data to verify the performance of the developed algorithms.  相似文献   

13.
A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-hard problems on real networks efficiently, like maximal clique finding. In many real-world applications, networks change over time. As a result, it is essential to develop efficient incremental algorithms for dynamic graph data. In this paper, we propose a suite of incremental k-core decomposition algorithms for dynamic graph data. These algorithms locate a small subgraph that is guaranteed to contain the list of vertices whose maximum k-core values have changed and efficiently process this subgraph to update the k-core decomposition. We present incremental algorithms for both insertion and deletion operations, and propose auxiliary vertex state maintenance techniques that can further accelerate these operations. Our results show a significant reduction in runtime compared to non-incremental alternatives. We illustrate the efficiency of our algorithms on different types of real and synthetic graphs, at varying scales. For a graph of 16 million vertices, we observe relative throughputs reaching a million times, relative to the non-incremental algorithms.  相似文献   

14.
Every rectilinear Steiner tree problem admits an optimal tree T * which is composed of tree stars. Moreover, the currently fastest algorithms for the rectilinear Steiner tree problem proceed by composing an optimum tree T * from tree star components in the cheapest way. The efficiency of such algorithms depends heavily on the number of tree stars (candidate components). Fößmeier and Kaufmann (Algorithmica 26, 68–99, 2000) showed that any problem instance with k terminals has a number of tree stars in between 1.32 k and 1.38 k (modulo polynomial factors) in the worst case. We determine the exact bound O *(ρ k ) where ρ≈1.357 and mention some consequences of this result.  相似文献   

15.
The top-k query on uncertain data set has been a very hot topic these years, and there have been many studies on uncertain top-k queries. Unfortunately, most of the existing algorithms only consider centralized processing environments, and they are not suitable for the large-scale data. In this paper, it is the first attempt to process probabilistic threshold top-k queries (an important uncertain top-k query, PT-k for short) in a distributed environment. We propose 3 efficient algorithms. The serial distributed approach adopts a new method, which only requires a few amount of calculations, to serially process PT-k queries in distributed environments. The global sorting first algorithm for PT-k query processing (GSP) is designed for improving the computation speed. In GSP, a distributed sorting operation is performed, and then we compute the candidates for PT-k queries in parallel. The query results can be computed by using a novel incremental method which can reduce the number of calculations. The local filtering first algorithm for PT-k query processing is designed for reducing the network overhead. Specifically, several filtering strategies are proposed to filter out redundant data locally, and then the incremental method in GSP is used to process the PT-k queries. Finally, the effectiveness of our proposed algorithms is verified through a series of experiments.  相似文献   

16.
We introduce the novel concept of knowledge states. The knowledge state approach can be used to construct competitive randomized online algorithms and study the trade-off between competitiveness and memory. Many well-known algorithms can be viewed as knowledge state algorithms. A knowledge state consists of a distribution of states for the algorithm, together with a work function which approximates the conditional obligations of the adversary. When a knowledge state algorithm receives a request, it then calculates one or more “subsequent” knowledge states, together with a probability of transition to each. The algorithm uses randomization to select one of those subsequents to be the new knowledge state. We apply this method to randomized k-paging. The optimal minimum competitiveness of any randomized online algorithm for the k-paging problem is the kth harmonic number, \(H_{k}=\sum^{k}_{i=1}\frac{1}{i}\). Existing algorithms which achieve that optimal competitiveness must keep bookmarks, i.e., memory of the names of pages not in the cache. An H k -competitive randomized algorithm for that problem which uses O(k) bookmarks is presented, settling an open question by Borodin and El-Yaniv. In the special cases where k=2 and k=3, solutions are given using only one and two bookmarks, respectively.  相似文献   

17.
We study the k-level uncapacitated facility location problem (k-level UFL) in which clients need to be connected with paths crossing open facilities of k types (levels). In this paper we first propose an approximation algorithm that for any constant k, in polynomial time, delivers solutions of cost at most α k times OPT, where α k is an increasing function of k, with \(\lim _{k\to \infty } \alpha _{k} = 3\). Our algorithm rounds a fractional solution to an extended LP formulation of the problem. The rounding builds upon the technique of iteratively rounding fractional solutions on trees (Garg, Konjevod, and Ravi SODA’98) originally used for the group Steiner tree problem. We improve the approximation ratio for k-level UFL for all k ≥ 3, in particular we obtain the ratio equal 2.02, 2.14, and 2.24 for k = 3,4, and 5.  相似文献   

18.
An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding an optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies of a discrete Markov decision process. The k best policies, k?>?1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policies problem by using this reduction requires unreasonable amounts of time even when k?=?3. We then provide two new algorithms. The first is a complete algorithm, based on our theoretical contribution that the k-th best policy differs from the i-th policy, for some i?k, on exactly one state. The second is an approximate algorithm that skips many less useful policies. We show that both algorithms have good scalability. We also show that the approximate algorithms runs much faster and finds interesting, high-quality policies.  相似文献   

19.
Why-not and why questions can be posed by database users to seek clarifications on unexpected query results. Specifically, why-not questions aim to explain why certain expected tuples are absent from the query results, while why questions try to clarify why certain unexpected tuples are present in the query results. This paper systematically explores the why-not and why questions on reverse top-k queries, owing to its importance in multi-criteria decision making. We first formalize why-not questions on reverse top-k queries, which try to include the missing objects in the reverse top-k query results, and then, we propose a unified framework called WQRTQ to answer why-not questions on reverse top-k queries. Our framework offers three solutions to cater for different application scenarios. Furthermore, we study why questions on reverse top-k queries, which aim to exclude the undesirable objects from the reverse top-k query results, and extend the framework WQRTQ to efficiently answer why questions on reverse top-k queries, which demonstrates the flexibility of our proposed algorithms. Extensive experimental evaluation with both real and synthetic data sets verifies the effectiveness and efficiency of the presented algorithms under various experimental settings.  相似文献   

20.
The problem of anonymization in large networks and the utility of released data are considered in this paper. Although there are some anonymization methods for networks, most of them cannot be applied in large networks because of their complexity. In this paper, we devise a simple and efficient algorithm for k-degree anonymity in large networks. Our algorithm constructs a k-degree anonymous network by the minimum number of edge modifications. We compare our algorithm with other well-known k-degree anonymous algorithms and demonstrate that information loss in real networks is lowered. Moreover, we consider the edge relevance in order to improve the data utility on anonymized networks. By considering the neighbourhood centrality score of each edge, we preserve the most important edges of the network, reducing the information loss and increasing the data utility. An evaluation of clustering processes is performed on our algorithm, proving that edge neighbourhood centrality increases data utility. Lastly, we apply our algorithm to different large real datasets and demonstrate their efficiency and practical utility.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号