首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Consider a random k-conjunctive normal form Fk(n, rn) with n variables and rn clauses. We prove that if the probability that the formula Fk(n, rn) is satisfiable tends to 0 as n→∞, then r ? 2.83, 8.09, 18.91, 40.81, and 84.87, for k = 3, 4, 5, 6, and 7, respectively.  相似文献   

2.
Nowadays, many real-world problems are encoded into SAT instances and efficiently solved by modern SAT solvers. These solvers, usually known as Conflict-Driven Clause Learning (CDCL) SAT solvers, include a variety of sophisticated techniques, such as clause learning, lazy data structures, conflict-based adaptive branching heuristics, or random restarts, among others. However, the reasons of their efficiency in solving real-world, or industrial, SAT instances are still unknown. The common wisdom in the SAT community is that these technique exploit some hidden structure of real-world problems.In this thesis, we characterize some important features of the underlying structure of industrial SAT instances. Namely, they are the community structure and the self-similar structure. We observe that most industrial SAT formulas, viewed as graphs, have these two properties. This means that (i) in a graph with a clear community structure, i.e. having high modularity, we can find a partition of its nodes into communities such that most edges connect nodes of the same community; and (ii) in a graph with a self-similar pattern, i.e. being fractal, its shape is kept after re-scalings, i.e., grouping sets of nodes into a single node. We also analyze how these structures are affected by the effects of CDCL techniques during the search.Using the previous structural studies, we propose three applications. First, we face the problem of generating pseudo-industrial random SAT instances using the notion of modularity. Our model generates instances similar to (classical) random SAT formulas when the modularity is low, but when this value is high, our model is also adequate to model realistic pseudo-industrial problems. Second, we propose a method based on the community structure of the instance to detect relevant learnt clauses. Our technique augments the original instance with this set of relevant clauses, and this results into an overall improvement of the efficiency of several state-of-the-art CDCL SAT solvers. Finally, we analyze the classification of industrial SAT instances into families using the previously analyzed structure features, and we compare them to other classifiers commonly used in portfolio SAT approaches.In summary, this dissertation extends the understandings of the structure of SAT instances, with the aim of better explaining the success of CDCL techniques and possibly improve them, and propose a number of applications based on this analysis of the underlying structure of SAT formulas.  相似文献   

3.
The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user’s query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k retrieval problem in the framework of facility location analysis and prove the submodularity of that objective function which provides a theoretical approximation guarantee of factor 1?\(\frac{1}{e}\) for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first obtains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently.  相似文献   

4.
The problem of kNN (k Nearest Neighbor) queries has received considerable attention in the database and information retrieval communities. Given a dataset D and a kNN query q, the k nearest neighbor algorithm finds the closest k data points to q. The applications of kNN queries are board, not only in spatio-temporal databases but also in many areas. For example, they can be used in multimedia databases, data mining, scientific databases and video retrieval. The past studies of kNN query processing did not consider the case that the server may receive multiple kNN queries at one time. Their algorithms process queries independently. Thus, the server will be busy with continuously reaccessing the database to obtain the data that have already been acquired. This results in wasting I/O costs and degrading the performance of the whole system. In this paper, we focus on this problem and propose an algorithm named COrrelated kNN query Evaluation (COKE). The main idea of COKE is an “information sharing” strategy whereby the server reuses the query results of previously executed queries for efficiently processing subsequent queries. We conduct a comprehensive set of experiments to analyze the performance of COKE and compare it with the Best-First Search (BFS) algorithm. Empirical studies indicate that COKE outperforms BFS, and achieves lower I/O costs and less running time.  相似文献   

5.
Maximal clique enumeration is a fundamental problem in graph theory and has been extensively studied. However, maximal clique enumeration is time-consuming in large graphs and always returns enormous cliques with large overlaps. Motivated by this, in this paper, we study the diversified top-k clique search problem which is to find top-k cliques that can cover most number of nodes in the graph. Diversified top-k clique search can be widely used in a lot of applications including community search, motif discovery, and anomaly detection in large graphs. A naive solution for diversified top-k clique search is to keep all maximal cliques in memory and then find k of them that cover most nodes in the graph by using the approximate greedy max k-cover algorithm. However, such a solution is impractical when the graph is large. In this paper, instead of keeping all maximal cliques in memory, we devise an algorithm to maintain k candidates in the process of maximal clique enumeration. Our algorithm has limited memory footprint and can achieve a guaranteed approximation ratio. We also introduce a novel light-weight \(\mathsf {PNP}\)-\(\mathsf {Index}\), based on which we design an optimal maximal clique maintenance algorithm. We further explore three optimization strategies to avoid enumerating all maximal cliques and thus largely reduce the computational cost. Besides, for the massive input graph, we develop an I/O efficient algorithm to tackle the problem when the input graph cannot fit in main memory. We conduct extensive performance studies on real graphs and synthetic graphs. One of the real graphs contains 1.02 billion edges. The results demonstrate the high efficiency and effectiveness of our approach.  相似文献   

6.
We study two natural extensions of Constraint Satisfaction Problems (CSPs). Balance-Max-CSP requires that in any feasible assignment each element in the domain is used an equal number of times. An instance of Hard-Max-CSP consists of soft constraints and hard constraints, and the goal is to maximize the weight of satisfied soft constraints while satisfying all the hard constraints. These two extensions contain many fundamental problems not captured by CSPs, and challenge traditional theories about CSPs in a more general framework. Max-2-SAT and Max-Horn-SAT are the only two nontrivial classes of Boolean CSPs that admit a robust satisfibiality algorithm, i.e., an algorithm that finds an assignment satisfying at least (1 ? g(ε)) fraction of constraints given a (1 ? ε)-satisfiable instance, where g(ε) → 0 as ε → 0, and g(0) = 0. We prove the inapproximability of these problems with balance or hard constraints, showing that each variant changes the nature of the problems significantly (in different ways). For instance, deciding whether an instance of 2-SAT admits a balanced assignment is NP-hard, and for Max-2-SAT with hard constraints, it is hard to find a constant-factor approximation even on (1 ? ε)-satisfiable instances (in particular, the version with hard constraints does not admit a robust satisfiability algorithm). We also study hardness results for a certain CSP over a larger domain capturing ordering constraints: we show that hard constraints rule out constant-factor approximation algorithms. All our hardness results are almost optimal — they completely rule out algorithms with certain properties, or can be matched by simple extensions to existing algorithms.  相似文献   

7.
Stochastic local search (SLS) is a popular paradigm in incomplete solving for the Boolean satisfiability problem (SAT). Most SLS solvers for SAT switch between two modes, i.e., the greedy (intensification) mode and the diversification mode. However, the performance of these two-mode SLS algorithms lags far behind on solving random 3-satisfiability (3-SAT) problem, which is a significant special case of the SAT problem. In this paper, we propose a new hybrid scoring function called M C based on a linear combination of a greedy property m a k e and a diversification property C o n f T i m e s, and then utilize M C to develop a new two-mode SLS solver called CCMC. To evaluate the performance of CCMC, we conduct extensive experiments to compare CCMC with five state-of-the-art two-mode SLS solvers (i.e., Sparrow2011, Sattime2011, EagleUP, gNovelty+PCL and CCASat) on a broad range of random 3-SAT instances, including all large 3-SAT ones from SAT Competition 2009 and SAT Competition 2011 as well as 200 generated satisfiable huge random 3-SAT ones. The experiments illustrate that CCMC obviously outperforms its competitors, indicating the effectiveness of CCMC. We also analyze the effectiveness of the underlying ideas in CCMC and further improve the performance of CCMC on solving random 5-SAT instances.  相似文献   

8.
9.
A co-location pattern is a set of spatial features whose instances frequently appear in a spatial neighborhood. This paper efficiently mines the top-k probabilistic prevalent co-locations over spatially uncertain data sets and makes the following contributions: 1) the concept of the top-k probabilistic prevalent co-locations based on a possible world model is defined; 2) a framework for discovering the top-k probabilistic prevalent co-locations is set up; 3) a matrix method is proposed to improve the computation of the prevalence probability of a top-k candidate, and two pruning rules of the matrix block are given to accelerate the search for exact solutions; 4) a polynomial matrix is developed to further speed up the top-k candidate refinement process; 5) an approximate algorithm with compensation factor is introduced so that relatively large quantity of data can be processed quickly. The efficiency of our proposed algorithms as well as the accuracy of the approximation algorithms is evaluated with an extensive set of experiments using both synthetic and real uncertain data sets.  相似文献   

10.
Uncertain graph has been widely used to represent graph data with inherent uncertainty in structures. Reliability search is a fundamental problem in uncertain graph analytics. This paper investigates on a new problem with broad real-world applications, the top-k reliability search problem on uncertain graphs, that is, finding the k vertices v with the highest reliabilities of connections from a source vertex s to v. Note that the existing algorithm for the threshold-based reliability search problem is inefficient for the top-k reliability search problem. We propose a new algorithm to efficiently solve the top-k reliability search problem. The algorithm adopts two important techniques, namely the BFS sharing technique and the offline sampling technique. The BFS sharing technique exploits overlaps among different sampled possible worlds of the input uncertain graph and performs a single BFS on all possible worlds simultaneously. The offline sampling technique samples possible worlds offline and stores them using a compact structure. The algorithm also takes advantages of bit vectors and bitwise operations to improve efficiency. In addition, we generalize the top-k reliability search problem from single-source case to the multi-source case and show that the multi-source case of the problem can be equivalently converted to the single-source case of the problem. Moreover, we define two types of the reverse top-k reliability search problems with different semantics on uncertain graphs. We propose appropriate solutions for both of them. Extensive experiments carried out on both real and synthetic datasets verify that the optimized algorithm outperforms the baselines by 1–2 orders of magnitude in execution time while achieving comparable accuracy. Meanwhile, the optimized algorithm exhibits linear scalability with respect to the size of the input uncertain graph.  相似文献   

11.
Nowadays, location-based services (LBS) are facilitating people in daily life through answering LBS queries. However, privacy issues including location privacy and query privacy arise at the same time. Existing works for protecting query privacy either work on trusted servers or fail to provide sufficient privacy guarantee. This paper combines the concepts of differential privacy and k-anonymity to propose the notion of differentially private k-anonymity (DPkA) for query privacy in LBS. We recognize the sufficient and necessary condition for the availability of 0-DPkA and present how to achieve it. For cases where 0-DPkA is not achievable, we propose an algorithm to achieve ??-DPkA with minimized ??. Extensive simulations are conducted to validate the proposed mechanisms based on real-life datasets and synthetic data distributions.  相似文献   

12.
Target tracking is one of the important applications of wireless sensor networks (WSNs). Most of the existing approaches assume that the nodes are dense enough and ignore the coverage holes which are very common in WSNs because of random deployment of the sensor nodes, block of obstacles, etc. Besides, predicting the target’s location of the next time instance is unwise because of the quite a lot random factors. In this paper, we propose a novel target tracking approach without any predicting, called k-nearest neighbors tracking (k-NNT), to tackle the problems of energy efficiency, continuity and coverage holes. In k-NNT, only the k-nearest neighbors keep active and track the target when more than k nodes can sense the target; the k-nearest neighbors work when there are only k′ nodes (k′ < k) can sense the target. A sophisticated rotation mechanism is designed to improve the continuity of the tracking process. In the worst case, none of the nodes can sense the target, i.e., the target enters into the coverage holes, and then k-NNT recovers by the Round Up mode (RU mode). The nodes on the perimeter of the coverage hole always keep active for a time threshold t and sense the around environment to find the target in time. Once a node finds the target, the RU mode is over and the irrelevant nodes turn into inactive mode. A series of simulation show that k-NNT performs superiorly compared with several existing approaches in terms of tracking accuracy, continuity and energy efficiency.  相似文献   

13.
Sequential pattern mining (SPM) is an important data mining problem with broad applications. SPM is a hard problem due to the huge number of intermediate subsequences to be considered. State of the art approaches for SPM (e.g., PrefixSpan Pei et al. 2001) are largely based on the pattern-growth approach, where for each frequent prefix subsequence, only its related suffix subsequences need to be considered, and the database is recursively projected into smaller ones. Many authors have promoted the use of constraints to focus on the most promising patterns according to the interests of the end user. The top-k SPM problem is also used to cope with the difficulty of thresholding and to control the number of solutions. State of the art methods developed for SPM and top-k SPM, though efficient, are locked into a rather rigid search strategy, and suffer from the lack of declarativity and flexibility. Indeed, adding new constraints usually amounts to changing the data-structures used in the core of the algorithm, and combining these new constraints often require new developments. Recent works (e.g. Kemmar et al. 2014; Négrevergne and Guns 2015) have investigated the use of Constraint Programming (CP) for SPM. However, despite their nice declarative aspects, all these modelings have scaling problems, due to the huge size of their constraint networks. To address this issue, we propose the Prefix-Projection global constraint, which encapsulates both the subsequence relation as well as the frequency constraint. Its filtering algorithm relies on the principle of projected databases which allows to keep in the variables domain, only values leading to a frequent pattern in the database. Prefix-Projection filtering algorithm enforces domain consistency on the variable succeeding the current frequent prefix in polynomial time. This global constraint also allows for a straightforward implementation of additional constraints such as size, item membership, regular expressions and any combination of them. Experimental results show that our approach clearly outperforms existing CP approaches and competes well with the state-of-the-art methods on large datasets for mining frequent sequential patterns, sequential patterns under various constraints, and top-k sequential patterns. Unlike existing CP methods, our approach achieves a better scalability.  相似文献   

14.
The study of broken-triangles is becoming increasingly ambitious, by both solving constraint satisfaction problems (CSPs) in polynomial time and reducing search space size through either value merging or variable elimination. Considerable progress has been made in extending this important concept, such as dual broken-triangle and weakly broken-triangle, in order to maximize the number of captured tractable CSP instances and/or the number of merged values. Specifically, m-wBTP allows us to merge more values than BTP. DBTP, ??-BTP, k-BTP, WBTP and m-wBTP permit us to capture more tractable instances than BTP. However, except BTP, none of these extensions allows variable elimination while preserving satisfiability. Moreover, k-BTP and m-wBTP define bigger tractable classes around BTP but both of them generally need a high level of consistency. Here, we introduce a new weaker form of BTP, called m-fBTP for flexible broken-triangle property, which will represent a compromise between most of these previous tractable properties based on BTP. m-fBTP allows us on the one hand to eliminate more variables than BTP while preserving satisfiability and on the other to define a new bigger tractable class for which arc consistency is a decision procedure. Likewise, m-fBTP permits to merge more values than BTP but fewer than m-wBTP. The binary CSP instances satisfying m-fBTP are solved by algorithms of the state-of-the-art like MAC and RFL in polynomial time. An open question is whether it is possible to compute, in polynomial time, the existence of some variable ordering for which a given instance satisfies m-fBTP.  相似文献   

15.
The k nearest neighbors (k-NN) classification technique has a worldly wide fame due to its simplicity, effectiveness, and robustness. As a lazy learner, k-NN is a versatile algorithm and is used in many fields. In this classifier, the k parameter is generally chosen by the user, and the optimal k value is found by experiments. The chosen constant k value is used during the whole classification phase. The same k value used for each test sample can decrease the overall prediction performance. The optimal k value for each test sample should vary from others in order to have more accurate predictions. In this study, a dynamic k value selection method for each instance is proposed. This improved classification method employs a simple clustering procedure. In the experiments, more accurate results are found. The reasons of success have also been understood and presented.  相似文献   

16.
A deterministic parallel LL parsing algorithm is presented. The algorithm is based on a transformation from a parsing problem to parallel reduction. First, a nondeterministic version of a parallel LL parser is introduced. Then, it is transformed into the deterministic version—the LLP parser. The deterministic LLP(q,k) parser uses two kinds of information to select the next operation — a lookahead string of length up to k symbols and a lookback string of length up to q symbols. Deterministic parsing is available for LLP grammars, a subclass of LL grammars. Since the presented deterministic and nondeterministic parallel parsers are both based on parallel reduction, they are suitable for most parallel architectures.  相似文献   

17.
Representative skyline computation is a fundamental issue in database area, which has attracted much attention in recent years. A notable definition of representative skyline is the distance-based representative skyline (DBRS). Given an integer k, a DBRS includes k representative skyline points that aims at minimizing the maximal distance between a non-representative skyline point and its nearest representative. In the 2D space, the state-of-the-art algorithm to compute the DBRS is based on dynamic programming (DP) which takes O(k m 2) time complexity, where m is the number of skyline points. Clearly, such a DP-based algorithm cannot be used for handling large scale datasets due to the quadratic time cost. To overcome this problem, in this paper, we propose a new approximate algorithm called ARS, and a new exact algorithm named PSRS, based on a carefully-designed parametric search technique. We show that the ARS algorithm can guarantee a solution that is at most ?? larger than the optimal solution. The proposed ARS and PSRS algorithms run in O(klog2mlog(T/??)) and O(k 2 log3m) time respectively, where T is no more than the maximal distance between any two skyline points. We also propose an improved exact algorithm, called PSRS+, based on an effective lower and upper bounding technique. We conduct extensive experimental studies over both synthetic and real-world datasets, and the results demonstrate the efficiency and effectiveness of the proposed algorithms.  相似文献   

18.
Choosing the best location for starting a business or expanding an existing enterprize is an important issue. A number of location selection problems have been discussed in the literature. They often apply the Reverse Nearest Neighbor as the criterion for finding suitable locations. In this paper, we apply the Average Distance as the criterion and propose the so-called k-most suitable locations (k-MSL) selection problem. Given a positive integer k and three datasets: a set of customers, a set of existing facilities, and a set of potential locations. The k-MSL selection problem outputs k locations from the potential location set, such that the average distance between a customer and his nearest facility is minimized. In this paper, we formally define the k-MSL selection problem and show that it is NP-hard. We first propose a greedy algorithm which can quickly find an approximate result for users. Two exact algorithms are then proposed to find the optimal result. Several pruning rules are applied to increase computational efficiency. We evaluate the algorithms’ performance using both synthetic and real datasets. The results show that our algorithms are able to deal with the k-MSL selection problem efficiently.  相似文献   

19.
Regular Random k-SAT: Properties of Balanced Formulas   总被引:1,自引:0,他引:1  
We consider a model for generating random k-SAT formulas, in which each literal occurs approximately the same number of times in the formula clauses (regular random and k-SAT). Our experimental results show that such regular random k-SAT instances are much harder than the usual uniform random k-SAT problems. This is in agreement with other results that show that more balanced instances of random combinatorial problems are often much more difficult to solve than uniformly random instances, even at phase transition boundaries. There are almost no formal results known for such problem distributions. The balancing constraints add a dependency between variables that complicates a standard analysis. Regular random 3-SAT exhibits a phase transition as a function of the ratio α of clauses to variables. The transition takes place at approximately α = 3.5. We show that for α > 3.78 with high probability (w.h.p.) random regular 3-SAT formulas are unsatisfiable. Specifically, the events hold with high probability if Pr when n → ∞. We also show that the analysis of a greedy algorithm proposed by Kaporis et al. for the uniform 3-SAT model can be adapted for regular random 3-SAT. In particular, we show that for formulas with ratio α < 2.46, a greedy algorithm finds a satisfying assignment with positive probability.  相似文献   

20.
We prove that any balanced incomplete block design B(v, k, 1) generates a nearresolvable balanced incomplete block design NRB(v, k ? 1, k ? 2). We establish a one-to-one correspondence between near-resolvable block designs NRB(v, k ?1, k ?2) and the subclass of nonbinary (optimal, equidistant) constant-weight codes meeting the generalized Johnson bound.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号