首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 122 毫秒
1.
The output of frequent pattern mining is a huge number of frequent patterns, which are very redundant, causing a serious problem in understandability. We focus on mining frequent subgraphs for which well-considered approaches to reduce the redundancy are limited because of the complex nature of graphs. Two known, standard solutions are closed and maximal frequent subgraphs, but closed frequent subgraphs are still redundant and maximal frequent subgraphs are too specific. A more promising solution is δ-tolerance closed frequent subgraphs, which decrease monotonically in δ, being equal to maximal frequent subgraphs and closed frequent subgraphs for δ=0 and 1, respectively. However, the current algorithm for mining δ-tolerance closed frequent subgraphs is a naive, two-step approach in which frequent subgraphs are all enumerated and then sifted according to δ-tolerance closedness. We propose an efficient algorithm based on the idea of “reverse-search” by which the completeness of enumeration is guaranteed and for which new pruning conditions are incorporated. We empirically demonstrate that our approach significantly reduced the amount of real computation time of two compared algorithms for mining δ-tolerance closed frequent subgraphs, being pronounced more for practical settings.  相似文献   

2.
M. Hofri  H. Shachnai 《Algorithmica》2001,31(3):378-402
We consider the problem of maintaining a binary search tree ({bst}) that minimizes the average access cost needed to satisfy randomly generated requests. We analyze scenarios in which the accesses are generated according to a vector of fixed probabilities which is unknown . Our approach is statistical. We devise policies for modifying the tree structure dynamically, using rotations of accessed records. The aim is to produce good approximations of the optimal structure of the tree, while keeping the number of rotations as small as possible. The heuristics that we propose achieve a close approximation to the optimal BST, with lower organization costs than any previously studied. We introduce the MOVE_ONCE rule. The average access cost to the tree under this rule is shown to equal the value achieved by the common rule Move to the Root (MTR). The advantage of MOVE_ONCE over MTR and similar rules is that it relocates each of the items in the tree at most once. We show that the total expected cost of modifying the tree by the MOVE_ONCE rule is bounded from above by 2(n+1)H n -4n rotations (in a tree with n records), where H n is the n th harmonic number. Extensive experiments show that this value is an overestimate, and in fact the number of rotations is linear for all the access probability vectors we tested. An approximate analysis is shown to match the experimental results, producing the expected number n((π 2 /3)-2) - 2\ln n+0.1354 . Next we combine the MOVE_ONCE rule with reference counters, one per record, that provide estimates of the reference probabilities. We call the resulting reorganization rule MOUCS. We show that, for any δ , α >0 and sufficiently large n , it achieves a cost that approaches the optimum up to an absolute difference of δ with probability higher than 1- α , within a number of accesses that is proportional to n (\lg n) 2 /(αδ 2 ) . Received March 10, 2001; revised March 26, 2001.  相似文献   

3.
Algorithms used in data mining and bioinformatics have to deal with huge amount of data efficiently.In many applications,the data are supposed to have explicit or implicit structures.To develop efficient algorithms for such data,we have to propose possible structure models and test if the models are feasible.Hence,it is important to make a compact model for structured data,and enumerate all instances efficiently.There are few graph classes besides trees that can be used for a model.In this paper,we inves...  相似文献   

4.
 A tree language is congruential if it is the union of finitely many classes of a finitely generated congruence on the term algebra. It is well known that congruential tree languages are the same as recognizable tree languages. An equational representation is an ordered pair (E, P) , where E is either a ground term equation system or a ground term rewriting system, and P is a finite set of ground terms. We say that (E, P) represents the congruential tree language L which is the union of those ?* E -classes containing an element of P, i.e., for which L=⋃{[p]? * E pP}. We define two sorts of minimality for equational representations. We introduce the cardinality vector (∣E∣, ∣P∣) of an equational representation (E, P). Let ? l and ? a denote the lexicographic and antilexicographic orders on the set of ordered pairs of nonnegative integers, respectively. Let L be a congruential tree language. An equational representation (E, P) of L with ? l -minimal (? a -minimal) cardinality vector is called ? l -minimal (? a -minimal). We compute, for an L given by a deterministic bottom-up tree automaton, both a ? l -minimal and a ? a -minimal equational representation of L. Received: 27 July 1994/5 October 1995  相似文献   

5.
Complexity of Hard-Core Set Proofs   总被引:1,自引:1,他引:0  
We study a fundamental result of Impagliazzo (FOCS’95) known as the hard-core set lemma. Consider any function f:{0,1}n?{0,1}{f:\{0,1\}^n\to\{0,1\}} which is “mildly hard”, in the sense that any circuit of size s must disagree with f on at least a δ fraction of inputs. Then, the hard-core set lemma says that f must have a hard-core set H of density δ on which it is “extremely hard”, in the sense that any circuit of size s¢=O(s/(\frac1e2log(\frac1ed))){s'=O(s/(\frac{1}{\epsilon^2}\log(\frac{1}{\epsilon\delta})))} must disagree with f on at least (1-e)/2{(1-\epsilon)/2} fraction of inputs from H.  相似文献   

6.
We investigate in this work a generalization of the known CNF representation that we call General Normal Form (GNF) and extend the resolution rule of Robinson to design a new sound and complete resolution proof system for the GNF. We show that by using a cardinality operator in the GNF we obtain the new representation CGNF that allows a natural and efficient Boolean encoding for n-ary finite discrete extensional CSPs. We prove that the space complexity of a CSP encoded in the CGNF is identical to its space complexity when it is expressed in the classical CSP representation. We prove that the generalized resolution rule applies for the CGNF encoding and introduce a new inference rule whose application until saturation achieves arc-consistency in a linear time complexity for n-ary CSP expressed in the CGNF encoding. Two enumerative methods for solving CSP instances encoded in this Boolean framework are studied: the first one (equivalent to MAC in CSPs) maintains full arc-consistency at each node of the search tree while the second (equivalent to FC in CSPs) performs partial arc-consistency on each node. Both methods are experimented and compared on some instances of the Ramsey problem, randomly generated 3/4-ary CSPs and promising results are obtained. We also adapted the notion of clause learning defined in SAT for the CGNF encoding. Finally, we show how the proposed inference rule can be used to achieve Path-consistency in the case of binary CSPs encoded in CGNF, and how it can be used to enforce strong path consistency when it is combined with the generalized resolution rule.  相似文献   

7.
In this paper we present a new inductive inference algorithm for a class of logic programs, calledlinear monadic logic programs. It has several unique features not found in Shapiro’s Model Inference System. It has been proved that a set of trees isrational if and only if it is computed by a linear monadic logic program, and that the rational set of trees is recognized by a tree automaton. Based on these facts, we can reduce the problem of inductive inference of linear monadic logic programs to the problem of inductive inference of tree automata. Further several efficient inference algorithms for finite automata have been developed. We extend them to an inference algorithm for tree automata and use it to get an efficient inductive inference algorithm for linear monadic logic programs. The correctness, time complexity and several comparisons of our algorithm with Model Inference System are shown.  相似文献   

8.
We study two-level additive Schwarz preconditioners for the h-p version of the Galerkin boundary element method when used to solve hypersingular integral equations of the first kind, which arise from the Neumann problems for the Laplacian in two dimensions. Overlapping and non-overlapping methods are considered. We prove that the non-overlapping preconditioner yields a system of equations having a condition number bounded by   where H i is the length of the i-th subdomain, h i is the maximum length of the elements in this subdomain, and p is the maximum polynomial degree used. For the overlapping method, we prove that the condition number is bounded by   where δ is the size of the overlap and H=max i H i . We also discuss the use of the non-overlapping method when the mesh is geometrically graded. The condition number in that case is bounded by clog2 M, where M is the degrees of freedom. Received October 27, 2000, revised March 26, 2001  相似文献   

9.
In this paper, we propose a novel algorithm for mining frequent sequences from transaction databases. The transactions of the same customers form a set of customer sequences. A sequence (an ordered list of itemsets) is frequent if the number of customer sequences containing it satisfies the user-specified threshold. The 1-sequence is a special type of sequences because it consists of only a single itemset instead of an ordered list, while the k-sequence is a sequence composed of k itemsets. Compared with the cost of mining frequent k-sequences (k ≥ 2), the cost of mining frequent 1-sequences is negligible. We adopt a two-phase architecture to find the two types of frequent sequences separately in order that the discovery of frequent k-sequences can be well designed and optimized. For efficient frequent k-sequence mining, every frequent 1-sequence is encoded as a unique symbol and the database is transformed into one constituted by the symbols. We find that it is unnecessary to encode all the frequent 1-seqences, and make full use of the discovered frequent 1-sequences to transform the database into one with a smaller size. For every k ≥ 2, the customer sequences in the transformed database are scanned to find all the frequent k-sequences. We devise the compact representation for a customer sequence and elaborate the method to enumerate all distinct subsequences from a customer sequence without redundant scans. The soundness of the proposed approach is verified and a number of experiments are performed. The results show that our approach outperforms the previous works in both scalability and execution time.  相似文献   

10.
The diameter of a set P of n points in ℝ d is the maximum Euclidean distance between any two points in P. If P is the vertex set of a 3-dimensional convex polytope, and if the combinatorial structure of this polytope is given, we prove that, in the worst case, deciding whether the diameter of P is smaller than 1 requires Ω(nlog n) time in the algebraic computation tree model. It shows that the O(nlog n) time algorithm of Ramos for computing the diameter of a point set in ℝ3 is optimal for computing the diameter of a 3-polytope. We also give a linear time reduction from Hopcroft’s problem of finding an incidence between points and lines in ℝ2 to the diameter problem for a point set in ℝ7.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号