共查询到10条相似文献,搜索用时 122 毫秒
1.
The output of frequent pattern mining is a huge number of frequent patterns, which are very redundant, causing a serious problem
in understandability. We focus on mining frequent subgraphs for which well-considered approaches to reduce the redundancy
are limited because of the complex nature of graphs. Two known, standard solutions are closed and maximal frequent subgraphs,
but closed frequent subgraphs are still redundant and maximal frequent subgraphs are too specific. A more promising solution
is δ-tolerance closed frequent subgraphs, which decrease monotonically in δ, being equal to maximal frequent subgraphs and closed frequent subgraphs for δ=0 and 1, respectively. However, the current algorithm for mining δ-tolerance closed frequent subgraphs is a naive, two-step approach in which frequent subgraphs are all enumerated and then
sifted according to δ-tolerance closedness. We propose an efficient algorithm based on the idea of “reverse-search” by which the completeness of
enumeration is guaranteed and for which new pruning conditions are incorporated. We empirically demonstrate that our approach
significantly reduced the amount of real computation time of two compared algorithms for mining δ-tolerance closed frequent subgraphs, being pronounced more for practical settings. 相似文献
2.
We consider the problem of maintaining a binary search tree ({bst}) that minimizes the average access cost needed to satisfy randomly generated requests. We analyze scenarios in which the
accesses are generated according to a vector of fixed probabilities which is unknown . Our approach is statistical.
We devise policies for modifying the tree structure dynamically, using rotations of accessed records. The aim is to produce
good approximations of the optimal structure of the tree, while keeping the number of rotations as small as possible. The
heuristics that we propose achieve a close approximation to the optimal BST, with lower organization costs than any previously
studied.
We introduce the MOVE_ONCE rule. The average access cost to the tree under this rule is shown to equal the value achieved
by the common rule Move to the Root (MTR). The advantage of MOVE_ONCE over MTR and similar rules is that it relocates each of the items in the tree at most
once. We show that the total expected cost of modifying the tree by the MOVE_ONCE rule is bounded from above by 2(n+1)H
n
-4n rotations (in a tree with n records), where H
n
is the n th harmonic number. Extensive experiments show that this value is an overestimate, and in fact the number of rotations is
linear for all the access probability vectors we tested. An approximate analysis is shown to match the experimental results,
producing the expected number n((π
2
/3)-2) - 2\ln n+0.1354 .
Next we combine the MOVE_ONCE rule with reference counters, one per record, that provide estimates of the reference probabilities.
We call the resulting reorganization rule MOUCS. We show that, for any δ , α >0 and sufficiently large n , it achieves a cost that approaches the optimum up to an absolute difference of δ with probability higher than 1- α , within a number of accesses that is proportional to n (\lg n)
2
/(αδ
2
) .
Received March 10, 2001; revised March 26, 2001. 相似文献
3.
Shin-ichi Nakano Member ACM IEEE Ryuhei Uehara Member ACM IEEE and Takeaki Uno 《计算机科学技术学报》2009,24(3):517-533
Algorithms used in data mining and bioinformatics have to deal with huge amount of data efficiently.In many applications,the data are supposed to have explicit or implicit structures.To develop efficient algorithms for such data,we have to propose possible structure models and test if the models are feasible.Hence,it is important to make a compact model for structured data,and enumerate all instances efficiently.There are few graph classes besides trees that can be used for a model.In this paper,we inves... 相似文献
4.
A tree language is congruential if it is the union of finitely many classes of a finitely generated congruence on the term
algebra. It is well known that congruential tree languages are the same as recognizable tree languages. An equational representation
is an ordered pair (E, P) , where E is either a ground term equation system or a ground term rewriting system, and P is a finite set of ground terms. We say that (E, P) represents the congruential tree language L which is the union of those ?*
E
-classes containing an element of P, i.e., for which L=⋃{[p]?
*
E
∣p∈P}. We define two sorts of minimality for equational representations. We introduce the cardinality vector (∣E∣, ∣P∣) of an equational representation (E, P). Let ?
l
and ?
a
denote the lexicographic and antilexicographic orders on the set of ordered pairs of nonnegative integers, respectively.
Let L be a congruential tree language. An equational representation (E, P) of L with ?
l
-minimal (?
a
-minimal) cardinality vector is called ?
l
-minimal (?
a
-minimal). We compute, for an L given by a deterministic bottom-up tree automaton, both a ?
l
-minimal and a ?
a
-minimal equational representation of L.
Received: 27 July 1994/5 October 1995 相似文献
5.
Complexity of Hard-Core Set Proofs 总被引:1,自引:1,他引:0
We study a fundamental result of Impagliazzo (FOCS’95) known as the hard-core set lemma. Consider any function f:{0,1}n?{0,1}{f:\{0,1\}^n\to\{0,1\}} which is “mildly hard”, in the sense that any circuit of size s must disagree with f on at least a δ fraction of inputs. Then, the hard-core set lemma says that f must have a hard-core set H of density δ on which it is “extremely hard”, in the sense that any circuit of size
s¢=O(s/(\frac1e2log(\frac1ed))){s'=O(s/(\frac{1}{\epsilon^2}\log(\frac{1}{\epsilon\delta})))} must disagree with f on at least (1-e)/2{(1-\epsilon)/2} fraction of inputs from H. 相似文献
6.
We investigate in this work a generalization of the known CNF representation that we call General Normal Form (GNF) and extend the resolution rule of Robinson to design a new sound and complete resolution proof system for the GNF. We show that by using a cardinality operator in the GNF we obtain the new representation CGNF that allows a natural and efficient Boolean encoding for n-ary finite discrete extensional CSPs. We prove that the space
complexity of a CSP encoded in the CGNF is identical to its space complexity when it is expressed in the classical CSP representation. We prove that the generalized
resolution rule applies for the CGNF encoding and introduce a new inference rule whose application until saturation achieves arc-consistency in a linear time
complexity for n-ary CSP expressed in the CGNF encoding. Two enumerative methods for solving CSP instances encoded in this Boolean framework are studied: the first one
(equivalent to MAC in CSPs) maintains full arc-consistency at each node of the search tree while the second (equivalent to
FC in CSPs) performs partial arc-consistency on each node. Both methods are experimented and compared on some instances of
the Ramsey problem, randomly generated 3/4-ary CSPs and promising results are obtained. We also adapted the notion of clause
learning defined in SAT for the CGNF encoding. Finally, we show how the proposed inference rule can be used to achieve Path-consistency in the case of binary
CSPs encoded in CGNF, and how it can be used to enforce strong path consistency when it is combined with the generalized resolution rule. 相似文献
7.
Yasubumi Sakakibara 《New Generation Computing》1990,7(4):365-380
In this paper we present a new inductive inference algorithm for a class of logic programs, calledlinear monadic logic programs. It has several unique features not found in Shapiro’s Model Inference System. It has been proved that a set of trees isrational if and only if it is computed by a linear monadic logic program, and that the rational set of trees is recognized by a tree
automaton. Based on these facts, we can reduce the problem of inductive inference of linear monadic logic programs to the
problem of inductive inference of tree automata. Further several efficient inference algorithms for finite automata have been
developed. We extend them to an inference algorithm for tree automata and use it to get an efficient inductive inference algorithm
for linear monadic logic programs. The correctness, time complexity and several comparisons of our algorithm with Model Inference
System are shown. 相似文献
8.
We study two-level additive Schwarz preconditioners for the h-p version of the Galerkin boundary element method when used to solve hypersingular integral equations of the first kind, which
arise from the Neumann problems for the Laplacian in two dimensions. Overlapping and non-overlapping methods are considered.
We prove that the non-overlapping preconditioner yields a system of equations having a condition number bounded by where H
i
is the length of the i-th subdomain, h
i
is the maximum length of the elements in this subdomain, and p is the maximum polynomial degree used. For the overlapping method, we prove that the condition number is bounded by where δ is the size of the overlap and H=max
i
H
i
. We also discuss the use of the non-overlapping method when the mesh is geometrically graded. The condition number in that
case is bounded by clog2
M, where M is the degrees of freedom.
Received October 27, 2000, revised March 26, 2001 相似文献
9.
Chung-Wen Cho Yi-Hung Wu Arbee L. P. Chen 《Journal of Intelligent Information Systems》2009,32(1):23-51
In this paper, we propose a novel algorithm for mining frequent sequences from transaction databases. The transactions of
the same customers form a set of customer sequences. A sequence (an ordered list of itemsets) is frequent if the number of customer sequences containing it satisfies the user-specified threshold. The 1-sequence is a special type of sequences because it consists of only a single itemset instead of an ordered list, while the k-sequence is a sequence composed of k itemsets. Compared with the cost of mining frequent k-sequences (k ≥ 2), the cost of mining frequent 1-sequences is negligible. We adopt a two-phase architecture to find the two types of frequent
sequences separately in order that the discovery of frequent k-sequences can be well designed and optimized. For efficient frequent k-sequence mining, every frequent 1-sequence is encoded as a unique symbol and the database is transformed into one constituted
by the symbols. We find that it is unnecessary to encode all the frequent 1-seqences, and make full use of the discovered
frequent 1-sequences to transform the database into one with a smaller size. For every k ≥ 2, the customer sequences in the transformed database are scanned to find all the frequent k-sequences. We devise the compact representation for a customer sequence and elaborate the method to enumerate all distinct
subsequences from a customer sequence without redundant scans. The soundness of the proposed approach is verified and a number
of experiments are performed. The results show that our approach outperforms the previous works in both scalability and execution
time. 相似文献
10.
The diameter of a set P of n points in ℝ
d
is the maximum Euclidean distance between any two points in P. If P is the vertex set of a 3-dimensional convex polytope, and if the combinatorial structure of this polytope is given, we prove
that, in the worst case, deciding whether the diameter of P is smaller than 1 requires Ω(nlog n) time in the algebraic computation tree model. It shows that the O(nlog n) time algorithm of Ramos for computing the diameter of a point set in ℝ3 is optimal for computing the diameter of a 3-polytope. We also give a linear time reduction from Hopcroft’s problem of finding
an incidence between points and lines in ℝ2 to the diameter problem for a point set in ℝ7. 相似文献