首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recently, constraint-based mining of itemsets for questions like find all frequent itemsets whose total price is at least $50 has attracted much attention. Two classes of constraints, monotone and antimonotone, have been very useful in this area. There exist algorithms that efficiently take advantage of either one of these two classes, but no previous algorithms can efficiently handle both types of constraints simultaneously. In this paper, we present DualMiner, the first algorithm that efficiently prunes its search space using both monotone and antimonotone constraints. We complement a theoretical analysis and proof of correctness of DualMiner with an experimental study that shows the efficacy of DualMiner compared to previous work.  相似文献   

2.
Cubegrades: Generalizing Association Rules   总被引:7,自引:0,他引:7  
Cubegrades are a generalization of association rules which represent how a set of measures (aggregates) is affected by modifying a cube through specialization (rolldown), generalization (rollup) and mutation (which is a change in one of the cube's dimensions). Cubegrades are significantly more expressive than association rules in capturing trends and patterns in data because they can use other standard aggregate measures, in addition to COUNT. Cubegrades are atoms which can support sophisticated what if analysis tasks dealing with behavior of arbitrary aggregates over different database segments. As such, cubegrades can be useful in marketing, sales analysis, and other typical data mining applications in business.In this paper we introduce the concept of cubegrades. We define them and give examples of their usage. We then describe in detail an important task for computing cubegrades: generation of significant cubes whichis analogous to generating frequent sets. A novel Grid Based Pruning (GBP) method is employed for this purpose. We experimentally demonstrate the practicality of the method. We conclude with a number of open questions and possible extensions of the work.  相似文献   

3.
Pushing Convertible Constraints in Frequent Itemset Mining   总被引:1,自引:0,他引:1  
Recent work has highlighted the importance of the constraint-based mining paradigm in the context of frequent itemsets, associations, correlations, sequential patterns, and many other interesting patterns in large databases. Constraint pushing techniques have been developed for mining frequent patterns and associations with antimonotonic, monotonic, and succinct constraints. In this paper, we study constraints which cannot be handled with existing theory and techniques in frequent pattern mining. For example, avg(S)v, median(S)v, sum(S)v (S can contain items of arbitrary values, {<, <, , } and v is a real number.) are customarily regarded as tough constraints in that they cannot be pushed inside an algorithm such as Apriori. We develop a notion of convertible constraints and systematically analyze, classify, and characterize this class. We also develop techniques which enable them to be readily pushed deep inside the recently developed FP-growth algorithm for frequent itemset mining. Results from our detailed experiments show the effectiveness of the techniques developed.  相似文献   

4.
This work presents a novel optimization method capable of integrating ordinal optimization (OO) and simulated annealing (SA). A general regression neural network (GRNN) is trained using available data to generate a rough model that approximates the response surface in the feasible domain. A set of good enough candidates are generated by conducting a (SA) search on this rough model. Only candidates accepted by the SA search are actually tested by evaluating their true objective functions. The GRNN model is then updated using these new data. The procedure is repeated until a specified number of tests have been performed. The method (SAOO+GRNN) is tested the well-known paper trim loss problem. SAOO+GRNN approach can substantially reduce the number of function calls and the computing time far below those of simple ordinal optimization method with such as horse race selection rule, as well as straightforward simulated annealing.  相似文献   

5.
6.
Recently, researchers have mainly been interested only in the search for data content that are globally similar to the query and not in the search for inside data items. This paper presents an algorithm, called a generalized virtual node (GVN) algorithm, to search for data items where parts (subdatatype) are similar to the incoming query. We call this subdatatype-based multimedia retrieval. Each multimedia datatype, such as image and audio is represented in this paper as a k-dimensional signal in the spatio-temporal domain. A k-dimensional signal is transformed into characteristic features and these features are stored in a hierarchical multidimensional structure, called the k-tree. Each node on the k-tree contains partial content corresponding to the spatial and/or temporal positions in the data. The k-tree structure allows us to build a unified retrieval model for any types of multimedia data. It also eliminates unnecessary comparisons of cross-media querying. The experimental results of the use of the new GVN algorithm for subaudio and subimage retrievals show that it takes much less retrieval times than other earlier algorithms such as brute-force and the partial-matching algorithm, while the accuracy is acceptable.  相似文献   

7.
This paper provides a survey of possibilistic logic as a simple and efficient tool for handling nonmonotonic reasoning, with some emphasis on algorithmic issues. In our previous works, two well-known nonmonotonic systems have been encoded in the possibility theory framework: the preferential inference based on System P, and the rational closure inference proposed by Lehmann and Magidor which relies on System P augmented with a rational monotony postulate. System P is known to provide reasonable but very cautious conclusions, and in particular, preferential inference is blocked by the presence of irrelevant properties. When using Lehmann's rational closure, the inference machinery, which is then more productive, may still remain too cautious, or on the contrary, provide counter -intuitive conclusions. The paper proposes an approach to overcome the cautiousness of System P and the problems encountered by the rational closure inference. This approach takes advantage of (contextual) independence assumptions of the form: the fact that is true (or is false) does not affect the validity of the rule normally if then . The modelling of such independence assumptions is discussed in the possibilistic framework. Moreover, we show that when a counter-intuitive conclusion of a set of defaults can be inferred, it is always possible to repair the set of defaults by adding suitable information so as to produce the desired conclusions and block unsuitable ones.  相似文献   

8.
关联规则是数据挖掘中发现知识的一种有效方法,其中Apriori算法又是关联规则挖掘的经典算法。本文在分析该Apriori算法的基础上.介绍了该算法的c#实现,包括频繁集的发现和关联规则的生成,并且通过对传统购物篮数据中的频繁集进行了验证,并且得到了其中满足最小支持度和可信度的强关联规则。  相似文献   

9.
The design of the database is crucial to the process of designing almost any Information System (IS) and involves two clearly identifiable key concepts: schema and data model, the latter allowing us to define the former. Nevertheless, the term model is commonly applied indistinctly to both, the confusion arising from the fact that in Software Engineering (SE), unlike in formal or empirical sciences, the notion of model has a double meaning of which we are not always aware. If we take our idea of model directly from empirical sciences, then the schema of a database would actually be a model, whereas the data model would be a set of tools allowing us to define such a schema.The present paper discusses the meaning of model in the area of Software Engineering from a philosophical point of view, an important topic for the confusion arising directly affects other debates where model is a key concept. We would also suggest that the need for a philosophical discussion on the concept of data model is a further argument in favour of institutionalizing a new area of knowledge, which could be called: Philosophy of Engineering.  相似文献   

10.
Since Aristotle it is recognised that a valid syllogism cannot have two particular premises. However, that is not how a lay person sees it; at least as long as the premises read many, most etc, instead of a plain some. The lay people are right if one considers that these syllogisms do not have strict but approximate (Zadeh) validity. Typically there are only particular premises available in everyday life and one is dependent on such syllogisms. – Some rules on the usage of particular premises are given below.  相似文献   

11.
I discuss the attitude of Jewish law sources from the 2nd–:5th centuries to the imprecision of measurement. I review a problem that the Talmud refers to, somewhat obscurely, as impossible reduction. This problem arises when a legal rule specifies an object by referring to a maximized (or minimized) measurement function, e.g., when a rule applies to the largest part of a divided whole, or to the first incidence that occurs, etc. A problem that is often mentioned is whether there might be hypothetical situations involving more than one maximal (or minimal) value of the relevant measurement and, given such situations, what is the pertinent legal rule. Presumption of simultaneous occurrences or equally measured values are also a source of embarrassment to modern legal systems, in situations exemplified in the paper, where law determines a preference based on measured values. I contend that the Talmudic sources discussing the problem of impossible reduction were guided by primitive insights compatible with fuzzy logic presentation of the inevitable uncertainty involved in measurement. I maintain that fuzzy models of data are compatible with a positivistic epistemology, which refuses to assume any precision in the extra-conscious world that may not be captured by observation and measurement. I therefore propose this view as the preferred interpretation of the Talmudic notion of impossible reduction. Attributing a fuzzy world view to the Talmudic authorities is meant not only to increase our understanding of the Talmud but, in so doing, also to demonstrate that fuzzy notions are entrenched in our practical reasoning. If Talmudic sages did indeed conceive the results of measurements in terms of fuzzy numbers, then equality between the results of measurements had to be more complicated than crisp equations. The problem of impossible reduction could lie in fuzzy sets with an empty core or whose membership functions were only partly congruent. Reduction is impossible may thus be reconstructed as there is no core to the intersection of two measures. I describe Dirichlet maps for fuzzy measurements of distance as a rough partition of the universe, where for any region A there may be a non-empty set of - _A (upper approximation minus lower approximation), where the problem of impossible reduction applies. This model may easily be combined with probabilistic extention. The possibility of adopting practical decision standards based on -cuts (and therefore applying interval analysis to fuzzy equations) is discussed in this context. I propose to characterize the uncertainty that was presumably capped by the old sages as U-uncertainty, defined, for a non-empty fuzzy set A on the set of real numbers, whose -cuts are intervals of real numbers, as U(A) = 1/h(A) 0 h(A) log [1+(A)]d, where h(A) is the largest membership value obtained by any element of A and (A) is the measure of the -cut of A defined by the Lebesge integral of its characteristic function.  相似文献   

12.
Given a complete residuated lattice (L,,,*,,0,1), we show that any *-preorder can be represented both by an implication-based graded inclusion as defined [1] and by a similarity-based graded inclusion as defined in [2]. Also, in accordance with a duality between fuzzy orders and quasi-metrics, we obtain two corresponding representation theorems for quasi-metrics.  相似文献   

13.
Through key examples and constructs, exact and approximate, complexity, computability, and solution of linear programming systems are reexamined in the light of Khachian's new notion of (approximate) solution. Algorithms, basic theorems, and alternate representations are reviewed. It is shown that the Klee-Minty example hasnever been exponential for (exact) adjacent extreme point algorithms and that the Balinski-Gomory (exact) algorithm continues to be polynomial in cases where (approximate) ellipsoidal centered-cutoff algorithms (Levin, Shor, Khachian, Gacs-Lovasz) are exponential. By model approximation, both the Klee-Minty and the new J. Clausen examples are shown to be trivial (explicitly solvable) interval programming problems. A new notion of computable (approximate) solution is proposed together with ana priori regularization for linear programming systems. New polyhedral constraint contraction algorithms are proposed for approximate solution and the relevance of interval programming for good starts or exact solution is brought forth. It is concluded from all this that the imposed problem ignorance of past complexity research is deleterious to research progress on computability or efficiency of computation.This research was partly supported by Project NR047-071, ONR Contract N00014-80-C-0242, and Project NR047-021, ONR Contract N00014-75-C-0569, with the Center for Cybernetic Studies, The University of Texas at Austin.  相似文献   

14.
This paper reports on the first steps towards the formal verification of correctness proofs of real-life protocols in process algebra. We show that such proofs can be verified, and partly constructed, by a general purpose proof checker. The process algebra we use isCRL, ACP augmented with data, which is expressive enough for the specification of real-life protocols. The proof checker we use is Coq, which is based on the Calculus of Constructions, an extension of simply typed lambda calculus. The focus is on the translation of the proof theory ofCRL andCRL-specifications to Coq. As a case study, we verified the Alternating Bit Protocol.  相似文献   

15.
This paper presents a detailed study of Eurotra Machine Translation engines, namely the mainstream Eurotra software known as the E-Framework, and two unofficial spin-offs – the C,A,T and Relaxed Compositionality translator notations – with regard to how these systems handle hard cases, and in particular their ability to handle combinations of such problems. In the C,A,T translator notation, some cases of complex transfer are wild, meaning roughly that they interact badly when presented with other complex cases in the same sentence. The effect of this is that each combination of a wild case and another complex case needs ad hoc treatment. The E-Framework is the same as the C,A,T notation in this respect. In general, the E-Framework is equivalent to the C,A,T notation for the task of transfer. The Relaxed Compositionality translator notation is able to handle each wild case (bar one exception) with a single rule even where it appears in the same sentence as other complex cases.  相似文献   

16.
I-SATCHMO: An Improvement of SATCHMO   总被引:2,自引:0,他引:2  
We introduce a method for reducing the redundant search space for SATCHMO's model generation approach by means of intelligent backtracking. During the reasoning, we mark an asserted consequent atom as useful whenever it has been used as an antecedent atom for forward chaining. We show that a splitting of the consequence of a non-Horn clause is unnecessary if one of its consequent atoms is found not to be useful at the time it is retracted from the database on backtracking, and therefore the remaining splitting over the clause's consequence can be immediately abandoned. In this way, much of the redundant search space can be eliminated. Our method is simple in principle, easy to implement in Prolog, independent of other refinements, and effective for model generation theorem proving.  相似文献   

17.
Summary The following three results concerning tree automata are presented in this paper. (1) Rounds has presented the following open problem: For every recognizable setR, can we construct a deterministic finite-state transformation recognizingR? We show that this is not possible, in fact, even for a local set. However, the following is true: For every recognizable setR there is an inverse projectionR effectively obtained such thatR is recognized by a deterministic finite-state transformation. (2) Martin and Vere in their study of tree automata leave open the question of whether Generalized Syntax Directed Transductions (GSDT's) are closed under Arden's transformation or Greibach's transformation, and conjecture that they are not. We prove that this conjecture is true. It is also shown that GSDT's are not closed under transformation to LR(k) grammars. (3) Peters and Ritchie have shown that if, in a grammar where the generative rules are context-free, there are recognition rules which are context-sensitive, the language recognized is still context-free. A tree-automata-oriented proof is given by Rounds. We show that a similar result holds also for right linear grammars, i.e., if the generative rules are right linear, then using context-sensitive rules for recognition, one can still recognize only regular languages. Some other related results concerning context-sensitive extensions of subclasses of context-free languages are also presented.This work was partially supported by NSF Grant GJ27, U.S. Army Research Office, Durham (DA-31-124 ARO(D)-98), and NSF Grant GS-2509.A present on leave at The Institute for Advanced Study, Princeton, N.J.  相似文献   

18.
Spatio-Temporal Data Mining for Typhoon Image Collection   总被引:4,自引:0,他引:4  
Our research aims at discovering useful knowledge from the large collection of satellite images of typhoons using data mining approaches. We first introduce the creation of the typhoon image collection that consists of around 34,000 typhoon images for the northern and southern hemisphere, providing the medium-sized, richly-variational and quality-controlled data collection suitable for spatio-temporal data mining research. Next we apply several data mining approaches for this image collection. We start with spatial data mining, where principal component analysis is used for extracting basic components and reducing dimensionality, and it revealed that the major principal components describe latitudinal structures and spiral bands. Moreover, clustering procedures give the birds-eye-view visualization of typhoon cloud patterns. We then turn to temporal data mining, including state transition rules, but we demonstrate that it involves intrinsic difficulty associated with the nonlinear dynamics of the atmosphere, or chaos. Finally we briefly introduce our system IMET (Image Mining Environment for Typhoon analysis and prediction), which is designed for the intelligent and efficient searching and browsing of the typhoon image collection.  相似文献   

19.
关联规则挖掘是数据挖掘中的一个重要模型。传统的关联规则挖掘算法需要多次扫描数据库,生成大量候选项集,并且把数据库中各个项目按平等一致的方法对待,算法复杂且与实际情况不符。为此提出一种基于矩阵的加权关联规则挖掘算法,它只需扫描一次数据库,不生成候选项目集,可以快速挖掘出频率小但重要性高的项目。  相似文献   

20.
Concept learning in robotics is an extremely challenging problem: sensory data is often high dimensional, and noisy due to specularities and other irregularities. In this paper, we investigate two general strategies to speed up learning, based on spatial decomposition of the sensory representation, and simultaneous learning of multiple classes using a shared structure. We study two concept learning scenarios: a hallway navigation problem, where the robot has to induce features such as opening or wall. The second task is recycling, where the robot has to learn to recognize objects, such as a trash can. We use a common underlying function approximator in both studies in the form of a feedforward neural network, with several hundred input units and multiple output units. Despite the high degree of freedom afforded by such an approximator, we show the two strategies provide sufficient bias to achieve rapid learning. We provide detailed experimental studies on an actual mobile robot called PAVLOV to illustrate the effectiveness of this approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号