首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
传统的数据挖掘方法会生成大量的模式和规则,且难以理解,而实际上用户感兴趣的只是其中的一小部分.针对该问题,在挖掘序列模式的PrefixSpan算法基础上提出一种带数据项约束的序列模式挖掘方法,通过数据项约束,减少了搜索空间.实验结果表明,该方法可以有效地挖掘出满足数据项约束的序列模式.  相似文献   

2.
3.
Mining sequential patterns is used to discover all the frequent sequences in a sequence database. However, the mining may return a huge number of patterns, while the users are only interested in a particular subset of these. In this paper, we consider the problem of mining sequential patterns with itemset constraints. In order to solve this problem, we propose a new algorithm named MSPIC-DBV, which is a pattern-growth algorithm that uses prefixes and dynamic bit vectors. This algorithm prunes the search space at the beginning and during the mining process. Moreover, it reduces the number of candidates that need to be checked. The experimental results show that the proposed algorithm outperforms the previous methods.  相似文献   

4.
In classical association rules mining, a minimum support threshold is assumed to be available for mining frequent itemsets. However, setting such a threshold is typically hard. We handle a more practical problem; roughly speaking, it is to mine N k-itemsets with the highest supports for k up to a certain k/sub max/ value. We call the results the N-most interesting itemsets. Generally, it is more straightforward for users to determine N and k/sub max/. We propose two new algorithms, LOOPBACK and BOMO. Experiments show that our methods outperform the previously proposed Itemset-Loop algorithm, and the performance of BOMO can be an order of magnitude better than the original FP-tree algorithm, even with the assumption of an optimally chosen support threshold. We also propose the mining of "N-most interesting k-itemsets with item constraints." This allows user to specify different degrees of interestingness for different itemsets. Experiments show that our proposed Double FP-trees algorithm, which is based on BOMO, is highly efficient in solving this problem.  相似文献   

5.
Mining sequential patterns with regular expression constraints   总被引:5,自引:0,他引:5  
Discovering sequential patterns is an important problem in data mining with a host of application domains including medicine, telecommunications, and the World Wide Web. Conventional sequential pattern mining systems provide users with only a very restricted mechanism (based on minimum support) for specifying patterns of interest. As a consequence, the pattern mining process is typically characterized by lack of focus and users often end up paying inordinate computational costs just to be inundated with an overwhelming number of useless results. We propose the use of Regular Expressions (REs) as a flexible constraint specification tool that enables user-controlled focus to be incorporated into the pattern mining process. We develop a family of novel algorithms (termed SPIRIT-Sequential Pattern mining with Regular expression consTraints) for mining frequent sequential patterns that also satisfy user-specified RE constraints. The main distinguishing factor among the proposed schemes is the degree to which the RE constraints are enforced to prune the search space of patterns during computation. Our solutions provide valuable insights into the trade-offs that arise when constraints that do not subscribe to nice properties (like anti monotonicity) are integrated into the mining process  相似文献   

6.
王华东  杨杰  李亚娟 《计算机应用》2014,34(9):2612-2616
研究这样一个问题:给定多序列、支持度阈值和间隔约束,从多序列中挖掘所有出现次数不小于支持度阈值的频繁序列模式,这里要求模式中任意两个相邻元素在序列中的出现都要满足用户自定义的间隔约束,并且模式在序列中的出现要满足one-off条件。在解决该问题上,已有算法M-OneOffMine在计算模式的支持度时,只考虑模式的每个字符在序列中的首次出现,导致计算的模式支持度远小于其真实支持度,以致许多频繁的模式没有被挖掘出来。为此,设计了一个有效的带有间隔约束的多序列模式挖掘算法--MMSP算法:首先,通过采用二维表保存模式的候选位置;然后,根据候选位置采用最左最优的思想选择匹配位置。通过生物DNA序列进行实验,多序列中元素序列数目不变而序列长度变化时,MMSP挖掘出的频繁模式总数是同类算法M-OneOffMine的3.23倍;在元素序列个数变化时,MMSP挖掘出的频繁模式个数平均是M-OneOffMine的4.11倍;这两种情况下MMSP都有更好的时间性能。在模式长度变化时,MMSP挖掘出的频繁模式个数分别平均是M-OneOffMine的2.21倍和MPP的5.24倍。同时还验证了M-OneOffMine挖掘到的模式是MMSP挖掘到的频繁的子集。实验结果表明,MMSP算法不仅可以挖掘到更多的频繁模式,而且时间花费更少,更适合于实际的应用。  相似文献   

7.
《Knowledge》2007,20(1):86-97
Frequent pattern mining is one of main concerns in data mining tasks. In frequent pattern mining, closed frequent pattern mining and weighted frequent pattern mining are two main approaches to reduce the search space. Although many related studies have been suggested, no mining algorithm considers both paradigms. Even if closed frequent pattern mining represents exactly the same knowledge and weighted frequent pattern mining provides a way to discover more important patterns, the incorporation of closed frequent pattern mining and weight frequent pattern mining may loss information. Based on our analysis of joining orders, we propose closed weighted frequent pattern mining, and present how to discover succinct but lossless closed frequent pattern with weight constraints. To our knowledge, ours is the first work specifically to consider both constraints. An extensive performance study shows that our algorithm outperforms previous algorithms. In addition, it is efficient and scalable.  相似文献   

8.
Mining minimal distinguishing subsequence patterns with gap constraints   总被引:1,自引:4,他引:1  
Discovering contrasts between collections of data is an important task in data mining. In this paper, we introduce a new type of contrast pattern, called a Minimal Distinguishing Subsequence (MDS). An MDS is a minimal subsequence that occurs frequently in one class of sequences and infrequently in sequences of another class. It is a natural way of representing strong and succinct contrast information between two sequential datasets and can be useful in applications such as protein comparison, document comparison and building sequential classification models. Mining MDS patterns is a challenging task and is significantly different from mining contrasts between relational/transactional data. One particularly important type of constraint that can be integrated into the mining process is the gap constraint. We present an efficient algorithm called ConSGapMiner (Contrast Sequences with Gap Miner), to mine all MDSs satisfying a minimum and maximum gap constraint, plus a maximum length constraint. It employs highly efficient bitset and boolean operations, for powerful gap-based pruning within a prefix growth framework. A performance evaluation with both sparse and dense datasets, demonstrates the scalability of ConSGapMiner and shows its ability to mine patterns from high dimensional datasets at low supports.  相似文献   

9.
Mining sequential patterns from data streams: a centroid approach   总被引:1,自引:0,他引:1  
In recent years, emerging applications introduced new constraints for data mining methods. These constraints are typical of a new kind of data: the data streams. In data stream processing, memory usage is restricted, new elements are generated continuously and have to be considered in a linear time, no blocking operator can be performed and the data can be examined only once. At this time, only a few methods has been proposed for mining sequential patterns in data streams. We argue that the main reason is the combinatory phenomenon related to sequential pattern mining. In this paper, we propose an algorithm based on sequences alignment for mining approximate sequential patterns in Web usage data streams. To meet the constraint of one scan, a greedy clustering algorithm associated to an alignment method is proposed. We will show that our proposal is able to extract relevant sequences with very low thresholds.  相似文献   

10.
Given a directed graph, the problem of blackhole mining is to identify groups of nodes, called blackhole patterns, in a way such that the average in-weight of this group is significantly larger than the average out-weight of the same group. The problem of finding volcano patterns is a dual problem of mining blackhole patterns. Therefore, we focus on discovering the blackhole patterns. Indeed, in this article, we develop a generalized blackhole mining framework. Specifically, we first design two pruning schemes for reducing the computational cost by reducing both the number of candidate patterns and the average computation cost for each candidate pattern. The first pruning scheme is to exploit the concept of combination dominance to reduce the exponential growth search space. Based on this pruning approach, we develop the gBlackhole algorithm. Instead, the second pruning scheme is an approximate approach, named approxBlackhole, which can strike a balance between the efficiency and the completeness of blackhole mining. Finally, experimental results on real-world data show that the performance of approxBlackhole can be several orders of magnitude faster than gBlackhole, and both of them have huge computational advantages over the brute-force approach. Also, we show that the blackhole mining algorithm can be used to capture some suspicious financial fraud patterns.  相似文献   

11.
Mining sequential patterns by pattern-growth: the PrefixSpan approach   总被引:12,自引:0,他引:12  
Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [R. Agrawal et al. (1994)] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [J. Han et al. (2000)], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [M. Zaki, (2001)] (a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.  相似文献   

12.
We present a new approach to the stability analysis of finite receding horizon control applied to constrained linear systems. By relating the final predicted state to the current state through a bound on the terminal cost, it is shown that knowledge of upper and lower bounds for the finite horizon costs is sufficient to determine the stability of a receding horizon controller. This analysis is valid for receding horizon schemes with arbitrary positive-definite terminal weights and does not rely on the use of stabilizing constraints. The result is a computable test for stability, and two simple examples are used to illustrate its application  相似文献   

13.
To obtain a user-desired and accurate clustering result in practical applications, one way is to utilize additional pairwise constraints that indicate the relationship between two samples, that is, whether these samples belong to the same cluster or not. In this paper, we put forward a discriminative learning approach which can incorporate pairwise constraints into the recently proposed two-class maximum margin clustering framework. In particular, a set of pairwise loss functions is proposed, which features robust detection and penalization for violating the pairwise constraints. Consequently, the proposed method is able to directly find the partitioning hyperplane, which can separate the data into two groups and satisfy the given pairwise constraints as much as possible. In this way, it makes fewer assumptions on the distance metric or similarity matrix for the data, which may be complicated in practice, than existing popular constrained clustering algorithms. Finally, an iterative updating algorithm is proposed for the resulting optimization problem. The experiments on a number of real-world data sets demonstrate that the proposed pairwise constrained two-class clustering algorithm outperforms several representative pairwise constrained clustering counterparts in the literature.  相似文献   

14.
项约束频繁项集挖掘的新方法   总被引:1,自引:0,他引:1       下载免费PDF全文
项约束频繁项集挖掘是项约束关联规则挖掘的关键步骤。对项约束频繁项集挖掘的内涵进行讨论,认为一个项集X本身满足项约束条件B是不够的,数据库中支持X的全部事务均满足B才能称“项集X满足条件B”。据此,将Direct算法改进为Direct*,在Direct*中负项被作为一个独立的项来看待。项约束是简洁性约束,但目前已有的算法没有充分利用其简洁性,提出利用项约束简洁性的MSEB算法。实验表明:对稠密数据库,MSEB的效率较高,并且Direct*和MSEB两个算法均是正确的。  相似文献   

15.
In many real world applications classification models are required to be in line with domain knowledge and to respect monotone relations between predictor variables and the target class, in order to be acceptable for implementation. This paper presents a novel heuristic approach, called RULEM, to induce monotone ordinal rule based classification models. The proposed approach can be applied in combination with any rule- or tree-based classification technique, since monotonicity is guaranteed in a post-processing step. RULEM checks whether a rule set or decision tree violates the imposed monotonicity constraints and existing violations are resolved by inducing a set of additional rules which enforce monotone classification. The approach is able to handle non-monotonic noise, and can be applied to both partially and totally monotone problems with an ordinal target variable. Two novel justifiability measures are introduced which are based on RULEM and allow to calculate the extent to which a classification model is in line with domain knowledge expressed in the form of monotonicity constraints. An extensive benchmarking experiment and subsequent statistical analysis of the results on 14 public data sets indicates that RULEM preserves the predictive power of a rule induction technique while guaranteeing monotone classification. On the other hand, the post-processed rule sets are found to be significantly larger which is due to the induction of additional rules. E.g., when combined with Ripper a median performance difference was observed in terms of PCC equal to zero and an average difference equal to −0.66%, with on average 5 rules added to the rule sets. The average and minimum justifiability of the original rule sets equal respectively 92.66% and 34.44% in terms of the RULEMF justifiability index, and 91.28% and 40.1% in terms of RULEMS, indicating the effective need for monotonizing the rule sets.  相似文献   

16.
We consider the minimum-compliance formulation of the truss topology problem with additional linear constraints on the displacements: the so-called displacement constraints. We propose a new bilevel programming approach to this problem. Our primal goal (upper-level) is to satisfy the displacement constraint as well as possible — we minimize the gap between the actual and prescribed displacement. Our second goal (lower level) is to minimize the compliance — we still want to find the stiffest structure satisfying the displacement constraints. On the lower level we solve a standard truss topology problem and hence we can solve it in the formulation suitable for the fast interior point alogrithms. The overall bilevel problem is solved by means of the so-called implicit programming approach. This approach leads to a nonsmooth optimization problem which is finally solved by a nonsmooth solver.  相似文献   

17.
Van  Trang  Le  Bac 《Applied Intelligence》2021,51(10):7208-7220
Applied Intelligence - Mining sequential rules from a sequence database usually returns a set of rules with great cardinality. However, in real world applications, the end-users are often...  相似文献   

18.
This paper considers the problem of determining the minimum Euclidean distance of a point from a polynomial surface in . It is well known that this problem is in general non-convex. The main purpose of the paper is to investigate to what extent linear matrix inequality (LMI) techniques can be exploited for solving this problem. The first result of the paper shows that a lower bound to the global minimum can be achieved via the solution of a one-parameter family of linear matrix inequalities (LMIs). It is also pointed out that for some classes of problems the solution of a single LMI problem provides the lower bound. The second result concerns the tightness of the bound. It is shown that optimality of the lower bound amounts to solving a system of linear equations. An application example is finally presented to show the features of the approach.  相似文献   

19.
作为对有色装箱问题的推广,提出了一种受位置约束的有色装箱问题(longest item at the bottom coloring bin packing problem,LIBCBPP),即在有色物品的装箱过程中,要求重(长)的物品置于轻(短)的物品下方.该问题在任务调度和日常生活中的运输等问题中有着广泛的应用背景.给出了一个求解该问题的近似KC-LIBFF算法,分析其最坏情况渐进性能比为2,并给出了相应的实验结果.  相似文献   

20.
Mining spatial colocation patterns: a different framework   总被引:2,自引:0,他引:2  
Recently, there has been considerable interest in mining spatial colocation patterns from large spatial datasets. Spatial colocation patterns represent the subsets of spatial events whose instances are often located in close geographic proximity. Most studies of spatial colocation mining require the specification of two parameter constraints to find interesting colocation patterns. One is a minimum prevalent threshold of colocations, and the other is a distance threshold to define spatial neighborhood. However, it is difficult for users to decide appropriate threshold values without prior knowledge of their task-specific spatial data. In this paper, we propose a different framework for spatial colocation pattern mining. To remove the first constraint, we propose the problem of finding N-most prevalent colocated event sets, where N is the desired number of colocated event sets with the highest interest measure values per each pattern size. We developed two alternative algorithms for mining the N-most patterns. They reduce candidate events effectively and use a filter-and-refine strategy for efficiently finding colocation instances from a spatial dataset. We prove the algorithms are correct and complete in finding the N-most prevalent colocation patterns. For the second constraint, a distance threshold for spatial neighborhood determination, we present various methods to estimate appropriate distance bounds from user input data. The result can help an user to set a distance for a conceptualization of spatial neighborhood. Our experimental results with real and synthetic datasets show that our algorithmic design is computationally effective in finding the N-most prevalent colocation patterns. The discovered patterns were different depending on the distance threshold, which shows that it is important to select appropriate neighbor distances.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号