首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
The data mining of association rules is an essential research aspect in the data mining fields. Association rules reflect the inner relationship of data. Discovering these associations is beneficial to the correct and appropriate decision made by decision-makers. Association rules is an important subject of data mining study. The association rules provide an effective means to found the potential link between the data, reflecting a built-in association between the data. In this paper, from the study of data mining technology, we make a in-depth study of the mining association rules, and on the basis, we analyzing the classic method of mining association rules-Apriori algorithm, pointing out its weaknesses, and putting tbrward a new improved algorithm-AprioriMend algorithm.  相似文献   

2.
In this paper a survey of elaboration tolerance in logical AI is provided. John McCarthy views elaboration tolerance as the key property of any formalism that can represent information in the common sense informatic situa-tion. The goal of studying elaboration tolerance is finding a formalism for describing problems logically that is as elab-oration tolerant as natural language and the associated background knowledge. In the beginning, we introduce the missionaries and cannibals problem and its elaboration problems provided by John McCarthy as the test examples of studying elaboration tolerance. Then we introduce the study of elaboration tolerance from three aspects. First of all,the study of elaboration tolerance of the existing systems is introduced such as Causal Calculator and ABSFOL. Sec-ond the study of special elaboration is presented such as elaboration of actions. Last but not least a formal definition of elaboration toleration and evaluation tools is nrmvided.  相似文献   

3.
Given two non-negative integers h and k,an L(h,k)-labeling of a graph G=(V,E) is a function from the set V to a set of colors,such that adjacent nodes take colors at distance at least h,and nodes at distance 2 take colors at distance at least k.The aim of the L(h,k)-labeling problem is to minimize the greatest used color.Since the decisional version of this problem is NP-complete,it is important to investigate particular classes of graphs for which the problem can be efficiently solved.It is well known that the most common interconnection topologies,such as Butterfly-like,Bene(?),CCC,Trivalent Cayley networks,are all characterized by a similar structure:they have nodes organized as a matrix and connections are divided into layers.So we naturally introduce a new class of graphs,called (1×n)-multistage graphs,containing the most common interconnection topologies,on which we study the L(h,k)-labeling.A general algorithm for L(h,k)-labeling these graphs is presented,and from this method an efficient L(2,1)-labeling for Butterfly and CCC networks is derived.Finally we describe a possible generalization of our approach.  相似文献   

4.
This paper investigates the semantics of conditional term rewriting systems with negation (denoted by EI-CTRS),called constructor-based EI-model semantics.The introduction of “≠” in EI-CTRS make EI-CTRS more difficult to study.This is in part because of a failure of EI-CTRS to guarantee that there exist least Herbrand models in classical logical point of views.The key idea of EI-model is to explain that t≠s” means that the two concepts represented by t and s respectively actually belong to distinguished basic concepts represented by two constructor-ground terms.We define the concept of EI-model,and show that there exist least Herbrand EI-models for EI-satisfiable EI-CTRS.From algebraic and logic point of view,we show that there are very strong reasons for regarding the least Herbrand EI-models as the intended semantics of EI-CTRS.According to fixpoint theory,we develop a method to construct least Herbrand EI-models in a bottom-up manner.Moreover,we discuss soundness and completeness of EI-rewrite for EI-model semantics.  相似文献   

5.
Mining frequent patterns from people’s trajectory has become a hot topic in big data research. Previously, these data mostly come from GPS. Compared with GPS data which is more densely sampled, base station data is extremely sparse in both time and space. Trajectory discovery from base station data becomes much more challenging. In this paper, we propose a new method to effectively solve this problem. In our method, we assume that the locations of objects are sampled over a long time period. First, sequential pattern mining algorithm is employed to find frequent passing areas of a person’s route every day. Second, frequent paths are pieced together by points of records which pass through frequent passing area. Finally, to ensure credibility and efficiency, we depend on the location information provided by scattered points which piece together frequent paths to mine frequent road paths.  相似文献   

6.
When a new investment opportunity of purchasing a new device occurs,the investors must decide whether or not and when to buy this device in an online fashion.That is,the online player must make an investment decision while neither future demand for orders nor future investment opportunities are known.This problem which generalizes the basic leasing problem has been introduced by Azar et al.,and then two special cases have been studied by Damaschke.In the so-called equal prices model a 2-competitive algorithm is devised and a 1.618 lower bound is given.Here we make use of an averaging technique and obtain a better tight lower bound of 2,in other words,this lower bound cannot be improved. Furthermore,another special case which only considers two-stage device replacement is studied in this paper.Accounting for the interest rate is an essential feature of any reasonable financial model.Therefore,we explore the two-stage model with and without the interest rate respectively.In addition,we introduce the risk-reward model to analyze this problem and improve the competitive ratio performance.  相似文献   

7.
Peer-to-peer (P2P) botnets outperform the traditional Internet relay chat (IRC) botnets in evading detection and they have become a prevailing type of threat to the Internet nowadays.Current methods for detecting P2P botnets,such as similarity analysis of network behavior and machine-learning based classification,cannot handle the challenges brought about by different network scenarios and botnet variants.We noticed that one important but neglected characteristic of P2P bots is that they periodically send requests to update their peer lists or receive commands from botmasters in the command-and-control (C&C) phase.In this paper,we propose a novel detection model named detection by mining regional periodicity (DMRP),including capturing the event time series,mining the hidden periodicity of host behaviors,and evaluating the mined periodic patterns to identify P2P bot traffic.As our detection model is built based on the basic properties of P2P protocols,it is difficult for P2P bots to avoid being detected as long as P2P protocols are employed in their C&C.For hidden periodicity mining,we introduce the so-called regional periodic pattern mining in a time series and present our algorithms to solve the mining problem.The experimental evaluation on public datasets demonstrates that the algorithms are promising for efficient P2P bot detection in the C&C phase.  相似文献   

8.
A normal Hall subgroup N of a group G is a normal subgroup with its order coprime with its index.SchurZassenhaus theorem states that every normal Hall subgroup has a complement subgroup,that is a set of coset representatives H which also forms a subgroup of G.In this paper,we present a framework to test isomorphism of groups with at least one normal Hall subgroup,when groups are given as multiplication tables.To establish the framework,we first observe that a proof of Schur-Zassenhaus theorem is constructive,and formulate a necessary and sufficient condition for testing isomorphism in terms of the associated actions of the semidirect products,and isomorphisms of the normal parts and complement parts.We then focus on the case when the normal subgroup is abelian.Utilizing basic facts of representation theory of finite groups and a technique by Le Gall (STACS 2009),we first get an efficient isomorphism testing algorithm when the complement has bounded number of generators.For the case when the complement subgroup is elementary abelian,which does not necessarily have bounded number of generators,we obtain a polynomial time isomorphism testing algorithm by reducing to generalized code isomorphism problem,which asks whether two linear subspaces are the same up to permutation of coordinates.A solution to the latter can be obtained by a mild extension of the singly exponential (in the number of coordinates) time algorithm for code isomorphism problem developed recently by Babai et al.(SODA 2011).Enroute to obtaining the above reduction,we study the following computational problem in representation theory of finite groups: given two representations ρ and τ of a group H over Zpd,p a prime,determine if there exists an automorphism φ : H → H ,such that the induced representation ρφ = ρφ and τ are equivalent,in time poly(|H |,pd).  相似文献   

9.
With slim and legless body, particular ball articulation, and rhythmic locomotion, a nature snake adapted itself to many terrains under the control of a neuron system. Based on analyzing the locomotion mechanism, the main functional features of the motor system in snakes are specified in detail. Furthermore, a bidirectional cyclic inhibitory (BCl) CPG model is applied for the first time to imitate the pattern generation for the locomotion control of the snake-like robot, and its characteristics are discussed, particularly for the generation of three kinds of rhythmic locomotion. Moreover, we introduce the neuron network organized by the BCI-CPGs connected in line with unilateral excitation to switch automatically locomotion pattern of a snake-like robot under different commands from the higher level control neuron and present a necessary condition for the CPG neuron network to sustain a rhythmic output. The validity for the generation of different kinds of rhythmic locomotion modes by the CPG network are verified by the dynamic simulations and experiments. This research provided a new method to model the generation mechanism of the rhythmic pattern of the snake.  相似文献   

10.
QoS组播路由:算法与协议   总被引:2,自引:0,他引:2  
  相似文献   

11.
Data mining for client behavior analysis has become increasingly important in business, however further analysis on transactions and sequential behaviors would be of even greater value, especially in the financial service industry, such as banking and insurance, government and so on. In a real-world business application of taxation debt collection, in order to understand the internal relationship between taxpayers’ sequential behaviors (payment, lodgment and actions) and compliance to their debt, we need to find the contrast sequential behavior patterns between compliant and non-compliant taxpayers. Contrast Patterns (CP) are defined as the itemsets showing the difference/discrimination between two classes/datasets (Dong and Li, 1999). However, the existing CP mining methods which can only mine itemset patterns, are not suitable for mining sequential patterns, such as time-ordered transactions in taxpayer sequential behaviors. Little work has been conducted on Contrast Sequential Pattern (CSP) mining so far. Therefore, to address this issue, we develop a CSP mining approach, e C S P, by using an effective CSP-tree structure, which improves the PrefixSpan tree (Pei et al., 2001) for mining contrast patterns. We propose some heuristics and interestingness filtering criteria, and integrate them into the CSP-tree seamlessly to reduce the search space and to find business-interesting patterns as well. The performance of the proposed approach is evaluated on three real-world datasets. In addition, we use a case study to show how to implement the approach to analyse taxpayer behaviour. The results show a very promising performance and convincing business value.  相似文献   

12.
An active research topic in data mining is the discovery of sequential patterns, which finds all frequent subsequences in a sequence database. The generalized sequential pattern (GSP) algorithm was proposed to solve the mining of sequential patterns with time constraints, such as time gaps and sliding time windows. Recent studies indicate that the pattern-growth methodology could speed up sequence mining. However, the capabilities to mine sequential patterns with time constraints were previously available only within the Apriori framework. Therefore, we propose the DELISP (delimited sequential pattern) approach to provide the capabilities within the pattern-growth methodology. DELISP features in reducing the size of projected databases by bounded and windowed projection techniques. Bounded projection keeps only time-gap valid subsequences and windowed projection saves nonredundant subsequences satisfying the sliding time-window constraint. Furthermore, the delimited growth technique directly generates constraint-satisfactory patterns and speeds up the pattern growing process. The comprehensive experiments conducted show that DELISP has good scalability and outperforms the well-known GSP algorithm in the discovery of sequential patterns with time constraints.  相似文献   

13.
Mining sequential patterns by pattern-growth: the PrefixSpan approach   总被引:12,自引:0,他引:12  
Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [R. Agrawal et al. (1994)] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [J. Han et al. (2000)], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [M. Zaki, (2001)] (a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.  相似文献   

14.
基于位置信息的序列模式挖掘算法*   总被引:2,自引:1,他引:1       下载免费PDF全文
PrefixSpan算法在产生频繁序列模式时会产生大量的投影数据库,其中很多投影数据库是相同的。提出了基于位置信息的序列模式挖掘算法——PVS,该方法通过记录每个已产生投影数据库的位置信息,避免了重复产生相同的投影数据库,从而提高了算法的运行效率。通过实验证明,该算法在处理相似度很高的序列数据时比PrefixSpan算法有效。  相似文献   

15.
Constraint-based sequential pattern mining: the pattern-growth methods   总被引:4,自引:0,他引:4  
Constraints are essential for many sequential pattern mining applications. However, there is no systematic study on constraint-based sequential pattern mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequent-pattern mining does not fit our mission well. An extended framework is developed based on a sequential pattern growth methodology. Our study shows that constraints can be effectively and efficiently pushed deep into the sequential pattern mining under this new framework. Moreover, this framework can be extended to constraint-based structured pattern mining as well. This research is supported in part by NSERC Grant 312194-05, NSF Grants IIS-0308001, IIS-0513678, BDI-0515813 and National Science Foundation of China (NSFC) grants No. 60303008 and 69933010. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.  相似文献   

16.
WebLog访问序列模式挖掘将数据挖掘中的序列模式技术应用于Web服务器上的日志文件,以此来改善Web的信息服务,而在对海量的数据挖掘时,系统资源开销很大。该文结合SPAM、PrefixSpan的思想,提出一个新的算法——SPAM-FPT,该算法通过建立First_Positon_Table,避免了SPAM中的“与操作”、“连接操作”以及PrefixSpan中大量的“投影数据库”的建立,可以快捷地挖掘数据库中所有“频繁子序列”。  相似文献   

17.
High utility sequential pattern (HUSP) mining has emerged as an important topic in data mining. A number of studies have been conducted on mining HUSPs, but they are mainly intended for non-streaming data and thus do not take data stream characteristics into consideration. Streaming data are fast changing, continuously generated unbounded in quantity. Such data can easily exhaust computer resources (e.g., memory) unless a proper resource-aware mining is performed. In this study, we explore the fundamental problem of how limited memory can be best utilized to produce high quality HUSPs over a data stream. We design an approximation algorithm, called MAHUSP, that employs memory adaptive mechanisms to use a bounded portion of memory, in order to efficiently discover HUSPs over data streams. An efficient tree structure, called MAS-Tree, is proposed to store potential HUSPs over a data stream. MAHUSP guarantees that all HUSPs are discovered in certain circumstances. Our experimental study shows that our algorithm can not only discover HUSPs over data streams efficiently, but also adapt to memory allocation with limited sacrifices in the quality of discovered HUSPs. Furthermore, in order to show the effectiveness and efficiency of MAHUSP in real-life applications, we apply our proposed algorithm to a web clickstream dataset obtained from a Canadian news portal to showcase users’ reading behavior, and to a real biosequence database to identify disease-related gene regulation sequential patterns. The results show that MAHUSP effectively discovers useful and meaningful patterns in both cases.  相似文献   

18.
Sequential pattern mining (SPM) is an important data mining problem with broad applications. SPM is a hard problem due to the huge number of intermediate subsequences to be considered. State of the art approaches for SPM (e.g., PrefixSpan Pei et al. 2001) are largely based on the pattern-growth approach, where for each frequent prefix subsequence, only its related suffix subsequences need to be considered, and the database is recursively projected into smaller ones. Many authors have promoted the use of constraints to focus on the most promising patterns according to the interests of the end user. The top-k SPM problem is also used to cope with the difficulty of thresholding and to control the number of solutions. State of the art methods developed for SPM and top-k SPM, though efficient, are locked into a rather rigid search strategy, and suffer from the lack of declarativity and flexibility. Indeed, adding new constraints usually amounts to changing the data-structures used in the core of the algorithm, and combining these new constraints often require new developments. Recent works (e.g. Kemmar et al. 2014; Négrevergne and Guns 2015) have investigated the use of Constraint Programming (CP) for SPM. However, despite their nice declarative aspects, all these modelings have scaling problems, due to the huge size of their constraint networks. To address this issue, we propose the Prefix-Projection global constraint, which encapsulates both the subsequence relation as well as the frequency constraint. Its filtering algorithm relies on the principle of projected databases which allows to keep in the variables domain, only values leading to a frequent pattern in the database. Prefix-Projection filtering algorithm enforces domain consistency on the variable succeeding the current frequent prefix in polynomial time. This global constraint also allows for a straightforward implementation of additional constraints such as size, item membership, regular expressions and any combination of them. Experimental results show that our approach clearly outperforms existing CP approaches and competes well with the state-of-the-art methods on large datasets for mining frequent sequential patterns, sequential patterns under various constraints, and top-k sequential patterns. Unlike existing CP methods, our approach achieves a better scalability.  相似文献   

19.
挖掘闭合多维序列模式的可行方法   总被引:1,自引:1,他引:0  
为了对闭合多维序列模式进行挖掘,研究了多维序列模式的基本性质,进而提出了挖掘闭合多雏序列模式的新方法.该方法集成了闭合序列模式挖掘方法和闭合项目集模式挖掘方法,通过证明该方法的正确性,指出闭合多维序列模式集合不大于多维序列模式集合,并且能够覆盖所有多维序列模式的结果集.最后分析了该方法所具备的两个明显优点,表明了在闭合多维序列模式挖掘中的可行性.  相似文献   

20.
In this paper we present a novel methodology for sequence classification, based on sequential pattern mining and optimization algorithms. The proposed methodology automatically generates a sequence classification model, based on a two stage process. In the first stage, a sequential pattern mining algorithm is applied to a set of sequences and the sequential patterns are extracted. Then, the score of every pattern with respect to each sequence is calculated using a scoring function and the score of each class under consideration is estimated by summing the specific pattern scores. Each score is updated, multiplied by a weight and the output of the first stage is the classification confusion matrix of the sequences. In the second stage an optimization technique, aims to finding a set of weights which minimize an objective function, defined using the classification confusion matrix. The set of the extracted sequential patterns and the optimal weights of the classes comprise the sequence classification model. Extensive evaluation of the methodology was carried out in the protein classification domain, by varying the number of training and test sequences, the number of patterns and the number of classes. The methodology is compared with other similar sequence classification approaches. The proposed methodology exhibits several advantages, such as automated weight assignment to classes using optimization techniques and knowledge discovery in the domain of application.
Dimitrios I. FotiadisEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号