首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.
对于连续频繁访问路径的挖掘如果采用常见的序列模式挖掘算法, 挖掘效率是比较低的, 而且只能得到频繁访问路径. 本文在研究访问路径性质的基础上给出了一种能从普通 Web 日志中挖掘出连续频繁访问路径的算法. 设计了一种新颖的数据结构压缩存储空间及存储所需挖掘信息. 同时采用分区搜索的方式, 为每个频繁节点构造一棵后缀树, 通过遍历该后缀树挖掘出连续频繁访问路径. 采用这种方法进行挖掘, 无需生成候选集, 而且一次就可以挖掘出所有以根节点为后缀的连续频繁访问路径.  相似文献   

2.
“Sequential pattern mining” is a prominent and significant method to explore the knowledge and innovation from the large database. Common sequential pattern mining algorithms handle static databases. Pragmatically, looking into the functional and actual execution, the database grows exponentially thereby leading to the necessity and requirement of such innovation, research, and development culminating into the designing of mining algorithm. Once the database is updated, the previous mining result will be incorrect, and we need to restart and trigger the entire mining process for the new updated sequential database. To overcome and avoid the process of rescanning of the entire database, this unique system of incremental mining of sequential pattern is available. The previous approaches, system, and techniques are a priori-based frameworks but mine patterns is an advanced and sophisticated technique giving the desired solution. We propose and incorporate an algorithm called STISPM for incremental mining of sequential patterns using the sequence tree space structure. STISPM uses the depth-first approach along with backward tracking and the dynamic lookahead pruning strategy that removes infrequent and irregular patterns. The process and approach from the root node to any leaf node depict a sequential pattern in the database. The structural characteristic of the sequence tree makes it convenient and appropriate for incremental sequential pattern mining. The sequence tree also stores all the sequential patterns with its count and statistics, so whenever the support system is withdrawn or changed, our algorithm using frequent sequence tree as the storage structure can find and detect all the sequential patterns without mining the database once again.  相似文献   

3.
Frequent pattern mining is the most important phase of association rule mining process because of its time and space complexity. Several methods have attempted to improve the performance of association rule mining by enhancing frequent pattern mining efficiency. Due to the large size of the data-sets and huge amounts of data which should be mined, many parallel and distributed mining approaches have been introduced to divide data-sets or to distribute mining processes between multiple processors or computers and thus, improve the efficiency of the mining process. In this paper, we propose a hadoop-based parallel implementation of PrePost+ algorithm for frequent itemset mining. In our parallel approach, the process of constructing N-Lists of itemsets has been distributed between the mappers and the operation of the final pruning process and extracting frequent itemsets has been carried out by reducers in a map-reduce parallel programming model. The experimental results show that our hadoop-based PrePost+(HBPrePost+) algorithm outperforms one of the best existing parallel methods of frequent itemset mining (PARMA) in terms of execution time.  相似文献   

4.
针对频繁项集挖掘存在数据和模式冗余的问题,对数据流最大频繁项集挖掘算法进行了研究。针对目前典型的数据流最大频繁模式挖掘算法DSM-MFI存在消耗大量存储空间及执行效率低等问题,提出了一种挖掘数据流界标窗口内最大频繁项集的算法MMFI-DS,该算法首先采用SEFI-tree存储包含在不断增长的数据流中相关最大频繁项集的重要信息,同时删除SEFI-tree中大量不频繁项目,然后使用自顶向下和自底向上双向搜索策略挖掘界标窗口中一系列的最大频繁项集。理论分析与实验表明,该算法比DSM-MFI算法具有更高的效率,并能节省存储空间。  相似文献   

5.
为改进基于数据库垂直表示的频繁项集挖掘算法的性能,给出了用索引数组方法来改进计算性能的思路.提出了索引数组的概念及其计算方法,并提出了一种新的高效的频繁项集挖掘算法Index-FIMiner.该算法大大减少了不必要的tidset求交及相应的频繁性判断操作,同时也论证了代表项可直接与其包含索引中的所有项集的组合进行连接,这些结果项集的支持度均与代表项的支持度相等,从而降低了这些频繁项集的处理代价,提高了算法的性能.实验结果表明,Index-FIMiner算法具有较高的挖掘效率.  相似文献   

6.
Zhang  X.J. Gong  Y. 《Communications, IET》2009,3(10):1683-1692
The authors consider a dual-hop multi-relay cooperative relay system in this study. Both decode-and-forward (DF) and amplify-and-forward (AF) protocols are considered. Under different relay selection strategies, the authors derive closed-form outage probability expressions. With the second-order channel statistics, the authors propose to jointly optimise power allocation (PA) and relay positions in order to minimise the system outage probability. Simulation results show that the proposed adaptive allocation algorithms significantly outperform fixed allocation algorithms. With the proposed joint optimisation algorithm, AF relaying outperforms DF relaying when multiple relays are selected to help. When only the best relay is selected to help, DF relaying is shown to have better performance.  相似文献   

7.
针对用于数据流频繁项集挖掘的现有方法存在引入过多次频繁项集以及时空性能与输出精度较低的问题,利用Chebyshev不等式,构造了项集频度周期采样的概率误差边界,给出了动态检测项集支持度变化方法.提出了一种基于周期采样的数据流频繁项集挖掘算法FI-PS,该算法通过跟踪项集支持度变化确定项集支持度的稳定性,并以此作为调整窗口大小以及采样周期的依据,从而以一个较大的概率保证项集支持度误差有上界.理论分析及实验证明该算法有效,在保证挖掘结果准确度相对较好的条件下,可获得较优执行性能.  相似文献   

8.
为实现在大型事务数据库中挖掘有价值的序列数据,提出了一种基于位图的高效的序列模式挖掘算法(SMBR)。SMBR算法采用位图表示数据库的方法,提出一种简化的位图表示结构。该算法首先由序列扩展和项扩展产生候选序列,然后通过原序列位图和被扩展项位图位置快速运算生成频繁序列。实验表明,应用于大型事务数据库,该方法不仅能有效地提高挖掘效率,而且挖掘处理过程中产生的临时数据所需的内存大大降低,能够高效地挖掘序列模式。  相似文献   

9.
The antenna subset selection technique balances the performance and hardware cost in the multipleinput multiple-output (MIMO) systems, and the problems on the antenna selection in MIMO relay systems have not been fully solved. This paper considers antenna selection on amplify-and-forward (AF) and decode-andforward (DF) MIMO relay systems to maximise capacity. Since the optimal antenna selection algorithm has high complexity, two fast algorithms are proposed. The selection criterion of the algorithm for AF relay is to maximise a lower bound of the capacity, but not the exact capacity. This criterion reduces algorithmic complexity. The algorithm for DF relay is an extension of an existing antenna subset selection algorithm for one-hop MIMO systems. The authors show the derivations of the algorithms in detail, and analyse their complexity in terms of numbers of complex multiplications. Simulation results show that the proposed algorithms for both cases achieve comparable performance to the optimal algorithm under various conditions, and have decreased complexity.  相似文献   

10.
Mining penetration testing semantic knowledge hidden in vast amounts of raw penetration testing data is of vital importance for automated penetration testing. Associative rule mining, a data mining technique, has been studied and explored for a long time. However, few studies have focused on knowledge discovery in the penetration testing area. The experimental result reveals that the long-tail distribution of penetration testing data nullifies the effectiveness of associative rule mining algorithms that are based on frequent pattern. To address this problem, a Bayesian inference based penetration semantic knowledge mining algorithm is proposed. First, a directed bipartite graph model, a kind of Bayesian network, is constructed to formalize penetration testing data. Then, we adopt the maximum likelihood estimate method to optimize the model parameters and decompose a large Bayesian network into smaller networks based on conditional independence of variables for improved solution efficiency. Finally, irrelevant variable elimination is adopted to extract penetration semantic knowledge from the conditional probability distribution of the model. The experimental results show that the proposed method can discover penetration semantic knowledge from raw penetration testing data effectively and efficiently.  相似文献   

11.
We investigate the use of direct-Fourier (DF) image reconstruction in computed tomography and synthetic aperture radar (SAR). One of our aims is to determine why the convolution-backprojection (CBP) method is favored over DF methods in tomography, while DF methods are virtually always used in SAR. We show that the CBP algorithm is equivalent to DF reconstruction using a Jacobian-weighted two-dimensional periodic sinc-kernel interpolator. This interpolation is not optimal in any sense, which suggests that DF algorithms using optimal interpolators may surpass CBP in image quality. We consider use of two types of DF interpolation: a windowed sinc kernel, and the least-squares optimal Yen interpolator. Simulations show that reconstructions using the Yen interpolator do not possess the expected visual quality, because of regularization needed to preserve numerical stability. Next, we show that with a concentric-squares sampling scheme, DF interpolation can be performed accurately and efficiently, producing imagery that is superior to that obtainable by other algorithms. In the case of SAR, we show that the DF method performs very well with interpolators of low complexity. We also study DF reconstruction in SAR for trapezoidal grids. We conclude that the success of the DF method in SAR imaging is due to the nearly Cartesian shape of the sampling grid. © 1998 John Wiley & Sons, Inc. Int J Imaging Syst Technol, 9, 1–13, 1998  相似文献   

12.
多关系频繁模式发现能够直接从复杂结构化数据中发现涉及多个关系的复杂频繁模式,避免了传统方法的局限。有别于主流基于归纳逻辑程序设计技术的方法,提出了基于合取查询包含关系的面向语义的精简化多关系频繁模式发现方法,具有理论与技术基础的新颖性,解决了两种语义冗余问题。实验表明,该方法在可理解性、功能、效率以及可扩展性方面具有优势。  相似文献   

13.
In this study, we propose a simple and novel data structure using hyper-links, H-struct, and a new mining algorithm, H-mine, which takes advantage of this data structure and dynamically adjusts links in the mining process. A distinct feature of this method is that it has a very limited and precisely predictable main memory cost and runs very quickly in memory-based settings. Moreover, it can be scaled up to very large databases using database partitioning. When the data set becomes dense, (conditional) FP-trees can be constructed dynamically as part of the mining process. Our study shows that H-mine has an excellent performance for various kinds of data, outperforms currently available algorithms in different settings, and is highly scalable to mining large databases. This study also proposes a new data mining methodology, space-preserving mining, which may have a major impact on the future development of efficient and scalable data mining methods.  相似文献   

14.
Cluster analysis is one of the popular data mining techniques and it is defined as the process of grouping similar data. K-Means is one of the clustering algorithms to cluster the numerical data. The features of K-Means clustering algorithm are easy to implement and it is efficient to handle large amounts of data. The major problem with K-Means is the selection of initial centroids. It selects the initial centroids randomly and it leads to a local optimum solution. Recently, nature-inspired optimization algorithms are combined with clustering algorithms to obtain the global optimum solution. Crow Search Algorithm (CSA) is a new population-based metaheuristic optimization algorithm. This algorithm is based on the intelligent behaviour of the crows. In this paper, CSA is combined with the K-Means clustering algorithm to obtain the global optimum solution. Experiments are conducted on benchmark datasets and the results are compared to those from various clustering algorithms and optimization-based clustering algorithms. Also the results are evaluated with internal, external and statistical experiments to prove the efficiency of the proposed algorithm.  相似文献   

15.
Artificial Bee Colony (ABC) algorithm is used in many domains of computation, including optimization, clustering and classification tasks. Further, honey bees dancing is one of the most fascinating and intriguing behaviours of animal life. Honey bees’ dancing is termed as “waggle Dance” in literature and they perform it for indicating the food sources in their environment. This work presents a novel honey bees dancing language (HBDL)-based algorithm for mining the induction rules from datasets. The proposed HBDL algorithm is implemented and tested against the performance of ABC, Particle Swarm Optimization and nine more traditional algorithms frequently used by researchers. The experimental results showed that HBDL is a suitable and effective technique for data mining and classification task.  相似文献   

16.
一种快速的间接关联挖掘算法   总被引:1,自引:1,他引:0  
给出了一个基于候选间接关联反单调性和频繁项目对支持矩阵的不需要生成所有频繁集的直接挖掘项目对之间间接关联的挖掘算法,并在一个Web log的真实数据集上进行了试验,与现有算法的比较表明该算法具有更好的性能。  相似文献   

17.
Currently, the uniform linear array (ULA) is the most commonly used antenna system for different wireless systems like commercial cellular systems. In this study, a ULA adaptive antenna that uses a novel numerical stochastic optimisation algorithm inspired from colonising weeds, designated as invasive weed optimisation (IWO), is introduced. Weeds are shown to be very robust and adaptive to changes in the environment. Thus, capturing their properties leads to a powerful optimisation algorithm. This optimisation algorithm is used for adaptive beamforming; the obtained results are compared with the results obtained from two other optimisation algorithms, that is, the least mean square and genetic algorithms. The reported results show that the IWO is very robust and effective in locating the optimal solution with higher precision and a lower cost function when compared with the other two algorithms. Other advantages of the IWO algorithm is its simplicity and fast convergence, which makes it a practical algorithm for adaptive beamforming.  相似文献   

18.
Land cover change detection has been a topic of active research in the remote sensing community. Due to enormous amount of data available from satellites, it has attracted the attention of data mining researchers to search a new direction for solution. The Terra Moderate Resolution Imaging Spectrometer (MODIS) vegetation index (EVI/NDVI) data products are used for land cover change detection. These data products are associated with various challenges such as seasonality of data, spatio-temporal correlation, missing values, poor quality measurement, high resolution and high dimensional data. The land cover change detection has often been performed by comparing two or more satellite snapshot images acquired on different dates. The image comparison techniques have a number of limitations. The data mining technique addresses many challenges such as missing value and poor quality measurements present in the data set, by performing the pre-processing of data. Furthermore, the data mining approaches are capable of handling large data sets and also use some of the inherent characteristics of spatio-temporal data; hence, they can be applied to increasingly immense data set. This paper stretches in detail various data mining algorithms for land cover change detection and each algorithm’s advantages and limitations. Also, an empirical study of some existing land cover change detection algorithms and results have been presented in this paper.  相似文献   

19.
A survey of temporal data mining   总被引:2,自引:0,他引:2  
Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining. We mainly concentrate on algorithms for pattern discovery in sequential data streams. We also describe some recent results regarding statistical analysis of pattern discovery methods.  相似文献   

20.
Ueno  Maomi  Yamazaki  Takahiro 《Behaviormetrika》2008,35(2):137-158

This paper proposes a collaborative filtering method for massive datasets that is based on Bayesian networks. We first compare the prediction accuracy of four scoring-based learning Bayesian networks algorithms (AIC, MDL, UPSM, and BDeu) and two conditional-independence-based (Cl-based) learning Bayesian networks algorithms (MWST, and Polytree-MWST) using actual massive datasets. The results show that (1) for large networks, the scoring-based algorithms have lower prediction accuracy than the CI-based algorithms and (2) when the scoring-based algorithms use a greedy search to learn a large network, algorithms which make a lot of arcs tend to have less prediction accuracy than those that make fewer arcs. Next, we propose a learning algorithm based on MWST for collaborative filtering of massive datasets. The proposed algorithm employs a traditional data mining technique, the “a priori” algorithm, to quickly calculate the amount of mutual information, which is needed in MWST, from massive datasets. We compare the original MWST algorithm and the proposed algorithm on actual data, and the comparison shows the effectiveness of the proposed algorithm.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号