期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Scalable parallel data mining for association rules 总被引：3，自引：0，他引：3

Eui-Hong Han Karypis G. Kumar V. 《Knowledge and Data Engineering, IEEE Transactions on》2000,12(3):337-352

The authors propose two new parallel formulations of the Apriori algorithm (R. Agrawal and R. Srikant, 1994) that is used for computing association rules. These new formulations, IDD and HD, address the shortcomings of two previously proposed parallel formulations CD and DD. Unlike the CD algorithm, the IDD algorithm partitions the candidate set intelligently among processors to efficiently parallelize the step of building the hash tree. The IDD algorithm also eliminates the redundant work inherent in DD, and requires substantially smaller communication overhead than DD. But IDD suffers from the added cost due to communication of transactions among processors. HD is a hybrid algorithm that combines the advantages of CD and DD. Experimental results on a 128-processor Cray T3E show that HD scales just as well as the CD algorithm with respect to the number of transactions, and scales as well as IDD with respect to increasing candidate set size 相似文献

2.

Self-adaptive nonoverlapping sequential pattern mining

Wang Yuehua Wu Youxi Li Yan Yao Fang Fournier-Viger Philippe Wu Xindong 《Applied Intelligence》2022,52(6):6646-6661

Applied Intelligence - Repetitive sequential pattern mining (SPM) with gap constraints is a data analysis task that consists of identifying patterns (subsequences) appearing many times in a... 相似文献

3.

闭合序列模式挖掘算法 总被引：3，自引：1，他引：2

沙金邓成玉张翠肖刘伟峰《计算机工程与设计》2006,27(3):514-518

提出了一种新的挖掘闭合序列模式的PosD算法,该算法利用位置数据保存数据项的顺序信息,并基于位置数据列表保存数据项的顺序关系提出了两种修剪方法：逆向超模式和相同位置数据。为了确保栅格存储的正确性和简洁性,另外还针对一些特殊情况做处理。试验结果表明,在中大型数据库和小支持度的情况下谊算法比CloSpan算法更有效。相似文献

4.

Anonymity preserving sequential pattern mining

Anna Monreale Dino Pedreschi Ruggero G. Pensa Fabio Pinelli 《Artificial Intelligence and Law》2014,22(2):141-173

The increasing availability of personal data of a sequential nature, such as time-stamped transaction or location data, enables increasingly sophisticated sequential pattern mining techniques. However, privacy is at risk if it is possible to reconstruct the identity of individuals from sequential data. Therefore, it is important to develop privacy-preserving techniques that support publishing of really anonymous data, without altering the analysis results significantly. In this paper we propose to apply the Privacy-by-design paradigm for designing a technological framework to counter the threats of undesirable, unlawful effects of privacy violation on sequence data, without obstructing the knowledge discovery opportunities of data mining technologies. First, we introduce a k-anonymity framework for sequence data, by defining the sequence linking attack model and its associated countermeasure, a k-anonymity notion for sequence datasets, which provides a formal protection against the attack. Second, we instantiate this framework and provide a specific method for constructing the k-anonymous version of a sequence dataset, which preserves the results of sequential pattern mining, together with several basic statistics and other analytical properties of the original data, including the clustering structure. A comprehensive experimental study on realistic datasets of process-logs, web-logs and GPS tracks is carried out, which empirically shows how, in our proposed method, the protection of privacy meets analytical utility. 相似文献

5.

序列模式挖掘综述 总被引：4，自引：0，他引：4

陈卓杨炳儒宋威宋泽锋《计算机应用研究》2008,25(7):1960-1963

综述了序列模式挖掘的研究状况。首先介绍了序列模式挖掘背景与相关概念;其次总结了序列模式挖掘的一般方法,介绍并分析了最具代表性的序列模式挖掘算法;最后展望序列模式挖掘的研究方向。便于研究者对已有算法进行改进,提出具有更好性能的新的序列模式挖掘算法。相似文献

6.

Weighted frequent sequential pattern mining

Islam Md Ashraful Rafi Mahfuzur Rahman Azad Al-amin Ovi Jesan Ahammed 《Applied Intelligence》2022,52(1):254-281

Trillions of bytes of data are generated every day in different forms, and extracting useful information from that massive amount of data is the study of data mining. Sequential pattern mining is a major branch of data mining that deals with mining frequent sequential patterns from sequence databases. Due to items having different importance in real-life scenarios, they cannot be treated uniformly. With today’s datasets, the use of weights in sequential pattern mining is much more feasible. In most cases, as in real-life datasets, pushing weights will give a better understanding of the dataset, as it will also measure the importance of an item inside a pattern rather than treating all the items equally. Many techniques have been introduced to mine weighted sequential patterns, but typically these algorithms generate a massive number of candidate patterns and take a long time to execute. This work aims to introduce a new pruning technique and a complete framework that takes much less time and generates a small number of candidate sequences without compromising with completeness. Performance evaluation on real-life datasets shows that our proposed approach can mine weighted patterns substantially faster than other existing approaches.

相似文献

7.

基于相邻频繁模式段的闭合序列模式挖掘算法

下载免费PDF全文

王淼尚学群薛贺《计算机工程与应用》2008,44(11):148-151

直接对生物序列进行频繁模式挖掘会产生很多冗余模式,闭合模式更能表达出序列的功能和结构。根据生物序列的特点,提出了基于相邻闭合频繁模式段的模式挖掘算法－JCPS。首先产生闭合相邻频繁模式段,然后对这些闭合频繁模式段进行组合,同时进行闭合检测,产生新的闭合频繁模式。通过对真实的蛋白质序列家族库的处理,证明该算法能有效处理生物序列数据。相似文献

8.

基于位置信息的序列模式挖掘算法* 总被引：1，自引：1，他引：1

张利军李战怀王淼《计算机应用研究》2009,26(2):529-531

PrefixSpan算法在产生频繁序列模式时会产生大量的投影数据库,其中很多投影数据库是相同的。提出了基于位置信息的序列模式挖掘算法——PVS,该方法通过记录每个已产生投影数据库的位置信息,避免了重复产生相同的投影数据库,从而提高了算法的运行效率。通过实验证明,该算法在处理相似度很高的序列数据时比PrefixSpan算法有效。相似文献

9.

基于序列模式挖掘的隐私保护

景波刘莹黄兵《计算机工程与应用》2007,43(22):173-175

研究针对序列模式有关隐私保护议题,提出有效的SDRF序列模式隐藏算法,让分享序列模式时也能保护自己的核心信息。相似文献

10.

Clustering of tourist routes for individual tourists using sequential pattern mining

Lee Gun Ho Han Hee Seon 《The Journal of supercomputing》2020,76(7):5364-5381

Grouping individual tourists who have the same or similar tourist routes over the same time period makes it more convenient for the tourists at a low cost by providing transportation means such as regular or occasional tour buses, driver, and tourism guides. In this paper, we propose a mathematical formulation for the tour routes clustering problem and two phases for a sequential pattern algorithm for clustering similar or identical routes according to the tourist routes of individual tourists, with illustrative examples. The first phase is to construct a site by site frequency matrix and prune infrequent tour route patterns from the matrix. The second phase is to perform clustering of the tour routes to determine the tour route using a sequential pattern mining algorithm. We compare and evaluate the performance of our algorithms, i.e., in terms of execution time and memory used. The proposed algorithm is efficient in both runtime and memory usage for the increasing number of transactions.

相似文献

11.

Scalable pattern mining with Bayesian networks as background knowledge 总被引：1，自引：1，他引：1

Szymon Jaroszewicz Tobias Scheffer Dan A. Simovici 《Data mining and knowledge discovery》2009,18(1):56-100

We study a discovery framework in which background knowledge on variables and their relations within a discourse area is available in the form of a graphical model. Starting from an initial, hand-crafted or possibly empty graphical model, the network evolves in an interactive process of discovery. We focus on the central step of this process: given a graphical model and a database, we address the problem of finding the most interesting attribute sets. We formalize the concept of interestingness of attribute sets as the divergence between their behavior as observed in the data, and the behavior that can be explained given the current model. We derive an exact algorithm that finds all attribute sets whose interestingness exceeds a given threshold. We then consider the case of a very large network that renders exact inference unfeasible, and a very large database or data stream. We devise an algorithm that efficiently finds the most interesting attribute sets with prescribed approximation bound and confidence probability, even for very large networks and infinite streams. We study the scalability of the methods in controlled experiments; a case-study sheds light on the practical usefulness of the approach. 相似文献

12.

Closed sequential pattern mining for sitemap generation

Ceci Michelangelo Lanotte Pasqua Fabiana 《World Wide Web》2021,24(1):175-203

World Wide Web - A sitemap represents an explicit specification of the design concept and knowledge organization of a website and is therefore considered as the website’s basic ontology. It... 相似文献

13.

NetNMSP: Nonoverlapping maximal sequential pattern mining

Li Yan Zhang Shuai Guo Lei Liu Jing Wu Youxi Wu Xindong 《Applied Intelligence》2022,52(9):9861-9884

Applied Intelligence - Nonoverlapping sequential pattern mining, as a kind of repetitive sequential pattern mining with gap constraints, can find more valuable patterns. Traditional algorithms... 相似文献

14.

Distributed and scalable sequential pattern mining through stream processing

Chun-Chieh Chen Hong-Han Shuai Ming-Syan Chen 《Knowledge and Information Systems》2017,53(2):365-390

Scalability is a primary issue in existing sequential pattern mining algorithms for dealing with a large amount of data. Previous work, namely sequential pattern mining on the cloud (SPAMC), has already addressed the scalability problem. It supports the MapReduce cloud computing architecture for mining frequent sequential patterns on large datasets. However, this existing algorithm does not address the iterative mining problem, which is the problem that reloading data incur additional costs. Furthermore, it did not study the load balancing problem. To remedy these problems, we devised a powerful sequential pattern mining algorithm, the sequential pattern mining in the cloud-uniform distributed lexical sequence tree algorithm (SPAMC-UDLT), exploiting MapReduce and streaming processes. SPAMC-UDLT dramatically improves overall performance without launching multiple MapReduce rounds and provides perfect load balancing across machines in the cloud. The results show that SPAMC-UDLT can significantly reduce execution time, achieves extremely high scalability, and provides much better load balancing than existing algorithms in the cloud. 相似文献

15.

一种Web日志序列模式挖掘的改进方法

下载免费PDF全文

汪琳庄卫华《计算机工程与应用》2010,46(7):136-138

Web序列模式挖掘是Web数据挖掘重要研究内容之一。在WAP算法的基础上提出了一种改进算法,该算法在Web序列模式挖掘过程中不需要反复生成条件树,从而提高了算法的运行效率。实验表明,该算法在运行时间上相对于WAP算法具有明显的优势。相似文献

16.

挖掘闭合多维序列模式的可行方法

纪兆辉李存华《计算机工程与设计》2009,30(22)

为了对闭合多维序列模式进行挖掘,研究了多维序列模式的基本性质,进而提出了挖掘闭合多雏序列模式的新方法.该方法集成了闭合序列模式挖掘方法和闭合项目集模式挖掘方法,通过证明该方法的正确性,指出闭合多维序列模式集合不大于多维序列模式集合,并且能够覆盖所有多维序列模式的结果集.最后分析了该方法所具备的两个明显优点,表明了在闭合多维序列模式挖掘中的可行性. 相似文献

17.

Constraint-based sequential pattern mining: the pattern-growth methods 总被引：4，自引：0，他引：4

Jian Pei Jiawei Han Wei Wang 《Journal of Intelligent Information Systems》2007,28(2):133-160

Constraints are essential for many sequential pattern mining applications. However, there is no systematic study on constraint-based sequential pattern mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequent-pattern mining does not fit our mission well. An extended framework is developed based on a sequential pattern growth methodology. Our study shows that constraints can be effectively and efficiently pushed deep into the sequential pattern mining under this new framework. Moreover, this framework can be extended to constraint-based structured pattern mining as well. This research is supported in part by NSERC Grant 312194-05, NSF Grants IIS-0308001, IIS-0513678, BDI-0515813 and National Science Foundation of China (NSFC) grants No. 60303008 and 69933010. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies. 相似文献

18.

基于深度优先序列模式挖掘的预取模型

下载免费PDF全文

卫琳石磊《计算机工程与应用》2007,43(20):169-172

序列模式挖掘能够发现隐含在Web日志中的用户的访问规律,可以被用来在Web预取模型中预测即将访问的Web对象。目前大多数序列模式挖掘是基于Apriori的宽度优先算法。提出了基于位图深度优先挖掘算法,采用基于字典树数据结构的深度优先策略,同时采用位图保存和计算各序列的支持度,能够较迅速地挖掘出频繁序列。将该序列模式挖掘算法应用于Web预取模型中,在预取缓存一体化的条件下实验表明具有较好的性能。相似文献

19.

Malicious sequential pattern mining for automatic malware detection

《Expert systems with applications》2016

Due to its damage to Internet security, malware (e.g., virus, worm, trojan) and its detection has caught the attention of both anti-malware industry and researchers for decades. To protect legitimate users from the attacks, the most significant line of defense against malware is anti-malware software products, which mainly use signature-based method for detection. However, this method fails to recognize new, unseen malicious executables. To solve this problem, in this paper, based on the instruction sequences extracted from the file sample set, we propose an effective sequence mining algorithm to discover malicious sequential patterns, and then All-Nearest-Neighbor (ANN) classifier is constructed for malware detection based on the discovered patterns. The developed data mining framework composed of the proposed sequential pattern mining method and ANN classifier can well characterize the malicious patterns from the collected file sample set to effectively detect newly unseen malware samples. A comprehensive experimental study on a real data collection is performed to evaluate our detection framework. Promising experimental results show that our framework outperforms other alternate data mining based detection methods in identifying new malicious executables. 相似文献

20.

序列模式挖掘中的隐私保护方法研究

朱玉全胡天寒陈耿常鹏《计算机应用研究》2009,26(7):2489-2491

目前,已提出了一些关联规则挖掘中的隐私保护方法,而对序列模式挖掘中隐私保护的研究却很少。为此,提出了一种有效的敏感序列隐藏算法CLSDA(current least sequences delete algorithm),该算法对候选序列加权,在删除序列的过程中随时更新权值,使用贪心算法获得局部最优解,尽可能减少对原始数据库的改动。实验结果表明,与现有序列模式隐藏方法相比,算法CLSDA将具有更好的隐藏效果。相似文献