期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Incremental sequence-based frequent query pattern mining from XML queries

Guoliang Li Jianhua Feng Jianyong Wang Lizhu Zhou 《Data mining and knowledge discovery》2009,18(3):472-516

Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i ⁺, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly. 相似文献

2.

基于XML的完全频繁查询模式挖掘算法

陈超祥叶时平华成金林樵《计算机应用》2008,28(6):1450-1453

使用树结构建模对XML查询进行研究,提出了一种基于树同构的查询包含检测方法。采用最右分枝扩展方法,系统地枚举查询模式树的同根子树。在枚举过程中,采用Diffset结构记录包含同根子树的事务集的查询事务标识,并给出挖掘算法DiffFRSTMiner。实验结果证实了该算法合理、高效,并可以减少一定的内存开销。相似文献

3.

An efficient algorithm for mining frequent inter-transaction patterns 总被引：1，自引：0，他引：1

Anthony J.T. Lee Chun-Sheng Wang 《Information Sciences》2007,177(17):3453-3476

In this paper, we propose an efficient method for mining all frequent inter-transaction patterns. The method consists of two phases. First, we devise two data structures: a dat-list, which stores the item information used to find frequent inter-transaction patterns; and an ITP-tree, which stores the discovered frequent inter-transaction patterns. In the second phase, we apply an algorithm, called ITP-Miner (Inter-Transaction Patterns Miner), to mine all frequent inter-transaction patterns. By using the ITP-tree, the algorithm requires only one database scan and can localize joining, pruning, and support counting to a small number of dat-lists. The experiment results show that the ITP-Miner algorithm outperforms the FITI (First Intra Then Inter) algorithm by one order of magnitude. 相似文献

4.

一种多关系频繁模式挖掘算法*

邓左祥刘连芳梁一平周小平《计算机应用研究》2009,26(9):3285-3288

传统数据挖掘算法在处理多表时,需要物理连接,存在效率不高的问题。为了解决这一问题,提出了一种多关系频繁模式挖掘算法。该算法利用元组ID传播的思想,使多表间无须物理连接,就可以直接挖掘频繁模式。实验表明,此算法具有较高的效率。相似文献

5.

Dynamic interval-based labeling scheme for efficient XML query and update processing

Jung-Hee Yun Chin-Wan Chung 《Journal of Systems and Software》2008,81(1):56-70

XML data can be represented by a tree or graph structure and XML query processing requires the information of structural relationships among nodes. The basic structural relationships are parent-child and ancestor-descendant, and finding all occurrences of these basic structural relationships in an XML data is clearly a core operation in XML query processing. Several node labeling schemes have been suggested to support the determination of ancestor-descendant or parent-child structural relationships simply by comparing the labels of nodes. However, the previous node labeling schemes have some disadvantages, such as a large number of nodes that need to be relabeled in the case of an insertion of XML data, huge space requirements for node labels, and inefficient processing of structural joins. In this paper, we propose the nested tree structure that eliminates the disadvantages and takes advantage of the previous node labeling schemes. The nested tree structure makes it possible to use the dynamic interval-based labeling scheme, which supports XML data updates with almost no node relabeling as well as efficient structural join processing. Experimental results show that our approach is efficient in handling updates with the interval-based labeling scheme and also significantly improves the performance of the structural join processing compared with recent methods. 相似文献

6.

XML查询的结构连接算法 总被引：1，自引：0，他引：1

黄渊杨薇薇《计算机辅助工程》2007,16(1):73-75

针对目前多数XML结构连接方法在输入元素集合不存在索引或者无序的情况下,对输入数据临时排序或建立索引代价过高的问题,分析经典的Stack-Tree-Desc算法以及B 树索引的优化算法,提出不局限于外部索引结构的XML查询优化策略并给出算法实现.实验结果表明该算法较Stack-Tree-Desc算法查询效率更高. 相似文献

7.

高效查询的XML编码方案 总被引：1，自引：0，他引：1

文华南刘先锋李文锋李玲勇《计算机应用》2010,30(3):831-834

在XML数据查询中,结构连接操作占用了大量时间。针对这个问题,提出一种高效查询的编码方案—LSEQ编码。它将节点路径信息进行分解,避免记录路径的重复信息,减小了编码长度;同时支持节点祖先后代关系,父子关系和兄弟关系的表示。LSEQ编码通过记录非叶节点的路径,在节点查询中避免了结构连接操作,提高了查询效率。实验表明LSEQ编码提高了空间利用率,在查询速度上具有出良好的性能。相似文献

8.

Efficient single-pass frequent pattern mining using a prefix-tree

Syed Khairuzzaman Tanbeer Byeong-Soo Jeong Young-Koo Lee 《Information Sciences》2009,179(5):559-583

The FP-growth algorithm using the FP-tree has been widely studied for frequent pattern mining because it can dramatically improve performance compared to the candidate generation-and-test paradigm of Apriori. However, it still requires two database scans, which are not consistent with efficient data stream processing. In this paper, we present a novel tree structure, called CP-tree (compact pattern tree), that captures database information with one scan (insertion phase) and provides the same mining performance as the FP-growth method (restructuring phase). The CP-tree introduces the concept of dynamic tree restructuring to produce a highly compact frequency-descending tree structure at runtime. An efficient tree restructuring method, called the branch sorting method, that restructures a prefix-tree branch-by-branch, is also proposed in this paper. Moreover, the CP-tree provides full functionality for interactive and incremental mining. Extensive experimental results show that the CP-tree is efficient for frequent pattern mining, interactive, and incremental mining with a single database scan. 相似文献

9.

增量式隐私保护频繁模式挖掘算法

张亚玲王婷王尚平《计算机应用》2018,38(1):176-181

针对多数隐私保护的频繁模式挖掘算法需要多次数据库扫描以及计数时需要进行多次比较的不足,提出了一种增量的基于位图的部分隐藏随机化回答（IBRRPH）算法。首先,引入bitmap表示数据库中的事务,采用位与操作有效提高支持度的计算速度;其次,通过分析增量访问关系,引入增量更新模型,使得在数据增量更新时频繁模式挖掘最大限度地利用了之前挖掘结果。针对增量分别为1000至40000,与顾铖等提出的算法（顾铖,朱保平,张金康.一种改进的隐私保护关联规则挖掘算法.南京航空航天大学学报,2015,47（1）：119-124）进行了对比测试实验。实验结果表明,与顾铖等提出的算法相比,IBRRPH算法的效率提高幅度超过21%。相似文献

10.

DiffNodesets: An efficient structure for fast mining frequent itemsets

《Applied Soft Computing》2016

Mining frequent itemsets is an essential problem in data mining and plays an important role in many data mining applications. In recent years, some itemset representations based on node sets have been proposed, which have shown to be very efficient for mining frequent itemsets. In this paper, we propose DiffNodeset, a novel and more efficient itemset representation, for mining frequent itemsets. Based on the DiffNodeset structure, we present an efficient algorithm, named dFIN, to mining frequent itemsets. To achieve high efficiency, dFIN finds frequent itemsets using a set-enumeration tree with a hybrid search strategy and directly enumerates frequent itemsets without candidate generation under some case. For evaluating the performance of dFIN, we have conduct extensive experiments to compare it against with existing leading algorithms on a variety of real and synthetic datasets. The experimental results show that dFIN is significantly faster than these leading algorithms. 相似文献

11.

DSM-FI: an efficient algorithm for mining frequent itemsets in data streams 总被引：4，自引：4，他引：0

Hua-Fu Li Man-Kwan Shan Suh-Yin Lee 《Knowledge and Information Systems》2008,17(1):79-97

Online mining of data streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some inherent characteristics. In this paper, we propose a new single-pass algorithm, called DSM-FI (data stream mining for frequent itemsets), for online incremental mining of frequent itemsets over a continuous stream of online transactions. According to the proposed algorithm, each transaction of the stream is projected into a set of sub-transactions, and these sub-transactions are inserted into a new in-memory summary data structure, called SFI-forest (summary frequent itemset forest) for maintaining the set of all frequent itemsets embedded in the transaction data stream generated so far. Finally, the set of all frequent itemsets is determined from the current SFI-forest. Theoretical analysis and experimental studies show that the proposed DSM-FI algorithm uses stable memory, makes only one pass over an online transactional data stream, and outperforms the existing algorithms of one-pass mining of frequent itemsets.

Suh-Yin LeeEmail:

相似文献

12.

基于频繁叶模式的XML最大频繁查询模式挖掘算法

陈超祥丁健龙华成金林樵《计算机应用与软件》2009,26(6):85-87,197

在XML频繁查询模式挖掘稠密数据集、长数据集中,为克服项目集挖掘过程中挖掘的项目过多、不利于结果利用等问题,提出基于频繁叶模式的最大频繁查询模式挖掘算法MFRSTMiner。该算法通过构造频繁模式扩展森林,在扩展森林的叶节点中挖掘出最大频繁子树。试验结果表明该算法能够有效地挖掘动态事务集的最大频繁查询模式。相似文献

13.

Sliding window based weighted maximal frequent pattern mining over data streams

《Expert systems with applications》2014,41(2):694-708

As data have been accumulated more quickly in recent years, corresponding databases have also become huger, and thus, general frequent pattern mining methods have been faced with limitations that do not appropriately respond to the massive data. To overcome this problem, data mining researchers have studied methods which can conduct more efficient and immediate mining tasks by scanning databases only once. Thereafter, the sliding window model, which can perform mining operations focusing on recently accumulated parts over data streams, was proposed, and a variety of mining approaches related to this have been suggested. However, it is hard to mine all of the frequent patterns in the data stream environment since generated patterns are remarkably increased as data streams are continuously extended. Thus, methods for efficiently compressing generated patterns are needed in order to solve that problem. In addition, since not only support conditions but also weight constraints expressing items’ importance are one of the important factors in the pattern mining, we need to consider them in mining process. Motivated by these issues, we propose a novel algorithm, weighted maximal frequent pattern mining over data streams based on sliding window model (WMFP-SW) to obtain weighted maximal frequent patterns reflecting recent information over data streams. Performance experiments report that MWFP-SW outperforms previous algorithms in terms of runtime, memory usage, and scalability. 相似文献

14.

频繁模式挖掘的约束算法

孟彩霞《智能系统学报》2009,4(2):142-147

在频繁模式挖掘过程中能够动态改变约束的算法比较少.提出了一种基于约束的频繁模式挖掘算法MCFP.MCFP首先按照约束的性质来建立频繁模式树,并且只需扫描一遍数据库,然后建立每个项的条件树,挖掘以该项为前缀的最大频繁模式,并用最大模式树来存储,最后根据最大模式来找出所有支持度明确的频繁模式.MCFP算法允许用户在挖掘频繁模式过程中动态地改变约束.实验表明,该算法与iCFP算法相比是很有效的. 相似文献

15.

基于日志挖掘的电商查询建议方法

王菁王若飞《计算机工程与科学》2018,40(2):231-237

查询建议可以有效减少用户输入、消除查询歧义,提高信息检索的便捷性和准确率。随着电子商务的发展,查询建议也越来越多地应用于电子商务网站的商品搜索中。然而,传统的基于Web搜索的查询建议方法在电商领域并不能完全适用。针对电商这一特定领域,对不同的查询建议技术进行比较,提出了一种综合考虑用户的搜索以及购物行为的查询建议方法,运用MapReduce技术对用户日志进行挖掘,以此生成检索词词库;并通过在线计算与离线计算结合的方法,为用户提供实时查询建议。实验结果表明,本文提出的基于日志挖掘的电商查询建议方法能有效提高查询建议的准确率,并且具有良好的处理性能。相似文献

16.

数据流频繁模式挖掘综述

韩萌丁剑《计算机应用》2019,39(3):719-727

一些先进应用如欺诈检测和趋势学习等带来了数据流频繁模式挖掘的发展。不同于静态数据,数据流挖掘面临着时空约束和项集组合爆炸等问题。对已有数据流频繁模式挖掘算法进行综述并对经典和最新算法进行分析。按照模式集合的完整程度进行分类,数据流中频繁模式分为全集模式和压缩模式。压缩模式主要包括闭合模式、最大模式、top-k模式以及三者的组合模式。不同之处是闭合模式是无损压缩的,而其他模式是有损压缩的。为了得到有趣的频繁模式,可以挖掘基于用户约束的模式。为了处理数据流中的新近事务,将算法分为基于窗口模型和基于衰减模型的方法。数据流中模式挖掘常见的还包含序列模式和高效用模式,对经典和最新算法进行介绍。最后给出了数据流模式挖掘的下一步工作。相似文献

17.

Novel techniques and an efficient algorithm for closed pattern mining

《Expert systems with applications》2014,41(11):5105-5114

In this paper we show that frequent closed itemset mining and biclustering, the two most prominent application fields in pattern discovery, can be reduced to the same problem when dealing with binary (0–1) data. FCPMiner, a new powerful pattern mining method, is then introduced to mine such data efficiently. The uniqueness of the proposed method is its extendibility to non-binary data. The mining method is coupled with a novel visualization technique and a pattern aggregation method to detect the most meaningful, non-overlapping patterns. The proposed methods are rigorously tested on both synthetic and real data sets. 相似文献

18.

Demand-driven frequent itemset mining using pattern structures

Haixun Wang Chang-Shing Perng Sheng Ma Philip S. Yu 《Knowledge and Information Systems》2005,8(1):82-102

Frequent itemset mining aims at discovering patterns the supports of which are beyond a given threshold. In many applications, including network event management systems, which motivated this work, patterns are composed of items each described by a subset of attributes of a relational table. As it involves an exponential mining space, the efficient implementation of user preferences and mining constraints becomes the first priority for a mining algorithm. User preferences and mining constraints are often expressed using patterns attribute structures. Unlike traditional methods that mine all frequent patterns indiscriminately, we regard frequent itemset mining as a two-step process: the mining of the pattern structures and the mining of patterns within each pattern structure. In this paper, we present a novel architecture that uses pattern structures to organize the mining space. In comparison with the previous techniques, the advantage of our approach is two-fold: (i) by exploiting the interrelationships among pattern structures, execution times for mining can be reduced significantly; and (ii) more importantly, it enables us to incorporate high-level simple user preferences and mining constraints into the mining process efficiently. These advantages are demonstrated by our experiments using both synthetic and real-life datasets. 相似文献

19.

一种基于关键字的XML文档查询算法

下载免费PDF全文

李素清陶世群《计算机工程与应用》2012,48(5):138-142

对XML文档查询的常用方法有两种：一种是使用查询语言;另一种是使用关键字,而使用关键字查询XML文档比使用查询语言更为简单方便。给出了一种使用关键字查询XML文档的索引查找算法。该算法只需要扫描一次关键字对应的编码列,就可以找到需要的编码,提高了查询效率。实验表明该算法是可行的和有效的。相似文献

20.

基于XQuery实现XML高效查询的分析研究

蔡可训《数字社区&智能家居》2009,5(12):9640-9643

当愈来愈多的数据资料以XML为标准格式进行存储时,由于其格式的不同而导致传统的数据库及查询语法无法适用,该文分析了一种全新的XML查询语言XQuery,并对其在相关领域的应用作了介绍和比较。最后,对XQuery的芡展前景作出展望。相似文献