期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Incremental sequence-based frequent query pattern mining from XML queries

Guoliang Li Jianhua Feng Jianyong Wang Lizhu Zhou 《Data mining and knowledge discovery》2009,18(3):472-516

Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i ⁺, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly. 相似文献

2.

Bottom-up discovery of frequent rooted unordered subtrees

Yijun Bei Gang Chen Lidan Shou Xiaoyan Li Jinxiang Dong 《Information Sciences》2009,179(1-2):70-88

In the past decade, XML has emerged as the standard language for information exchanging over the Internet. Due to its tree-structure paradigm, XML is superior for its capability of storing, querying, and manipulating complex data. Therefore, discovering frequent tree patterns over tree-structured data has become an interesting topic for XML data management. In this paper, we propose a tree mining algorithm, named BUXMiner, for finding a special class of frequent trees, called rooted unordered trees, from a tree-structured database. BUXMiner employs an efficient bottom-up approach to enumerate all candidate trees over a compact global tree guide and computes the frequent trees based on the tree guide. In addition to BUXMiner, we also propose a mining approach called BUMXMiner to discover the maximal frequent rooted unordered trees. We compare BUXMiner with previous tree-structure mining algorithms, namely XQPMinerTID and FastXMiner, which were also proposed to discover rooted unordered trees. The experimental results show that our algorithm outperforms XQPMinerTID and FastXMiner in terms of efficiency. The performance results from real-world applications also indicate the usefulness of our proposed tree mining algorithms in a variety of web applications, such as analysis of web page access patterns and mining frequent XML query patterns for caching. 相似文献

3.

Mining frequent patterns in image databases with 9D-SPA representation

Anthony J.T. Lee^{Author Vitae} Ying-Ho Liu Author Vitae Author Vitae Hsiu-Hui Lin Author Vitae Author Vitae 《Journal of Systems and Software》2009,82(4):603-618

In this paper, we propose a novel algorithm, called 9DSPA-Miner, to mine frequent patterns from an image database, where every image is represented by the 9D-SPA representation. Our proposed method consists of three phases. First, we scan the database once and create an index structure. Next, the index structure is scanned to find all frequent patterns of length two. Finally, we use the frequent k-patterns (k ? 2) to generate candidate (k + 1)-patterns and check if the support of each candidate generated is not less than the user-specified minimum support threshold by using the index structure. Then, the steps in the third phase are repeated until no more frequent patterns can be found. Since the 9DSPA-Miner algorithm uses the characteristics of the 9D-SPA representation to prune most of impossible candidates, the experiment results demonstrate that it is more efficient and scalable than the modified Apriori method. 相似文献

4.

Mining spatial association rules in image databases 总被引：2，自引：0，他引：2

Anthony J.T. Lee Ruey-Wen Hong Wen-Kwang Tsao 《Information Sciences》2007,177(7):1593-1608

In this paper, we propose a novel spatial mining algorithm, called 9DLT-Miner, to mine the spatial association rules from an image database, where every image is represented by the 9DLT representation. The proposed method consists of two phases. First, we find all frequent patterns of length one. Next, we use frequent k-patterns (k ? 1) to generate all candidate (k + 1)-patterns. For each candidate pattern generated, we scan the database to count the pattern’s support and check if it is frequent. The steps in the second phase are repeated until no more frequent patterns can be found. Since our proposed algorithm prunes most of impossible candidates, it is more efficient than the Apriori algorithm. The experiment results show that 9DLT-Miner runs 2-5 times faster than the Apriori algorithm. 相似文献

5.

Privacy preserving data mining of sequential patterns for network traffic data

Seung-Woo Kim Jung-Im Won 《Information Sciences》2008,178(3):694-713

As the total amount of traffic data in networks has been growing at an alarming rate, there is currently a substantial body of research that attempts to mine traffic data with the purpose of obtaining useful information. For instance, there are some investigations into the detection of Internet worms and intrusions by discovering abnormal traffic patterns. However, since network traffic data contain information about the Internet usage patterns of users, network users’ privacy may be compromised during the mining process. In this paper, we propose an efficient and practical method that preserves privacy during sequential pattern mining on network traffic data. In order to discover frequent sequential patterns without violating privacy, our method uses the N-repository server model, which operates as a single mining server and the retention replacement technique, which changes the answer to a query probabilistically. In addition, our method accelerates the overall mining process by maintaining the meta tables in each site so as to determine quickly whether candidate patterns have ever occurred in the site or not. Extensive experiments with real-world network traffic data revealed the correctness and the efficiency of the proposed method. 相似文献

6.

Efficient Mining of Frequent Closed XML Query Pattern

下载免费PDF全文

Jian-Hua Feng Qian Qian Jian-Yong Wang and Li-Zhu Zhou 《计算机科学技术学报》2007,22(5):725-735

Previous research works have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. Upon discovery of frequent closed XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Most of the previous algorithms for finding frequent patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present SOLARIA＊, an efficient algorithm for mining frequent closed XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA＊ deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA＊ over the previous known alternative. SOLARIA＊ is also linearly scalable in terms of XML queries＇ size. 相似文献

7.

Finding hot query patterns over an XQuery stream

Liang?Huai?Yang Email author Mong?Li?Lee Wynne?Hsu 《The VLDB Journal The International Journal on Very Large Data Bases》2004,13(4):318-332

Caching query results is one efficient approach to improving the performance of XML management systems. This entails the discovery of frequent XML queries issued by users. In this paper, we model user queries as a stream of XML query pattern trees and mine the frequent query patterns over the query stream. To facilitate the one-pass mining process, we devise a novel data structure called DTS to summarize the pattern trees seen so far. By grouping the incoming pattern trees into batches, we can dynamically mark the active portion of the current batch in DTS and limit the enumeration of candidate trees to only the currently active pattern trees. We also design another summary data structure called ECTree that provides for the incremental computation of the frequent tree patterns over the query stream. Based on the above two constructs, we present two mining algorithms called XQSMinerI and XQSMinerII. XQSMinerI is fast, but it tends to overestimate, while XQSMinerII adopts a filter-and-refine approach to minimize the amount of overestimation. Experimental results show that the proposed methods are both efficient and scalable and require only small memory footprints.Received: 17 October 2003, Accepted: 16 April 2004, Published online: 14 September 2004Edited by: J. Gehrke and J. Hellerstein. 相似文献

8.

XBridge-Mobile: efficient XML keyword search on mobile web data

Jianxin Li Chengfei Liu Jiajie Xu 《Computing》2014,96(7):631-650

Mobile computing over intelligent mobile is affecting human’s habits of obtaining information over Internet, especially keyword search. Most of previous keyword search works are mainly focused on traditional web data sources, in which the performance can be improved by adding more computing power and/or building more offline-computed index. However, it is very challenging to apply the traditional keyword search methods to mobile web-based keyword search because mobile computing has many different features, e.g., frequent disconnections, variety of bandwidths, limited power of mobile devices, limited data size to be downloaded, etc.. To address this challenge, in this paper we design an adaptive mobile-based XML keyword search approach, called XBridge-Mobile, that can derive the semantics of a keyword query and generate a set of effective structured patterns by analyzing the given keyword query and the schemas of XML data sources. Each structured pattern represents one of user’s possible search intentions. The patterns will be firstly sent to the mobile client from web server. And then, the mobile client can select some interested patterns to load the results. By doing this, we can reduce the communication cost a lot between web server and mobile client because only the derived patterns and a few results need to be transferred, not all the keyword search results, by which we can save lots of expenses when the downloaded data is priced. In addition, we can economically maintain the frequent structured pattern queries in the mobile device, which can further reduce the expense of downloading data. At last, we analyze and propose a ranking function to measure the quality of keyword search results, design a set of algorithms to optimize mobile keyword search based on the maintained structured patterns, and present the experimental study of XBridge-Mobile with real XML datasets. 相似文献

9.

Continuous reverse k nearest neighbor monitoring on moving objects in road networks

Li Guohui Li Yanhong Li Jianjun LihChyun Shu Yang Fumin 《Information Systems》2010

Continuous reverse k nearest neighbor (CRkNN) monitoring in road networks has recently received increasing attentions. However, there is still a lack of efficient CRkNN algorithms in road networks up to now. In road networks, moving query objects and data objects are restricted by the connectivity of the road network and both the object–query distance and object–object distance updates affect the result of CRkNN queries. In this paper, we present a novel algorithm for continuous and incremental evaluation of CRkNN queries in road networks. Our method is based on a novel data structure called dual layer multiway tree (DLM tree) we proposed to represent the whole monitoring region of a CRkNN query q. We propose several lemmas to reduce the monitoring region of q and the number of candidate objects as much as possible. Moreover, by associating a variable NN_count with each candidate object, we can simplify the monitoring of candidate objects. There are a large number of objects roaming in a road network and many of them are irrelevant to a specific CRkNN query of a query object q. To minimize the processing extension, for a road in the network, we give an IQL list and an IQCL list to specify the set of query objects and data objects whose location updates should be maintained for CRkNN processing of query objects. Our CRkNN method consists of two phase: the initial result generating phase and incremental maintenance phase. In each phase, algorithms with high performance are proposed to make our CRkNN method more efficient. Extensive simulation experiments are conducted and the result shows that our proposed approach is efficient and scalable in processing CRkNN queries in road networks. 相似文献

10.

CineCubes: Aiding data workers gain insights from OLAP queries

《Information Systems》2015

In this paper we demonstrate that it is possible to enrich query answering with a short data movie that gives insights to the original results of an OLAP query. Our method, implemented in an actual system, CineCubes, includes the following steps. The user submits a query over an underlying star schema. Taking this query as input, the system comes up with a set of queries complementing the information content of the original query, and executes them. For each of the query results, we execute a set of highlight extraction algorithms that identify interesting patterns and values in the data of the results. Then, the system visualizes the query results and accompanies this presentation with a text commenting on the result highlights. Moreover, via a text-to-speech conversion the system automatically produces audio for the constructed text. Each combination of visualization, text and audio practically constitutes a movie, which is wrapped as a PowerPoint presentation and returned to the user. 相似文献

11.

Towards practical private processing of database queries over public data

Shiyuan Wang Divyakant Agrawal Amr El Abbadi 《Distributed and Parallel Databases》2014,32(1):65-89

Privacy is a major concern when users query public online data services. The privacy of millions of people has been jeopardized in numerous user data leakage incidents in many popular online applications. To address the critical problem of personal data leakage through queries, we enable private querying on public data services so that the contents of user queries and any user data are hidden and therefore not revealed to the online service providers. We propose two protocols for private processing of database queries, namely BHE and HHE. The two protocols provide strong query privacy by using Paillier’s homomorphic encryption, and support common database queries such as range and join queries by relying on the bucketization of public data. In contrast to traditional Private Information Retrieval proposals, BHE and HHE only incur one round of client server communication for processing a single query. BHE is a basic private query processing protocol that provides complete query privacy but still incurs expensive computation and communication costs. Built upon BHE, HHE is a hybrid protocol that applies ciphertext computation and communication on a subset of the data, such that this subset not only covers the actual requested data but also resembles some frequent query patterns of common users, thus achieving practical query performance while ensuring adequate privacy levels. By using frequent query patterns and data specific privacy protection, HHE is not vulnerable to the traditional attacks on k-Anonymity that exploit data similarity and skewness. Moreover, HHE consistently protects user query privacy for a sequence of queries in a single query session. 相似文献

12.

Top-n query processing in spatial databases considering bi-chromatic reverse k-nearest neighbors

《Information Systems》2014

A reverse k-nearest neighbor (RkNN) query retrieves the data points which regard the query point as one of their respective k nearest neighbors. A bi-chromatic reverse k-nearest neighbor (BRkNN) query is a variant of the RkNN query, considering two types of data. Given two types of data G and C, a BRkNN query regarding a data point q in G retrieves the data points from C that regard q as one of their respective k-nearest neighbors among the data points in G. Many existing approaches answer either the RkNN query or the BRkNN query. Different from these approaches, in this paper, we make the first attempt to propose a top-n query based on the concept of BRkNN queries, which ranks the data points in G and retrieves the top-n points according to the cardinalities of the corresponding BRkNN answer sets. For efficiently answering this top-n query, we construct the Voronoi Diagram of G to index the data points in G and C. From the information associated with the Voronoi Diagram of G, the upper bound of the cardinality of the BRkNN answer sets for each data point in G can be quickly computed. Moreover, based on an existing approach to answering the RkNN query and the characteristics of the Voronoi Diagram of G, we propose a method to find the candidate region regarding a BRkNN query, which tightens the corresponding search space. Finally, based on the triangle inequality, we propose an efficient refinement algorithm for finding the exact BRkNN answers from the candidate regions. To evaluate our approach on answering the top-n query, it is compared with an approach which applies a state-of-the-art algorithm for answering the BRkNN query to each data point in G. The experiment results reveal that our approach has a much better performance. 相似文献

13.

Mining frequent patterns from univariate uncertain data 总被引：1，自引：0，他引：1

Ying-Ho Liu^{Author Vitae} 《Data & Knowledge Engineering》2012,71(1):47-68

In this paper, we propose a new algorithm called U2P-Miner for mining frequent U2 patterns from univariate uncertain data, where each attribute in a transaction is associated with a quantitative interval and a probability density function. The algorithm is implemented in two phases. First, we construct a U2P-tree that compresses the information in the target database. Then, we use the U2P-tree to discover frequent U2 patterns. Potential frequent U2 patterns are derived by combining base intervals and verified by traversing the U2P-tree. We also develop two techniques to speed up the mining process. Since the proposed method is based on a tree-traversing strategy, it is both efficient and scalable. Our experimental results demonstrate that the U2P-Miner algorithm outperforms three widely used algorithms, namely, the modified Apriori, modified H-mine, and modified depth-first backtracking algorithms. 相似文献

14.

Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach 总被引：22，自引：2，他引：22

Jiawei Han Jian Pei Yiwen Yin Runying Mao 《Data mining and knowledge discovery》2004,8(1):53-87

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist a large number of patterns and/or long patterns.In this study, we propose a novel frequent-pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a condensed, smaller data structure, FP-tree which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern-fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent-pattern mining methods. 相似文献

15.

Approximate Continuous Top-<Emphasis Type="Italic">k</Emphasis> Query over Sliding Window

下载免费PDF全文

Rui Zhu Bin Wang Shi-Ying Luo Xiao-Chun Yang Guo-Ren Wang 《计算机科学技术学报》2017,32(1):93-109

Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε, δ)-approximate continuous top-k query, which returns approximate answers for top-k query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-k query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1 ? δ of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate. 相似文献

16.

Unbounded-error quantum query complexity

Ashley Montanaro 《Theoretical computer science》2011,412(35):4619-4628

This work studies the quantum query complexity of Boolean functions in an unbounded-error scenario where it is only required that the query algorithm succeeds with a probability strictly greater than 1/2. We show that, just as in the communication complexity model, the unbounded-error quantum query complexity is exactly half of its classical counterpart for any (partial or total) Boolean function. Moreover, connecting the query and communication complexity results, we show that the “black-box” approach to convert quantum query algorithms into communication protocols by Buhrman-Cleve—Wigderson [STOC’98] is optimal even in the unbounded-error setting.We also study a related setting, called the weakly unbounded-error setting, where the cost of a query algorithm is given by q+log(1/2(p−1/2)), where q is the number of queries made and p>1/2 is the success probability of the algorithm. In contrast to the case of communication complexity, we show a tight multiplicative Θ(logn) separation between quantum and classical query complexity in this setting for a partial Boolean function. The asymptotic equivalence between them is also shown for some well-studied total Boolean functions. 相似文献

17.

k-Nearest neighbor query processing algorithm for cloaking regions towards user privacy protection in location-based services

Jung-Ho Um Yong-Ki Kim Hyun-Jo Lee Miyoung Jang Jae-Woo Chang 《Journal of Systems Architecture》2012,58(9):354-371

Due to the advancement of wireless internet and mobile positioning technology, the application of location-based services (LBSs) has become popular for mobile users. Since users have to send their exact locations to obtain the service, it may lead to several privacy threats. To solve this problem, a cloaking method has been proposed to blur users’ exact locations into a cloaked spatial region with a required privacy threshold (k). With the cloaked region, an LBS server can carry out a k-nearest neighbor (k-NN) search algorithm. Some recent studies have proposed methods to search k-nearest POIs while protecting a user’s privacy. However, they have at least one major problem, such as inefficiency on query processing or low precision of retrieved result. To resolve these problems, in this paper, we propose a novel k-NN query processing algorithm for a cloaking region to satisfy both requirements of fast query processing time and high precision of the retrieved result. To achieve fast query processing time, we propose a new pruning technique based on a 2D-coodinate scheme. In addition, we make use of a Voronoi diagram for retrieving the nearest POIs efficiently. To satisfy the requirement of high precision of the retrieved result, we guarantee that our k-NN query processing algorithm always contains the exact set of k nearest neighbors. Our performance analysis shows that our algorithm achieves better performance in terms of query processing time and the number of candidate POIs compared with other algorithms. 相似文献

18.

Efficient successor retrieval operations for aggregate query processing on clustered road networks

Engin Demir 《Information Sciences》2010,180(14):2743-2762

Get-Successors (GS) which retrieves all successors of a junction is a kernel operation used to facilitate aggregate computations in road network queries. Efficient implementation of the GS operation is crucial since the disk access cost of this operation constitutes a considerable portion of the total query processing cost. Firstly, we propose a new successor retrieval operation Get-Unevaluated-Successors (GUS), which retrieves only the unevaluated successors of a given junction. The GUS operation is an efficient implementation of the GS operation, where the candidate successors to be retrieved are pruned according to the properties and state of the algorithm. Secondly, we propose a hypergraph-based model for clustering successively retrieved junctions by the GUS operations to the same pages. The proposed model utilizes query logs to correctly capture the disk access cost of GUS operations. The proposed GUS operation and associated clustering model are evaluated for two different instances of GUS operations which typically arise in Dijkstra’s single source shortest path algorithm and incremental network expansion framework. Our simulation results show that the proposed successor retrieval operation together with the proposed clustering hypergraph model is quite effective in reducing the number of disk accesses in query processing. 相似文献

19.

No-but-semantic-match: computing semantically matched xml keyword search results

Mehdi?Naseriparsa Email author Md.?Saiful?Islam Chengfei?Liu Irene?Moser 《World Wide Web》2018,21(5):1223-1257

Users are rarely familiar with the content of a data source they are querying, and therefore cannot avoid using keywords that do not exist in the data source. Traditional systems may respond with an empty result, causing dissatisfaction, while the data source in effect holds semantically related content. In this paper we study this no-but-semantic-match problem on XML keyword search and propose a solution which enables us to present the top-k semantically related results to the user. Our solution involves two steps: (a) extracting semantically related candidate queries from the original query and (b) processing candidate queries and retrieving the top-k semantically related results. Candidate queries are generated by replacement of non-mapped keywords with candidate keywords obtained from an ontological knowledge base. Candidate results are scored using their cohesiveness and their similarity to the original query. Since the number of queries to process can be large, with each result having to be analyzed, we propose pruning techniques to retrieve the top-k results efficiently. We develop two query processing algorithms based on our pruning techniques. Further, we exploit a property of the candidate queries to propose a technique for processing multiple queries in batch, which improves the performance substantially. Extensive experiments on two real datasets verify the effectiveness and efficiency of the proposed approaches. 相似文献

20.

基于频繁项集和相关性的局部反馈查询扩展

下载免费PDF全文

黄名选冯平马瑞兴《计算机工程》2011,37(23):66-68

针对信息检索中存在的词不匹配问题,提出一种基于频繁项集和相关性的局部反馈查询扩展算法。设计查询扩展模型和扩展词权重计算方法,从前列n篇初检文档中,挖掘同时含有查询词项、非查询词项的频繁项集,在该频繁项集中提取非查询词项作为候选扩展词,计算每个候选扩展词与整个查询的相关性,并根据该相关性得到最终的扩展词,以此实现查询扩展。实验结果表明,该算法能有效提高信息检索的性能。相似文献