共查询到20条相似文献,搜索用时 0 毫秒
1.
Francesco Bonchi Fosca Giannotti Claudio Lucchese Salvatore Orlando Raffaele Perego Roberto Trasarti 《Information Systems》2009
In this article we present ConQueSt, a constraint-based querying system able to support the intrinsically exploratory (i.e., human-guided, interactive and iterative) nature of pattern discovery. Following the inductive database vision, our framework provides users with an expressive constraint-based query language, which allows the discovery process to be effectively driven toward potentially interesting patterns. Such constraints are also exploited to reduce the cost of pattern mining computation. ConQueSt is a comprehensive mining system that can access real-world relational databases from which to extract data. Through the interaction with a friendly graphical user interface (GUI), the user can define complex mining queries by means of few clicks. After a pre-processing step, mining queries are answered by an efficient and robust pattern mining engine which entails the state-of-the-art of data and search space reduction techniques. Resulting patterns are then presented to the user in a pattern browsing window, and possibly stored back in the underlying database as relations. 相似文献
2.
《Expert systems with applications》2014,41(14):6098-6105
Streaming time series segmentation is one of the major problems in streaming time series mining, which can create the high-level representation of streaming time series, and thus can provide important supports for many time series mining tasks, such as indexing, clustering, classification, and discord discovery. However, the data elements in streaming time series, which usually arrive online, are fast-changing and unbounded in size, consequently, leading to a higher requirement for the computing efficiency of time series segmentation. Thus, it is a challenging task how to segment streaming time series accurately under the constraint of computing efficiency. In this paper, we propose exponential smoothing prediction-based segmentation algorithm (ESPSA). The proposed algorithm is developed based on a sliding window model, and uses the typical exponential smoothing method to calculate the smoothing value of arrived data element of streaming time series as the prediction value of the future data. Besides, to determine whether a data element is a segmenting key point, we study the statistical characteristics of the prediction error and then deduce the relationship between the prediction error and the compression rate. The extensive experiments on both synthetic and real datasets demonstrate that the proposed algorithm can segment streaming time series effectively and efficiently. More importantly, compared with candidate algorithms, the proposed algorithm can reduce the computing time by orders of magnitude. 相似文献
3.
Algorithms on streaming data have attracted increasing attention in the past decade. Among them, dimensionality reduction algorithms are greatly interesting due to the desirability of real tasks. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two of the most widely used dimensionality reduction approaches. However, PCA is not optimal for general classification problems because it is unsupervised and ignores valuable label information for classification. On the other hand, the performance of LDA is degraded when encountering limited available low-dimensional spaces and singularity problem. Recently, Maximum Margin Criterion (MMC) was proposed to overcome the shortcomings of PCA and LDA. Nevertheless, the original MMC algorithm could not satisfy the streaming data model to handle large-scale high-dimensional data set. Thus an effective, efficient and scalable approach is needed. In this paper, we propose a supervised incremental dimensionality reduction algorithm and its extension to infer adaptive low-dimensional spaces by optimizing the maximum margin criterion. Experimental results on a synthetic dataset and real datasets demonstrate the superior performance of our proposed algorithm on streaming data. 相似文献
4.
Abdullah Mueen Eamonn Keogh Qiang Zhu Sydney S. Cash M. Brandon Westover Nima Bigdely-Shamlo 《Data mining and knowledge discovery》2011,22(1-2):73-105
Time series motifs are sets of very similar subsequences of a long time series. They are of interest in their own right, and are also used as inputs in several higher-level data mining algorithms including classification, clustering, rule-discovery and summarization. In spite of extensive research in recent years, finding time series motifs exactly in massive databases is an open problem. Previous efforts either found approximate motifs or considered relatively small datasets residing in main memory. In this work, we leverage off previous work on pivot-based indexing to introduce a disk-aware algorithm to find time series motifs exactly in multi-gigabyte databases which contain on the order of tens of millions of time series. We have evaluated our algorithm on datasets from diverse areas including medicine, anthropology, computer networking and image processing and show that we can find interesting and meaningful motifs in datasets that are many orders of magnitude larger than anything considered before. 相似文献
5.
Finding clusters in data is a challenging problem. Given a dataset, we usually do not know the number of natural clusters hidden in the dataset. The problem is exacerbated when there is little or no additional information except the data itself. This paper proposes a general stochastic clustering method that is a simplification of nature-inspired ant-based clustering approach. It begins with a basic solution and then performs stochastic search to incrementally improve the solution until the underlying clusters emerge, resulting in automatic cluster discovery in datasets. This method differs from several recent methods in that it does not require users to input the number of clusters and it makes no explicit assumption about the underlying distribution of a dataset. Our experimental results show that the proposed method performs better than several existing methods in terms of clustering accuracy and efficiency in majority of the datasets used in this study. Our theoretical analysis shows that the proposed method has linear time and space complexities, and our empirical study shows that it can accurately and efficiently discover clusters in large datasets in which many existing methods fail to run. 相似文献
6.
Given a text T and a pattern P, the order-preserving pattern matching (OPPM) problem is to find all substrings in T which have the same relative orders as P. The OPPM has been studied in the fields of finding some patterns affected by relative orders, not by their absolute values. In this paper, we present a method of deciding the order-isomorphism between two strings even when there are same characters. Then, we show that the bad character rule of the Horspool algorithm for generic pattern matching problems can be applied to the OPPM problem and we present a space-efficient algorithm for computing shift tables for text search. Finally, we combine our bad character rule with the KMP-based algorithm to improve the worst-case running time. We give experimental results to show that our algorithm is about 2 to 6 times faster than the KMP-based algorithm in reasonable cases. 相似文献
7.
8.
Hyperclique pattern discovery 总被引:6,自引:0,他引:6
Existing algorithms for mining association patterns often rely on the support-based pruning strategy to prune a combinatorial
search space. However, this strategy is not effective for discovering potentially interesting patterns at low levels of support.
Also, it tends to generate too many spurious patterns involving items which are from different support levels and are poorly
correlated. In this paper, we present a framework for mining highly-correlated association patterns called hyperclique patterns.
In this framework, an objective measure called h-confidence is applied to discover hyperclique patterns. We prove that the
items in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by the cosine
similarity (uncentered Pearson's correlation coefficient). Also, we show that the h-confidence measure satisfies a cross-support
property which can help efficiently eliminate spurious patterns involving items with substantially different support levels.
Indeed, this cross-support property is not limited to h-confidence and can be generalized to some other association measures.
In addition, an algorithm called hyperclique miner is proposed to exploit both cross-support and anti-monotone properties
of the h-confidence measure for the efficient discovery of hyperclique patterns. Finally, our experimental results show that
hyperclique miner can efficiently identify hyperclique patterns, even at extremely low levels of support.
相似文献
Vipin KumarEmail: |
9.
10.
针对固定码率(CBR)流媒体的动态替换算法(DRA) 总被引:1,自引:0,他引:1
流媒体代理服务器缓存是针对流媒体访问的一项技术,能有效的提高流媒体访问质量。缓存算法是缓存代理服务器的重要技术组成部分,包括缓存保存方式、替换算法及进入策略三个方面。文中将介绍一套针对固定码率(CBR)流媒体的缓存算法,在包含码率分级保存方法的同时,使用了包含流媒体缓存基本特性的动态替换算法(DRA,Dynamic Replication Algorithm)。 相似文献
11.
A new and direct procedure is presented for determining state-space representations of given, time-invariant systems whose dynamical behavior is expressed in a more general, differential operator form. The procedure employs some preliminary polynomial matrix operations, if necessary, in order to “reduce” the given system to an equivalent differential operator form which satisfies four specific conditions. An equivalent state-space representation is then determined in a most direct manner; i.e. the algorithm presented requires only a single matrix inversion. An explicit relationship between the partial state and input of the given system and the state of the equivalent state-space system is also obtained. 相似文献
12.
Kramer D.M. Kaufman L. Guzman R.J. Hawryszko C. 《Computer Graphics and Applications, IEEE》1990,10(2):62-65
It is shown how to generate oblique slices from a set of parallel slices. An algorithm that can produce planes or contours through the volume without any loss of the volume resolution of the original data set is presented. The algorithm uses the Fourier-shift theorem and is efficient for calculating large numbers of slices. Although the algorithm is general, it is particularly well suited for three-dimensional magnetic resonance images, as demonstrated with examples 相似文献
13.
Dubin Ran Shalala Raffael Dvir Amit Pele Ofir Hadar Ofer 《Multimedia Tools and Applications》2019,78(9):11203-11222
Multimedia Tools and Applications - The increasing popularity of online video content and adaptive video streaming services, especially those based on HTTP Adaptive Streaming (HAS) highlights the... 相似文献
14.
Yunpeng Chai Author Vitae Author Vitae Yinong Chen Author Vitae 《Journal of Systems and Software》2009,82(8):1344-1361
The optimization of Clustered Streaming Media Servers (CSMS), which aims at using as few hardware resources and as cost-effective as possible, while providing satisfactory performance and QoS, has a great impact on the practicability and efficiency of CSMS. Based on the analysis and formulization of critical performance factors of CSMS and the relationship among the performance, QoS, and the costs in CSMS, a stepwise optimization algorithm is developed to solve the optimization problem efficiently. The algorithm is based on an approach that models the optimization problem into a directed acyclic graph and then addresses the complex optimization problem step by step. The algorithm applies a divide and conquer model that not only reduces the complexity of the optimization problem, but also accelerates the optimization process. Progressive information is collected in the process and used in solving the problem. Furthermore, a simulation system of CSMS is necessary for the optimization algorithm to generate the accurate information produced in the entire streaming service process. Thus, we designed and implemented such a simulation system based on the theoretical performance model of CSMS and the parameters measured in practical CSMS testbed. Finally, a case study of the optimization problem is given to demonstrate the process of the algorithm, and an appropriate plan for designing practical CSMS system is illustrated. 相似文献
15.
We have developed a new algorithm for invertebrate expressed sequence tag (EST) analysis, termed as the fmEST algorithm, which
consists of a systematic homology search, functional motif scanning, and clustering alignment. This study was undertaken to
evaluate the validity of our fmEST algorithm in functional motif discovery for invertebrate EST sequence data. Out of 200
unidentified invertebrate ESTs, including 100 arthropod ESTs and 100 mollusk ESTs, 18 arthropod ESTs and 21 mollusk ESTs were
identified as fmESTs that contained functional motifs. The nucleotide lengths of arthropod fmEST and mollusk fmEST sequences
were distributed from 388 to 954 bp and from 222 to 742 bp, respectively. This result allowed us to annotate these invertebrate
fmESTs as various functional genes, while they showed no significant homology to the gene information recorded in the international
DNA databases using the conventional BLAST homology search program. In addition, another 1 arthropod EST and 23 mollusk ESTs
were assembled into contigs with any identified fmESTs by clustering alignment. Based on these findings, we have concluded
that our fmEST algorithm, involving the functional motif discovery procedure, is a valuable approach, enabling us to break
new ground in undeveloped invertebrate EST analysis.
This work was presented in part at the 11th International Symposium on Artificial Life and Robotics, Oita, Japan, January
23–25, 2006 相似文献
16.
A pattern adaptive thinning algorithm 总被引:3,自引:0,他引:3
A simple sequential thinning algorithm for peeling off pixels along contours is described. An adaptive algorithm obtained by incorporating shape adaptivity into this sequential process is also given. The distortions in the skeleton at the right-angle and acute-angle corners are minimized in the adaptive algorithm. The asymmetry of the skeleton, which is a characteristic of sequential algorithm, and is due to the presence of T-corners in some of the even-thickness pattern is eliminated. The performance (in terms of time requirements and shape preservation) is compared with that of a modern thinning algorithm. 相似文献
17.
Recent years have witnessed an increasing interest in computing cosine similarity between high-dimensional documents, transactions, and gene sequences, etc. Most previous studies limited their scope to the pairs of items, which cannot be adapted to the multi-itemset cases. Therefore, from a frequent pattern mining perspective, there exists still a critical need for discovering interesting patterns whose cosine similarity values are above some given thresholds. However, the knottiest point of this problem is, the cosine similarity has no anti-monotone property. To meet this challenge, we propose the notions of conditional anti-monotone property and Support-Ascending Set Enumeration Tree (SA-SET). We prove that the cosine similarity has the conditional anti-monotone property and therefore can be used for the interesting pattern mining if the itemset traversal sequence is defined by the SA-SET. We also identify the anti-monotone property of an upper bound of the cosine similarity, which can be used in further pruning the candidate itemsets. An Apriori-like algorithm called CosMiner is then put forward to mine the cosine interesting patterns from large-scale multi-item databases. Experimental results show that CosMiner can efficiently identify interesting patterns using the conditional anti-monotone property of the cosine similarity and the anti-monotone property of its upper bound, even at extremely low levels of support. 相似文献
18.
Anonymity preserving pattern discovery 总被引:5,自引:0,他引:5
Maurizio Atzori Francesco Bonchi Fosca Giannotti Dino Pedreschi 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(4):703-727
It is generally believed that data mining results do not violate the anonymity of the individuals recorded in the source database. In fact, data mining models and patterns, in order to ensure a required
statistical significance, represent a large number of individuals and thus conceal individual identities: this is the case
of the minimum support threshold in frequent pattern mining. In this paper we show that this belief is ill-founded. By shifting the concept of k
-anonymity from the source data to the extracted patterns, we formally characterize the notion of a threat to anonymity in the context
of pattern discovery, and provide a methodology to efficiently and effectively identify all such possible threats that arise
from the disclosure of the set of extracted patterns. On this basis, we obtain a formal notion of privacy protection that
allows the disclosure of the extracted knowledge while protecting the anonymity of the individuals in the source database.
Moreover, in order to handle the cases where the threats to anonymity cannot be avoided, we study how to eliminate such threats
by means of pattern (not data!) distortion performed in a controlled way. 相似文献
19.
为了解决流媒体传输拥塞控制机制的不足,提出了一种基于链路延迟抖动趋势的TFRC改进算法。对传统的TFRC拥塞控制算法以及链路延迟抖动变化趋势进行了分析,采用对链路拥塞状况进行预测的策略,引入抖动因子来修正TFRC的吞吐量公式,由链路延迟抖动的趋势自适应地调整发送速率。仿真实验结果表明,改进算法在保持TCP友好性的前提下,有效提高了流媒体数据传输的平滑性和稳定性。 相似文献