首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this article we present ConQueSt, a constraint-based querying system able to support the intrinsically exploratory (i.e., human-guided, interactive and iterative) nature of pattern discovery. Following the inductive database vision, our framework provides users with an expressive constraint-based query language, which allows the discovery process to be effectively driven toward potentially interesting patterns. Such constraints are also exploited to reduce the cost of pattern mining computation. ConQueSt is a comprehensive mining system that can access real-world relational databases from which to extract data. Through the interaction with a friendly graphical user interface (GUI), the user can define complex mining queries by means of few clicks. After a pre-processing step, mining queries are answered by an efficient and robust pattern mining engine which entails the state-of-the-art of data and search space reduction techniques. Resulting patterns are then presented to the user in a pattern browsing window, and possibly stored back in the underlying database as relations.  相似文献   

2.
Algorithms on streaming data have attracted increasing attention in the past decade. Among them, dimensionality reduction algorithms are greatly interesting due to the desirability of real tasks. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two of the most widely used dimensionality reduction approaches. However, PCA is not optimal for general classification problems because it is unsupervised and ignores valuable label information for classification. On the other hand, the performance of LDA is degraded when encountering limited available low-dimensional spaces and singularity problem. Recently, Maximum Margin Criterion (MMC) was proposed to overcome the shortcomings of PCA and LDA. Nevertheless, the original MMC algorithm could not satisfy the streaming data model to handle large-scale high-dimensional data set. Thus an effective, efficient and scalable approach is needed. In this paper, we propose a supervised incremental dimensionality reduction algorithm and its extension to infer adaptive low-dimensional spaces by optimizing the maximum margin criterion. Experimental results on a synthetic dataset and real datasets demonstrate the superior performance of our proposed algorithm on streaming data.  相似文献   

3.
Streaming time series segmentation is one of the major problems in streaming time series mining, which can create the high-level representation of streaming time series, and thus can provide important supports for many time series mining tasks, such as indexing, clustering, classification, and discord discovery. However, the data elements in streaming time series, which usually arrive online, are fast-changing and unbounded in size, consequently, leading to a higher requirement for the computing efficiency of time series segmentation. Thus, it is a challenging task how to segment streaming time series accurately under the constraint of computing efficiency. In this paper, we propose exponential smoothing prediction-based segmentation algorithm (ESPSA). The proposed algorithm is developed based on a sliding window model, and uses the typical exponential smoothing method to calculate the smoothing value of arrived data element of streaming time series as the prediction value of the future data. Besides, to determine whether a data element is a segmenting key point, we study the statistical characteristics of the prediction error and then deduce the relationship between the prediction error and the compression rate. The extensive experiments on both synthetic and real datasets demonstrate that the proposed algorithm can segment streaming time series effectively and efficiently. More importantly, compared with candidate algorithms, the proposed algorithm can reduce the computing time by orders of magnitude.  相似文献   

4.
Time series motifs are sets of very similar subsequences of a long time series. They are of interest in their own right, and are also used as inputs in several higher-level data mining algorithms including classification, clustering, rule-discovery and summarization. In spite of extensive research in recent years, finding time series motifs exactly in massive databases is an open problem. Previous efforts either found approximate motifs or considered relatively small datasets residing in main memory. In this work, we leverage off previous work on pivot-based indexing to introduce a disk-aware algorithm to find time series motifs exactly in multi-gigabyte databases which contain on the order of tens of millions of time series. We have evaluated our algorithm on datasets from diverse areas including medicine, anthropology, computer networking and image processing and show that we can find interesting and meaningful motifs in datasets that are many orders of magnitude larger than anything considered before.  相似文献   

5.
Finding clusters in data is a challenging problem. Given a dataset, we usually do not know the number of natural clusters hidden in the dataset. The problem is exacerbated when there is little or no additional information except the data itself. This paper proposes a general stochastic clustering method that is a simplification of nature-inspired ant-based clustering approach. It begins with a basic solution and then performs stochastic search to incrementally improve the solution until the underlying clusters emerge, resulting in automatic cluster discovery in datasets. This method differs from several recent methods in that it does not require users to input the number of clusters and it makes no explicit assumption about the underlying distribution of a dataset. Our experimental results show that the proposed method performs better than several existing methods in terms of clustering accuracy and efficiency in majority of the datasets used in this study. Our theoretical analysis shows that the proposed method has linear time and space complexities, and our empirical study shows that it can accurately and efficiently discover clusters in large datasets in which many existing methods fail to run.  相似文献   

6.
传统流媒体传输对等网在应用层构建覆盖图(overlay),其逻辑结构可能与网络物理拓扑不匹配,造成节点接收延迟大,网络利用效率不高。针对此问题,提出一种拓扑感知的对等网组织算法,称之为TaP2P(Topology-aware Peer-to-Peer),根据节点到数据源的距离动态调整节点在覆盖图中位置,使数据转发路径符合网络物理拓扑。模拟实验表明该算法有效降低了节点平均接收延迟。  相似文献   

7.
基于SNMP的远程网络拓扑发现方法   总被引:1,自引:0,他引:1  
提出并实现了基于简单网络管理协议(SNMP)的大型异构IP网络拓扑发现方法,该方法包括代理发现、拓扑信息探测和拓扑信息分析三个步骤。对代理发现中的探测报文构造、去除冗余信息、信息分析算法以及非转发设备信息利用等关键问题进行了讨论,针对探测时遇到的路由器间歇性不响应、路由器过长时间不响应和探测目标为子网络号等问题进行了分析并给出了解决方案。工程实现结果表明,该方法可以高效地获取较为丰富的拓扑信息,与traceroute路径探测结合使用,可以极大地提高拓扑发现结果的完整性。  相似文献   

8.
Hyperclique pattern discovery   总被引:6,自引:0,他引:6  
Existing algorithms for mining association patterns often rely on the support-based pruning strategy to prune a combinatorial search space. However, this strategy is not effective for discovering potentially interesting patterns at low levels of support. Also, it tends to generate too many spurious patterns involving items which are from different support levels and are poorly correlated. In this paper, we present a framework for mining highly-correlated association patterns called hyperclique patterns. In this framework, an objective measure called h-confidence is applied to discover hyperclique patterns. We prove that the items in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by the cosine similarity (uncentered Pearson's correlation coefficient). Also, we show that the h-confidence measure satisfies a cross-support property which can help efficiently eliminate spurious patterns involving items with substantially different support levels. Indeed, this cross-support property is not limited to h-confidence and can be generalized to some other association measures. In addition, an algorithm called hyperclique miner is proposed to exploit both cross-support and anti-monotone properties of the h-confidence measure for the efficient discovery of hyperclique patterns. Finally, our experimental results show that hyperclique miner can efficiently identify hyperclique patterns, even at extremely low levels of support.
Vipin KumarEmail:
  相似文献   

9.
Given a text T and a pattern P, the order-preserving pattern matching (OPPM) problem is to find all substrings in T which have the same relative orders as P. The OPPM has been studied in the fields of finding some patterns affected by relative orders, not by their absolute values. In this paper, we present a method of deciding the order-isomorphism between two strings even when there are same characters. Then, we show that the bad character rule of the Horspool algorithm for generic pattern matching problems can be applied to the OPPM problem and we present a space-efficient algorithm for computing shift tables for text search. Finally, we combine our bad character rule with the KMP-based algorithm to improve the worst-case running time. We give experimental results to show that our algorithm is about 2 to 6 times faster than the KMP-based algorithm in reasonable cases.  相似文献   

10.
AC及其改进算法基于有限状态自动机,随着中文模式串数目增加,完全Hash表和状态表矩阵存储方式会导致存储空间快速膨胀,状态转移函数计算量大,Cache命中率下降,算法的时空性能急剧下降。提出以邻接链表方式存储有限状态自动机,并将状态"0"的链表转化为线性表,以提高算法的时空效率。在此基础上,设计了一种适合中文的多模式匹配算法,该算法所需存储空间仅为完全Hash表方式的10%,约为状态表矩阵方式的20%。  相似文献   

11.
One of the major challenges in data mining is the extraction of comprehensible knowledge from recorded data. In this paper, a coevolutionary-based classification technique, namely COevolutionary Rule Extractor (CORE), is proposed to discover classification rules in data mining. Unlike existing approaches where candidate rules and rule sets are evolved at different stages in the classification process, the proposed CORE coevolves rules and rule sets concurrently in two cooperative populations to confine the search space and to produce good rule sets that are comprehensive. The proposed coevolutionary classification technique is extensively validated upon seven datasets obtained from the University of California, Irvine (UCI) machine learning repository, which are representative artificial and real-world data from various domains. Comparison results show that the proposed CORE produces comprehensive and good classification rules for most datasets, which are competitive as compared with existing classifiers in literature. Simulation results obtained from box plots also unveil that CORE is relatively robust and invariant to random partition of datasets.  相似文献   

12.
提出了一种同时使用缓存策略和调度机制的功率节省算法(Joint Buffering and Scheduling,JBS),该算法的基本思想是,只要一个流已经获得了未来一段时间播放所需的分组,即可将其无线网络接口(Wireless Network Interface,WNI)切换至睡眠模式。为了能够在较短时间内累积到足够多的分组进入睡眠,同时还能够准确计算睡眠时间以维护正常的播放质量,JBS同时引入了整型缓存和调度策略。描述了JBS的基本工作过程,分别介绍了算法所采用的缓存和调度策略;通过仿真方法与传统的BKS、RBS算法的性能进行比较,充分验证了JBS的有效性。  相似文献   

13.
针对固定码率(CBR)流媒体的动态替换算法(DRA)   总被引:1,自引:0,他引:1  
马杰  樊建平 《计算机应用》2005,25(5):1112-1115
流媒体代理服务器缓存是针对流媒体访问的一项技术,能有效的提高流媒体访问质量。缓存算法是缓存代理服务器的重要技术组成部分,包括缓存保存方式、替换算法及进入策略三个方面。文中将介绍一套针对固定码率(CBR)流媒体的缓存算法,在包含码率分级保存方法的同时,使用了包含流媒体缓存基本特性的动态替换算法(DRA,Dynamic Replication Algorithm)。  相似文献   

14.
It is shown how to generate oblique slices from a set of parallel slices. An algorithm that can produce planes or contours through the volume without any loss of the volume resolution of the original data set is presented. The algorithm uses the Fourier-shift theorem and is efficient for calculating large numbers of slices. Although the algorithm is general, it is particularly well suited for three-dimensional magnetic resonance images, as demonstrated with examples  相似文献   

15.
A new and direct procedure is presented for determining state-space representations of given, time-invariant systems whose dynamical behavior is expressed in a more general, differential operator form. The procedure employs some preliminary polynomial matrix operations, if necessary, in order to “reduce” the given system to an equivalent differential operator form which satisfies four specific conditions. An equivalent state-space representation is then determined in a most direct manner; i.e. the algorithm presented requires only a single matrix inversion. An explicit relationship between the partial state and input of the given system and the state of the equivalent state-space system is also obtained.  相似文献   

16.
Dubin  Ran  Shalala  Raffael  Dvir  Amit  Pele  Ofir  Hadar  Ofer 《Multimedia Tools and Applications》2019,78(9):11203-11222
Multimedia Tools and Applications - The increasing popularity of online video content and adaptive video streaming services, especially those based on HTTP Adaptive Streaming (HAS) highlights the...  相似文献   

17.
The optimization of Clustered Streaming Media Servers (CSMS), which aims at using as few hardware resources and as cost-effective as possible, while providing satisfactory performance and QoS, has a great impact on the practicability and efficiency of CSMS. Based on the analysis and formulization of critical performance factors of CSMS and the relationship among the performance, QoS, and the costs in CSMS, a stepwise optimization algorithm is developed to solve the optimization problem efficiently. The algorithm is based on an approach that models the optimization problem into a directed acyclic graph and then addresses the complex optimization problem step by step. The algorithm applies a divide and conquer model that not only reduces the complexity of the optimization problem, but also accelerates the optimization process. Progressive information is collected in the process and used in solving the problem. Furthermore, a simulation system of CSMS is necessary for the optimization algorithm to generate the accurate information produced in the entire streaming service process. Thus, we designed and implemented such a simulation system based on the theoretical performance model of CSMS and the parameters measured in practical CSMS testbed. Finally, a case study of the optimization problem is given to demonstrate the process of the algorithm, and an appropriate plan for designing practical CSMS system is illustrated.  相似文献   

18.
19.
Centralized or hierarchical administration of the classical grid resource discovery approaches is unable to efficiently manage the highly dynamic large-scale grid environments. Peer-to-peer (P2P) overlay represents a dynamic, scalable, and decentralized prospect of the grids. Structured P2P methods do not fully support the multi-attribute range queries and unstructured P2P resource discovery methods suffer from the network-wide broadcast storm problem. In this paper, a decentralized learning automata-based resource discovery algorithm is proposed for large-scale P2P grids. The proposed method supports the multi-attribute range queries and forwards the resource queries through the shortest path ending at the grid peers more likely having the requested resource. Several simulation experiments are conducted to show the efficiency of the proposed algorithm. Numerical results reveal the superiority of the proposed model over the other methods in terms of the average hop count, average hit ratio, and control message overhead.  相似文献   

20.
Anonymity preserving pattern discovery   总被引:5,自引:0,他引:5  
It is generally believed that data mining results do not violate the anonymity of the individuals recorded in the source database. In fact, data mining models and patterns, in order to ensure a required statistical significance, represent a large number of individuals and thus conceal individual identities: this is the case of the minimum support threshold in frequent pattern mining. In this paper we show that this belief is ill-founded. By shifting the concept of k -anonymity from the source data to the extracted patterns, we formally characterize the notion of a threat to anonymity in the context of pattern discovery, and provide a methodology to efficiently and effectively identify all such possible threats that arise from the disclosure of the set of extracted patterns. On this basis, we obtain a formal notion of privacy protection that allows the disclosure of the extracted knowledge while protecting the anonymity of the individuals in the source database. Moreover, in order to handle the cases where the threats to anonymity cannot be avoided, we study how to eliminate such threats by means of pattern (not data!) distortion performed in a controlled way.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号