期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel Mining of Neuronal Spike Streams on Graphics Processing Units

Yong Cao Debprakash Patnaik Sean Ponce Jeremy Archuleta Patrick Butler Wu-chun Feng Naren Ramakrishnan 《International journal of parallel programming》2012,40(6):605-632

Multi-electrode arrays (MEAs) provide dynamic and spatial perspectives into brain function by capturing the temporal behavior of spikes recorded from cultures and living tissue. Understanding the firing patterns of neurons implicit in these spike trains is crucial to gaining insight into cellular activity. We present a solution involving a massively parallel graphics processing unit (GPU) to mine spike train datasets. We focus on mining frequent episodes of firing patterns that capture coordinated events even in the presence of intervening background events. We present two algorithmic strategies—hybrid mining and two-pass elimination—to map the finite state machine-based counting algorithms onto GPUs. These strategies explore different computation-to-core mapping schemes and illustrate innovative parallel algorithm design patterns for temporal data mining. We also provide a multi-GPU mining framework, which exhibits additional performance enhancement. Together, these contributions move us towards a real-time solution to neuronal data mining. 相似文献

2.

Mining spatial colocation patterns: a different framework 总被引：2，自引：0，他引：2

Jin Soung Yoo Mark Bow 《Data mining and knowledge discovery》2012,24(1):159-194

Recently, there has been considerable interest in mining spatial colocation patterns from large spatial datasets. Spatial colocation patterns represent the subsets of spatial events whose instances are often located in close geographic proximity. Most studies of spatial colocation mining require the specification of two parameter constraints to find interesting colocation patterns. One is a minimum prevalent threshold of colocations, and the other is a distance threshold to define spatial neighborhood. However, it is difficult for users to decide appropriate threshold values without prior knowledge of their task-specific spatial data. In this paper, we propose a different framework for spatial colocation pattern mining. To remove the first constraint, we propose the problem of finding N-most prevalent colocated event sets, where N is the desired number of colocated event sets with the highest interest measure values per each pattern size. We developed two alternative algorithms for mining the N-most patterns. They reduce candidate events effectively and use a filter-and-refine strategy for efficiently finding colocation instances from a spatial dataset. We prove the algorithms are correct and complete in finding the N-most prevalent colocation patterns. For the second constraint, a distance threshold for spatial neighborhood determination, we present various methods to estimate appropriate distance bounds from user input data. The result can help an user to set a distance for a conceptualization of spatial neighborhood. Our experimental results with real and synthetic datasets show that our algorithmic design is computationally effective in finding the N-most prevalent colocation patterns. The discovered patterns were different depending on the distance threshold, which shows that it is important to select appropriate neighbor distances. 相似文献

3.

Scalable algorithms for clustering large datasets with mixed type attributes

Zengyou He Xiaofei Xu Shengchun Deng 《国际智能系统杂志》2005,20(10):1077-1089

Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes. However, datasets with mixed types of attributes are common in real life data mining applications. In this article, we present two algorithms that extend the Squeezer algorithm to domains with mixed numeric and categorical attributes. The performance of the two algorithms has been studied on real and artificially generated datasets. Comparisons with other clustering algorithms illustrate the superiority of our approaches. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 1077–1089, 2005. 相似文献

4.

A Nested Two-Stage Clustering Method for Structured Temporal Sequence Data

Wang Liang Narayanan Vignesh Yu Yao-Chi Park Yikyung Li Jr-Shin 《Knowledge and Information Systems》2021,63(7):1627-1662

Mining patterns of temporal sequence data is an important problem across many disciplines. Under appropriate preprocessing procedures, a structured temporal sequence can be organized into a probability measure or a time series representation, which grants a potential to reveal distinctive temporal pattern characteristics. In this paper, we propose a nested two-stage clustering method that integrates optimal transport and the dynamic time warping distances to learn the distributional and dynamic shape-based dissimilarity at the respective stage. The proposed clustering algorithm preserves both the distribution and shape patterns present in the data, which are critical for the datasets composed of structured temporal sequences. The effectiveness of the method is tested against existing agglomerative and K-shape-based clustering algorithms on Monte Carlo simulated synthetic datasets, and the performance is compared through various cluster validation metrics. Furthermore, we apply the developed method to real-world datasets from three domains: temporal dietary records, online retail sales, and smart meter energy profiles. The expressiveness of the cluster and subcluster centroid patterns shows significant promise of our method for structured temporal sequence data mining.

相似文献

5.

Discovering multi-label temporal patterns in sequence databases

Yen-Liang Chen Shin-Yi Wu 《Information Sciences》2011,181(3):398-418

Sequential pattern mining is one of the most important data mining techniques. Previous research on mining sequential patterns discovered patterns from point-based event data, interval-based event data, and hybrid event data. In many real life applications, however, an event may involve many statuses; it might not occur only at one certain point in time or over a period of time. In this work, we propose a generalized representation of temporal events. We treat events as multi-label events with many statuses, and introduce an algorithm called MLTPM to discover multi-label temporal patterns from temporal databases. The experimental results show that the efficiency and scalability of the MLTPM algorithm are satisfactory. We also discuss interesting multi-label temporal patterns discovered when MLTPM was applied to historical Nasdaq data. 相似文献

6.

Effective temporal data classification by integrating sequential pattern mining and probabilistic induction

Vincent S. Tseng Chao-Hui Lee 《Expert systems with applications》2009,36(5):9524-9532

Data classification is an important topic in the field of data mining due to its wide applications. A number of related methods have been proposed based on the well-known learning models such as decision tree or neural network. Although data classification was widely discussed, relatively few studies explored the topic of temporal data classification. Most of the existing researches focused on improving the accuracy of classification by using statistical models, neural network, or distance-based methods. However, they cannot interpret the results of classification to users. In many research cases, such as gene expression of microarray, users prefer the classification information above a classifier only with a high accuracy. In this paper, we propose a novel pattern-based data mining method, namely classify-by-sequence (CBS), for classifying large temporal datasets. The main methodology behind the CBS is integrating sequential pattern mining with probabilistic induction. The CBS has the merit of simplicity in implementation and its pattern-based architecture can supply clear classification information to users. Through experimental evaluation, the CBS was shown to deliver classification results with high accuracy under two real time series datasets. In addition, we designed a simulator to evaluate the performance of CBS under datasets with different characteristics. The experimental results show that CBS can discover the hidden patterns and classify data effectively by utilizing the mined sequential patterns. 相似文献

7.

一种基于MDL的日志序列模式挖掘算法

杜诗晴王鹏汪卫《计算机工程》2021,47(2):118-125

日志数据是互联网系统产生的过程性事件记录数据,从日志数据中挖掘出高质量序列模式可帮助工程师高效开展系统运维工作。针对传统模式挖掘算法结果冗余的问题,提出一种从时序日志序列中挖掘序列模式（DTS）的算法。DTS采用启发式思路挖掘能充分代表原序列中事件关系和时序规律的模式集合,并将最小描述长度准则应用于模式挖掘,设计一种考虑事件关系和时序关系的编码方案,以解决模式规模爆炸问题。在真实日志数据集上的实验结果表明,与SQS、CSC与ISM等序列模式挖掘算法相比,该算法能高效挖掘出含义丰富且冗余度低的序列模式。相似文献

8.

An approach to discovering multi-temporal patterns and its application to financial databases

Xiaoxiao Kong Guoqing Chen 《Information Sciences》2010,180(6):873-195

Managerial decision-making processes often involve data of the time nature and need to understand complex temporal associations among events. Extending classical association rule mining approaches in consideration of time in order to obtain temporal information/knowledge is deemed important for decision support, which is nowadays one of the key issues in business intelligence. This paper presents the notion of multi-temporal patterns with four different temporal predicates, namely before, during, equal and overlap, and discusses a number of related properties, based on which a mining algorithm is designed. This enables us to effectively discover multi-temporal patterns in large-scale temporal databases by reducing the database scan in the generation of candidate patterns. The proposed approach is then applied to stock markets, aimed at exploring possible associative movements between the stock markets of Chinese mainland and Hong Kong so as to provide helpful knowledge for investment decisions. 相似文献

9.

Finding Frequent Patterns in a Large Sparse Graph<Superscript>*</Superscript>

Michihiro?Kuramochi Email author George?Karypis 《Data mining and knowledge discovery》2005,11(3):243-271

Graph-based modeling has emerged as a powerful abstraction capable of capturing in a single and unified framework many of the relational, spatial, topological, and other characteristics that are present in a variety of datasets and application areas. Computationally efficient algorithms that find patterns corresponding to frequently occurring subgraphs play an important role in developing data mining-driven methodologies for analyzing the graphs resulting from such datasets. This paper presents two algorithms, based on the horizontal and vertical pattern discovery paradigms, that find the connected subgraphs that have a sufficient number of edge-disjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods for determining the number of edge-disjoint embeddings of a subgraph and employ novel algorithms for candidate generation and frequency counting, which allow them to operate on datasets with different characteristics and to quickly prune unpromising subgraphs. Experimental evaluation on real datasets from various domains show that both algorithms achieve good performance, scale well to sparse input graphs with more than 120,000 vertices or 110,000 edges, and significantly outperform previously developed algorithms. 相似文献

10.

Mining regional co-location patterns with kNNG 总被引：2，自引：0，他引：2

Feng Qian Kevin Chiew Qinming He Hao Huang 《Journal of Intelligent Information Systems》2014,42(3):485-505

Spatial co-location pattern mining discovers the subsets of features of which the events are frequently located together in geographic space. The current research on this topic adopts a distance threshold that has limitations in spatial data sets with various magnitudes of neighborhood distances, especially for mining of regional co-location patterns. In this paper, we propose a hierarchical co-location mining framework accounting for both variety of neighborhood distances and spatial heterogeneity. By adopting k-nearest neighbor graph (kNNG) instead of distance threshold, we propose “distance variation coefficient” as a new measure to drive the mining operations and determine an individual neighborhood relationship graph for each region. The proposed mining algorithm outputs a set of regions with each of them an individual set of regional co-location patterns. The experimental results on both synthetic and real world data sets show that our framework is effective to discover these regional co-location patterns. 相似文献

11.

Dynamic scene understanding using temporal association rules

Ayesha M. Talha Imran N. Junejo 《Image and vision computing》2014

The basic goal of scene understanding is to organize the video into sets of events and to find the associated temporal dependencies. Such systems aim to automatically interpret activities in the scene, as well as detect unusual events that could be of particular interest, such as traffic violations and unauthorized entry. The objective of this work, therefore, is to learn behaviors of multi-agent actions and interactions in a semi-supervised manner. Using tracked object trajectories, we organize similar motion trajectories into clusters using the spectral clustering technique. This set of clusters depicts the different paths/routes, i.e., the distinct events taking place at various locations in the scene. A temporal mining algorithm is used to mine interval-based frequent temporal patterns occurring in the scene. A temporal pattern indicates a set of events that are linked based on their relationship with other events in the set, and we use Allen's interval-based temporal logic to describe these relations. The resulting frequent patterns are used to generate temporal association rules, which convey the semantic information contained in the scene. Our overall aim is to generate rules that govern the dynamics of the scene and perform anomaly detection. We apply the proposed approach on two publicly available complex traffic datasets and demonstrate considerable improvements over the existing techniques. 相似文献

12.

Effective periodic pattern mining in time series databases

Manziba Akanda Nishi Chowdhury Farhan Ahmed Md. Samiullah Byeong-Soo Jeong 《Expert systems with applications》2013,40(8):3015-3027

The goal of analyzing a time series database is to find whether and how frequent a periodic pattern is repeated within the series. Periodic pattern mining is the problem that regards temporal regularity. However, most of the existing algorithms have a major limitation in mining interesting patterns of users interest, that is, they can mine patterns of specific length with all the events sequentially one after another in exact positions within this pattern. Though there are certain scenarios where a pattern can be flexible, that is, it may be interesting and can be mined by neglecting any number of unimportant events in between important events with variable length of the pattern. Moreover, existing algorithms can detect only specific type of periodicity in various time series databases and require the interaction from user to determine periodicity. In this paper, we have proposed an algorithm for the periodic pattern mining in time series databases which does not rely on the user for the period value or period type of the pattern and can detect all types of periodic patterns at the same time, indeed these flexibilities are missing in existing algorithms. The proposed algorithm facilitates the user to generate different kinds of patterns by skipping intermediate events in a time series database and find out the periodicity of the patterns within the database. It is an improvement over the generating pattern using suffix tree, because suffix tree based algorithms have weakness in this particular area of pattern generation. Comparing with the existing algorithms, the proposed algorithm improves generating different kinds of interesting patterns and detects whether the generated pattern is periodic or not. We have tested the performance of our algorithm on both synthetic and real life data from different domains and found a large number of interesting event sequences which were missing in existing algorithms and the proposed algorithm was efficient enough in generating and detecting periodicity of flexible patterns on both types of data. 相似文献

13.

Mining neighbor-based patterns in data streams 总被引：1，自引：0，他引：1

Di Yang Elke A. Rundensteiner Matthew O. Ward 《Information Systems》2013

Discovery of complex patterns such as clusters, outliers, and associations from huge volumes of streaming data has been recognized as critical for many application domains. However, little research effort has been made toward detecting patterns within sliding window semantics as required by real-time monitoring tasks, ranging from real time traffic monitoring to stock trend analysis. Applying static pattern detection algorithms from scratch to every window is impractical due to their high algorithmic complexity and the real-time responsiveness required by streaming applications. In this work, we develop methods for the incremental detection of neighbor-based patterns, in particular, density-based clusters and distance-based outliers over sliding stream windows. Incremental computation for pattern detection queries is challenging. This is because purging of to-be-expired data from previously formed patterns may cause birth, shrinkage, splitting or termination of these complex patterns. To overcome this, we exploit the “predictability” property of sliding windows to elegantly discount the effect of expired objects with little maintenance cost. Our solution achieves guaranteed minimal CPU consumption, while keeping the memory utilization linear in the number of objects in the window. To thoroughly analyze the performance of our proposed methods, we develop a cost model characterizing the performance of our proposed neighbor-based pattern mining strategies. We conduct an analysis study to not only identify the key performance factors for each strategy but also show under which conditions each of them are most efficient. Our comprehensive experimental study, using both synthetic and real data from domains of moving object monitoring and stock trades, demonstrates superiority of our proposed strategies over alternate methods in both CPU processing resources and in memory utilization. 相似文献

14.

Mining rooted ordered trees under subtree homeomorphism

Mostafa Haghir Chehreghani Maurice Bruynooghe 《Data mining and knowledge discovery》2016,30(5):1249-1272

Mining frequent tree patterns has many applications in different areas such as XML data, bioinformatics and World Wide Web. The crucial step in frequent pattern mining is frequency counting, which involves a matching operator to find occurrences (instances) of a tree pattern in a given collection of trees. A widely used matching operator for tree-structured data is subtree homeomorphism, where an edge in the tree pattern is mapped onto an ancestor-descendant relationship in the given tree. Tree patterns that are frequent under subtree homeomorphism are usually called embedded patterns. In this paper, we present an efficient algorithm for subtree homeomorphism with application to frequent pattern mining. We propose a compact data-structure, called occ, which stores only information about the rightmost paths of occurrences and hence can encode and represent several occurrences of a tree pattern. We then define efficient join operations on the occ data-structure, which help us count occurrences of tree patterns according to occurrences of their proper subtrees. Based on the proposed subtree homeomorphism method, we develop an effective pattern mining algorithm, called TPMiner. We evaluate the efficiency of TPMiner on several real-world and synthetic datasets. Our extensive experiments confirm that TPMiner always outperforms well-known existing algorithms, and in several cases the improvement with respect to existing algorithms is significant. 相似文献

15.

Mining frequent arrangements of temporal intervals 总被引：3，自引：3，他引：0

Panagiotis Papapetrou George Kollios Stan Sclaroff Dimitrios Gunopulos 《Knowledge and Information Systems》2009,21(2):133-171

The problem of discovering frequent arrangements of temporal intervals is studied. It is assumed that the database consists of sequences of events, where an event occurs during a time-interval. The goal is to mine temporal arrangements of event intervals that appear frequently in the database. The motivation of this work is the observation that in practice most events are not instantaneous but occur over a period of time and different events may occur concurrently. Thus, there are many practical applications that require mining such temporal correlations between intervals including the linguistic analysis of annotated data from American Sign Language as well as network and biological data. Three efficient methods to find frequent arrangements of temporal intervals are described; the first two are tree-based and use breadth and depth first search to mine the set of frequent arrangements, whereas the third one is prefix-based. The above methods apply efficient pruning techniques that include a set of constraints that add user-controlled focus into the mining process. Moreover, based on the extracted patterns a standard method for mining association rules is employed that applies different interestingness measures to evaluate the significance of the discovered patterns and rules. The performance of the proposed algorithms is evaluated and compared with other approaches on real (American Sign Language annotations and network data) and large synthetic datasets. 相似文献

16.

Statistical modeling of dissimilarity increments for d-dimensional data: Application in partitional clustering

Helena Aidos Ana Fred 《Pattern recognition》2012,45(9):3061-3071

This paper addresses the use of high order dissimilarity models in data mining problems. We explore dissimilarities between triplets of nearest neighbors, called dissimilarity increments (DIs). We derive a statistical model of DIs for d-dimensional data (d-DID) assuming that the objects follow a multivariate Gaussian distribution. Empirical evidence shows that the d-DID is well approximated by the particular case d=2. We propose the application of this model in clustering, with a partitional algorithm that uses a merge strategy on Gaussian components. Experimental results, in synthetic and real datasets, show that clustering algorithms using DID usually outperform well known clustering algorithms. 相似文献

17.

Constraint graph-based frequent pattern updating from temporal databases

Jason J. Jung 《Expert systems with applications》2012,39(3):3169-3173

There have been many kinds of association rule mining (ARM) algorithms, e.g., Apriori and FP-tree, to discover meaningful frequent patterns from a large dataset. Particularly, it is more difficult for such ARM algorithms to be applied for temporal databases which are continuously changing over time. Such algorithms are generally based on repeating time-consuming tasks, e.g., scanning databases. To deal with this problem, in this paper, we propose a constraint graph-based method for maintaining frequent patterns (FP) discovered from the temporal databases. Particularly, the constraint graph, which is represented as a set of constraint between two items, can be established by temporal persistency of the patterns. It means that some patterns can be used to build the constraint graph, when the patterns have been shown in a set of the FP. Two types of constraints can be generated by users and adaptation. Based on our scheme, we find that a large number of dataset has been efficiently reduced during mining process and the gathering information while updating. 相似文献

18.

Looking into the seeds of time: Discovering temporal patterns in large transaction sets

Yingjiu Li Sencun Zhu 《Information Sciences》2006,176(8):1003-1031

This paper studies the problem of mining frequent itemsets along with their temporal patterns from large transaction sets. A model is proposed in which users define a large set of temporal patterns that are interesting or meaningful to them. A temporal pattern defines the set of time points where the user expects a discovered itemset to be frequent. The model is general in that (i) no constraints are placed on the interesting patterns given by the users, and (ii) two measures—inclusiveness and exclusiveness—are used to capture how well the temporal patterns match the time points given by the discovered itemsets. Intuitively, these measures indicate to what extent a discovered itemset is frequent at time points included in a temporal pattern p, but not at time points not in p. Using these two measures, one is able to model many temporal data mining problems appeared in the literature, as well as those that have not been studied. By exploiting the relationship within and between itemset space and pattern space simultaneously, a series of pruning techniques are developed to speed up the mining process. Experiments show that these pruning techniques allow one to obtain performance benefits up to 100 times over a direct extension of non-temporal data mining algorithms. 相似文献

19.

Accelerating k-medoid-based algorithms through metric access methods

Maria Camila N. Barioni Humberto L. Razente Caetano Traina Jr. 《Journal of Systems and Software》2008,81(3):343-355

Scalable data mining algorithms have become crucial to efficiently support KDD processes on large databases. In this paper, we address the task of scaling up k-medoid-based algorithms through the utilization of metric access methods, allowing clustering algorithms to be executed by database management systems in a fraction of the time usually required by the traditional approaches. We also present an optimization strategy that can be applied as an additional step of the proposed algorithm in order to achieve better clustering solutions. Experimental results based on several datasets, including synthetic and real ones, show that the proposed algorithm can reduce the number of distance calculations by a factor of more than three thousand times when compared to existing algorithms, while producing clusters of equivalent quality. 相似文献

20.

Algorithms for spatial collocation pattern mining in a limited memory environment: a summary of results

Pawel Boinski Maciej Zakrzewicz 《Journal of Intelligent Information Systems》2014,43(1):147-182

Rapid growth of spatial datasets requires methods to find (semi-)automatically spatial knowledge from these sets. Spatial collocation patterns represent subsets of spatial features whose instances are frequently located together in a spatial neighborhood. In recent years, efficient methods for collocation discovery have been developed, however, none of them assume limited size of the operational memory or limited access to memory with short access times. Such restrictions are especially important in the context of the large size of the data structures required for efficient identification of collocation instances. In this work we present and compare three algorithms for collocation pattern mining in a limited memory environment. The first algorithm is based on the well-known joinless method introduced by Shekhar and Yoo. The second and third algorithms are inspired by a tree structure (iCPI-tree) presented by Wang et al. In our experimental evaluation, we have compared the efficiency of the algorithms, both on synthetic and real world datasets. 相似文献