首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
As the growing in Internet, database types and sizes are getting various and larger. The topic of finding out the significant information from a database at the shortest time is important. In the music databases, a repeating pattern is an important feature of music objects, which commonly used in analyzing the repeated part of music data and looking for themes. Most of the repeating patterns are key melodies or easy to familiarize and remember for people. Therefore, we can use the themes or the repeating patterns to construct indices that can speedup query execution for music retrievals. Nevertheless, non-trivial repeating patterns exclude those patterns, which are all contained in other longer patterns, such that they can reduce the redundancy of the repeating patterns and save the index space needed. Most of existing algorithms are time consuming for finding non-trivial repeating patterns in a music object. In this research, we aim to apply the true suffix tree approach to discover non-trivial repeating patterns for a music object, which can efficiently address the cost problems in processing time and memory space. In general case, our proposed scheme can extract non-trivial repeating patterns in a linear time.
Lin-huang ChangEmail:
  相似文献   

2.
3.
In this paper, given a set of sequence databases across multiple domains, we aim at mining multi-domain sequential patterns, where a multi-domain sequential pattern is a sequence of events whose occurrence time is within a pre-defined time window. We first propose algorithm Naive in which multiple sequence databases are joined as one sequence database for utilizing traditional sequential pattern mining algorithms (e.g., PrefixSpan). Due to the nature of join operations, algorithm Naive is costly and is developed for comparison purposes. Thus, we propose two algorithms without any join operations for mining multi-domain sequential patterns. Explicitly, algorithm IndividualMine derives sequential patterns in each domain and then iteratively combines sequential patterns among sequence databases of multiple domains to derive candidate multi-domain sequential patterns. However, not all sequential patterns mined in the sequence database of each domain are able to form multi-domain sequential patterns. To avoid the mining cost incurred in algorithm IndividualMine, algorithm PropagatedMine is developed. Algorithm PropagatedMine first performs one sequential pattern mining from one sequence database. In light of sequential patterns mined, algorithm PropagatedMine propagates sequential patterns mined to other sequence databases. Furthermore, sequential patterns mined are represented as a lattice structure for further reducing the number of sequential patterns to be propagated. In addition, we develop some mechanisms to allow some empty sets in multi-domain sequential patterns. Performance of the proposed algorithms is comparatively analyzed and sensitivity analysis is conducted. Experimental results show that by exploring propagation and lattice structures, algorithm PropagatedMine outperforms algorithm IndividualMine in terms of efficiency (i.e., the execution time).  相似文献   

4.
As music can be represented symbolically, most of the existing methods extend some string matching algorithms to retrieve musical patterns in a music database. However, not all retrieved patterns are perceptually significant because some of them are, in fact, inaudible. Music is perceived in groupings of musical notes called streams. The process of grouping musical notes into streams is called stream segregation. Stream-crossing musical patterns are perceptually insignificant and should be pruned from the retrieval results. This can be done if all musical notes in a music database are segregated into streams and musical patterns are retrieved from the streams. Findings in auditory psychology are utilized in this paper, in which stream segregation is modelled as a clustering process and an adapted single-link clustering algorithm is proposed. Supported by experiments on real music data, streams are identified by the proposed algorithm with considerable accuracy.
Man Hon WongEmail:
  相似文献   

5.
The discovery of dependencies between attributes in databases is an important problem in data mining, and can be applied to facilitate future decision-making. In the present paper some properties of the branching dependencies are examined. We define a minimal branching dependency and we propose an algorithm for finding all minimal branching dependencies between a given set of attributes and a given attribute in a relation of a database. Our examination of the branching dependencies is motivated by their application in a database storing realized sales of products. For example, finding out that arbitrary p products have totally attracted at most q new users can prove to be crucial in supporting the decision making.In addition, we also consider the fractional and the fractional branching dependencies. Some properties of these dependencies are examined. An algorithm for finding all fractional dependencies between a given set of attributes and a given attribute in a database relation is proposed. We examine the general case of an arbitrary relation, as well as a particular case where the problem of discovering the fractional dependencies is considerably simplified.  相似文献   

6.
The proliferation of large masses of data has created many new opportunities for those working in science, engineering and business. The field of data mining (DM) and knowledge discovery from databases (KDD) has emerged as a new discipline in engineering and computer science. In the modern sense of DM and KDD the focus tends to be on extracting information characterized as knowledge from data that can be very complex and in large quantities. Industrial engineering, with the diverse areas it comprises, presents unique opportunities for the application of DM and KDD, and for the development of new concepts and techniques in this field. Many industrial processes are now automated and computerized in order to ensure the quality of production and to minimize production costs. A computerized process records large masses of data during its functioning. This real-time data which is recorded to ensure the ability to trace production steps can also be used to optimize the process itself. A French truck manufacturer decided to exploit the data sets of measures recorded during the test of diesel engines manufactured on their production lines. The goal was to discover knowledge in the data of the test engine process in order to significantly reduce (by about 25%) the processing time. This paper presents the study of knowledge discovery utilizing the KDD method. All the steps of the method have been used and two additional steps have been needed. The study allowed us to develop two systems: the discovery application is implemented giving a real-time prediction model (with a real reduction of 28%) and the discovery support environment now allows those who are not experts in statistics to extract their own knowledge for other processes.  相似文献   

7.
In this paper, we propose a novel algorithm, called 9DSPA-Miner, to mine frequent patterns from an image database, where every image is represented by the 9D-SPA representation. Our proposed method consists of three phases. First, we scan the database once and create an index structure. Next, the index structure is scanned to find all frequent patterns of length two. Finally, we use the frequent k-patterns (k ? 2) to generate candidate (k + 1)-patterns and check if the support of each candidate generated is not less than the user-specified minimum support threshold by using the index structure. Then, the steps in the third phase are repeated until no more frequent patterns can be found. Since the 9DSPA-Miner algorithm uses the characteristics of the 9D-SPA representation to prune most of impossible candidates, the experiment results demonstrate that it is more efficient and scalable than the modified Apriori method.  相似文献   

8.
With the increased acceptance of electronic health records, we can observe the increasing interest in the application of data mining approaches within this field. This study introduces a novel approach for exploring and comparing temporal trends within different in-patient subgroups, which is based on associated rule mining using Apriori algorithm and linear model-based recursive partitioning. The Nationwide Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP), Agency for Healthcare Research and Quality was used to evaluate the proposed approach. This study presents a novel approach where visual analytics on big data is used for trend discovery in form of a regression tree with scatter plots in the leaves of the tree. The trend lines are used for directly comparing linear trends within a specified time frame. Our results demonstrate the existence of opposite trends in relation to age and sex based subgroups that would be impossible to discover using traditional trend-tracking techniques. Such an approach can be employed regarding decision support applications for policy makers when organizing campaigns or by hospital management for observing trends that cannot be directly discovered using traditional analytical techniques.  相似文献   

9.
10.
Research on traditional association rules has gained a great attention during the past decade. Generally, an association rule AB is used to predict that B likely occurs when A occurs. This is a kind of strong correlation, and indicates that the two events will probably happen simultaneously. However, in real world applications such as bioinformatics and medical research, there are many follow-up correlations between itemsets A and B, such as, B is likely to occur n times after A has occurred m times. That is, the correlative itemsets do not belong to the same transaction. We refer to this relation as a follow-up correlation pattern (FCP). The task of mining FCP patterns brings more challenges on efficient processing than normal pattern discovery because the number of potentially interesting patterns becomes extremely large as the length limit of transactions no longer exists. In this paper, we develop an efficient algorithm to identify FCP patterns in time-related databases. We also experimentally evaluate our approach, and provide extensive results on mining this new kind of patterns. This work is partially supported by Australian large ARC grants (DP0449535, DP0559536 and DP0667060), a China NSF major research Program (60496327), a China NSF grant (60463003), an Overseas Outstanding Talent Research Program of the Chinese Academy of Sciences (06S3011S01), an Overseas-Returning High-level Talent Research Program of China Hunan-Resource Ministry, and an Innovation Project of Guangxi Graduate Education (2006106020812M35).  相似文献   

11.
General patterns of execution that have been frequently scheduled by a workflow management system provide the administrator with previously unknown, and potentially useful information, e.g., about the existence of unexpected causalities between subprocesses of a given workflow. This paper investigates the problem of mining unconnected patterns on the basis of some execution traces, i.e., of detecting sets of activities exhibiting no explicit dependency relationships that are frequently executed together. The problem is faced in the paper by proposing and analyzing two algorithms. One algorithm takes into account information about the structure of the control-flow graph only, while the other is a smart refinement where the knowledge of the frequencies of edges and activities in the traces at hand is also accounted for, by means of a sophisticated graphical analysis. Both algorithms have been implemented and integrated into a system prototype, which may profitably support the enactment phase of the workflow. The correctness of the two algorithms is formally proven, and several experiments are reported to evidence the ability of the graphical analysis to significantly improve the performances, by dramatically pruning the search space of candidate patterns.  相似文献   

12.
High utility pattern (HUP) mining is one of the most important research issues in data mining. Although HUP mining extracts important knowledge from databases, it requires long calculations and multiple database scans. Therefore, HUP mining is often unsuitable for real-time data processing schemes such as data streams. Furthermore, many HUPs may be unimportant due to the poor correlations among the items inside of them. Hence,the fast discovery of fewer but more important HUPs would be very useful in many practical domains. In this paper, we propose a novel framework to introduce a very useful measure, called frequency affinity, among the items in a HUP and the concept of interesting HUP with a strong frequency affinity for the fast discovery of more applicable knowledge. Moreover, we propose a new tree structure, utility tree based on frequency affinity (UTFA), and a novel algorithm, high utility interesting pattern mining (HUIPM), for single-pass mining of HUIPs from a database. Our approach mines fewer but more valuable HUPs, significantly reduces the overall runtime of existing HUP mining algorithms and is applicable to real-time data processing. Extensive performance analyses show that the proposed HUIPM algorithm is very efficient and scalable for interesting HUP mining with a strong frequency affinity.  相似文献   

13.
In this paper, we propose an efficient algorithm, called CMP-Miner, to mine closed patterns in a time-series database where each record in the database, also called a transaction, contains multiple time-series sequences. Our proposed algorithm consists of three phases. First, we transform each time-series sequence in a transaction into a symbolic sequence. Second, we scan the transformed database to find frequent patterns of length one. Third, for each frequent pattern found in the second phase, we recursively enumerate frequent patterns by a frequent pattern tree in a depth-first search manner. During the process of enumeration, we apply several efficient pruning strategies to remove frequent but non-closed patterns. Thus, the CMP-Miner algorithm can efficiently mine the closed patterns from a time-series database. The experimental results show that our proposed algorithm outperforms the modified Apriori and BIDE algorithms.  相似文献   

14.
Moving average transform is very useful in finding the trend of time-series data by reducing the effect of noise, and has been used in many areas such as econometrics. Previous subsequence matching methods with moving average transform, however, are problematic in that, since they must build multiple indexes in supporting transform of arbitrary order, they incur index overhead both in storage space and in update maintenance. To solve this problem, we propose a single-index approach to subsequence matching that supports moving average transform of arbitrary order in time-series databases. Using the single-index approach, we can reduce both the storage space and the index maintenance overhead. In explaining the single-index approach, we first introduce the notion of poly-order moving average transform by generalizing the original definition of moving average transform. We then formally prove the correctness of poly-order transform-based subsequence matching. We also propose two subsequence matching methods based on poly-order transform that efficiently support moving average transform of arbitrary order. Experimental results for real stock data show that, compared with the sequential scan, our methods improve average performance significantly, by a factor of 22.6-33.6. Also, compared with cases in which an index is built for every moving average order, our methods reduce storage space and maintenance effort significantly while incurring only marginal performance degradation. Our approach entails the additional advantage of being generalized to support many other transforms in addition to moving average transform. Therefore, we believe that our approach will be widely used in many transform-based subsequence matching methods.  相似文献   

15.
Due to the inherent existence of uncertainty in many real-world applications, in this paper, we investigate an important query in uncertain databases, namely probabilistic least influenced set (PLIS) query, which retrieves all the uncertain objects in an uncertain database such that they are the least affected by a given query object with high probabilities. Such a PLIS query is useful in applications such as business planning. We propose and tackle both monochromatic and bichromatic versions (i.e. M-PLIS and B-PLIS, respectively) of the PLIS query. In order to efficiently answer PLIS queries, we present three pruning methods, MINMAX, Regional, and Candidate pruning, which can effectively reduce the PLIS search space. The proposed pruning methods can be seamlessly integrated into efficient query procedures. Moreover, we also study important variants of PLIS query with uncertain query object (i.e. UQ-PLIS). Furthermore, we formulate and tackle the PLIS problem on uncertain moving objects (i.e. UMOD-PLIS). Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches under various settings.  相似文献   

16.
With the growth of digital music, the development of music recommendation is helpful for users to pick desirable music pieces from a huge repository of music. The existing music recommendation approaches are based on a user’s preference on music. However, sometimes, it might better meet users’ requirement to recommend music pieces according to emotions. In this paper, we propose a novel framework for emotion-based music recommendation. The core of the recommendation framework is the construction of the music emotion model by affinity discovery from film music, which plays an important role in conveying emotions in film. We investigate the music feature extraction and propose the Music Affinity Graph and Music Affinity Graph-Plus algorithms for the construction of music emotion model. Experimental result shows the proposed emotion-based music recommendation achieves 85% accuracy in average.  相似文献   

17.
Foreign keys form one of the most fundamental constraints for relational databases. Since they are not always defined in existing databases, the discovery of foreign keys turns out to be an important and challenging task. The underlying problem is known to be the inclusion dependency (IND) inference problem. In this paper, data-mining algorithms are devised for IND inference in a given database. We propose a two-step approach. In the first step, unary INDs are discovered thanks to a new preprocessing stage which leads to a new algorithm and to an efficient implementation. In the second step, n-ary IND inference is achieved. This step fits in the framework of levelwise algorithms used in many data-mining algorithms. Since real-world databases can suffer from some data inconsistencies, approximate INDs, i.e. INDs which almost hold, are considered. We show how they can be safely integrated into our unary and n-ary discovery algorithms. An implementation of these algorithms has been achieved and tested against both synthetic and real-life databases. Up to our knowledge, no other algorithm does exist to solve this data-mining problem.  相似文献   

18.
In this paper, we proposed an efficient algorithm, called PCP-Miner (Pointset Closed Pattern Miner), for mining frequent closed patterns from a pointset database, where a pointset contains a set of points. Our proposed algorithm consists of two phases. First, we find all frequent patterns of length two in the database. Second, for each pattern found in the first phase, we recursively generate frequent closed patterns by a frequent pattern tree in a depth-first search manner. Since the PCP-Miner does not generate unnecessary candidates, it is more efficient and scalable than the modified Apriori, SASMiner and MaxGeo. The experimental results show that the PCP-Miner algorithm outperforms the comparing algorithms by more than one order of magnitude.  相似文献   

19.
TheNielsen Opportunity Explorer tmproduct can be used by sales and trade marketing personnel within consumer packaged goods manufacturers to understand how their products are performing in the market place and find opportunities to sell more product, more profitably to the retailers. Opportunity Explorer uses data collected at the point-of-sale terminals, and by auditors of A. C. Nielsen. Opportunity Explorer uses a knowledge-base of market research expertise to analyze large databases and generate interactive reports using knowledge discovery templates, converting a large space of data into concise, inter-linkedinformation frames. Each information frame addresses specific business issues, and leads the user to seek related information by means of dynamically created hyperlinks.  相似文献   

20.
A contrast pattern is a set of items (itemset) whose frequency differs significantly between two classes of data. Such patterns describe distinguishing characteristics between datasets, are meaningful to human experts, have strong discriminating ability and can be used for powerful classifiers. Incrementally mining such patterns is very important for evolving datasets, where transactions can be either inserted or deleted and mining needs to be repeated after changes occur. When the change is small, it is undesirable to carry out mining from scratch. Rather, the set of previously mined contrast patterns should be reused where possible to compute the new patterns. A primary example of evolving data is a data stream, where the data is a sequence of continuously arriving transactions (or itemsets). In this paper, we propose an efficient technique for incrementally mining contrast patterns. Our algorithm particularly aims to avoid redundant computation which might occur due to simultaneous transaction insertion and deletion, as is the case for data streams. In an experimental study using real and synthetic data streams, we show our algorithm can be substantially faster than the previous approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号