期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Hui-Ling Hu 《Information Sciences》2008,178(19):3683-3696

相似文献

2.

Mining sequential patterns across multiple sequence databases

Wen-Chih Zhung-Xun 《Data & Knowledge Engineering》2009,68(10):1014-1033

In this paper, given a set of sequence databases across multiple domains, we aim at mining multi-domain sequential patterns, where a multi-domain sequential pattern is a sequence of events whose occurrence time is within a pre-defined time window. We first propose algorithm Naive in which multiple sequence databases are joined as one sequence database for utilizing traditional sequential pattern mining algorithms (e.g., PrefixSpan). Due to the nature of join operations, algorithm Naive is costly and is developed for comparison purposes. Thus, we propose two algorithms without any join operations for mining multi-domain sequential patterns. Explicitly, algorithm IndividualMine derives sequential patterns in each domain and then iteratively combines sequential patterns among sequence databases of multiple domains to derive candidate multi-domain sequential patterns. However, not all sequential patterns mined in the sequence database of each domain are able to form multi-domain sequential patterns. To avoid the mining cost incurred in algorithm IndividualMine, algorithm PropagatedMine is developed. Algorithm PropagatedMine first performs one sequential pattern mining from one sequence database. In light of sequential patterns mined, algorithm PropagatedMine propagates sequential patterns mined to other sequence databases. Furthermore, sequential patterns mined are represented as a lattice structure for further reducing the number of sequential patterns to be propagated. In addition, we develop some mechanisms to allow some empty sets in multi-domain sequential patterns. Performance of the proposed algorithms is comparatively analyzed and sensitivity analysis is conducted. Experimental results show that by exploring propagation and lattice structures, algorithm PropagatedMine outperforms algorithm IndividualMine in terms of efficiency (i.e., the execution time). 相似文献

3.

Stream segregation algorithm for pattern matching in polyphonic music databases

Wai Man Szeto Man Hon Wong 《Multimedia Tools and Applications》2006,30(1):109-127

As music can be represented symbolically, most of the existing methods extend some string matching algorithms to retrieve musical patterns in a music database. However, not all retrieved patterns are perceptually significant because some of them are, in fact, inaudible. Music is perceived in groupings of musical notes called streams. The process of grouping musical notes into streams is called stream segregation. Stream-crossing musical patterns are perceptually insignificant and should be pruned from the retrieval results. This can be done if all musical notes in a music database are segregated into streams and musical patterns are retrieved from the streams. Findings in auditory psychology are utilized in this paper, in which stream segregation is modelled as a clustering process and an adapted single-link clustering algorithm is proposed. Supported by experiments on real music data, streams are identified by the proposed algorithm with considerable accuracy.

Man Hon WongEmail:

相似文献

4.

Discovering branching and fractional dependencies in databases

Tsvetanka 《Data & Knowledge Engineering》2008,66(2):311-325

The discovery of dependencies between attributes in databases is an important problem in data mining, and can be applied to facilitate future decision-making. In the present paper some properties of the branching dependencies are examined. We define a minimal branching dependency and we propose an algorithm for finding all minimal branching dependencies between a given set of attributes and a given attribute in a relation of a database. Our examination of the branching dependencies is motivated by their application in a database storing realized sales of products. For example, finding out that arbitrary p products have totally attracted at most q new users can prove to be crucial in supporting the decision making.In addition, we also consider the fractional and the fractional branching dependencies. Some properties of these dependencies are examined. An algorithm for finding all fractional dependencies between a given set of attributes and a given attribute in a database relation is proposed. We examine the general case of an arbitrary relation, as well as a particular case where the problem of discovering the fractional dependencies is considerably simplified. 相似文献

5.

Knowledge discovery from industrial databases

Christine Gertosio Alan Dussauchoy 《Journal of Intelligent Manufacturing》2004,15(1):29-37

The proliferation of large masses of data has created many new opportunities for those working in science, engineering and business. The field of data mining (DM) and knowledge discovery from databases (KDD) has emerged as a new discipline in engineering and computer science. In the modern sense of DM and KDD the focus tends to be on extracting information characterized as knowledge from data that can be very complex and in large quantities. Industrial engineering, with the diverse areas it comprises, presents unique opportunities for the application of DM and KDD, and for the development of new concepts and techniques in this field. Many industrial processes are now automated and computerized in order to ensure the quality of production and to minimize production costs. A computerized process records large masses of data during its functioning. This real-time data which is recorded to ensure the ability to trace production steps can also be used to optimize the process itself. A French truck manufacturer decided to exploit the data sets of measures recorded during the test of diesel engines manufactured on their production lines. The goal was to discover knowledge in the data of the test engine process in order to significantly reduce (by about 25%) the processing time. This paper presents the study of knowledge discovery utilizing the KDD method. All the steps of the method have been used and two additional steps have been needed. The study allowed us to develop two systems: the discovery application is implemented giving a real-time prediction model (with a real reduction of 28%) and the discovery support environment now allows those who are not experts in statistics to extract their own knowledge for other processes. 相似文献

6.

Mining frequent patterns in image databases with 9D-SPA representation

Anthony J.T. Lee^{Author Vitae} Ying-Ho Liu Author Vitae Author Vitae Hsiu-Hui Lin Author Vitae Author Vitae 《Journal of Systems and Software》2009,82(4):603-618

In this paper, we propose a novel algorithm, called 9DSPA-Miner, to mine frequent patterns from an image database, where every image is represented by the 9D-SPA representation. Our proposed method consists of three phases. First, we scan the database once and create an index structure. Next, the index structure is scanned to find all frequent patterns of length two. Finally, we use the frequent k-patterns (k ? 2) to generate candidate (k + 1)-patterns and check if the support of each candidate generated is not less than the user-specified minimum support threshold by using the index structure. Then, the steps in the third phase are repeated until no more frequent patterns can be found. Since the 9DSPA-Miner algorithm uses the characteristics of the 9D-SPA representation to prune most of impossible candidates, the experiment results demonstrate that it is more efficient and scalable than the modified Apriori method. 相似文献

7.

Contrasting temporal trend discovery for large healthcare databases

Goran Hrovat Gregor Stiglic Peter Kokol Milan Ojsteršek 《Computer methods and programs in biomedicine》2014

With the increased acceptance of electronic health records, we can observe the increasing interest in the application of data mining approaches within this field. This study introduces a novel approach for exploring and comparing temporal trends within different in-patient subgroups, which is based on associated rule mining using Apriori algorithm and linear model-based recursive partitioning. The Nationwide Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP), Agency for Healthcare Research and Quality was used to evaluate the proposed approach. This study presents a novel approach where visual analytics on big data is used for trend discovery in form of a regression tree with scatter plots in the leaves of the tree. The trend lines are used for directly comparing linear trends within a specified time frame. Our results demonstrate the existence of opposite trends in relation to age and sex based subgroups that would be impossible to discover using traditional trend-tracking techniques. Such an approach can be employed regarding decision support applications for policy makers when organizing campaigns or by hospital management for observing trends that cannot be directly discovered using traditional analytical techniques. 相似文献

8.

Efficient processing of subsequence matching with the Euclidean metric in time-series databases

Sang-Wook Kim Dae-Hyun Park 《Information Processing Letters》2004,90(5):253-260

相似文献

9.

Mining follow-up correlation patterns from time-related databases

Shichao Zhang Zifang Huang Jilian Zhang Xiaofeng Zhu 《Knowledge and Information Systems》2008,14(1):81-100

Research on traditional association rules has gained a great attention during the past decade. Generally, an association rule A → B is used to predict that B likely occurs when A occurs. This is a kind of strong correlation, and indicates that the two events will probably happen simultaneously. However, in real world applications such as bioinformatics and medical research, there are many follow-up correlations between itemsets A and B, such as, B is likely to occur n times after A has occurred m times. That is, the correlative itemsets do not belong to the same transaction. We refer to this relation as a follow-up correlation pattern (FCP). The task of mining FCP patterns brings more challenges on efficient processing than normal pattern discovery because the number of potentially interesting patterns becomes extremely large as the length limit of transactions no longer exists. In this paper, we develop an efficient algorithm to identify FCP patterns in time-related databases. We also experimentally evaluate our approach, and provide extensive results on mining this new kind of patterns. This work is partially supported by Australian large ARC grants (DP0449535, DP0559536 and DP0667060), a China NSF major research Program (60496327), a China NSF grant (60463003), an Overseas Outstanding Talent Research Program of the Chinese Academy of Sciences (06S3011S01), an Overseas-Returning High-level Talent Research Program of China Hunan-Resource Ministry, and an Innovation Project of Guangxi Graduate Education (2006106020812M35). 相似文献

10.

Mining unconnected patterns in workflows

Gianluigi Greco Antonella Guzzo Giuseppe Manco Domenico Saccà 《Information Systems》2007

General patterns of execution that have been frequently scheduled by a workflow management system provide the administrator with previously unknown, and potentially useful information, e.g., about the existence of unexpected causalities between subprocesses of a given workflow. This paper investigates the problem of mining unconnected patterns on the basis of some execution traces, i.e., of detecting sets of activities exhibiting no explicit dependency relationships that are frequently executed together. The problem is faced in the paper by proposing and analyzing two algorithms. One algorithm takes into account information about the structure of the control-flow graph only, while the other is a smart refinement where the knowledge of the frequencies of edges and activities in the traces at hand is also accounted for, by means of a sophisticated graphical analysis. Both algorithms have been implemented and integrated into a system prototype, which may profitably support the enactment phase of the workflow. The correctness of the two algorithms is formally proven, and several experiments are reported to evidence the ability of the graphical analysis to significantly improve the performances, by dramatically pruning the search space of candidate patterns. 相似文献

11.

Mining closed patterns in multi-sequence time-series databases

Anthony J.T. Huei-Wen Tzu-Yu Ying-Ho Kuo-Tay 《Data & Knowledge Engineering》2009,68(10):1071-1090

In this paper, we propose an efficient algorithm, called CMP-Miner, to mine closed patterns in a time-series database where each record in the database, also called a transaction, contains multiple time-series sequences. Our proposed algorithm consists of three phases. First, we transform each time-series sequence in a transaction into a symbolic sequence. Second, we scan the transformed database to find frequent patterns of length one. Third, for each frequent pattern found in the second phase, we recursively enumerate frequent patterns by a frequent pattern tree in a depth-first search manner. During the process of enumeration, we apply several efficient pruning strategies to remove frequent but non-closed patterns. Thus, the CMP-Miner algorithm can efficiently mine the closed patterns from a time-series database. The experimental results show that our proposed algorithm outperforms the modified Apriori and BIDE algorithms. 相似文献

12.

A framework for mining interesting high utility patterns with a strong frequency affinity

Chowdhury Farhan Ahmed Ho-Jin Choi 《Information Sciences》2011,181(21):4878-4894

High utility pattern (HUP) mining is one of the most important research issues in data mining. Although HUP mining extracts important knowledge from databases, it requires long calculations and multiple database scans. Therefore, HUP mining is often unsuitable for real-time data processing schemes such as data streams. Furthermore, many HUPs may be unimportant due to the poor correlations among the items inside of them. Hence,the fast discovery of fewer but more important HUPs would be very useful in many practical domains. In this paper, we propose a novel framework to introduce a very useful measure, called frequency affinity, among the items in a HUP and the concept of interesting HUP with a strong frequency affinity for the fast discovery of more applicable knowledge. Moreover, we propose a new tree structure, utility tree based on frequency affinity (UTFA), and a novel algorithm, high utility interesting pattern mining (HUIPM), for single-pass mining of HUIPs from a database. Our approach mines fewer but more valuable HUPs, significantly reduces the overall runtime of existing HUP mining algorithms and is applicable to real-time data processing. Extensive performance analyses show that the proposed HUIPM algorithm is very efficient and scalable for interesting HUP mining with a strong frequency affinity. 相似文献

13.

Unary and n-ary inclusion dependency discovery in relational databases

Fabien De Marchi Stéphane Lopes Jean-Marc Petit 《Journal of Intelligent Information Systems》2009,32(1):53-73

Foreign keys form one of the most fundamental constraints for relational databases. Since they are not always defined in existing databases, the discovery of foreign keys turns out to be an important and challenging task. The underlying problem is known to be the inclusion dependency (IND) inference problem. In this paper, data-mining algorithms are devised for IND inference in a given database. We propose a two-step approach. In the first step, unary INDs are discovered thanks to a new preprocessing stage which leads to a new algorithm and to an efficient implementation. In the second step, n-ary IND inference is achieved. This step fits in the framework of levelwise algorithms used in many data-mining algorithms. Since real-world databases can suffer from some data inconsistencies, approximate INDs, i.e. INDs which almost hold, are considered. We show how they can be safely integrated into our unary and n-ary discovery algorithms. An implementation of these algorithms has been achieved and tested against both synthetic and real-life databases. Up to our knowledge, no other algorithm does exist to solve this data-mining problem. 相似文献

14.

Efficient moving average transform-based subsequence matching algorithms in time-series databases 总被引：1，自引：0，他引：1

Yang-Sae Moon Jinho Kim 《Information Sciences》2007,177(23):5415-5431

Moving average transform is very useful in finding the trend of time-series data by reducing the effect of noise, and has been used in many areas such as econometrics. Previous subsequence matching methods with moving average transform, however, are problematic in that, since they must build multiple indexes in supporting transform of arbitrary order, they incur index overhead both in storage space and in update maintenance. To solve this problem, we propose a single-index approach to subsequence matching that supports moving average transform of arbitrary order in time-series databases. Using the single-index approach, we can reduce both the storage space and the index maintenance overhead. In explaining the single-index approach, we first introduce the notion of poly-order moving average transform by generalizing the original definition of moving average transform. We then formally prove the correctness of poly-order transform-based subsequence matching. We also propose two subsequence matching methods based on poly-order transform that efficiently support moving average transform of arbitrary order. Experimental results for real stock data show that, compared with the sequential scan, our methods improve average performance significantly, by a factor of 22.6-33.6. Also, compared with cases in which an index is built for every moving average order, our methods reduce storage space and maintenance effort significantly while incurring only marginal performance degradation. Our approach entails the additional advantage of being generalized to support many other transforms in addition to moving average transform. Therefore, we believe that our approach will be widely used in many transform-based subsequence matching methods. 相似文献

15.

Finding the least influenced set in uncertain databases

Xiang Lian Lei Chen Guoren Wang 《Information Systems》2011

Due to the inherent existence of uncertainty in many real-world applications, in this paper, we investigate an important query in uncertain databases, namely probabilistic least influenced set (PLIS) query, which retrieves all the uncertain objects in an uncertain database such that they are the least affected by a given query object with high probabilities. Such a PLIS query is useful in applications such as business planning. We propose and tackle both monochromatic and bichromatic versions (i.e. M-PLIS and B-PLIS, respectively) of the PLIS query. In order to efficiently answer PLIS queries, we present three pruning methods, MINMAX, Regional, and Candidate pruning, which can effectively reduce the PLIS search space. The proposed pruning methods can be seamlessly integrated into efficient query procedures. Moreover, we also study important variants of PLIS query with uncertain query object (i.e. UQ-PLIS). Furthermore, we formulate and tackle the PLIS problem on uncertain moving objects (i.e. UMOD-PLIS). Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches under various settings. 相似文献

16.

Emotion-based music recommendation by affinity discovery from film music

Man-Kwan Shan Fang-Fei Kuo Meng-Fen Chiang Suh-Yin Lee 《Expert systems with applications》2009,36(4):7666-7674

With the growth of digital music, the development of music recommendation is helpful for users to pick desirable music pieces from a huge repository of music. The existing music recommendation approaches are based on a user’s preference on music. However, sometimes, it might better meet users’ requirement to recommend music pieces according to emotions. In this paper, we propose a novel framework for emotion-based music recommendation. The core of the recommendation framework is the construction of the music emotion model by affinity discovery from film music, which plays an important role in conveying emotions in film. We investigate the music feature extraction and propose the Music Affinity Graph and Music Affinity Graph-Plus algorithms for the construction of music emotion model. Experimental result shows the proposed emotion-based music recommendation achieves 85% accuracy in average. 相似文献

17.

Opportunity explorer: Navigating large databases using knowledge discovery templates

Tej Anand 《Journal of Intelligent Information Systems》1995,4(1):27-37

TheNielsen Opportunity Explorer ^tmproduct can be used by sales and trade marketing personnel within consumer packaged goods manufacturers to understand how their products are performing in the market place and find opportunities to sell more product, more profitably to the retailers. Opportunity Explorer uses data collected at the point-of-sale terminals, and by auditors of A. C. Nielsen. Opportunity Explorer uses a knowledge-base of market research expertise to analyze large databases and generate interactive reports using knowledge discovery templates, converting a large space of data into concise, inter-linkedinformation frames. Each information frame addresses specific business issues, and leads the user to seek related information by means of dynamically created hyperlinks. 相似文献

18.

Mining frequent closed patterns in pointset databases

Anthony J.T. Lee Wen-Kwang Tsao Po-Yin Chen Ming-Chih Lin Shih-Hui Yang 《Information Systems》2010

In this paper, we proposed an efficient algorithm, called PCP-Miner (Pointset Closed Pattern Miner), for mining frequent closed patterns from a pointset database, where a pointset contains a set of points. Our proposed algorithm consists of two phases. First, we find all frequent patterns of length two in the database. Second, for each pattern found in the first phase, we recursively generate frequent closed patterns by a frequent pattern tree in a depth-first search manner. Since the PCP-Miner does not generate unnecessary candidates, it is more efficient and scalable than the modified Apriori, SASMiner and MaxGeo. The experimental results show that the PCP-Miner algorithm outperforms the comparing algorithms by more than one order of magnitude. 相似文献

19.

Efficient incremental mining of contrast patterns in changing data

James Bailey Elsa Loekito 《Information Processing Letters》2010,110(3):88-5368

A contrast pattern is a set of items (itemset) whose frequency differs significantly between two classes of data. Such patterns describe distinguishing characteristics between datasets, are meaningful to human experts, have strong discriminating ability and can be used for powerful classifiers. Incrementally mining such patterns is very important for evolving datasets, where transactions can be either inserted or deleted and mining needs to be repeated after changes occur. When the change is small, it is undesirable to carry out mining from scratch. Rather, the set of previously mined contrast patterns should be reused where possible to compute the new patterns. A primary example of evolving data is a data stream, where the data is a sequence of continuously arriving transactions (or itemsets). In this paper, we propose an efficient technique for incrementally mining contrast patterns. Our algorithm particularly aims to avoid redundant computation which might occur due to simultaneous transaction insertion and deletion, as is the case for data streams. In an experimental study using real and synthetic data streams, we show our algorithm can be substantially faster than the previous approach. 相似文献

20.

Finding representative patterns with ordered projections

José C. RiquelmeAuthor VitaeJesús S. Aguilar-RuizAuthor Vitae Miguel ToroAuthor Vitae 《Pattern recognition》2003,36(4):1009-1018

相似文献