期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Mining very large databases 总被引：1，自引：0，他引：1

Ganti V. Gehrke J. Ramakrishnan R. 《Computer》1999,32(8):38-45

Established companies have had decades to accumulate masses of data about their customers, suppliers, products and services, and employees. Data mining, also known as knowledge discovery in databases, gives organizations the tools to sift through these vast data stores to find the trends, patterns, and correlations that can guide strategic decision making. Traditionally, algorithms for data analysis assume that the input data contains relatively few records. Current databases however, are much too large to be held in main memory. To be efficient, the data mining techniques applied to very large databases must be highly scalable. An algorithm is said to be scalable if (given a fixed amount of main memory), its runtime increases linearly with the number of records in the input database. Recent work has focused on scaling data mining algorithms to very large data sets. The authors describe a broad range of algorithms that address three classical data mining problems: market basket analysis, clustering, and classification 相似文献

2.

Mining multiple-level association rules in large databases 总被引：2，自引：0，他引：2

Jiawei Han Yongjian Fu 《Knowledge and Data Engineering, IEEE Transactions on》1999,11(5):798-805

A top-down progressive deepening method is developed for efficient mining of multiple-level association rules from large transaction databases based on the a priori principle. A group of variant algorithms is proposed based on the ways of sharing intermediate results, with the relative performance tested and analyzed. The enforcement of different interestingness measurements to find more interesting rules, and the relaxation of rule conditions for finding “level-crossing” association rules, are also investigated. The study shows that efficient algorithms can be developed from large databases for the discovery of interesting and strong multiple-level association rules 相似文献

3.

Mining frequent itemsets in large databases: The hierarchical partitioning approach

Fan-Chen Tseng 《Expert systems with applications》2013,40(5):1654-1661

Although many methods have been proposed to enhance the efficiencies of data mining, little research has been devoted to the issue of scalability – that is, the problem of mining frequent itemsets when the size of the database is very large. This study proposes a methodology, hierarchical partitioning, for mining frequent itemsets in large databases, based on a novel data structure called the Frequent Pattern List (FPL). One of the major features of the FPL is its ability to partition the database, and thus transform the database into a set of sub-databases of manageable sizes. As a result, a divide-and-conquer approach can be developed to perform the desired data-mining tasks. Experimental results show that hierarchical partitioning is capable of mining frequent itemsets and frequent closed itemsets in very large databases. 相似文献

4.

Closed constrained gradient mining in retail databases

Jianyong Wang Jiawei Han Jian Pei 《Knowledge and Data Engineering, IEEE Transactions on》2006,18(6):764-769

Incorporating constraints into frequent itemset mining not only improves data mining efficiency, but also leads to concise and meaningful results. In this paper, a framework for closed constrained gradient itemset mining in retail databases is proposed by introducing the concept of gradient constraint into closed itemset mining. A tailored version of CLOSET+, LCLOSET, is first briefly introduced, which is designed for efficient closed itemset mining from sparse databases. Then, a newly proposed weaker but antimonotone measure, top-X average measure, is proposed and can be adopted to prune search space effectively. Experiments show that a combination of LCLOSET and the top-X average pruning provides an efficient approach to mining frequent closed gradient itemsets. 相似文献

5.

Discovering causality in large databases

Shichao Zhang Chengqi Zhang 《Applied Artificial Intelligence》2013,27(5):333-358

A causal rule between two variables, X M Y, captures the relationship that the presence of X causes the appearance of Y. Because of its usefulness (compared to association rules), techniques for mining causal rules are beginning to be developed. However, the effectiveness of existing methods (such as the LCD and CU-path algorithms) are limited to mining causal rules among simple variables, and are inadequate to discover and represent causal rules among multi-value variables. In this paper, we propose that the causality between variables X and Y be represented in the form X M Y with conditional probability matrix M Y|X . We also propose a new approach to discover causality in large databases based on partitioning. The approach partitions the items into item variables by decomposing "bad" item variables and composing "not-good" item variables. In particular, we establish a method to optimize causal rules that merges the "useless" information in conditional probability matrices of extracted causal rules. 相似文献

6.

Mining typical patterns from databases

Hui-Ling Hu 《Information Sciences》2008,178(19):3683-3696

相似文献

7.

Active learning in very large databases

Navneet Panda King-Shy Goh Edward Y. Chang 《Multimedia Tools and Applications》2006,31(3):249-267

Query-by-example and query-by-keyword both suffer from the problem of “aliasing,” meaning that example-images and keywords potentially have variable interpretations or multiple semantics. For discerning which semantic is appropriate for a given query, we have established that combining active learning with kernel methods is a very effective approach. In this work, we first examine active-learning strategies, and then focus on addressing the challenges of two scalability issues: scalability in concept complexity and in dataset size. We present remedies, explain limitations, and discuss future directions that research might take. 相似文献

8.

Mining interesting association rules from customer databases and transaction databases 总被引：1，自引：0，他引：1

Pauray S. M. Tsai Chien-Ming Chen 《Information Systems》2004,29(8):139-696

In this paper, we examine a new data mining issue of mining association rules from customer databases and transaction databases. The problem is decomposed into two subproblems: identifying all the large itemsets from the transaction database and mining association rules from the customer database and the large itemsets identified. For the first subproblem, we propose an efficient algorithm to discover all the large itemsets from the transaction database. Experimental results show that by our approach, the total execution time can be reduced significantly. For the second subproblem, a relationship graph is constructed according to the identified large itemsets from the transaction database and the priorities of condition attributes from the customer database. Based on the relationship graph, we present an efficient graph-based algorithm to discover interesting association rules embedded in the transaction database and the customer database. 相似文献

9.

Mining spatiotemporal co-occurrence patterns in non-relational databases

Berkay Aydin Vijay Akkineni Rafal Angryk 《GeoInformatica》2016,20(4):801-828

Spatiotemporal co-occurrence patterns (STCOPs) represent the subsets of feature types whose instances are frequently co-occurring both in space and time. Spatiotemporal co-occurrences reflect the spatiotemporal overlap relationships among two or more spatiotemporal instances both in spatial and temporal dimensions. STCOPs can be potentially used to predict and understand the generation and evolution of different types of interacting phenomena in various scientific fields such as astronomy, meteorology, biology, geosciences. Meaningful and statistically significant data analysis for these scientific fields requires processing sufficiently large datasets. Due to the computationally expensive nature of spatiotemporal operations required for mining spatiotemporal co-occurrences, it is increasingly difficult to identify spatiotemporal co-occurrences and discover STCOPs in centralized system settings. As a solution, we developed a cloud-based distributed mining system for discovering STCOPs. Our system uses Accumulo, a column-oriented non-relational database management system as its backbone. In order to efficiently mine the STCOPs, we propose three data models for managing trajectory-based spatiotemporal data in Accumulo. We introduce an in-memory join-index structure and a join algorithm for effectively performing spatiotemporal join operations on spatiotemporal trajectories in non-relational databases. Lastly, with the experiments with artificial and real life datasets, we evaluate the performance of the proposed models for STCOP mining. 相似文献

10.

Mining spatio-temporal patterns in object mobility databases

Florian Verhein Sanjay Chawla 《Data mining and knowledge discovery》2008,16(1):5-38

With the increasing use of wireless communication devices and the ability to track people and objects cheaply and easily, the amount of spatio-temporal data is growing substantially. Many of these applications cannot easily locate the exact position of objects, but they can determine the region in which each object is contained. Furthermore, the regions are fixed and may vary greatly in size. Examples include mobile/cell phone networks, RFID tag readers and satellite tracking. This demands techniques to mine such data. These techniques must also correct for the bias produced by different sized regions. We provide a comprehensive definition of Spatio-Temporal Association Rules (STARs) that describe how objects move between regions over time. We also present other patterns that are useful for mobility data; stationary regions and high traffic regions. The latter consists of sources, sinks and thoroughfares. These patterns describe important temporal characteristics of regions and we show that they can be considered as special STARs. We define spatial support to effectively deal with the problem of different sized regions. We provide an efficient algorithm—STAR-Miner—to find these patterns by exploiting several pruning properties. Responsible editors: Charles Perng and Tao Li. 相似文献

11.

Mining spatial association rules in image databases 总被引：2，自引：0，他引：2

Anthony J.T. Lee Ruey-Wen Hong Wen-Kwang Tsao 《Information Sciences》2007,177(7):1593-1608

In this paper, we propose a novel spatial mining algorithm, called 9DLT-Miner, to mine the spatial association rules from an image database, where every image is represented by the 9DLT representation. The proposed method consists of two phases. First, we find all frequent patterns of length one. Next, we use frequent k-patterns (k ? 1) to generate all candidate (k + 1)-patterns. For each candidate pattern generated, we scan the database to count the pattern’s support and check if it is frequent. The steps in the second phase are repeated until no more frequent patterns can be found. Since our proposed algorithm prunes most of impossible candidates, it is more efficient than the Apriori algorithm. The experiment results show that 9DLT-Miner runs 2-5 times faster than the Apriori algorithm. 相似文献

12.

Mining frequent trajectory patterns in spatial-temporal databases 总被引：1，自引：0，他引：1

Anthony J.T. Lee Yi-An Chen 《Information Sciences》2009,179(13):2218-2796

In this paper, we propose an efficient graph-based mining (GBM) algorithm for mining the frequent trajectory patterns in a spatial-temporal database. The proposed method comprises two phases. First, we scan the database once to generate a mapping graph and trajectory information lists (TI-lists). Then, we traverse the mapping graph in a depth-first search manner to mine all frequent trajectory patterns in the database. By using the mapping graph and TI-lists, the GBM algorithm can localize support counting and pattern extension in a small number of TI-lists. Moreover, it utilizes the adjacency property to reduce the search space. Therefore, our proposed method can efficiently mine the frequent trajectory patterns in the database. The experimental results show that it outperforms the Apriori-based and PrefixSpan-based methods by more than one order of magnitude. 相似文献

13.

Mining closed patterns in multi-sequence time-series databases

Anthony J.T. Huei-Wen Tzu-Yu Ying-Ho Kuo-Tay 《Data & Knowledge Engineering》2009,68(10):1071-1090

In this paper, we propose an efficient algorithm, called CMP-Miner, to mine closed patterns in a time-series database where each record in the database, also called a transaction, contains multiple time-series sequences. Our proposed algorithm consists of three phases. First, we transform each time-series sequence in a transaction into a symbolic sequence. Second, we scan the transformed database to find frequent patterns of length one. Third, for each frequent pattern found in the second phase, we recursively enumerate frequent patterns by a frequent pattern tree in a depth-first search manner. During the process of enumeration, we apply several efficient pruning strategies to remove frequent but non-closed patterns. Thus, the CMP-Miner algorithm can efficiently mine the closed patterns from a time-series database. The experimental results show that our proposed algorithm outperforms the modified Apriori and BIDE algorithms. 相似文献

14.

Mining frequent closed patterns in pointset databases

Anthony J.T. Lee Wen-Kwang Tsao Po-Yin Chen Ming-Chih Lin Shih-Hui Yang 《Information Systems》2010

In this paper, we proposed an efficient algorithm, called PCP-Miner (Pointset Closed Pattern Miner), for mining frequent closed patterns from a pointset database, where a pointset contains a set of points. Our proposed algorithm consists of two phases. First, we find all frequent patterns of length two in the database. Second, for each pattern found in the first phase, we recursively generate frequent closed patterns by a frequent pattern tree in a depth-first search manner. Since the PCP-Miner does not generate unnecessary candidates, it is more efficient and scalable than the modified Apriori, SASMiner and MaxGeo. The experimental results show that the PCP-Miner algorithm outperforms the comparing algorithms by more than one order of magnitude. 相似文献

15.

Mining coverage patterns from transactional databases

P. Gowtham Srinivas P. Krishna Reddy A. V. Trinath S. Bhargav R. Uday Kiran 《Journal of Intelligent Information Systems》2015,45(3):423-439

相似文献

16.

Mining sequential patterns from probabilistic databases

Muzammal Muhammad Raman Rajeev 《Knowledge and Information Systems》2015,44(2):325-358

Knowledge and Information Systems - This paper considers the problem of sequential pattern mining (SPM) in probabilistic databases. Specifically, we consider SPM in situations where there is... 相似文献

17.

Mining itemset utilities from transaction databases 总被引：4，自引：0，他引：4

Hong Howard J. 《Data & Knowledge Engineering》2006,59(3):603-626

The rationale behind mining frequent itemsets is that only itemsets with high frequency are of interest to users. However, the practical usefulness of frequent itemsets is limited by the significance of the discovered itemsets. A frequent itemset only reflects the statistical correlation between items, and it does not reflect the semantic significance of the items. In this paper, we propose a utility based itemset mining approach to overcome this limitation. The proposed approach permits users to quantify their preferences concerning the usefulness of itemsets using utility values. The usefulness of an itemset is characterized as a utility constraint. That is, an itemset is interesting to the user only if it satisfies a given utility constraint. We show that the pruning strategies used in previous itemset mining approaches cannot be applied to utility constraints. In response, we identify several mathematical properties of utility constraints. Then, two novel pruning strategies are designed. Two algorithms for utility based itemset mining are developed by incorporating these pruning strategies. The algorithms are evaluated by applying them to synthetic and real world databases. Experimental results show that the proposed algorithms are effective on the databases tested. 相似文献

18.

Finding latent variable models in large databases

Richard Scheines Peter Spirtes 《国际智能系统杂志》1992,7(7):609-621

Structural equation models with latent variables are used widely in psychometrics, econometrics, and sociology to explore the causal relations among latent variables. Since such models often involve dozens of variables, the number of theoretically feasible alternatives can be astronomical. Without computational aids with which to search such a space, researchers can only explore a handful of alternative models. We describe a procedure that can find information about the causal structure among latent, or unmeasured variables. the procedure is asymptotically reliable, feasible on data sets with as many as a hundred variables, and has already proved useful in modeling an empirical data set collected by the U.S. Navy. © 1992 John Wiley & Sons, Inc. 相似文献

19.

Outlier detection from large distributed databases

Ji Zhang Xiaohui Tao Hua Wang 《World Wide Web》2014,17(4):539-568

In this paper, we present an innovative system, coined as DISTROD (a.k.a DISTRibuted Outlier Detector), for detecting outliers, namely abnormal instances or observations, from multiple large distributed databases. DISTROD is able to effectively detect the so-called global outliers from distributed databases that are consistent with those produced by the centralized detection paradigm. DISTROD is equipped with a number of optimization/boosting strategies which empower it to significantly enhance its speed performance and reduce its communication overhead. Experimental evaluation demonstrates the good performance of DISTROD in terms of speed and communication overhead. 相似文献

20.

Time separation technique for large databases

Kostovetsky Alex 《International journal of parallel programming》1983,12(3):193-209

International Journal of Parallel Programming - A time decomposition technique is suggested for large-database (DB) models. The problem of network aggregation is studied and the results used to... 相似文献