共查询到20条相似文献,搜索用时 0 毫秒
1.
Mining association rules are widely studied in data mining society. In this paper, we analyze the measure method of support–confidence framework for mining association rules, from which we find it tends to mine many redundant or unrelated rules besides the interesting ones. In order to ameliorate the criterion, we propose a new method of match as the substitution of confidence. We analyze in detail the property of the proposed measurement. Experimental results show that the generated rules by the improved method reveal high correlation between the antecedent and the consequent when the rules were compared with that produced by the support–confidence framework. Furthermore, the improved method decreases the generation of redundant rules. 相似文献
2.
Efficient algorithms for distortion and blocking techniques in association rule hiding 总被引:1,自引:0,他引:1
Vassilios S. Verykios Emmanuel D. Pontikakis Yannis Theodoridis Liwu Chang 《Distributed and Parallel Databases》2007,22(1):85-104
Data mining provides the opportunity to extract useful information from large databases. Various techniques have been proposed
in this context in order to extract this information in the most efficient way. However, efficiency is not our only concern
in this study. The security and privacy issues over the extracted knowledge must be seriously considered as well. By taking
this into consideration, we study the procedure of hiding sensitive association rules in binary data sets by blocking some
data values and we present an algorithm for solving this problem. We also provide a fuzzification of the support and the confidence
of an association rule in order to accommodate for the existence of blocked/unknown values. In addition, we quantitatively
compare the proposed algorithm with other already published algorithms by running experiments on binary data sets, and we
also qualitatively compare the efficiency of the proposed algorithm in hiding association rules. We utilize the notion of
border rules, by putting weights in each rule, and we use effective data structures for the representation of the rules so
as (a) to minimize the side effects created by the hiding process and (b) to speed up the selection of the victim transactions.
Finally, we study the overall security of the modified database, using the C4.5 decision tree algorithm of the WEKA data mining
tool, and we discuss the advantages and the limitations of blocking. 相似文献
3.
It has been well recognized that product portfolio planning has far-reaching impact on the company's business success in competition. In general, product portfolio planning involves two main stages, namely portfolio identification and portfolio evaluation and selection. The former aims to capture and understand customer needs effectively and accordingly to transform them into specifications of product offerings. The latter concerns how to determine an optimal configuration of these identified offerings with the objective of achieving best profit performance. Current research and industrial practice have mainly focused on the economic justification of a given product portfolio, whereas the portfolio identification issue has been received only limited attention. This article intends to develop explicit decision support to improve product portfolio identification by efficient knowledge discovery from past sales and product records. As one of the important applications of data mining, association rule mining lends itself to the discovery of useful patterns associated with requirement analysis enacted among customers, marketing folks, and designers. An association rule mining system (ARMS) is proposed for effective product portfolio identification. Based on a scrutiny into the product definition process, the article studies the fundamental issues underlying product portfolio identification. The ARMS differentiates the customer needs from functional requirements involved in the respective customer and functional domains. Product portfolio identification entails the identification of functional requirement clusters in conjunction with the mappings from customer needs to these clusters. While clusters of functional requirements are identified based on fuzzy clustering analysis, the mapping mechanism between the customer and functional domains is incarnated in association rules. The ARMS architecture and implementation issues are discussed in detail. An application of the proposed methodology and system in a consumer electronics company to generate a vibration motor portfolio for mobile phones is also presented. 相似文献
4.
Quantitative association rule (QAR) mining has been recognized an influential research problem over the last decade due to
the popularity of quantitative databases and the usefulness of association rules in real life. Unlike boolean association
rules (BARs), which only consider boolean attributes, QARs consist of quantitative attributes which contain much richer information
than the boolean attributes. However, the combination of these quantitative attributes and their value intervals always gives
rise to the generation of an explosively large number of itemsets, thereby severely degrading the mining efficiency. In this
paper, we propose an information-theoretic approach to avoid unrewarding combinations of both the attributes and their value
intervals being generated in the mining process. We study the mutual information between the attributes in a quantitative
database and devise a normalization on the mutual information to make it applicable in the context of QAR mining. To indicate
the strong informative relationships among the attributes, we construct a mutual information graph (MI graph), whose edges
are attribute pairs that have normalized mutual information no less than a predefined information threshold. We find that
the cliques in the MI graph represent a majority of the frequent itemsets. We also show that frequent itemsets that do not
form a clique in the MI graph are those whose attributes are not informatively correlated to each other. By utilizing the
cliques in the MI graph, we devise an efficient algorithm that significantly reduces the number of value intervals of the
attribute sets to be joined during the mining process. Extensive experiments show that our algorithm speeds up the mining
process by up to two orders of magnitude. Most importantly, we are able to obtain most of the high-confidence QARs, whereas
the QARs that are not returned by MIC are shown to be less interesting. 相似文献
5.
Computing the minimum-support for mining frequent patterns 总被引:4,自引:4,他引:0
Shichao Zhang Xindong Wu Chengqi Zhang Jingli Lu 《Knowledge and Information Systems》2008,15(2):233-257
Frequent pattern mining is based on the assumption that users can specify the minimum-support for mining their databases.
It has been recognized that setting the minimum-support is a difficult task to users. This can hinder the widespread applications
of these algorithms. In this paper we propose a computational strategy for identifying frequent itemsets, consisting of polynomial
approximation and fuzzy estimation. More specifically, our algorithms (polynomial approximation and fuzzy estimation) automatically
generate actual minimum-supports (appropriate to a database to be mined) according to users’ mining requirements. We experimentally
examine the algorithms using different datasets, and demonstrate that our fuzzy estimation algorithm fittingly approximates
actual minimum-supports from the commonly-used requirements.
This work is partially supported by Australian ARC grants for discovery projects (DP0449535, DP0559536 and DP0667060), a China
NSF Major Research Program (60496327), a China NSF grant (60463003), an Overseas Outstanding Talent Research Program of the
Chinese Academy of Sciences (06S3011S01), and an Overseas-Returning High-level Talent Research Program of China Human-Resource
Ministry.
A preliminary and shortened version of this paper has been published in the Proceedings of the 8th Pacific Rim International
Conference on Artificial Intelligence (PRICAI ’04). 相似文献
6.
提出一种基于免疫原理的人工免疫算法,用于模糊关联规则的挖掘.该算法通过借鉴生物免疫系统中的克隆选择原理来实施优化操作,它直接从给出的数据中,通过优化机制自动确定每个属性对应的模糊集合,使推导出的满足条件的模糊关联规则数目最多.将实际数据集和相关算法进行性能比较,实验结果表明了所提出算法的有效性. 相似文献
7.
Bilal Sowan Keshav Dahal M.A. Hossain Li Zhang Linda Spencer 《Expert systems with applications》2013,40(17):6928-6937
This paper presents an investigation into two fuzzy association rule mining models for enhancing prediction performance. The first model (the FCM–Apriori model) integrates Fuzzy C-Means (FCM) and the Apriori approach for road traffic performance prediction. FCM is used to define the membership functions of fuzzy sets and the Apriori approach is employed to identify the Fuzzy Association Rules (FARs). The proposed model extracts knowledge from a database for a Fuzzy Inference System (FIS) that can be used in prediction of a future value. The knowledge extraction process and the performance of the model are demonstrated through two case studies of road traffic data sets with different sizes. The experimental results show the merits and capability of the proposed KD model in FARs based knowledge extraction. The second model (the FCM–MSapriori model) integrates FCM and a Multiple Support Apriori (MSapriori) approach to extract the FARs. These FARs provide the knowledge base to be utilized within the FIS for prediction evaluation. Experimental results have shown that the FCM–MSapriori model predicted the future values effectively and outperformed the FCM–Apriori model and other models reported in the literature. 相似文献
8.
Multi-objective PSO algorithm for mining numerical association rules without a priori discretization
《Expert systems with applications》2014,41(9):4259-4273
In the domain of association rules mining (ARM) discovering the rules for numerical attributes is still a challenging issue. Most of the popular approaches for numerical ARM require a priori data discretization to handle the numerical attributes. Moreover, in the process of discovering relations among data, often more than one objective (quality measure) is required, and in most cases, such objectives include conflicting measures. In such a situation, it is recommended to obtain the optimal trade-off between objectives. This paper deals with the numerical ARM problem using a multi-objective perspective by proposing a multi-objective particle swarm optimization algorithm (i.e., MOPAR) for numerical ARM that discovers numerical association rules (ARs) in only one single step. To identify more efficient ARs, several objectives are defined in the proposed multi-objective optimization approach, including confidence, comprehensibility, and interestingness. Finally, by using the Pareto optimality the best ARs are extracted. To deal with numerical attributes, we use rough values containing lower and upper bounds to show the intervals of attributes. In the experimental section of the paper, we analyze the effect of operators used in this study, compare our method to the most popular evolutionary-based proposals for ARM and present an analysis of the mined ARs. The results show that MOPAR extracts reliable (with confidence values close to 95%), comprehensible, and interesting numerical ARs when attaining the optimal trade-off between confidence, comprehensibility and interestingness. 相似文献
9.
Most incremental mining and online mining algorithms concentrate on finding association rules or patterns consistent with entire current sets of data. Users cannot easily obtain results from only interesting portion of data. This may prevent the usage of mining from online decision support for multidimensional data. To provide ad-hoc, query-driven, and online mining support, we first propose a relation called the multidimensional pattern relation to structurally and systematically store context and mining information for later analysis. Each tuple in the relation comes from an inserted dataset in the database. We then develop an online mining approach called three-phase online association rule mining (TOARM) based on this proposed multidimensional pattern relation to support online generation of association rules under multidimensional considerations. The TOARM approach consists of three phases during which final sets of patterns satisfying various mining requests are found. It first selects and integrates related mining information in the multidimensional pattern relation, and then if necessary, re-processes itemsets without sufficient information against the underlying datasets. Some implementation considerations for the algorithm are also stated in detail. Experiments on homogeneous and heterogeneous datasets were made and the results show the effectiveness of the proposed approach. 相似文献
10.
Laure Berti-Équille 《Knowledge and Information Systems》2007,11(2):191-215
The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence)
with the purpose of supplying indicators to the user in the understanding and use of the new discovered knowledge. Low-quality
datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder if
a so-called “interesting” rule noted LHS→ RHS is meaningful when 30% of the LHS data are not up-to-date anymore, 20% of the RHS data are not accurate, and 15% of the LHS data come from a data source that is well-known for its bad credibility. This paper presents an overview of data quality
characterization and management techniques that can be advantageously employed for improving the quality awareness of the
knowledge discovery and data mining processes. We propose to integrate data quality indicators for quality aware association
rule mining. We propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-Cup-98 datasets show that variations on data quality have a great impact on the
cost and quality of discovered association rules and confirm our approach for the integrated management of data quality indicators
into the KDD process that ensure the quality of data mining results. 相似文献
11.
This paper addresses the integration of fuzziness with On-Line Analytical Processing (OLAP) based association rules mining. It contributes to the ongoing research on multidimensional online association rules
mining by proposing a general architecture that utilizes a fuzzy data cube for knowledge discovery. A data cube is mainly constructed to provide users with the flexibility to view data from different
perspectives as some dimensions of the cube contain multiple levels of abstraction. The first step of the process described
in this paper involves introducing fuzzy data cube as a remedy to the problem of handling quantitative values of dimensional
attributes in a cube. This facilitates the online mining of fuzzy association rules at different levels within the constructed
fuzzy data cube. Then, we investigate combining the concepts of weight and multiple-level to mine fuzzy weighted multi-cross-level
association rules from the constructed fuzzy data cube. For this purpose, three different methods are introduced for single
dimension, multidimensional and hybrid (integrates the other two methods) fuzzy weighted association rules mining. Each of
the three methods utilizes a fuzzy data cube constructed to suite the particular method. To the best of our knowledge, this
is the first effort in this direction. We compared the proposed approach to an existing approach that does not utilize fuzziness.
Experimental results obtained for each of the three methods on a synthetic dataset and on the adult data of the United States
census in year 2000 demonstrate the effectiveness and applicability of the proposed fuzzy OLAP based mining approach.
OLAP is one of the most popular tools for on-line, fast and effective multidimensional data analysis.
In the OLAP framework, data is mainly stored in data hypercubes (simply called cubes). 相似文献
12.
Samet Çokp?nar Taflan ?mre Gündem 《Expert systems with applications》2012,39(8):7503-7511
In recent years, data mining has become one of the most popular techniques for data owners to determine their strategies. Association rule mining is a data mining approach that is used widely in traditional databases and usually to find the positive association rules. However, there are some other challenging rule mining topics like data stream mining and negative association rule mining. Besides, organizations want to concentrate on their own business and outsource the rest of their work. This approach is named “database as a service concept” and provides lots of benefits to data owner, but, at the same time, brings out some security problems. In this paper, a rule mining system has been proposed that provides efficient and secure solution to positive and negative association rule computation on XML data streams in database as a service concept. The system is implemented and several experiments have been done with different synthetic data sets to show the performance and efficiency of the proposed system. 相似文献
13.
In this paper, we propose an efficient method for mining all frequent inter-transaction patterns. The method consists of two phases. First, we devise two data structures: a dat-list, which stores the item information used to find frequent inter-transaction patterns; and an ITP-tree, which stores the discovered frequent inter-transaction patterns. In the second phase, we apply an algorithm, called ITP-Miner (Inter-Transaction Patterns Miner), to mine all frequent inter-transaction patterns. By using the ITP-tree, the algorithm requires only one database scan and can localize joining, pruning, and support counting to a small number of dat-lists. The experiment results show that the ITP-Miner algorithm outperforms the FITI (First Intra Then Inter) algorithm by one order of magnitude. 相似文献
14.
Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule mining 总被引:2,自引:1,他引:2
We present two algorithms for learning large-scale gene regulatory networks from microarray data: a modified information-theory-based Bayesian network algorithm and a modified association rule algorithm. Simulation-based evaluation using six datasets indicated that both algorithms outperformed their unmodified counterparts, especially when analyzing large numbers of genes. Both algorithms learned about 20% (50% if directionality and relation type were not considered) of the relations in the actual models. In our empirical evaluation based on two real datasets, domain experts evaluated subsets of learned relations with high confidence and identified 20–30% to be “interesting” or “maybe interesting” as potential experiment hypotheses. 相似文献
15.
《Expert systems with applications》2014,41(6):2914-2938
Multilevel knowledge in transactional databases plays a significant role in our real-life market basket analysis. Many researchers have mined the hierarchical association rules and thus proposed various approaches. However, some of the existing approaches produce many multilevel and cross-level association rules that fail to convey quality information. From these large number of redundant association rules, it is extremely difficult to extract any meaningful information. There also exist some approaches that mine minimal association rules, but these have many shortcomings due to their naïve-based approaches. In this paper, we have focused on the need for generating hierarchical minimal rules that provide maximal information. An algorithm has been proposed to derive minimal multilevel association rules and cross-level association rules. Our work has made significant contributions in mining the minimal cross-level association rules, which express the mixed relationship between the generalized and specialized view of the transaction itemsets. We are the first to design an efficient algorithm using a closed itemset lattice-based approach, which can mine the most relevant minimal cross-level association rules. The parent–child relationship of the lattices has been exploited while mining cross-level closed itemset lattices. We have extensively evaluated our proposed algorithm’s efficiency using a variety of real-life datasets and performing a large number of experiments. The proposed algorithm has outperformed the existing related work significantly during the pervasive performance comparison. 相似文献
16.
Sheng Zhong 《Information Sciences》2007,177(2):490-503
Standard algorithms for association rule mining are based on identification of frequent itemsets. In this paper, we study how to maintain privacy in distributed mining of frequent itemsets. That is, we study how two (or more) parties can find frequent itemsets in a distributed database without revealing each party’s portion of the data to the other. The existing solution for vertically partitioned data leaks a significant amount of information, while the existing solution for horizontally partitioned data only works for three parties or more. In this paper, we design algorithms for both vertically and horizontally partitioned data, with cryptographically strong privacy. We give two algorithms for vertically partitioned data; one of them reveals only the support count and the other reveals nothing. Both of them have computational overheads linear in the number of transactions. Our algorithm for horizontally partitioned data works for two parties and above and is more efficient than the existing solution. 相似文献
17.
Chin-Feng Lee S. Wesley Changchien Wei-Tse Wang Jau-Ji Shen 《Information Systems Frontiers》2006,8(3):147-161
Data mining can dig out valuable information from databases to assist a business in approaching knowledge discovery and improving
business intelligence. Database stores large structured data. The amount of data increases due to the advanced database technology
and extensive use of information systems. Despite the price drop of storage devices, it is still important to develop efficient
techniques for database compression. This paper develops a database compression method by eliminating redundant data, which
often exist in transaction database. The proposed approach uses a data mining structure to extract association rules from
a database. Redundant data will then be replaced by means of compression rules. A heuristic method is designed to resolve
the conflicts of the compression rules. To prove its efficiency and effectiveness, the proposed approach is compared with
two other database compression methods.
Chin-Feng Lee is an associate professor with the Department of Information Management at Chaoyang University of Technology, Taiwan, R.O.C.
She received her M.S. and Ph.D. degrees in 1994 and 1998, respectively, from the Department of Computer Science and Information
Engineering at National Chung Cheng University. Her current research interests include database design, image processing and
data mining techniques.
S. Wesley Changchien is a professor with the Institute of Electronic Commerce at National Chung-Hsing University, Taiwan, R.O.C. He received a
BS degree in Mechanical Engineering (1989) and completed his MS (1993) and Ph.D. (1996) degrees in Industrial Engineering
at State University of New York at Buffalo, USA. His current research interests include electronic commerce, internet/database
marketing, knowledge management, data mining, and decision support systems.
Jau-Ji Shen received his Ph.D. degree in Information Engineering and Computer Science from National Taiwan University at Taipei, Taiwan
in 1988. From 1988 to 1994, he was the leader of the software group in Institute of Aeronautic, Chung-Sung Institute of Science
and Technology. He is currently an associate professor of information management department in the National Chung Hsing University
at Taichung. His research areas focus on the digital multimedia, database and information security. His current research areas
focus on data engineering, database techniques and information security.
Wei-Tse Wang received the B.A. (2001) and M.B.A (2003) degrees in Information Management at Chaoyang University of Technology, Taiwan,
R.O.C. His research interests include data mining, XML, and database compression. 相似文献
18.
Privacy preserving association rule mining has been an active research area since recently. To this problem, there have been
two different approaches—perturbation based and secure multiparty computation based. One drawback of the perturbation based
approach is that it cannot always fully preserve individual’s privacy while achieving precision of mining results. The secure
multiparty computation based approach works only for distributed environment and needs sophisticated protocols, which constrains
its practical usage. In this paper, we propose a new approach for preserving privacy in association rule mining. The main
idea is to use keyed Bloom filters to represent transactions as well as data items. The proposed approach can fully preserve
privacy while maintaining the precision of mining results. The tradeoff between mining precision and storage requirement is
investigated. We also propose δ-folding technique to further reduce the storage requirement without sacrificing mining precision and running time. 相似文献
19.
《Expert systems with applications》2014,41(5):2259-2268
Business rules are an effective way to control data quality. Business experts can directly enter the rules into appropriate software without error prone communication with programmers. However, not all business situations and possible data quality problems can be considered in advance. In situations where business rules have not been defined yet, patterns of data handling may arise in practice. We employ data mining to accounting transactions in order to discover such patterns. The discovered patterns are represented in form of association rules. Then, deviations from discovered patterns can be marked as potential data quality violations that need to be examined by humans. Data quality breaches can be expensive but manual examination of many transactions is also expensive. Therefore, the goal is to find a balance between marking too many and too few transactions as being potentially erroneous. We apply appropriate procedures to evaluate the classification accuracy of developed association rules and support the decision on the number of deviations to be manually examined based on economic principles. 相似文献
20.
关联规则挖掘Apriori算法的改进 总被引:3,自引:0,他引:3
在分析研究关联规则挖掘Apriori算法及其若干改进算法的基础上,对Apriori算法做了进一步地改进,提出一种基于条件判断的新思想.改进后的算法根据条件采用了事务压缩与候选项压缩的相结合的方式,减小了不必要的开销,从而提高了挖掘速度. 相似文献