共查询到20条相似文献,搜索用时 15 毫秒
1.
Data mining is most commonly used in attempts to induce association rules from transaction data. In the past, we used the fuzzy and GA concepts to discover both useful fuzzy association rules and suitable membership functions from quantitative values. The evaluation for fitness values was, however, quite time-consuming. Due to dramatic increases in available computing power and concomitant decreases in computing costs over the last decade, learning or mining by applying parallel processing techniques has become a feasible way to overcome the slow-learning problem. In this paper, we thus propose a parallel genetic-fuzzy mining algorithm based on the master–slave architecture to extract both association rules and membership functions from quantitative transactions. The master processor uses a single population as a simple genetic algorithm does, and distributes the tasks of fitness evaluation to slave processors. The evolutionary processes, such as crossover, mutation and production are performed by the master processor. It is very natural and efficient to run the proposed algorithm on the master–slave architecture. The time complexities for both sequential and parallel genetic-fuzzy mining algorithms have also been analyzed, with results showing the good effect of the proposed one. When the number of generations is large, the speed-up can be nearly linear. The experimental results also show this point. Applying the master–slave parallel architecture to speed up the genetic-fuzzy data mining algorithm is thus a feasible way to overcome the low-speed fitness evaluation problem of the original algorithm. 相似文献
2.
Time series analysis has always been an important and interesting research field due to its frequent appearance in different applications. In the past, many approaches based on regression, neural networks and other mathematical models were proposed to analyze the time series. In this paper, we attempt to use the data mining technique to analyze time series. Many previous studies on data mining have focused on handling binary-valued data. Time series data, however, are usually quantitative values. We thus extend our previous fuzzy mining approach for handling time-series data to find linguistic association rules. The proposed approach first uses a sliding window to generate continues subsequences from a given time series and then analyzes the fuzzy itemsets from these subsequences. Appropriate post-processing is then performed to remove redundant patterns. Experiments are also made to show the performance of the proposed mining algorithm. Since the final results are represented by linguistic rules, they will be friendlier to human than quantitative representation. 相似文献
3.
Data mining is most commonly used in attempts to induce association rules from transaction data. Transactions in real-world applications, however, usually consist of quantitative values. This paper thus proposes a fuzzy data-mining algorithm for extracting both association rules and membership functions from quantitative transactions. We present a GA-based framework for finding membership functions suitable for mining problems and then use the final best set of membership functions to mine fuzzy association rules. The fitness of each chromosome is evaluated by the number of large 1-itemsets generated from part of the previously proposed fuzzy mining algorithm and by the suitability of the membership functions. Experimental results also show the effectiveness of the framework. 相似文献
4.
Data mining mechanisms have widely been applied in various businesses and manufacturing companies across many industry sectors. Sharing data or sharing mined rules has become a trend among business partnerships, as it is perceived to be a mutually benefit way of increasing productivity for all parties involved. Nevertheless, this has also increased the risk of unexpected information leaks when releasing data. To conceal restrictive itemsets (patterns) contained in the source database, a sanitization process transforms the source database into a released database that the counterpart cannot extract sensitive rules from. The transformed result also conceals non-restrictive information as an unwanted event, called a side effect or the “misses cost”. The problem of finding an optimal sanitization method, which conceals all restrictive itemsets but minimizes the misses cost, is NP-hard. To address this challenging problem, this study proposes the maximum item conflict first (MICF) algorithm. Experimental results demonstrate that the proposed method is effective, has a low sanitization rate, and can generally achieve a significantly lower misses cost than those achieved by the MinFIA, MaxFIA, IGA and Algo2b methods in several real and artificial datasets. 相似文献
5.
Prediction of liquefaction is an important subject in geotechnical engineering. Prediction of liquefaction is also a complex problem as it depends on many different physical factors, and the relations between these factors are highly non-linear and complex. Several approaches have been proposed in the literature for modeling and prediction of liquefaction. Most of these approaches are based on classical statistical approaches and neural networks. In this paper a new approach which is based on classification data mining is proposed first time in the literature for liquefaction prediction. The proposed approach is based on extracting accurate classification rules from neural networks via ant colony optimization. The extracted classification rules are in the form of IF–THEN rules which can be easily understood by human. The proposed algorithm is also compared with several other data mining algorithms. It is shown that the proposed algorithm is very effective and accurate in prediction of liquefaction. 相似文献
6.
提出一种基于免疫原理的人工免疫算法,用于模糊关联规则的挖掘.该算法通过借鉴生物免疫系统中的克隆选择原理来实施优化操作,它直接从给出的数据中,通过优化机制自动确定每个属性对应的模糊集合,使推导出的满足条件的模糊关联规则数目最多.将实际数据集和相关算法进行性能比较,实验结果表明了所提出算法的有效性. 相似文献
7.
Mining association rules and mining sequential patterns both are to discover customer purchasing behaviors from a transaction database, such that the quality of business decision can be improved. However, the size of the transaction database can be very large. It is very time consuming to find all the association rules and sequential patterns from a large database, and users may be only interested in some information. Moreover, the criteria of the discovered association rules and sequential patterns for the user requirements may not be the same. Many uninteresting information for the user requirements can be generated when traditional mining methods are applied. Hence, a data mining language needs to be provided such that users can query only interesting knowledge to them from a large database of customer transactions. In this paper, a data mining language is presented. From the data mining language, users can specify the interested items and the criteria of the association rules or sequential patterns to be discovered. Also, the efficient data mining techniques are proposed to extract the association rules and the sequential patterns according to the user requirements. 相似文献
8.
This paper addresses the integration of fuzziness with On-Line Analytical Processing (OLAP) based association rules mining. It contributes to the ongoing research on multidimensional online association rules
mining by proposing a general architecture that utilizes a fuzzy data cube for knowledge discovery. A data cube is mainly constructed to provide users with the flexibility to view data from different
perspectives as some dimensions of the cube contain multiple levels of abstraction. The first step of the process described
in this paper involves introducing fuzzy data cube as a remedy to the problem of handling quantitative values of dimensional
attributes in a cube. This facilitates the online mining of fuzzy association rules at different levels within the constructed
fuzzy data cube. Then, we investigate combining the concepts of weight and multiple-level to mine fuzzy weighted multi-cross-level
association rules from the constructed fuzzy data cube. For this purpose, three different methods are introduced for single
dimension, multidimensional and hybrid (integrates the other two methods) fuzzy weighted association rules mining. Each of
the three methods utilizes a fuzzy data cube constructed to suite the particular method. To the best of our knowledge, this
is the first effort in this direction. We compared the proposed approach to an existing approach that does not utilize fuzziness.
Experimental results obtained for each of the three methods on a synthetic dataset and on the adult data of the United States
census in year 2000 demonstrate the effectiveness and applicability of the proposed fuzzy OLAP based mining approach.
OLAP is one of the most popular tools for on-line, fast and effective multidimensional data analysis.
In the OLAP framework, data is mainly stored in data hypercubes (simply called cubes). 相似文献
9.
Analyzing bank databases for customer behavior management is difficult since bank databases are multi-dimensional, comprised of monthly account records and daily transaction records. This study proposes an integrated data mining and behavioral scoring model to manage existing credit card customers in a bank. A self-organizing map neural network was used to identify groups of customers based on repayment behavior and recency, frequency, monetary behavioral scoring predicators. It also classified bank customers into three major profitable groups of customers. The resulting groups of customers were then profiled by customer's feature attributes determined using an Apriori association rule inducer. This study demonstrates that identifying customers by a behavioral scoring model is helpful characteristics of customer and facilitates marketing strategy development. 相似文献
10.
In recent years, data mining has become one of the most popular techniques for data owners to determine their strategies. Association rule mining is a data mining approach that is used widely in traditional databases and usually to find the positive association rules. However, there are some other challenging rule mining topics like data stream mining and negative association rule mining. Besides, organizations want to concentrate on their own business and outsource the rest of their work. This approach is named “database as a service concept” and provides lots of benefits to data owner, but, at the same time, brings out some security problems. In this paper, a rule mining system has been proposed that provides efficient and secure solution to positive and negative association rule computation on XML data streams in database as a service concept. The system is implemented and several experiments have been done with different synthetic data sets to show the performance and efficiency of the proposed system. 相似文献
11.
In the field of data mining, an important issue for association rules generation is frequent itemset discovery, which is the key factor in implementing association rule mining. Therefore, this study considers the user’s assigned constraints in the mining process. Constraint-based mining enables users to concentrate on mining itemsets that are interesting to themselves, which improves the efficiency of mining tasks. In addition, in the real world, users may prefer recording more than one attribute and setting multi-dimensional constraints. Thus, this study intends to solve the multi-dimensional constraints problem for association rules generation.The ant colony system (ACS) is one of the newest meta-heuristics for combinatorial optimization problems, and this study uses the ant colony system to mine a large database to find the association rules effectively. If this system can consider multi-dimensional constraints, the association rules will be generated more effectively. Therefore, this study proposes a novel approach of applying the ant colony system for extracting the association rules from the database. In addition, the multi-dimensional constraints are taken into account. The results using a real case, the National Health Insurance Research Database, show that the proposed method is able to provide more condensed rules than the Apriori method. The computational time is also reduced. 相似文献
12.
Frequent itemset mining aims at discovering patterns the supports of which are beyond a given threshold. In many applications, including network event management systems, which motivated this work, patterns are composed of items each described by a subset of attributes of a relational table. As it involves an exponential mining space, the efficient implementation of user preferences and mining constraints becomes the first priority for a mining algorithm. User preferences and mining constraints are often expressed using patterns attribute structures. Unlike traditional methods that mine all frequent patterns indiscriminately, we regard frequent itemset mining as a two-step process: the mining of the pattern structures and the mining of patterns within each pattern structure. In this paper, we present a novel architecture that uses pattern structures to organize the mining space. In comparison with the previous techniques, the advantage of our approach is two-fold: (i) by exploiting the interrelationships among pattern structures, execution times for mining can be reduced significantly; and (ii) more importantly, it enables us to incorporate high-level simple user preferences and mining constraints into the mining process efficiently. These advantages are demonstrated by our experiments using both synthetic and real-life datasets. 相似文献
13.
中药文化的地区差异带来了中医药数据的众多不确定性,为解决基于数据挖掘的新药研制决策支持系统的数据问题,提出了一套规范原始中医药数据的处理方法。应用了数据归约技术、聚类的方法、模糊集理论改进了中医药数据的质量,使得在预处理后的中药方剂数据库中成功挖掘出重要规则,为研制中药新药提供了有力的决策支持。 相似文献
14.
Utility of an itemset is considered as the value of this itemset, and utility mining aims at identifying the itemsets with high utilities. The temporal high utility itemsets are the itemsets whose support is larger than a pre-specified threshold in current time window of the data stream. Discovery of temporal high utility itemsets is an important process for mining interesting patterns like association rules from data streams. In this paper, we propose a novel method, namely THUI ( Temporal High Utility Itemsets)- Mine, for mining temporal high utility itemsets from data streams efficiently and effectively. To the best of our knowledge, this is the first work on mining temporal high utility itemsets from data streams. The novel contribution of THUI-Mine is that it can effectively identify the temporal high utility itemsets by generating fewer candidate itemsets such that the execution time can be reduced substantially in mining all high utility itemsets in data streams. In this way, the process of discovering all temporal high utility itemsets under all time windows of data streams can be achieved effectively with less memory space and execution time. This meets the critical requirements on time and space efficiency for mining data streams. Through experimental evaluation, THUI-Mine is shown to significantly outperform other existing methods like Two-Phase algorithm under various experimental conditions. 相似文献
15.
In this paper, we propose a novel integrated framework combining association rule mining, case-based-reasoning and text mining that can be used to continuously improve service and repair in an automotive domain. The developed framework enables identification of anomalies in the field that cause customer dissatisfaction and performs root cause investigation of the anomalies. It also facilitates identification of the best practices in the field and learning from these best practices to achieve lean and effective service. Association rule mining is used for the anomaly detection and the root cause investigation, while case-based-reasoning in conjunction with text mining is used to learn from the best practices. The integrated system is implemented in a web based distributed architecture and has been tested on real life data. 相似文献
16.
We present a new distributed association rule mining (D-ARM) algorithm that demonstrates superlinear speed-up with the number of computing nodes. The algorithm is the first D-ARM algorithm to perform a single scan over the database. As such, its performance is unmatched by any previous algorithm. Scale-up experiments over standard synthetic benchmarks demonstrate stable run time regardless of the number of computers. Theoretical analysis reveals a tighter bound on error probability than the one shown in the corresponding sequential algorithm. As a result of this tighter bound and by utilizing the combined memory of several computers, the algorithm generates far fewer candidates than comparable sequential algorithms—the same order of magnitude as the optimum. 相似文献
17.
Time series data mining (TSDM) techniques permit exploring large amounts of time series data in search of consistent patterns and/or interesting relationships between variables. TSDM is becoming increasingly important as a knowledge management tool where it is expected to reveal knowledge structures that can guide decision making in conditions of limited certainty. Human decision making in problems related with analysis of time series databases is usually based on perceptions like “end of the day”, “high temperature”, “quickly increasing”, “possible”, etc. Though many effective algorithms of TSDM have been developed, the integration of TSDM algorithms with human decision making procedures is still an open problem. In this paper, we consider architecture of perception-based decision making system in time series databases domains integrating perception-based TSDM, computing with words and perceptions, and expert knowledge. The new tasks which should be solved by the perception-based TSDM methods to enable their integration in such systems are discussed. These tasks include: precisiation of perceptions, shape pattern identification, and pattern retranslation. We show how different methods developed so far in TSDM for manipulation of perception-based information can be used for development of a fuzzy perception-based TSDM approach. This approach is grounded in computing with words and perceptions permitting to formalize human perception-based inference mechanisms. The discussion is illustrated by examples from economics, finance, meteorology, medicine, etc. 相似文献
18.
In this paper, we propose an efficient method for mining all frequent inter-transaction patterns. The method consists of two phases. First, we devise two data structures: a dat-list, which stores the item information used to find frequent inter-transaction patterns; and an ITP-tree, which stores the discovered frequent inter-transaction patterns. In the second phase, we apply an algorithm, called ITP-Miner (Inter-Transaction Patterns Miner), to mine all frequent inter-transaction patterns. By using the ITP-tree, the algorithm requires only one database scan and can localize joining, pruning, and support counting to a small number of dat-lists. The experiment results show that the ITP-Miner algorithm outperforms the FITI (First Intra Then Inter) algorithm by one order of magnitude. 相似文献
19.
Data mining can dig out valuable information from databases to assist a business in approaching knowledge discovery and improving
business intelligence. Database stores large structured data. The amount of data increases due to the advanced database technology
and extensive use of information systems. Despite the price drop of storage devices, it is still important to develop efficient
techniques for database compression. This paper develops a database compression method by eliminating redundant data, which
often exist in transaction database. The proposed approach uses a data mining structure to extract association rules from
a database. Redundant data will then be replaced by means of compression rules. A heuristic method is designed to resolve
the conflicts of the compression rules. To prove its efficiency and effectiveness, the proposed approach is compared with
two other database compression methods.
Chin-Feng Lee is an associate professor with the Department of Information Management at Chaoyang University of Technology, Taiwan, R.O.C.
She received her M.S. and Ph.D. degrees in 1994 and 1998, respectively, from the Department of Computer Science and Information
Engineering at National Chung Cheng University. Her current research interests include database design, image processing and
data mining techniques.
S. Wesley Changchien is a professor with the Institute of Electronic Commerce at National Chung-Hsing University, Taiwan, R.O.C. He received a
BS degree in Mechanical Engineering (1989) and completed his MS (1993) and Ph.D. (1996) degrees in Industrial Engineering
at State University of New York at Buffalo, USA. His current research interests include electronic commerce, internet/database
marketing, knowledge management, data mining, and decision support systems.
Jau-Ji Shen received his Ph.D. degree in Information Engineering and Computer Science from National Taiwan University at Taipei, Taiwan
in 1988. From 1988 to 1994, he was the leader of the software group in Institute of Aeronautic, Chung-Sung Institute of Science
and Technology. He is currently an associate professor of information management department in the National Chung Hsing University
at Taichung. His research areas focus on the digital multimedia, database and information security. His current research areas
focus on data engineering, database techniques and information security.
Wei-Tse Wang received the B.A. (2001) and M.B.A (2003) degrees in Information Management at Chaoyang University of Technology, Taiwan,
R.O.C. His research interests include data mining, XML, and database compression. 相似文献
20.
An effective incident information management system needs to deal with several challenges. It must support heterogeneous distributed incident data, allow decision makers (DMs) to detect anomalies and extract useful knowledge, assist DMs in evaluating the risks and selecting an appropriate alternative during an incident, and provide differentiated services to satisfy the requirements of different incident management phases. To address these challenges, this paper proposes an incident information management framework that consists of three major components. The first component is a high-level data integration module in which heterogeneous data sources are integrated and presented in a uniform format. The second component is a data mining module that uses data mining methods to identify useful patterns and presents a process to provide differentiated services for pre-incident and post-incident information management. The third component is a multi-criteria decision-making (MCDM) module that utilizes MCDM methods to assess the current situation, find the satisfactory solutions, and take appropriate responses in a timely manner. To validate the proposed framework, this paper conducts a case study on agrometeorological disasters that occurred in China between 1997 and 2001. The case study demonstrates that the combination of data mining and MCDM methods can provide objective and comprehensive assessments of incident risks. 相似文献
|