共查询到20条相似文献,搜索用时 0 毫秒
2.
This paper proposes a methodology for text mining relying on the classical knowledge discovery loop, with a number of adaptations.
First, texts are indexed and prepared to be processed by frequent itemset levelwise search. Association rules are then extracted
and interpreted, with respect to a set of quality measures and domain knowledge, under the control of an analyst. The article
includes an experimentation on a real-world text corpus holding on molecular biology. 相似文献
3.
This paper introduces a new approach to a problem of data sharing among multiple parties, without disclosing the data between the parties. Our focus is data sharing among parties involved in a data mining task. We study how to share private or confidential data in the following scenario: multiple parties, each having a private data set, want to collaboratively conduct association rule mining without disclosing their private data to each other or any other parties. To tackle this demanding problem, we develop a secure protocol for multiple parties to conduct the desired computation. The solution is distributed, i.e., there is no central, trusted party having access to all the data. Instead, we define a protocol using homomorphic encryption techniques to exchange the data while keeping it private. 相似文献
4.
Traditional temporal association rules mining algorithms cannot dynamically update the temporal association rules within the valid time interval with increasing data. In this paper, a new algorithm called incremental fuzzy temporal association rule mining using fuzzy grid table (IFTARMFGT) is proposed by combining the advantages of boolean matrix with incremental mining. First, multivariate time series data are transformed into discrete fuzzy values that contain the time intervals and fuzzy membership. Second, in order to improve the mining efficiency, the concept of boolean matrices was introduced into the fuzzy membership to generate a fuzzy grid table to mine the frequent itemsets. Finally, in view of the Fast UPdate (FUP) algorithm, fuzzy temporal association rules are incrementally mined and updated without repeatedly scanning the original database by considering the lifespan of each item and inheriting the information from previous mining results. The experiments show that our algorithm provides better efficiency and interpretability in mining temporal association rules than other algorithms. 相似文献
5.
In this paper, we present an efficient computer-aided mass classification method in digitized mammograms using Association rule mining, which performs benign–malignant classification on region of interest that contains mass. One of the major mammographic characteristics for mass classification is texture. Association rule mining (ARM) exploits this important factor to classify the mass into benign or malignant. The statistical textural features used in characterizing the masses are mean, standard deviation, entropy, skewness, kurtosis and uniformity. The main aim of the method is to increase the effectiveness and efficiency of the classification process in an objective manner to reduce the numbers of false-positive of malignancies. Correlated association rule mining was proposed for classifying the marked regions into benign and malignant and 98.6% sensitivity and 97.4% specificity is achieved that is very much promising compare to the radiologist’s sensitivity 75%. 相似文献
6.
Association Rule Mining (ARM) can be considered as a combinatorial problem with the purpose of extracting the correlations between items in sizeable datasets. The numerous polynomial exact algorithms already proposed for ARM are unadapted for large databases and especially for those existing on the web. Assuming that datasets are a large space search, intelligent algorithms was used to found high quality rules and solve ARM issue. This paper deals with a cooperative multi-swarm bat algorithm for association rule mining. It is based on the bat-inspired algorithm adapted to rule discovering problem (BAT-ARM). This latter suffers from absence of communication between bats in the population which lessen the exploration of search space. However, it has a powerful rule generation process which leads to perfect local search. Therefore, to maintain a good trade-off between diversification and intensification, in our proposed approach, we introduce cooperative strategies between the swarms that already proved their efficiency in multi-swarm optimization algorithm(Ring, Master-slave). Furthermore, we innovate a new topology called Hybrid that merges Ring strategy with Master-slave plan previously developed in our earlier work [ 23]. A series of experiments are carried out on nine well known datasets in ARM field and the performance of proposed approach are evaluated and compared with those of other recently published methods. The results show a clear superiority of our proposal against its similar approaches in terms of time and rule quality. The analysis also shows a competitive outcomes in terms of quality in-face-of multi-objective optimization methods. 相似文献
8.
In recent years, manufacturing processes have become more and more complex, and meeting high-yield target expectations and quickly identifying root-cause machinesets, the most likely sources of defective products, also become essential issues. In this paper, we first define the root-cause machineset identification problem of analyzing correlations between combinations of machines and the defective products. We then propose the Root-cause Machine Identifier (RMI) method using the technique of association rule mining to solve the problem efficiently and effectively. The experimental results of real datasets show that the actual root-cause machinesets are almost ranked in the top 10 by the proposed RMI method. 相似文献
9.
We propose a stock market portfolio recommender system based on association rule mining (ARM) that analyzes stock data and suggests a ranked basket of stocks. The objective of this recommender system is to support stock market traders, individual investors and fund managers in their decisions by suggesting investment in a group of equity stocks when strong evidence of possible profit from these transactions is available.Our system is different compared to existing systems because it finds the correlation between stocks and recommends a portfolio. Existing techniques recommend buying or selling a single stock and do not recommend a portfolio.We have used the support confidence framework for generating association rules. The use of traditional ARM is infeasible because the number of association rules is exponential and finding relevant rules from this set is difficult. Therefore ARM techniques have been augmented with domain specific techniques like formation of thematical sectors, use of cross-sector and intra-sector rules to overcome the disadvantages of traditional ARM.We have implemented novel methods like using fuzzy logic and the concept of time lags to generate datasets from actual data of stock prices.Thorough experimentation has been performed on a variety of datasets like the BSE-30 sensitive Index, the S&P CNX Nifty or NSE-50, S&P CNX-100 and DOW-30 Industrial Average. We have compared the returns of our recommender system with the returns obtained from the top-5 mutual funds in India. The results of our system have surpassed the results from the mutual funds for all the datasets.Our approach demonstrates the application of soft computing techniques like ARM and fuzzy classification in the design of recommender systems. 相似文献
10.
Two parameters, namely support and confidence, in association rule mining, are used to arrange association rules in either
increasing or decreasing order. These two parameters are assigned values by counting the number of transactions satisfying
the rule without considering user perspective. Hence, an association rule, with low values of support and confidence, but
meaningful to the user, does not receive the same importance as is perceived by the user. Reflecting user perspective is of
paramount importance in light of improving user satisfaction for a given recommendation system. In this paper, we propose
a model and an algorithm to extract association rules, meaningful to a user, with an ad-hoc support and confidence by allowing
the user to specify the importance of each transaction. In addition, we apply the characteristics of a concept lattice, a
core data structure of Formal Concept Analysis (FCA) to reflect subsumption relation of association rules when assigning the
priority to each rule. Finally, we describe experiment results to verify the potential and efficiency of the proposed method. 相似文献
11.
在分析研究关联规则挖掘Apriori算法及其若干改进算法的基础上,对Apriori算法做了进一步地改进,提出一种基于条件判断的新思想.改进后的算法根据条件采用了事务压缩与候选项压缩的相结合的方式,减小了不必要的开销,从而提高了挖掘速度. 相似文献
12.
为了使传统的关联规则挖掘算法在结合到具体领域时具有更强的适应性,提出了DS-Apriori算法。该算法建立在语义本体的基础上,根据项集内部的语义相关度动态的确定该项集的最小支持度,并采用了项集语义相关度的增量计算方法。实验结果表明,DS-Apriori算法在很大程度上提高了关联规则挖掘算法的效率和效果。 相似文献
13.
Pattern Analysis and Applications - Rare association rule mining is an imperative field of data mining that attempts to identify rare correlations among the items in a database. Although numerous... 相似文献
14.
The steady growth in the size of textual document collections is a key progress-driver for modern information retrieval techniques whose effectiveness and efficiency are constantly challenged. Given a user query, the number of retrieved documents can be overwhelmingly large, hampering their efficient exploitation by the user. In addition, retaining only relevant documents in a query answer is of paramount importance for an effective meeting of the user needs. In this situation, the query expansion technique offers an interesting solution for obtaining a complete answer while preserving the quality of retained documents. This mainly relies on an accurate choice of the added terms to an initial query. Interestingly enough, query expansion takes advantage of large text volumes by extracting statistical information about index terms co-occurrences and using it to make user queries better fit the real information needs. In this respect, a promising track consists in the application of data mining methods to the extraction of dependencies between terms. In this paper, we present a novel approach for mining knowledge supporting query expansion that is based on association rules. The key feature of our approach is a better trade-off between the size of the mining result and the conveyed knowledge. Thus, our association rules mining method implements results from Galois connection theory and compact representations of rules sets in order to reduce the huge number of potentially useful associations. An experimental study has examined the application of our approach to some real collections, whereby automatic query expansion has been performed. The results of the study show a significant improvement in the performances of the information retrieval system, both in terms of recall and precision, as highlighted by the carried out significance testing using the Wilcoxon?test. 相似文献
16.
针对构建FP-Tree时存在的大量内存消耗问题,提出了CCFP(constraint clip FP-tree)算法,该算法利用有项和缺项约束对事务数据库进行修剪后构造简化的FP-Tree,经再一次扫描后得到关联规则.实验结果表明:该算法较一般的FP-Tree算法能节省大量的内存空间,同时,运行效率也略有提高. 相似文献
17.
Distributed data mining applications, such as those dealing with health care, finance, counter-terrorism and homeland defence, use sensitive data from distributed databases held by different parties. This comes into direct conflict with an individual’s need and right to privacy. In this paper, we come up with a privacy-preserving distributed association rule mining protocol based on a new semi-trusted mixer model. Our protocol can protect the privacy of each distributed database against the coalition up to n − 2 other data sites or even the mixer if the mixer does not collude with any data site. Furthermore, our protocol needs only two communications between each data site and the mixer in one round of data collection. 相似文献
18.
In sentiment analysis, a finer-grained opinion mining method not only focuses on the view of the product itself, but also focuses on product features, which can be a component or attribute of the product. Previous related research mainly relied on explicit features but ignored implicit features. However, the implicit features, which are implied by some words or phrases, are so significant that they can express the users’ opinion and help us to better understand the users’ comments. It is a big challenge to detect these implicit features in Chinese product reviews, due to the complexity of Chinese. This paper is mainly centered on implicit features identification in Chinese product reviews. A novel hybrid association rule mining method is proposed for this task. The core idea of this approach is mining as many association rules as possible via several complementary algorithms. Firstly, we extract candidate feature indicators based word segmentation, part-of-speech (POS) tagging and feature clustering, then compute the co-occurrence degree between the candidate feature indicators and the feature words using five collocation extraction algorithms. Each indicator and the corresponding feature word constitute a rule (feature indicator → feature word). The best rules in five different rule sets are chosen as the basic rules. Next, three methods are proposed to mine some possible reasonable rules from the lower co-occurrence feature indicators and non indicator words. Finally, the latest rules are used to identify implicit features and the results are compared with the previous. Experiment results demonstrate that our proposed approach is competent at the task, especially via using several expanding methods. The recall is effectively improved, suggesting that the shortcomings of the basic rules have been overcome to certain extent. Besides those high co-occurrence degree indicators, the final rules also contain uncommon rules. 相似文献
19.
Quantitative association rule (QAR) mining has been recognized an influential research problem over the last decade due to
the popularity of quantitative databases and the usefulness of association rules in real life. Unlike boolean association
rules (BARs), which only consider boolean attributes, QARs consist of quantitative attributes which contain much richer information
than the boolean attributes. However, the combination of these quantitative attributes and their value intervals always gives
rise to the generation of an explosively large number of itemsets, thereby severely degrading the mining efficiency. In this
paper, we propose an information-theoretic approach to avoid unrewarding combinations of both the attributes and their value
intervals being generated in the mining process. We study the mutual information between the attributes in a quantitative
database and devise a normalization on the mutual information to make it applicable in the context of QAR mining. To indicate
the strong informative relationships among the attributes, we construct a mutual information graph (MI graph), whose edges
are attribute pairs that have normalized mutual information no less than a predefined information threshold. We find that
the cliques in the MI graph represent a majority of the frequent itemsets. We also show that frequent itemsets that do not
form a clique in the MI graph are those whose attributes are not informatively correlated to each other. By utilizing the
cliques in the MI graph, we devise an efficient algorithm that significantly reduces the number of value intervals of the
attribute sets to be joined during the mining process. Extensive experiments show that our algorithm speeds up the mining
process by up to two orders of magnitude. Most importantly, we are able to obtain most of the high-confidence QARs, whereas
the QARs that are not returned by MIC are shown to be less interesting. 相似文献
20.
It has been well recognized that product portfolio planning has far-reaching impact on the company's business success in competition. In general, product portfolio planning involves two main stages, namely portfolio identification and portfolio evaluation and selection. The former aims to capture and understand customer needs effectively and accordingly to transform them into specifications of product offerings. The latter concerns how to determine an optimal configuration of these identified offerings with the objective of achieving best profit performance. Current research and industrial practice have mainly focused on the economic justification of a given product portfolio, whereas the portfolio identification issue has been received only limited attention. This article intends to develop explicit decision support to improve product portfolio identification by efficient knowledge discovery from past sales and product records. As one of the important applications of data mining, association rule mining lends itself to the discovery of useful patterns associated with requirement analysis enacted among customers, marketing folks, and designers. An association rule mining system (ARMS) is proposed for effective product portfolio identification. Based on a scrutiny into the product definition process, the article studies the fundamental issues underlying product portfolio identification. The ARMS differentiates the customer needs from functional requirements involved in the respective customer and functional domains. Product portfolio identification entails the identification of functional requirement clusters in conjunction with the mappings from customer needs to these clusters. While clusters of functional requirements are identified based on fuzzy clustering analysis, the mapping mechanism between the customer and functional domains is incarnated in association rules. The ARMS architecture and implementation issues are discussed in detail. An application of the proposed methodology and system in a consumer electronics company to generate a vibration motor portfolio for mobile phones is also presented. 相似文献
|