共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper introduces a new approach to a problem of data sharing among multiple parties, without disclosing the data between the parties. Our focus is data sharing among parties involved in a data mining task. We study how to share private or confidential data in the following scenario: multiple parties, each having a private data set, want to collaboratively conduct association rule mining without disclosing their private data to each other or any other parties. To tackle this demanding problem, we develop a secure protocol for multiple parties to conduct the desired computation. The solution is distributed, i.e., there is no central, trusted party having access to all the data. Instead, we define a protocol using homomorphic encryption techniques to exchange the data while keeping it private. 相似文献
2.
Quantitative association rule (QAR) mining has been recognized an influential research problem over the last decade due to
the popularity of quantitative databases and the usefulness of association rules in real life. Unlike boolean association
rules (BARs), which only consider boolean attributes, QARs consist of quantitative attributes which contain much richer information
than the boolean attributes. However, the combination of these quantitative attributes and their value intervals always gives
rise to the generation of an explosively large number of itemsets, thereby severely degrading the mining efficiency. In this
paper, we propose an information-theoretic approach to avoid unrewarding combinations of both the attributes and their value
intervals being generated in the mining process. We study the mutual information between the attributes in a quantitative
database and devise a normalization on the mutual information to make it applicable in the context of QAR mining. To indicate
the strong informative relationships among the attributes, we construct a mutual information graph (MI graph), whose edges
are attribute pairs that have normalized mutual information no less than a predefined information threshold. We find that
the cliques in the MI graph represent a majority of the frequent itemsets. We also show that frequent itemsets that do not
form a clique in the MI graph are those whose attributes are not informatively correlated to each other. By utilizing the
cliques in the MI graph, we devise an efficient algorithm that significantly reduces the number of value intervals of the
attribute sets to be joined during the mining process. Extensive experiments show that our algorithm speeds up the mining
process by up to two orders of magnitude. Most importantly, we are able to obtain most of the high-confidence QARs, whereas
the QARs that are not returned by MIC are shown to be less interesting. 相似文献
4.
Identifying irregular file system permissions in large, multi-user systems is challenging due to the complexity of gaining structural understanding from large volumes of permission information. This challenge is exacerbated when file systems permissions are allocated in an ad-hoc manner when new access rights are required, and when access rights become redundant as users change job roles or terminate employment. These factors make it challenging to identify what can be classed as an irregular file system permission, as well as identifying if they are irregular and exposing a vulnerability. The current way of finding such irregularities is by performing an exhaustive audit of the permission distribution; however, this requires expert knowledge and a significant amount of time. In this paper a novel method of modelling file system permissions which can be used by association rule mining techniques to identify irregular permissions is presented. This results in the creation of object-centric model as a by-product. This technique is then implemented and tested on Microsoft’s New Technology File System permissions (NTFS). Empirical observations are derived by making comparisons with expert knowledge to determine the effectiveness of the proposed technique on five diverse real-world directory structures extracted from different organisations. The results demonstrate that the technique is able to correctly identify irregularities with an average accuracy rate of 91%, minimising the reliance on expert knowledge. Experiments are also performed on synthetic directory structures which demonstrate an accuracy rate of 95% when the number of irregular permissions constitutes 1% of the total number. This is a significant contribution as it creates the possibility of identifying vulnerabilities without prior knowledge of how to file systems permissions are implemented within a directory structure. 相似文献
5.
Recommendation systems have been investigated and implemented in many ways. In particular, in the case of a collaborative filtering system, the most important issue is how to manipulate the personalized recommendation results for better user understandability and satisfaction. A collaborative filtering system predicts items of interest for users based on predictive relationships discovered between each item and others. This paper proposes a categorization for grouping associative items discovered by mining, for the purpose of improving the accuracy and performance of item-based collaborative filtering. It is possible that, if an associative item is required to be simultaneously associated with all other groups in which it occurs, the proposed method can collect associative items into relevant groups. In addition, the proposed method can result in improved predictive performance under circumstances of sparse data and cold-start initiation of collaborative filtering starting from a small number of items. In addition, this method can increase prediction accuracy and scalability because it removes the noise generated by ratings on items of dissimilar content or level of interest. The approach is empirically evaluated by comparison with k-means, average link, and robust, using the MovieLens dataset. The method was found to outperform existing methods significantly. 相似文献
6.
Collaborative Filtering (CF) is a popular method for personalizing product recommendations for e-Commerce and customer relationship management (CRM). CF utilizes the explicit or implicit product evaluation ratings of customers to develop personalized recommendations. However, there has been no in-depth investigation of the parameters of CF in relation to the number of ratings on the part of an individual customer and the total number of ratings for an item. We empirically investigated the relationships between these two parameters and CF performance, using two publicly available data sets, EachMovie and MovieLens. We conducted three experiments. The first two investigated the relationship between a particular customer’s number of ratings and CF recommendation performance. The third experiment evaluated the relationship between the total number of ratings for a particular item and CF recommendation performance. We found that there are ratings thresholds below which recommendation performance increases monotonically, i.e., when the numbers of customer and item ratings are below threshold levels, CF recommendation performance is affected. In addition, once rating numbers surpass threshold levels, the value of each rating decreases. These results may facilitate operational decisions when applying CF in practice. 相似文献
7.
World Wide Web - To solve the problem that users’ retrieval intentions are seldom considered by personalized websites, we propose an improved incremental collaborative filtering (CF)-based... 相似文献
8.
Association Rule Mining (ARM) can be considered as a combinatorial problem with the purpose of extracting the correlations between items in sizeable datasets. The numerous polynomial exact algorithms already proposed for ARM are unadapted for large databases and especially for those existing on the web. Assuming that datasets are a large space search, intelligent algorithms was used to found high quality rules and solve ARM issue. This paper deals with a cooperative multi-swarm bat algorithm for association rule mining. It is based on the bat-inspired algorithm adapted to rule discovering problem (BAT-ARM). This latter suffers from absence of communication between bats in the population which lessen the exploration of search space. However, it has a powerful rule generation process which leads to perfect local search. Therefore, to maintain a good trade-off between diversification and intensification, in our proposed approach, we introduce cooperative strategies between the swarms that already proved their efficiency in multi-swarm optimization algorithm(Ring, Master-slave). Furthermore, we innovate a new topology called Hybrid that merges Ring strategy with Master-slave plan previously developed in our earlier work [ 23]. A series of experiments are carried out on nine well known datasets in ARM field and the performance of proposed approach are evaluated and compared with those of other recently published methods. The results show a clear superiority of our proposal against its similar approaches in terms of time and rule quality. The analysis also shows a competitive outcomes in terms of quality in-face-of multi-objective optimization methods. 相似文献
9.
This paper proposes a methodology for text mining relying on the classical knowledge discovery loop, with a number of adaptations.
First, texts are indexed and prepared to be processed by frequent itemset levelwise search. Association rules are then extracted
and interpreted, with respect to a set of quality measures and domain knowledge, under the control of an analyst. The article
includes an experimentation on a real-world text corpus holding on molecular biology. 相似文献
11.
Knowledge is a critical resource that organizations use to gain and maintain competitive advantages. In the constantly changing business environment, organizations must exploit effective and efficient methods of preserving, sharing and reusing knowledge in order to help knowledge workers find task-relevant information. Hence, an important issue is how to discover and model the knowledge flow (KF) of workers from their historical work records. The objectives of a knowledge flow model are to understand knowledge workers’ task-needs and the ways they reference documents, and then provide adaptive knowledge support. This work proposes hybrid recommendation methods based on the knowledge flow model, which integrates KF mining, sequential rule mining and collaborative filtering techniques to recommend codified knowledge. These KF-based recommendation methods involve two phases: a KF mining phase and a KF-based recommendation phase. The KF mining phase identifies each worker’s knowledge flow by analyzing his/her knowledge referencing behavior (information needs), while the KF-based recommendation phase utilizes the proposed hybrid methods to proactively provide relevant codified knowledge for the worker. Therefore, the proposed methods use workers’ preferences for codified knowledge as well as their knowledge referencing behavior to predict their topics of interest and recommend task-related knowledge. Using data collected from a research institute laboratory, experiments are conducted to evaluate the performance of the proposed hybrid methods and compare them with the traditional CF method. The results of experiments demonstrate that utilizing the document preferences and knowledge referencing behavior of workers can effectively improve the quality of recommendations and facilitate efficient knowledge sharing. 相似文献
12.
介绍了关联规则的常用理论,研究了关联规则中的标准Apriori算法,针对其不足进行了有益的改进,提出了一种新的加权关联规则挖掘算法,并分析了其主要特点。通过把该算法用于电子商务数据挖掘中,并与标准Apriori算法的对比分析,证明了这种新的加权关联规则挖掘算法的有效性。 相似文献
13.
In this paper, we present an efficient computer-aided mass classification method in digitized mammograms using Association rule mining, which performs benign–malignant classification on region of interest that contains mass. One of the major mammographic characteristics for mass classification is texture. Association rule mining (ARM) exploits this important factor to classify the mass into benign or malignant. The statistical textural features used in characterizing the masses are mean, standard deviation, entropy, skewness, kurtosis and uniformity. The main aim of the method is to increase the effectiveness and efficiency of the classification process in an objective manner to reduce the numbers of false-positive of malignancies. Correlated association rule mining was proposed for classifying the marked regions into benign and malignant and 98.6% sensitivity and 97.4% specificity is achieved that is very much promising compare to the radiologist’s sensitivity 75%. 相似文献
14.
Traditional temporal association rules mining algorithms cannot dynamically update the temporal association rules within the valid time interval with increasing data. In this paper, a new algorithm called incremental fuzzy temporal association rule mining using fuzzy grid table (IFTARMFGT) is proposed by combining the advantages of boolean matrix with incremental mining. First, multivariate time series data are transformed into discrete fuzzy values that contain the time intervals and fuzzy membership. Second, in order to improve the mining efficiency, the concept of boolean matrices was introduced into the fuzzy membership to generate a fuzzy grid table to mine the frequent itemsets. Finally, in view of the Fast UPdate (FUP) algorithm, fuzzy temporal association rules are incrementally mined and updated without repeatedly scanning the original database by considering the lifespan of each item and inheriting the information from previous mining results. The experiments show that our algorithm provides better efficiency and interpretability in mining temporal association rules than other algorithms. 相似文献
16.
随着旅游业的发展,从海量旅行数据中挖掘旅客类型和环境因素之间内在的、隐含的相关性,是分析旅游市场状况、预测对相关行业影响的一种有效方法。结合旅行数据特点,并针对现有约束方法的局限性,提出一种基于关系延展路径约束的关联规则并行挖掘算法。该算法有效结合MapReduce并行机制,在关系延展路径约束下生成事务集,提升后续并行效率;同时利用并行方法改进Apriori算法的逐层搜索,带来“二次”效率提升,从而更好更快地把握旅游业发展动态,调整旅游业宏观政策。 相似文献
17.
The most computationally demanding aspect of Association Rule Mining is the identification and counting of support of the frequent sets of items that occur together sufficiently often to be the basis of potentially interesting rules. The task increases in difficulty with the scale of the data and also with its density. The greatest challenge is posed by data that is too large to be contained in primary memory, especially when high data density and/or low support thresholds give rise to very large numbers of candidates that must be counted. In this paper, we consider strategies for partitioning the data to deal effectively with such cases. We describe a partitioning approach which organises the data into tree structures that can be processed independently. We present experimental results that show the method scales well for increasing dimensions of data and performs significantly better than alternatives, especially when dealing with dense data and low support thresholds.
Shakil Ahmed received a first class BSc (Hons) degree from Dhaka University, Bangladesh, in 1990; and an MSc (first class), also Dhaka University, in 1992. He received his PhD from The University of Liverpool, UK, in 2005. From 2000 onwards he is a member of the Data Mining Group at the Department of Computer Science of the University of Liverpool, UK. His research interests include data mining, Association Rule Mining and pattern recognition.
Frans Coenen has been working in the field of Data Mining for many years and has written widely on the subject. He received his PhD from Liverpool Polytechnic in 1989, after which he took up a post as a RA within the Department of Computer Science at the University of Liverpool. In 1997, he took up a lecturing post within the same department. His current Data Mining research interests include Association rule Mining, Classification algorithms and text mining. He is on the programme committee for ICDM'05 and was the chair for the UK KDD symposium (UKKDD'05).
Paul Leng is professor of e-Learning at the University of Liverpool and director of the e-Learning Unit, which is responsible for overseeing the University's online degree programmes, leading to degrees of MSc in IT and MBA. Along with e-Learning, his main research interests are in Data Mining, especially in methods of discovering Association Rules. In collaboration with Frans Coenen, he has developed efficient new algorithms for finding frequent sets and is exploring applications in text mining and classification. 相似文献
18.
In the area of association rule mining, most previous research had focused on improving computational efficiency. However, determination of the threshold values of support and confidence, which seriously affect the quality of association rule mining, is still under investigation. Thus, this study intends to propose a novel algorithm for association rule mining in order to improve computational efficiency as well as to automatically determine suitable threshold values. The particle swarm optimization algorithm first searches for the optimum fitness value of each particle and then finds corresponding support and confidence as minimal threshold values after the data are transformed into binary values. The proposed method is verified by applying the FoodMart2000 database of Microsoft SQL Server 2000 and compared with a genetic algorithm. The results indicate that the particle swarm optimization algorithm really can suggest suitable threshold values and obtain quality rules. In addition, a real-world stock market database is employed to mine association rules to measure investment behavior and stock category purchasing. The computational results are also very promising. 相似文献
19.
The paper focuses on the adaptive relational association rule mining problem. Relational association rules represent a particular type of association rules which describe frequent relations that occur between the features characterizing the instances within a data set. We aim at re-mining an object set, previously mined, when the feature set characterizing the objects increases. An adaptive relational association rule method, based on the discovery of interesting relational association rules, is proposed. This method, called ARARM ( Adaptive Relational Association Rule Mining) adapts the set of rules that was established by mining the data before the feature set changed, preserving the completeness. We aim to reach the result more efficiently than running the mining algorithm again from scratch on the feature-extended object set. Experiments testing the method's performance on several case studies are also reported. The obtained results highlight the efficiency of the ARARM method and confirm the potential of our proposal. 相似文献
20.
In recent years, manufacturing processes have become more and more complex, and meeting high-yield target expectations and quickly identifying root-cause machinesets, the most likely sources of defective products, also become essential issues. In this paper, we first define the root-cause machineset identification problem of analyzing correlations between combinations of machines and the defective products. We then propose the Root-cause Machine Identifier (RMI) method using the technique of association rule mining to solve the problem efficiently and effectively. The experimental results of real datasets show that the actual root-cause machinesets are almost ranked in the top 10 by the proposed RMI method. 相似文献
|