首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper introduces a new approach to a problem of data sharing among multiple parties, without disclosing the data between the parties. Our focus is data sharing among parties involved in a data mining task. We study how to share private or confidential data in the following scenario: multiple parties, each having a private data set, want to collaboratively conduct association rule mining without disclosing their private data to each other or any other parties. To tackle this demanding problem, we develop a secure protocol for multiple parties to conduct the desired computation. The solution is distributed, i.e., there is no central, trusted party having access to all the data. Instead, we define a protocol using homomorphic encryption techniques to exchange the data while keeping it private.  相似文献   

2.
Distributed data mining applications, such as those dealing with health care, finance, counter-terrorism and homeland defense, use sensitive data from distributed databases held by different parties. This comes into direct conflict with an individual's need and right to privacy. It is thus of great importance to develop adequate security techniques for protecting privacy of individual values used for data mining.  相似文献   

3.
Data mining can extract important knowledge from large data collections ut sometimes these collections are split among various parties. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. We address secure mining of association rules over horizontally partitioned data. The methods incorporate cryptographic techniques to minimize the information shared, while adding little overhead to the mining task.  相似文献   

4.
介绍了假日旅游信息数据挖掘的概念,提出了一种改进的分布式抽样关联规则挖掘算法DS-ARM,给出了算法的实现过程,并对算法性能进行了测试,利用DS-ARM算法对假日旅游者在目的地的旅游行为模式进行了研究。  相似文献   

5.
Existing parallel algorithms for association rule mining have a large inter-site communication cost or require a large amount of space to maintain the local support counts of a large number of candidate sets. This study proposes a de-clustering approach for distributed architectures, which eliminates the inter-site communication cost, for most of the influential association rule mining algorithms. To de-cluster the database into similar partitions, an efficient algorithm is developed to approximate the shortest spanning path (SSP) to link transaction data together. The SSP obtained is then used to evenly de-cluster the transaction data into subgroups. The proposed approach guarantees that all subgroups are similar to each other and to the original group. Experiment results show that data size and the number of items are the only two factors that determine the performance of de-clustering. Additionally, based on the approach, most of the influential association rule mining algorithms can be implemented in a distributed architecture to obtain a drastic increase in speed without losing any frequent itemsets. Furthermore, the data distribution in each de-clustered participant is almost the same as that of a single site, which implies that the proposed approach can be regarded as a sampling method for distributed association rule mining. Finally, the experiment results prove that the original inadequate mining results can be improved to an almost perfect level.  相似文献   

6.
In sentiment analysis, a finer-grained opinion mining method not only focuses on the view of the product itself, but also focuses on product features, which can be a component or attribute of the product. Previous related research mainly relied on explicit features but ignored implicit features. However, the implicit features, which are implied by some words or phrases, are so significant that they can express the users’ opinion and help us to better understand the users’ comments. It is a big challenge to detect these implicit features in Chinese product reviews, due to the complexity of Chinese. This paper is mainly centered on implicit features identification in Chinese product reviews. A novel hybrid association rule mining method is proposed for this task. The core idea of this approach is mining as many association rules as possible via several complementary algorithms. Firstly, we extract candidate feature indicators based word segmentation, part-of-speech (POS) tagging and feature clustering, then compute the co-occurrence degree between the candidate feature indicators and the feature words using five collocation extraction algorithms. Each indicator and the corresponding feature word constitute a rule (feature indicator → feature word). The best rules in five different rule sets are chosen as the basic rules. Next, three methods are proposed to mine some possible reasonable rules from the lower co-occurrence feature indicators and non indicator words. Finally, the latest rules are used to identify implicit features and the results are compared with the previous. Experiment results demonstrate that our proposed approach is competent at the task, especially via using several expanding methods. The recall is effectively improved, suggesting that the shortcomings of the basic rules have been overcome to certain extent. Besides those high co-occurrence degree indicators, the final rules also contain uncommon rules.  相似文献   

7.
Standard algorithms for association rule mining are based on identification of frequent itemsets. In this paper, we study how to maintain privacy in distributed mining of frequent itemsets. That is, we study how two (or more) parties can find frequent itemsets in a distributed database without revealing each party’s portion of the data to the other. The existing solution for vertically partitioned data leaks a significant amount of information, while the existing solution for horizontally partitioned data only works for three parties or more. In this paper, we design algorithms for both vertically and horizontally partitioned data, with cryptographically strong privacy. We give two algorithms for vertically partitioned data; one of them reveals only the support count and the other reveals nothing. Both of them have computational overheads linear in the number of transactions. Our algorithm for horizontally partitioned data works for two parties and above and is more efficient than the existing solution.  相似文献   

8.

Privacy preservation in distributed database is an active area of research. With the advancement of technology, massive amounts of data are continuously being collected and stored in distributed database applications. Indeed, temporal associations and correlations among items in large transactional datasets of distributed database can help in many business decision-making processes. One among them is mining frequent itemset and computing their association rules, which is a nontrivial issue. In a typical situation, multiple parties may wish to collaborate for extracting interesting global information such as frequent association, without revealing their respective data to each other. This may be particularly useful in applications such as retail market basket analysis, medical research, academic, etc. In the proposed work, we aim to find frequent items and to develop a global association rules model based on the genetic algorithm (GA). The GA is used due to its inherent features like robustness with respect to local maxima/minima and domain-independent nature for large space search technique to find exact or approximate solutions for optimization and search problems. For privacy preservation of the data, the concept of trusted third party with two offsets has been used. The data are first anonymized at local party end, and then, the aggregation and global association is done by the trusted third party. The proposed algorithms address various types of partitions such as horizontal, vertical, and arbitrary.

  相似文献   

9.
基于支持度的关联规则挖掘算法无法找到那些非频繁但效用很高的项集,基于效用的关联规则会漏掉那些效用不高但发生比较频繁、支持度和效用值的积(激励)很大的项集。提出了基于激励的关联规则挖掘问题及一种自下而上的挖掘算法HM-miner。激励综合了支持度与效用的优点,能同时度量项集的统计重要性和语义重要性。HM-miner利用激励的上界特性进行减枝,能有效挖掘高激励项集。  相似文献   

10.
11.
Data collection is a necessary step in data mining process. Due to privacy reasons, collecting data from different parties becomes difficult. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. How multiple parties collaboratively conduct data mining without breaching data privacy presents a challenge. The objective of this paper is to provide solutions for privacy-preserving collaborative data mining problems. In particular, we illustrate how to conduct privacy-preserving naive Bayesian classification which is one of the data mining tasks. To measure the privacy level for privacy- preserving schemes, we propose a definition of privacy and show that our solutions preserve data privacy.  相似文献   

12.
为了使传统的关联规则挖掘算法在结合到具体领域时具有更强的适应性,提出了DS-Apriori算法。该算法建立在语义本体的基础上,根据项集内部的语义相关度动态的确定该项集的最小支持度,并采用了项集语义相关度的增量计算方法。实验结果表明,DS-Apriori算法在很大程度上提高了关联规则挖掘算法的效率和效果。  相似文献   

13.
Pattern Analysis and Applications - Rare association rule mining is an imperative field of data mining that attempts to identify rare correlations among the items in a database. Although numerous...  相似文献   

14.
Two parameters, namely support and confidence, in association rule mining, are used to arrange association rules in either increasing or decreasing order. These two parameters are assigned values by counting the number of transactions satisfying the rule without considering user perspective. Hence, an association rule, with low values of support and confidence, but meaningful to the user, does not receive the same importance as is perceived by the user. Reflecting user perspective is of paramount importance in light of improving user satisfaction for a given recommendation system. In this paper, we propose a model and an algorithm to extract association rules, meaningful to a user, with an ad-hoc support and confidence by allowing the user to specify the importance of each transaction. In addition, we apply the characteristics of a concept lattice, a core data structure of Formal Concept Analysis (FCA) to reflect subsumption relation of association rules when assigning the priority to each rule. Finally, we describe experiment results to verify the potential and efficiency of the proposed method.  相似文献   

15.
关联规则挖掘Apriori算法的改进   总被引:3,自引:0,他引:3  
在分析研究关联规则挖掘Apriori算法及其若干改进算法的基础上,对Apriori算法做了进一步地改进,提出一种基于条件判断的新思想.改进后的算法根据条件采用了事务压缩与候选项压缩的相结合的方式,减小了不必要的开销,从而提高了挖掘速度.  相似文献   

16.
分布式数据库关联规则的安全挖掘算法研究   总被引:1,自引:0,他引:1  
分布式环境中,进行分布式数据库关联规则的挖掘而不泄露用户的隐私,是非常重要的问题.提出了分布式数据库的关联规则的安全挖掘算法PPDMA(Privacy Preserving Distributed Mining Algorithms),通过应用密码学方法对站点间传送的用于挖掘全局频繁项集的被约束子树及其它信息进行加密,而在接受站点对加密信息进行解密,达到不披露用户信息,起到保护用户隐私的作用,以进行关联规则的安全挖掘.分析表明,该算法是正确可行的.  相似文献   

17.
Tree-based partitioning of date for association rule mining   总被引:1,自引:1,他引:0  
The most computationally demanding aspect of Association Rule Mining is the identification and counting of support of the frequent sets of items that occur together sufficiently often to be the basis of potentially interesting rules. The task increases in difficulty with the scale of the data and also with its density. The greatest challenge is posed by data that is too large to be contained in primary memory, especially when high data density and/or low support thresholds give rise to very large numbers of candidates that must be counted. In this paper, we consider strategies for partitioning the data to deal effectively with such cases. We describe a partitioning approach which organises the data into tree structures that can be processed independently. We present experimental results that show the method scales well for increasing dimensions of data and performs significantly better than alternatives, especially when dealing with dense data and low support thresholds. Shakil Ahmed received a first class BSc (Hons) degree from Dhaka University, Bangladesh, in 1990; and an MSc (first class), also Dhaka University, in 1992. He received his PhD from The University of Liverpool, UK, in 2005. From 2000 onwards he is a member of the Data Mining Group at the Department of Computer Science of the University of Liverpool, UK. His research interests include data mining, Association Rule Mining and pattern recognition. Frans Coenen has been working in the field of Data Mining for many years and has written widely on the subject. He received his PhD from Liverpool Polytechnic in 1989, after which he took up a post as a RA within the Department of Computer Science at the University of Liverpool. In 1997, he took up a lecturing post within the same department. His current Data Mining research interests include Association rule Mining, Classification algorithms and text mining. He is on the programme committee for ICDM'05 and was the chair for the UK KDD symposium (UKKDD'05). Paul Leng is professor of e-Learning at the University of Liverpool and director of the e-Learning Unit, which is responsible for overseeing the University's online degree programmes, leading to degrees of MSc in IT and MBA. Along with e-Learning, his main research interests are in Data Mining, especially in methods of discovering Association Rules. In collaboration with Frans Coenen, he has developed efficient new algorithms for finding frequent sets and is exploring applications in text mining and classification.  相似文献   

18.
随着旅游业的发展,从海量旅行数据中挖掘旅客类型和环境因素之间内在的、隐含的相关性,是分析旅游市场状况、预测对相关行业影响的一种有效方法。结合旅行数据特点,并针对现有约束方法的局限性,提出一种基于关系延展路径约束的关联规则并行挖掘算法。该算法有效结合MapReduce并行机制,在关系延展路径约束下生成事务集,提升后续并行效率;同时利用并行方法改进Apriori算法的逐层搜索,带来“二次”效率提升,从而更好更快地把握旅游业发展动态,调整旅游业宏观政策。  相似文献   

19.
针对构建FP-Tree时存在的大量内存消耗问题,提出了CCFP(constraint clip FP-tree)算法,该算法利用有项和缺项约束对事务数据库进行修剪后构造简化的FP-Tree,经再一次扫描后得到关联规则.实验结果表明:该算法较一般的FP-Tree算法能节省大量的内存空间,同时,运行效率也略有提高.  相似文献   

20.
关联规则挖掘是数据挖掘问题中一个典型任务。其挖掘响应时间是数据挖掘系统中重要的问题之一。为了高效解决这一问题,给出了关联规则实视图的概念以及相应的代价模型;提出了针对数据挖掘环境的实视图选择算法,以便在存储空间约束的条件下,取得较好的查询性能。实验结果表明,该算法能有效地选取实视图,从而大大提高关联规则挖掘算法的效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号