首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Finding the nearest k objects to a query object is a fundamental operation for many data mining algorithms. With the recent interest in privacy, it is not surprising that there is strong interest in k-NN queries to enable clustering, classification and outlier-detection tasks. However, previous approaches to privacy-preserving k-NN have been costly and can only be realistically applied to small data sets. In this paper, we provide efficient solutions for k-NN queries for vertically partitioned data. We provide the first solution for the L (or Chessboard) metric as well as detailed privacy-preserving computation of all other Minkowski metrics. We enable privacy-preserving L by providing a practical approach to the Yao’s millionaires problem with more than two parties. This is based on a pragmatic and implementable solution to Yao’s millionaires problem with shares. We also provide privacy-preserving algorithms for combinations of local metrics into a global metric that handles the large dimensionality and diversity of attributes common in vertically partitioned data. To manage very large data sets, we provide a privacy-preserving SASH (a very successful data structure for associative queries in high dimensions). Besides providing a theoretical analysis, we illustrate the efficiency of our approach with an empirical evaluation.  相似文献   

2.
Two methods for privacy preserving data mining with malicious participants   总被引:1,自引:0,他引:1  
Privacy preserving data mining addresses the need of multiple parties with private inputs to run a data mining algorithm and learn the results over the combined data without revealing any unnecessary information. Most of the existing cryptographic solutions to privacy-preserving data mining assume semi-honest participants. In theory, these solutions can be extended to the malicious model using standard techniques like commitment schemes and zero-knowledge proofs. However, these techniques are often expensive, especially when the data sizes are large. In this paper, we investigate alternative ways to convert solutions in the semi-honest model to the malicious model. We take two classical solutions as examples, one of which can be extended to the malicious model with only slight modifications while another requires a careful redesign of the protocol. In both cases, our solutions for the malicious model are much more efficient than the zero-knowledge proofs based solutions.  相似文献   

3.
A secure scalar product protocol is a type of specific secure multi-party computation problem.Using this kind of protocol,two involved parties are able to jointly compute the scalar product of their private vectors,but no party will reveal any information about his/her private vector to another one.The secure scalar product protocol is of great importance in many privacy-preserving applications such as privacy-preserving data mining,privacy-preserving cooperative statistical analysis,and privacy-preserving geometry computation.In this paper,we give an efficient and secure scalar product protocol in the presence of malicious adversaries based on two important tools:the proof of knowledge of a discrete logarithm and the verifiable encryption.The security of the new protocol is proved under the standard simulation-based definitions.Compared with the existing schemes,our scheme offers higher efficiency because of avoiding inefficient cut-and-choose proofs.  相似文献   

4.
Distributed data mining applications, such as those dealing with health care, finance, counter-terrorism and homeland defence, use sensitive data from distributed databases held by different parties. This comes into direct conflict with an individual’s need and right to privacy. In this paper, we come up with a privacy-preserving distributed association rule mining protocol based on a new semi-trusted mixer model. Our protocol can protect the privacy of each distributed database against the coalition up to n  2 other data sites or even the mixer if the mixer does not collude with any data site. Furthermore, our protocol needs only two communications between each data site and the mixer in one round of data collection.  相似文献   

5.
Data integration methods enable different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining data from different sources could potentially reveal person-specific sensitive information. In VLDBJ 2006, Jiang and Clifton (Very Large Data Bases J (VLDBJ) 15(4):316–333, 2006) propose a secure Distributed k-Anonymity (DkA) framework for integrating two private data tables to a k-anonymous table in which each private table is a vertical partition on the same set of records. Their proposed DkA framework is not scalable to large data sets. Moreover, DkA is limited to a two-party scenario and the parties are assumed to be semi-honest. In this paper, we propose two algorithms to securely integrate private data from multiple parties (data providers). Our first algorithm achieves the k-anonymity privacy model in a semi-honest adversary model. Our second algorithm employs a game-theoretic approach to thwart malicious participants and to ensure fair and honest participation of multiple data providers in the data integration process. Moreover, we study and resolve a real-life privacy problem in data sharing for the financial industry in Sweden. Experiments on the real-life data demonstrate that our proposed algorithms can effectively retain the essential information in anonymous data for data analysis and are scalable for anonymizing large data sets.  相似文献   

6.
In this paper, we define a class of special two-party private summation (S2PPS) problems and present a common quantum solution to S2PPS problems. Compared to related classical solutions, our solution has advantages of higher security and lower communication complexity, and especially it can ensure the fairness of two parties without the help of a third party. Furthermore, we investigate the practical applications of our proposed S2PPS protocol in many privacy-preserving settings with big data sets, including private similarity decision, anonymous authentication, social networks, secure trade negotiation, secure data mining.  相似文献   

7.
In this paper, a novel multi-party quantum private comparison protocol with a semi-honest third party (TP) is proposed based on the entanglement swapping of d-level cat states and d-level Bell states. Here, TP is allowed to misbehave on his own, but will not conspire with any party. In our protocol, n parties employ unitary operations to encode their private secrets and can compare the equality of their private secrets within one time execution of the protocol. Our protocol can withstand both the outside attacks and the participant attacks on the condition that none of the QKD methods is adopted to generate keys for security. One party cannot obtain other parties’ secrets except for the case that their secrets are identical. The semi-honest TP cannot learn any information about these parties’ secrets except the end comparison result on whether all private secrets from n parties are equal.  相似文献   

8.
Data collection is a necessary step in data mining process. Due to privacy reasons, collecting data from different parties becomes difficult. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. How multiple parties collaboratively conduct data mining without breaching data privacy presents a challenge. The objective of this paper is to provide solutions for privacy-preserving collaborative data mining problems. In particular, we illustrate how to conduct privacy-preserving naive Bayesian classification which is one of the data mining tasks. To measure the privacy level for privacy- preserving schemes, we propose a definition of privacy and show that our solutions preserve data privacy.  相似文献   

9.
This paper presents a clustering (Clustering partitions record into clusters such that records within a cluster are similar to each other, while records in different clusters are most distinct from one another.) based k-anonymization technique to minimize the information loss while at the same time assuring data quality. Privacy preservation of individuals has drawn considerable interests in data mining research. The k-anonymity model proposed by Samarati and Sweeney is a practical approach for data privacy preservation and has been studied extensively for the last few years. Anonymization methods via generalization or suppression are able to protect private information, but lose valued information. The challenge is how to minimize the information loss during the anonymization process. We refer to the challenge as a systematic clustering problem for k-anonymization which is analysed in this paper. The proposed technique adopts group-similar data together and then anonymizes each group individually. The structure of systematic clustering problem is defined and investigated through paradigm and properties. An algorithm of the proposed problem is developed and shown that the time complexity is in O(\fracn2k){O(\frac{n^{2}}{k})}, where n is the total number of records containing individuals concerning their privacy. Experimental results show that our method attains a reasonable dominance with respect to both information loss and execution time. Finally the algorithm illustrates the usability for incremental datasets.  相似文献   

10.
Gao  Jiu-Ru  Chen  Wei  Xu  Jia-Jie  Liu  An  Li  Zhi-Xu  Yin  Hongzhi  Zhao  Lei 《计算机科学技术学报》2019,34(6):1185-1202

With the popularity of storing large data graph in cloud, the emergence of subgraph pattern matching on a remote cloud has been inspired. Typically, subgraph pattern matching is defined in terms of subgraph isomorphism, which is an NP-complete problem and sometimes too strict to find useful matches in certain applications. And how to protect the privacy of data graphs in subgraph pattern matching without undermining matching results is an important concern. Thus, we propose a novel framework to achieve the privacy-preserving subgraph pattern matching in cloud. In order to protect the structural privacy in data graphs, we firstly develop a k-automorphism model based method. Additionally, we use a cost-model based label generalization method to protect label privacy in both data graphs and pattern graphs. During the generation of the k-automorphic graph, a large number of noise edges or vertices might be introduced to the original data graph. Thus, we use the outsourced graph, which is only a subset of a k-automorphic graph, to answer the subgraph pattern matching. The efficiency of the pattern matching process can be greatly improved in this way. Extensive experiments on real-world datasets demonstrate the high efficiency of our framework.

  相似文献   

11.
The similarity join has become an important database primitive for supporting similarity searches and data mining. A similarity join combines two sets of complex objects such that the result contains all pairs of similar objects. Two types of the similarity join are well-known, the distance range join, in which the user defines a distance threshold for the join, and the closest pair query or k-distance join, which retrieves the k most similar pairs. In this paper, we propose an important, third similarity join operation called the k-nearest neighbour join, which combines each point of one point set with its k nearest neighbours in the other set. We discover that many standard algorithms of Knowledge Discovery in Databases (KDD) such as k-means and k-medoid clustering, nearest neighbour classification, data cleansing, postprocessing of sampling-based data mining, etc. can be implemented on top of the k-nn join operation to achieve performance improvements without affecting the quality of the result of these algorithms. We propose a new algorithm to compute the k-nearest neighbour join using the multipage index (MuX), a specialised index structure for the similarity join. To reduce both CPU and I/O costs, we develop optimal loading and processing strategies.  相似文献   

12.
k-Anonymity is a privacy preserving method for limiting disclosure of private information in data mining. The process of anonymizing a database table typically involves generalizing table entries and, consequently, it incurs loss of relevant information. This motivates the search for anonymization algorithms that achieve the required level of anonymization while incurring a minimal loss of information. The problem of k-anonymization with minimal loss of information is NP-hard. We present a practical approximation algorithm that enables solving the k-anonymization problem with an approximation guarantee of O(ln k). That algorithm improves an algorithm due to Aggarwal et al. (Proceedings of the international conference on database theory (ICDT), 2005) that offers an approximation guarantee of O(k), and generalizes that of Park and Shim (SIGMOD ’07: proceedings of the 2007 ACM SIGMOD international conference on management of data, 2007) that was limited to the case of generalization by suppression. Our algorithm uses techniques that we introduce herein for mining closed frequent generalized records. Our experiments show that the significance of our algorithm is not limited only to the theory of k-anonymization. The proposed algorithm achieves lower information losses than the leading approximation algorithm, as well as the leading heuristic algorithms. A modified version of our algorithm that issues -diverse k-anonymizations also achieves lower information losses than the corresponding modified versions of the leading algorithms.  相似文献   

13.
The increasing availability of personal data of a sequential nature, such as time-stamped transaction or location data, enables increasingly sophisticated sequential pattern mining techniques. However, privacy is at risk if it is possible to reconstruct the identity of individuals from sequential data. Therefore, it is important to develop privacy-preserving techniques that support publishing of really anonymous data, without altering the analysis results significantly. In this paper we propose to apply the Privacy-by-design paradigm for designing a technological framework to counter the threats of undesirable, unlawful effects of privacy violation on sequence data, without obstructing the knowledge discovery opportunities of data mining technologies. First, we introduce a k-anonymity framework for sequence data, by defining the sequence linking attack model and its associated countermeasure, a k-anonymity notion for sequence datasets, which provides a formal protection against the attack. Second, we instantiate this framework and provide a specific method for constructing the k-anonymous version of a sequence dataset, which preserves the results of sequential pattern mining, together with several basic statistics and other analytical properties of the original data, including the clustering structure. A comprehensive experimental study on realistic datasets of process-logs, web-logs and GPS tracks is carried out, which empirically shows how, in our proposed method, the protection of privacy meets analytical utility.  相似文献   

14.
在分布式环境下,实现隐私保护的数据挖掘,已成为该领域的研究热点。文中着重研究在垂直分布数据中,实现隐私保护的决策树分类模型。该模型创建新型的隐私保护决策树,即由在茫然半诚实方存储的全局决策表和各站点存储的局部决策树组成,并结合索引数组和秘密数据比较协议,实现在不泄漏原始信息的前提下决策树的生成和分类。经过理论分析和实验验证,证明该模型具有较好的安全性、准确性和适用性。  相似文献   

15.
Traditionally, many data mining techniques have been designed in the centralized model in which all data is collected and available in one central site. However, as more and more activities are carried out using computers and computer networks, the amount of potentially sensitive data stored by business, governments, and other parties increases. Different parties often wish to benefit from cooperative use of their data, but privacy regulations and other privacy concerns may prevent the parties from sharing their data. Privacy-preserving data mining provides a solution by creating distributed data mining algorithms in which the underlying data need not be revealed. In this paper, we present privacy-preserving protocols for a particular data mining task: learning a Bayesian network from a database vertically partitioned among two parties. In this setting, two parties owning confidential databases wish to learn the Bayesian network on the combination of their databases without revealing anything else about their data to each other. We present an efficient and privacy-preserving protocol to construct a Bayesian network on the parties' joint data.  相似文献   

16.
We present a cryptographically t‐private protocol for electronic auctions whose low resource demands make it viable for practical use. Our construction is based on Yao's garbled circuits and pseudorandom number generators (PRNGs). Our protocol involves a field of (t + 1)2 parties for the generation of the garbled circuit and permits an arbitrary large number of bidders. The computational requirements are low: Only t + 1 parties of the field have to use the PRNG, the remaining parties execute only primitive computations (XOR, permutations and sharing). The bidders have to stay active for one round of communication, independent of each other. Each bidder has to compute only t + 1 XOR‐operations. We present an implementation and evaluate its performance. The observed running time of our protocol is linear in the size of the auction circuit and the number of bidders and, as expected, grows quadratically in the parameter t. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

17.
Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.Received: 23 December 2003, Accepted: 31 March 2004, Published online: 12 August 2004Edited by: S. AbiteboulExtended version of the paper published in the Proceedings of the 29th International Conference on Very Large Databases, VLDB 2003, Berlin, Germany, pp 754-765  相似文献   

18.
Federated learning (FL) was created with the intention of enabling collaborative training of models without the need for direct data exchange. However, data leakage remains an issue in FL. Multi-Key Fully Homomorphic Encryption (MKFHE) is a promising technique that allows computations on ciphertexts encrypted by different parties. MKFHE’s aptitude to handle multi-party data makes it an ideal tool for implementing privacy-preserving federated learning.We present a multi-hop MKFHE with compact ciphertext. MKFHE allows computations on data encrypted by different parties. In MKFHE, the compact ciphertext means that the size of the ciphertext is independent of the number of parties. The multi-hop property means that parties can dynamically join the homomorphic computation at any time. Prior MKFHE schemes were limited by their inability to combine these desirable properties. To address this limitation, we propose a multi-hop MKFHE scheme with compact ciphertext based on the random sample common reference string(CRS). We construct our scheme based on the residue number system (RNS) variant CKKS17 scheme, which enables efficient homomorphic computation over complex numbers due to the RNS representations of numbers.We construct a round efficient privacy-preserving federated learning based on our multi-hop MKFHE. In FL, there is always the possibility that some clients may drop out during the computation. Previous HE-based FL methods did not address this issue. However, our approach takes advantage of multi-hop MKFHE that users can join dynamically and constructs an efficient federated learning scheme that reduces interactions between parties. Compared to other HE-based methods, our approach reduces the number of interactions during a round from 3 to 2. Furthermore, in situations where some users fail, we are able to reduce the number of interactions from 3 to just 1.  相似文献   

19.
王洪亚  杨利宏  刘晓强 《软件学报》2016,27(12):3051-3066
相似连接算法在数据清理、数据集成和重复网页检测等领域有着广泛的应用.现有相似连接算法有两种类型:基于相似度阈值的相似连接和Top-k相似连接.Top-k连接算法非常适合于相似度阈值未知的应用场景,目前最为有效的Top-k相似连接算法是Xiao等人提出的Topk-join.为了解决Topk-join中存在的性能问题,提出了一种Top-k相似连接算法Opt-join,该算法将Token批处理技术集成在现有的事件驱动框架中,以降低前缀事件的处理代价;通过置换哈希查找与过滤操作的执行位置来降低哈希查找代价,并理论证明了该置换的正确性.实验结果表明:与Topk-join算法相比,Opt-join取得了1.28倍~3.09倍的性能提升.实验数据还显示:随着数据长度的增加或k值的增长,Opt-join的性能优势有不断增加的趋势.  相似文献   

20.
为了解决分布式环境中多个参与方在不共享各自隐私数据的情况下完成全局属性约简计算的问题,提出了一种水平划分多决策表下基于相对粒度的隐私保护属性约简算法。该算法基于相对粒度约简理论实现了分布式环境下全局属性约简的求解,利用半可信第三方与安全多方基础协议,设计了安全多方计算相对粒度协议,使各参与方在不共享其隐私信息的前提下达到集中式属性约简的效果。分析结果表明,该算法是有效可行的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号