共查询到20条相似文献,搜索用时 15 毫秒
1.
Hillol Kargupta Souptik Datta Qi Wang Krishnamoorthy Sivakumar 《Knowledge and Information Systems》2005,7(4):387-414
Privacy is becoming an increasingly important issue in many data-mining applications. This has triggered the development of many privacy-preserving data-mining techniques. A large fraction of them use randomized data-distortion techniques to mask the data for preserving the privacy of sensitive data. This methodology attempts to hide the sensitive data by randomly modifying the data values often using additive noise. This paper questions the utility of the random-value distortion technique in privacy preservation. The paper first notes that random matrices have predictable structures in the spectral domain and then it develops a random matrix-based spectral-filtering technique to retrieve original data from the dataset distorted by adding random values. The proposed method works by comparing the spectrum generated from the observed data with that of random matrices. This paper presents the theoretical foundation and extensive experimental results to demonstrate that, in many cases, random-data distortion preserves very little data privacy. The analytical framework presented in this paper also points out several possible avenues for the development of new privacy-preserving data-mining techniques. Examples include algorithms that explicitly guard against privacy breaches through linear transformations, exploiting multiplicative and colored noise for preserving privacy in data mining applications. 相似文献
2.
Shipra Agrawal Jayant R. Haritsa B. Aditya Prakash 《Data mining and knowledge discovery》2009,18(1):101-139
To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of individual
data records have been proposed recently. In this paper, we present FRAPP, a generalized matrix-theoretic framework of random
perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining.
Specifically, FRAPP is used to demonstrate that (a) the prior techniques differ only in their choices for the perturbation
matrix elements, and (b) a symmetric positive-definite perturbation matrix with minimal condition number can be identified,
substantially enhancing the accuracy even under strict privacy requirements. We also propose a novel perturbation mechanism
wherein the matrix elements are themselves characterized as random variables, and demonstrate that this feature provides significant
improvements in privacy at only a marginal reduction in accuracy. The quantitative utility of FRAPP, which is a general-purpose
random-perturbation-based privacy-preserving mining technique, is evaluated specifically with regard to association and classification
rule mining on a variety of real datasets. Our experimental results indicate that, for a given privacy requirement, either
substantially lower modeling errors are incurred as compared to the prior techniques, or the errors are comparable to those
of direct mining on the true database.
A partial and preliminary version of this paper appeared in the Proc. of the 21st IEEE Intl. Conf. on Data Engineering (ICDE),
Tokyo, Japan, 2005, pgs. 193–204. 相似文献
3.
Shibnath Mukherjee Zhiyuan Chen Aryya Gangopadhyay 《The VLDB Journal The International Journal on Very Large Data Bases》2006,15(4):293-315
Privacy preserving data mining has become increasingly popular because it allows sharing of privacy-sensitive data for analysis purposes. However, existing techniques such as random perturbation do not fare well for simple yet widely used and efficient Euclidean distance-based mining algorithms. Although original data distributions can be pretty accurately reconstructed from the perturbed data, distances between individual data points are not preserved, leading to poor accuracy for the distance-based mining methods. Besides, they do not generally focus on data reduction. Other studies on secure multi-party computation often concentrate on techniques useful to very specific mining algorithms and scenarios such that they require modification of the mining algorithms and are often difficult to generalize to other mining algorithms or scenarios. This paper proposes a novel generalized approach using the well-known energy compaction power of Fourier-related transforms to hide sensitive data values and to approximately preserve Euclidean distances in centralized and distributed scenarios to a great degree of accuracy. Three algorithms to select the most important transform coefficients are presented, one for a centralized database case, the second one for a horizontally partitioned, and the third one for a vertically partitioned database case. Experimental results demonstrate the effectiveness of the proposed approach. 相似文献
4.
Privacy preserving technologies are likely to become an essential component of adaptive services in pervasive and mobile computing. Although privacy issues have been studied for a long time in computer science as well as in other fields, most studies are focused on the release of data from large repositories. Mobile and pervasive computing pose new challenges, requiring specific formal models for attacks and new privacy preserving techniques. This paper considers a specific pervasive computing scenario, and shows that the application of state-of-the-art techniques for the anonymization of service requests is insufficient to protect the privacy of users. A specific class of attacks, called shadow attacks, is formally defined and a defense technique is proposed. This defense is formally proved to be correct, and its effectiveness is validated by extensive experiments in a simulated environment. 相似文献
5.
When a table containing individual data is published, disclosure of sensitive information should be prohibitive. Since simply removing identifiers such as name and social security number may reveal the sensitive information by linking attacks which joins the published table with other tables on some attributes, the notion of k-anonymity which makes each record in the table be indistinguishable with k−1 other records by suppression or generalization has been proposed previously. It is shown to be NP-hard to k-anonymize a table minimizing information loss. The approximation algorithms with up to O(k) approximation ratio were proposed when generalization is used for anonymization. 相似文献
6.
Maxine M. Denniston Nancy D. Brener Laura Kann Danice K. Eaton Timothy McManus Tonja M. Kyle Alice M. Roberts Katherine H. Flint James G. Ross 《Computers in human behavior》2010
The Youth Risk Behavior Surveillance System (YRBSS) monitors priority health-risk behaviors among US high school students. To better understand the ramifications of changing the YRBSS from paper-and-pencil to Web administration, in 2008 the Centers for Disease Control and Prevention conducted a study comparing these two modes of administration. Eighty-five schools in 15 states agreed to participate in the study. Within each participating school, four classrooms of students in grades 9 or 10 were randomly assigned to complete the Youth Risk Behavior Survey questionnaire in one of four conditions (in-class paper-and-pencil, in-class Web without programmed skip patterns, in-class Web with programmed skip patterns, and “on your own” Web without programmed skip patterns). Findings included less missing data for the paper-and-pencil condition (1.5% vs. 5.3%, 4.4%, 6.4%; p < .001), less perceived privacy and anonymity among respondents for the in-class Web conditions, and a lower response rate for the “on your own” Web condition than for in-class administration by either mode (28.0% vs. 91.2%, 90.1%, 91.4%; p < .001). Although Web administration might be useful for some surveys, these findings do not favor the use of a Web survey for the YRBSS. 相似文献
7.
The process of identifying which records in two or more databases correspond to the same entity is an important aspect of data quality activities such as data pre-processing and data integration. Known as record linkage, data matching or entity resolution, this process has attracted interest from researchers in fields such as databases and data warehousing, data mining, information systems, and machine learning. Record linkage has various challenges, including scalability to large databases, accurate matching and classification, and privacy and confidentiality. The latter challenge arises because commonly personal identifying data, such as names, addresses and dates of birth of individuals, are used in the linkage process. When databases are linked across organizations, the issue of how to protect the privacy and confidentiality of such sensitive information is crucial to successful application of record linkage. 相似文献
8.
Numerous privacy-preserving data publishing algorithms were proposed to achieve privacy guarantees such as ?‐diversity. Many of them, however, were recently found to be vulnerable to algorithm-based disclosure—i.e., privacy leakage incurred by an adversary who is aware of the privacy-preserving algorithm being used. This paper describes generic techniques for correcting the design of existing privacy-preserving data publishing algorithms to eliminate algorithm-based disclosure. We first show that algorithm-based disclosure is more prevalent and serious than previously studied. Then, we strictly define Algorithm-SAfe Publishing (ASAP) to capture and eliminate threats from algorithm-based disclosure. To correct the problems of existing data publishing algorithms, we propose two generic tools to be integrated in their design: global look-ahead and local look-ahead. To enhance data utility, we propose another generic tool called stratified pick-up . We demonstrate the effectiveness of our tools by applying them to several popular ?‐diversity algorithms: Mondrian, Hilb, and MASK. We conduct extensive experiments to demonstrate the effectiveness of our tools in terms of data utility and efficiency. 相似文献
9.
The concept of anonymity comes into play in a wide range of situations, varying from voting and anonymous donations to postings on bulletin boards and sending emails. The protocols for ensuring anonymity often use random mechanisms which can be described probabilistically, while the agents’ behavior may be totally unpredictable, irregular, and hence expressible only nondeterministically. Formal definitions of the concept of anonymity have been investigated in the past either in a totally nondeterministic framework, or in a purely probabilistic one. In this paper, we investigate a notion of anonymity which combines both probability and nondeterminism, and which is suitable for describing the most general situation in which the protocol and the users can have both probabilistic and nondeterministic behavior. We also investigate the properties of the definition for the particular cases of purely nondeterministic users and purely probabilistic users. We formulate the notions of anonymity in terms of probabilistic automata, and we describe protocols and users as processes in the probabilistic π-calculus, whose semantics is again based on probabilistic automata. Throughout the paper, we illustrate our ideas by using the example of the dining cryptographers. 相似文献
10.
David Antolino Rivas José M. Barceló-Ordinas Manel Guerrero Zapata Julián D. Morillo-Pozo 《Journal of Network and Computer Applications》2011,34(6):1942-1955
This article is a position paper on the current security issues in Vehicular Ad hoc Networks (VANETs). VANETs face many interesting research challenges in multiple areas, from privacy and anonymity to the detection and eviction of misbehaving nodes and many others in between. Multiple solutions have been proposed to address those issues. This paper surveys the most relevant while discussing its benefits and drawbacks. The paper explores the newest trends in privacy, anonymity, misbehaving nodes, the dissemination of false information and secure data aggregation, giving a perspective on how we foresee the future of this research area.First, the paper discusses the use of Public Key Infrastructure (PKI) (and certificates revocation), location privacy, anonymity and group signatures for VANETs. Then, it compares several proposals to identify and evict misbehaving and faulty nodes. Finally, the paper explores the differences between syntactic and semantic aggregation techniques, cluster and non-cluster based with fixed and dynamic based areas, while presenting secure as well as probabilistic aggregation schemes. 相似文献
11.
An efficient mutual authentication and key agreement protocol preserving user anonymity in mobile networks 总被引:1,自引:0,他引:1
We address the problem of mutual authentication and key agreement with user anonymity for mobile networks. Recently, Lee et al. proposed such a scheme, which is claimed to be a slight modification of, but a security enhancement on Zhu et al.’s scheme based on the smart card. In this paper, however, we reveal that both schemes still suffer from certain weaknesses which have been previously overlooked, and thus are far from the desired security. We then propose a new protocol which is immune to various known types of attacks. Analysis shows that, while achieving identity anonymity, key agreement fairness, and user friendliness, our scheme is still cost-efficient for a general mobile node. 相似文献
12.
In this paper, we propose a hybrid multi-group approach for privacy preserving data mining. We make two contributions in this
paper. First, we propose a hybrid approach. Previous work has used either the randomization approach or the secure multi-party
computation (SMC) approach. However, these two approaches have complementary features: the randomization approach is much
more efficient but less accurate, while the SMC approach is less efficient but more accurate. We propose a novel hybrid approach, which takes advantage of the strength of both approaches to balance the accuracy and efficiency constraints. Compared
to the two existing approaches, our proposed approach can achieve much better accuracy than randomization approach and much
reduced computation cost than SMC approach. We also propose a multi-group scheme that makes it flexible for the data miner
to control the balance between data mining accuracy and privacy. This scheme is motivated by the fact that existing randomization
schemes that randomize data at individual attribute level can produce insufficient accuracy when the number of dimensions
is high. We partition attributes into groups, and develop a scheme to conduct group-based randomization to achieve better
data mining accuracy. To demonstrate the effectiveness of the proposed general schemes, we have implemented them for the ID3
decision tree algorithm and association rule mining problem and we also present experimental results.
相似文献
Wenliang DuEmail: |
13.
In this paper, we propose a privacy-preserving method to determine the number of distinct users who connected to one or more entry points of a distributed Internet service with multiple service operators. The problem is motivated by the anonymization network Tor, and the difficulties that arise when aiming to estimate the number of Tor users. We present a way to perform distributed user counting with accurate estimates and a high level of privacy protection, based on a probabilistic data structure. We start from a relatively naive approach, and analyze the level of privacy protection that it provides. Subsequently, we improve on this baseline mechanism, building upon the gained insights. In order to assess the privacy properties of the discussed techniques, we use a novel probabilistic analysis approach which compares an attacker’s a priori and a posteriori knowledge. 相似文献
14.
Hidden attribute-based signatures without anonymity revocation 总被引:1,自引:0,他引:1
We propose a new notion called hidden attribute-based signature, which is inspired by the recent developments in attribute-based cryptosystem. With this technique, users are able to sign messages with any subset of their attributes issued from an attribute center. In this notion, a signature attests not to the identity of the individual who endorsed a message, but instead to a claim regarding the attributes the underlying signer possesses. Users cannot forge signature with attributes which they have not been issued. Furthermore, signer remains anonymous without the fear of revocation, among all users with the attributes purported in the signature.After formalizing the security model, we propose two constructions of hidden attribute-based signature from pairings. The first construction supports a large universe of attributes and its security proof relies on the random oracle assumption, which can be removed in the second construction. Both constructions have proven to be secure under the standard computational Diffie-Hellman assumption. 相似文献
15.
16.
This paper describes PayCash, an Internet payment system that was designed to offer strong security and privacy protection. This system is based on the concept of electronic cash, extended to support a flexible anonymity policy so as to accommodate privacy and security laws that differ from nation to nation. PayCash includes novel techniques to generate trustworthy records of all transactions, making it possible to detect many forms of fraud. This system also allows users to send a variable number of “electronic coins” in a single message, so both large and small amounts of money can be transferred efficiently. 相似文献
17.
Numerical networks completely transform the security problems about files and communications. Nowadays, in data processing, normalised tools are being put to use. Study of new services emphasizes preoccupations that were formerly of secondary interest, such as discretion, identification, signature… “Garantir” is the french word that covers up all these concepts. The link with teleinformatics will lead to forge a new word: “the garantics”.The increasing importance of data processing brings dangers for private life, individual and public freedom. The governmental preoccupations in this matter are expressed by new laws adapted to this new situation.Recent theoretical and technological developments in the field of public cryptology allow one to constitute the security software procedures and hardware devices necessary for the application of new legislative measures.But, taking into consideration the demands upon national security these recent developments arise also serious legal problems for the “garantics” clearly favor the enciphering party to the party who tries to break the code.Yet, in the face of the increasing investigation means of big firms both public and private, and in the face of the ever-increasing automation of files the “garantics” seem quite necessary to enforce the individual freedom and protect private life. 相似文献
18.
Privacy-preserving is a major concern in the application of data mining techniques to datasets containing personal, sensitive, or confidential information. Data distortion is a critical component to preserve privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. We propose a sparsified Singular Value Decomposition (SVD) method for data distortion. We also put forth a few metrics to measure the difference between the distorted dataset and the original dataset and the degree of the privacy protection. Our experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets.
Shuting Xu received her PhD in Computer Science from the University of Kentucky in 2005. Dr. Xu is presently an Assistant Professor in the Department of Computer Information Systems at the Virginia State University. Her research interests include data mining and information retrieval, database systems, parallel, and distributed computing.
Jun Zhang received a PhD from The George Washington University in 1997. He is an Associate Professor of Computer Science and Director of the Laboratory for High Performance Scientific Computing & Computer Simulation and Laboratory for Computational Medical Imaging & Data Analysis at the University of Kentucky. His research interests include computational neuroinformatics, data miningand information retrieval, large scale parallel and scientific computing, numerical simulation, iterative and preconditioning techniques for large scale matrix computation. Dr. Zhang is associate editor and on the editorial boards of four international journals in computer simulation andcomputational mathematics, and is on the program committees of a few international conferences. His research work has been funded by the U.S. National Science Foundation and the Department of Energy. He is recipient of the U.S. National Science Foundation CAREER Award and several other awards.
Dianwei Han received an M.E. degree from Beijing Institute of Technology, Beijing, China, in 1995. From 1995to 1998, he worked in a Hitachi company(BHH) in Beijing, China. He received an MS degree from Lamar University, USA, in 2003. He is currently a PhD student in the Department of Computer Science, University of Kentucky, USA. His research interests include data mining and information retrieval, computational medical imaging analysis, and artificial intelligence.
Jie Wang received the masters degree in Industrial Automation from Beijing University of Chemical Technology in 1996. She is currently a PhD student and a member of the Laboratory for High Performance Computing and Computer Simulation in the Department of Computer Science at the University of Kentucky, USA. Her research interests include data mining and knowledge discovery, information filtering and retrieval, inter-organizational collaboration mechanism, and intelligent e-Technology. 相似文献
19.
Anonymity technologies enable Internet users to maintain a level of privacy that prevents the collection of identifying information such as the IP address. Understanding the deployment of anonymity technologies on the Internet is important to analyze the current and future trends. In this paper, we provide a tutorial survey and a measurement study to understand the anonymity technology usage on the Internet from multiple perspectives and platforms. First, we review currently utilized anonymity technologies and assess their usage levels. For this, we cover deployed contemporary anonymity technologies including proxy servers, remailers, JAP, I2P, and Tor with the geo-location of deployed servers. Among these systems, proxy servers, Tor and I2P are actively used, while remailers and JAP have minimal usage. Then, we analyze application-level protocol usage and anonymity technology usage with different applications. For this, we preform a measurement study by collecting data from a Tor exit node, a P2P client, a large campus network, a departmental email server, and publicly available data on spam sources to assess the utilization of anonymizer technologies from various perspectives. Our results confirm previous findings regarding application usage and server geo-location distribution where certain countries utilize anonymity networks significantly more than others. Moreover, our application analysis reveals that Tor and proxy servers are used more than other anonymity techniques. 相似文献
20.
Privacy preserving clustering on horizontally partitioned data 总被引:3,自引:0,他引:3
Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy preserving data mining researchers is to develop data mining techniques that could be applied on databases without violating the privacy of individuals. Privacy preserving techniques for various data mining models have been proposed, initially for classification on centralized data then for association rules in distributed environments. In this work, we propose methods for constructing the dissimilarity matrix of objects from different sites in a privacy preserving manner which can be used for privacy preserving clustering as well as database joins, record linkage and other operations that require pair-wise comparison of individual private data objects horizontally distributed to multiple sites. We show communication and computation complexity of our protocol by conducting experiments over synthetically generated and real datasets. Each experiment is also performed for a baseline protocol, which has no privacy concern to show that the overhead comes with security and privacy by comparing the baseline protocol and our protocol. 相似文献