首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Tsai  Yu-Chuan  Wang  Shyue-Liang  Ting  I-Hsien  Hong  Tzung-Pei 《World Wide Web》2020,23(4):2391-2406

In recent years, privacy breaches have been a great concern on the published data. Only removing one’s personal identification information is not sufficient to protect individual’s privacy. Privacy preservation technology for published data is devoted to preventing re-identification and retaining the useful information in published data. In this work, we propose a novel algorithm to deal with sensitive and quasi-identifier items, respectively, in transactional data. The proposed algorithm maintains at least the same or a stronger privacy level for transactional data with 1/k. In numerical experiments, our proposed algorithm shows better running time and better data utility.

  相似文献   

2.
Transaction data are increasingly used in applications, such as marketing research and biomedical studies. Publishing these data, however, may risk privacy breaches, as they often contain personal information about individuals. Approaches to anonymizing transaction data have been proposed recently, but they may produce excessively distorted and inadequately protected solutions. This is because these approaches do not consider privacy requirements that are common in real-world applications in a realistic and flexible manner, and attempt to safeguard the data only against either identity disclosure or sensitive information inference. In this paper, we propose a new approach that overcomes these limitations. We introduce a rule-based privacy model that allows data publishers to express fine-grained protection requirements for both identity and sensitive information disclosure. Based on this model, we also develop two anonymization algorithms. Our first algorithm works in a top-down fashion, employing an efficient strategy to recursively generalize data with low information loss. Our second algorithm uses sampling and a combination of top-down and bottom-up generalization heuristics, which greatly improves scalability while maintaining low information loss. Extensive experiments show that our algorithms significantly outperform the state-of-the-art in terms of retaining data utility, while achieving good protection and scalability.  相似文献   

3.
Transactional data collection and sharing currently face the challenge of how to prevent information leakage and protect data from privacy breaches while maintaining high-quality data utilities. Data anonymization methods such as perturbation, generalization, and suppression have been proposed for privacy protection. However, many of these methods incur excessive information loss and cannot satisfy multipurpose utility requirements. In this paper, we propose a multidimensional generalization method to provide multipurpose optimization when anonymizing transactional data in order to offer better data utility for different applications. Our methodology uses bipartite graphs with generalizing attribute, grouping item and perturbing outlier. Experiments on real-life datasets are performed and show that our solution considerably improves data utility compared to existing algorithms.  相似文献   

4.
Multirelational k-Anonymity   总被引:1,自引:0,他引:1  
k-Anonymity protects privacy by ensuring that data cannot be linked to a single individual. In a k-anonymous data set, any identifying information occurs in at least k tuples. Much research has been done to modify a single-table data set to satisfy anonymity constraints. This paper extends the definitions of k-anonymity to multiple relations and shows that previously proposed methodologies either fail to protect privacy or overly reduce the utility of the data in a multiple relation setting. We also propose two new clustering algorithms to achieve multirelational anonymity. Experiments show the effectiveness of the approach in terms of utility and efficiency.  相似文献   

5.
This paper tackles a privacy breach in current location-based services (LBS) where mobile users have to report their exact location information to an LBS provider in order to obtain their desired services. For example, a user who wants to issue a query asking about her nearest gas station has to report her exact location to an LBS provider. However, many recent research efforts have indicated that revealing private location information to potentially untrusted LBS providers may lead to major privacy breaches. To preserve user location privacy, spatial cloaking is the most commonly used privacy-enhancing technique in LBS. The basic idea of the spatial cloaking technique is to blur a user’s exact location into a cloaked area that satisfies the user specified privacy requirements. Unfortunately, existing spatial cloaking algorithms designed for LBS rely on fixed communication infrastructure, e.g., base stations, and centralized/distributed servers. Thus, these algorithms cannot be applied to a mobile peer-to-peer (P2P) environment where mobile users can only communicate with other peers through P2P multi-hop routing without any support of fixed communication infrastructure or servers. In this paper, we propose a spatial cloaking algorithm for mobile P2P environments. As mobile P2P environments have many unique limitations, e.g., user mobility, limited transmission range, multi-hop communication, scarce communication resources, and network partitions, we propose three key features to enhance our algorithm: (1) An information sharing scheme enables mobile users to share their gathered peer location information to reduce communication overhead; (2) A historical location scheme allows mobile users to utilize stale peer location information to overcome the network partition problem; and (3) A cloaked area adjustment scheme guarantees that our spatial cloaking algorithm is free from a “center-of-cloaked-area” privacy attack. Experimental results show that our P2P spatial cloaking algorithm is scalable while guaranteeing the user’s location privacy protection.  相似文献   

6.
Transaction data record various information about individuals, including their purchases and diagnoses, and are increasingly published to support large-scale and low-cost studies in domains such as marketing and medicine. However, the dissemination of transaction data may lead to privacy breaches, as it allows an attacker to link an individual's record to their identity. Approaches that anonymize data by eliminating certain values in an individual's record or by replacing them with more general values have been proposed recently, but they often produce data of limited usefulness. This is because these approaches adopt value transformation strategies that do not guarantee data utility in intended applications and objective measures that may lead to excessive data distortion. In this paper, we propose a novel approach for anonymizing data in a way that satisfies data publishers' utility requirements and incurs low information loss. To achieve this, we introduce an accurate information loss measure and an effective anonymization algorithm that explores a large part of the problem space. An extensive experimental study, using click-stream and medical data, demonstrates that our approach permits many times more accurate query answering than the state-of-the-art methods, while it is comparable to them in terms of efficiency.  相似文献   

7.
Privacy preserving association rule mining has been an active research area since recently. To this problem, there have been two different approaches—perturbation based and secure multiparty computation based. One drawback of the perturbation based approach is that it cannot always fully preserve individual’s privacy while achieving precision of mining results. The secure multiparty computation based approach works only for distributed environment and needs sophisticated protocols, which constrains its practical usage. In this paper, we propose a new approach for preserving privacy in association rule mining. The main idea is to use keyed Bloom filters to represent transactions as well as data items. The proposed approach can fully preserve privacy while maintaining the precision of mining results. The tradeoff between mining precision and storage requirement is investigated. We also propose δ-folding technique to further reduce the storage requirement without sacrificing mining precision and running time.  相似文献   

8.
In this paper we study the problem of providing controlled access to confidential data stored in semistructured databases. More specifically, we focus on privacy violations via data inferences that occur when domain knowledge is combined with non-private data. We propose a formal model, called Privacy Information Flow Model, to represent the information flow and the privacy requirements. These privacy requirements are enforced by the Privacy Mediator. Privacy Mediator guarantees that users are not be able to logically entail information that violates the privacy requirements. We present an inference algorithm that is sound and complete. The inference algorithm is developed for a tree-like, semistructured data model, selection-projection queries, and domain knowledge, represented as Horn-clause constraints.  相似文献   

9.
Online privacy policies describe organizations’ privacy practices for collecting, storing, using, and protecting consumers’ personal information. Users need to understand these policies in order to know how their personal information is being collected, stored, used, and protected. Organizations need to ensure that the commitments they express in their privacy policies reflect their actual business practices, especially in the United States where the Federal Trade Commission regulates fair business practices. Requirements engineers need to understand the privacy policies to know the privacy practices with which the software must comply and to ensure that the commitments expressed in these privacy policies are incorporated into the software requirements. In this paper, we present a methodology for obtaining requirements from privacy policies based on our theory of commitments, privileges, and rights, which was developed through a grounded theory approach. This methodology was developed from a case study in which we derived software requirements from seventeen healthcare privacy policies. We found that legal-based approaches do not provide sufficient coverage of privacy requirements because privacy policies focus primarily on procedural practices rather than legal practices.  相似文献   

10.
The participatory sensing paradigm, through the growing availability of cheap sensors in mobile devices, enables applications of great social and business interest, e.g., electrosmog exposure measurement and early earthquake detection. However, users’ privacy concerns regarding their activity traces need to be adequately addressed as well. The existing static privacy-enabling approaches, which hide or obfuscate data, offer some protection at the expense of data value. These approaches do not offer privacy guarantees and heterogeneous user privacy requirements cannot be met by them. In this paper, we propose a user-side privacy-protection scheme; it adaptively adjusts its parameters, in order to meet personalized location-privacy protection requirements against adversaries in a measurable manner. As proved by simulation experiments with artificial- and real-data traces, when feasible, our approach not only always satisfies personal location-privacy concerns, but also maximizes data utility (in terms of error, data availability, area coverage), as compared to static privacy-protection schemes.  相似文献   

11.
The publication of microdata is pivotal for medical research purposes, data analysis and data mining. These published data contain a substantial amount of sensitive information, for example, a hospital may publish many sensitive attributes such as diseases, treatments and symptoms. The release of multiple sensitive attributes is not desirable because it puts the privacy of individuals at risk. The main vulnerability of such approach while releasing data is that if an adversary is successful in identifying a single sensitive attribute, then other sensitive attributes can be identified by co-relation. A whole variety of techniques such as SLOMS, SLAMSA and others already exist for the anonymization of multiple sensitive attributes; however, these techniques have their drawbacks when it comes to preserving privacy and ensuring data utility. The extant framework lacks in terms of preserving privacy for multiple sensitive attributes and ensuring data utility. We propose an efficient approach (p, k)-Angelization for the anonymization of multiple sensitive attributes. Our proposed approach protects the privacy of the individuals and yields promising results compared with currently used techniques in terms of utility. The (p, k)-Angelization approach not only preserves the privacy by eliminating the threat of background join and non-membership attacks but also reduces the information loss thus improving the utility of the released information.  相似文献   

12.
The security of computers and their networks is of crucial concern in the world today. One mechanism to safeguard information stored in database systems is an Intrusion Detection System (IDS). The purpose of intrusion detection in database systems is to detect malicious transactions that corrupt data. Recently researchers are working on using data mining techniques for detecting such malicious transactions in database systems. Their approach concentrates on mining data dependencies among data items. However, the transactions not compliant with these data dependencies are identified as malicious transactions. Algorithms that these approaches use for designing their data dependency miner have limitations. For instance, they need to experimentally determine appropriate settings for minimum support and related constraints, which does not necessarily lead to strong data dependencies. In this paper we propose a new data mining algorithm, called the Optimal Data Access Dependency Rule Mining (ODADRM), for designing a data dependency miner for our database IDS. ODADRM is an extension of k-optimal rule discovery algorithm, which has been improved to be suitable in database intrusion detection domain. ODADRM avoids many limitations of previous data dependency miner algorithms. As a result, our approach is able to track normal transactions and detect malicious ones more effectively than existing approaches.  相似文献   

13.
Real-time databases are poised to be an important component of complex embedded real-time systems. In real-time databases (as opposed to real-time systems), transactions must satisfy the ACID properties in addition to satisfying the timing constraints specified for each transaction (or task). Although several approaches have been proposed to combine real-time scheduling and database concurrency control methods, to the best of our knowledge, none of them provide a framework for taking into account the dynamic cost associated with aborts, rollbacks, and restarts of transactions. In this paper, we propose a framework in which both static and dynamic costs of transactions can be taken into account. Specifically, we present: i) a method for pre-analyzing transactions based on the notion of branch-points for data accessed up to a branch point and predicting expected data access to be incurred for completing the transaction, ii) a formulation of cost that includes static and dynamic factors for prioritizing transactions, iii) a scheduling algorithm which uses the above two, and iv) simulation of the algorithm for several operating conditions and workload. Our dynamic priority assignment policy (termed the cost conscious approach or CCA) adapts well to fluctuations in the system load without causing excessive numbers of transaction restarts. Our simulations indicate that i) CCA performs better than the EDF-HP algorithm for both soft and firm deadlines, ii) CCA is more fair than EDF-HP, iii) CCA is better than EDF-CR for soft deadline, even though CCA requires and uses less information, and iv) CCA is especially good for disk-resident data.  相似文献   

14.
Many recent applications depend on time series of data containing personal information. For example, the smart grid collects and distributes time series of energy-consumption data from households. Our concern is information hiding in such data according to individual privacy constraints, considering several constraints at a time. The existing information-hiding approaches we are aware of make limiting assumptions regarding the nature of such constraints. Our approach in turn lets the individuals concerned specify information that must be hidden arbitrarily, and it also lets the data receivers specify characteristics of the data needed to perform a certain task. We use these constraints to formulate an optimization problem that generates perturbed time series that fulfill the constraints of the data receivers and do not contain more sensitive information than allowed. Next, we propose a complexity-reduction approach that speeds up solving this optimization problem for time series by orders of magnitude. Three case studies on real-world data confirm that our approach is applicable to a wide range of application domains, and that it provides more protection against well-known privacy attacks such as re-identification, reconstruction and disaggregation. In addition, we provide a Java implementation of our approach and supplementary material on our web page.1  相似文献   

15.
K-anonymisation is an approach to protecting individuals from being identified from data.Good k-anonymisations should retain data utility and preserve privacy,but few methods have considered these two conflicting requirements together. In this paper,we extend our previous work on a clustering-based method for balancing data utility and privacy protection, and propose a set of heuristics to improve its effectiveness.We introduce new clustering criteria that treat utility and privacy on equal terms and propose sampling-based techniques to optimally set up its parameters.Extensive experiments show that the extended method achieves good accuracy in query answering and is able to prevent linking attacks effectively.  相似文献   

16.
With the ubiquity of handheld devices (such as smart phones and PDAs) and the availability of a wide range of mobile services (such as mobile banking, road traffic updates, and weather forecast), people can nowadays access information and conduct online transactions virtually anywhere and anytime. In such flexible, dynamic but less reliable environment, transaction management technology is believed to provide service reliability and data consistency. Indeed, in mobile and ubiquitous environments where devices as well as services can seamlessly join and leave the ubiquitous network; transaction management can be very helpful during the recovery of services from failure. Current transaction models and commit protocols do not take into account context information. However, in mobile environments, it is imperative to consider context information in the commit of a transaction—i.e., a transaction can be successfully completed if it meets the required context. In this paper, we propose a new model for context-aware transactions and their performance management in mobile environments. Unlike conventional transactions, context-aware transactions adapt to the required context. By context, we mean the service’s context as well as the users’ context that includes users’ needs and preferences. This paper designs and develops the proposed transaction model and evaluates its performance in terms of time and message complexities as well as transaction’s throughput.  相似文献   

17.
Anonymization is a practical approach to protect privacy in data. The major objective of privacy preserving data publishing is to protect private information in data whereas data is still useful for some intended applications, such as building classification models. In this paper, we argue that data generalization in anonymization should be determined by the classification capability of data rather than the privacy requirement. We make use of mutual information for measuring classification capability for generalization, and propose two k-anonymity algorithms to produce anonymized tables for building accurate classification models. The algorithms generalize attributes to maximize the classification capability, and then suppress values by a privacy requirement k (IACk) or distributional constraints (IACc). Experimental results show that algorithm IACk supports more accurate classification models and is faster than a benchmark utility-aware data anonymization algorithm.  相似文献   

18.
Web query logs provide a rich wealth of information, but also present serious privacy risks. We preserve privacy in publishing vocabularies extracted from a web query log by introducing vocabulary k-anonymity, which prevents the privacy attack of re-identification that reveals the real identities of vocabularies. A vocabulary is a bag of query-terms extracted from queries issued by a user at a specified granularity. Such bag-valued data are extremely sparse, which makes it hard to retain enough utility in enforcing k-anonymity. To the best of our knowledge, the prior works do not solve such a problem, among which some achieve a different privacy principle, for example, differential privacy, some deal with a different type of data, for example, set-valued data or relational data, and some consider a different publication scenario, for example, publishing frequent keywords. To retain enough data utility, a semantic similarity-based clustering approach is proposed, which measures the semantic similarity between a pair of terms by the minimum path distance over a semantic network of terms such as WordNet, computes the semantic similarity between two vocabularies by a weighted bipartite matching, and publishes the typical vocabulary for each cluster of semantically similar vocabularies. Extensive experiments on the AOL query log show that our approach can retain enough data utility in terms of loss metrics and in frequent pattern mining.  相似文献   

19.
Data mining technology helps extract usable knowledge from large data sets. The process of data collection and data dissemination may, however, result in an inherent risk of privacy threats. Some sensitive or private information about individuals, businesses and organizations needs to be suppressed before it is shared or published. The privacy-preserving data mining (PPDM) has thus become an important issue in recent years. In this paper, we propose an algorithm called SIF-IDF for modifying original databases in order to hide sensitive itemsets. It is a greedy approach based on the concept borrowed from the Term Frequency and Inverse Document Frequency (TF-IDF) in text mining. The above concept is used to evaluate the similarity degrees between the items in transactions and the desired sensitive itemsets and then selects appropriate items in some transactions to hide. The proposed algorithm can easily make good trade-offs between privacy preserving and execution time. Experimental results also show the performance of the proposed approach.  相似文献   

20.
Frequent pattern mining discovers sets of items that frequently appear together in a transactional database; these can serve valuable economic and research purposes. However, if the database contains sensitive data (e.g., user behavior records, electronic health records), directly releasing the discovered frequent patterns with support counts will carry significant risk to the privacy of individuals. In this paper, we study the problem of how to accurately find the top-k frequent patterns with noisy support counts on transactional databases while satisfying differential privacy. We propose an algorithm, called differentially private frequent pattern (DFP-Growth), that integrates a Laplace mechanism and an exponential mechanism to avoid privacy leakage. We theoretically prove that the proposed method is (λ, δ)-useful and differentially private. To boost the accuracy of the returned noisy support counts, we take consistency constraints into account to conduct constrained inference in the post-processing step. Extensive experiments, using several real datasets, confirm that our algorithm generates highly accurate noisy support counts and top-k frequent patterns.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号