共查询到20条相似文献,搜索用时 46 毫秒
1.
Transaction data record various information about individuals, including their purchases and diagnoses, and are increasingly published to support large-scale and low-cost studies in domains such as marketing and medicine. However, the dissemination of transaction data may lead to privacy breaches, as it allows an attacker to link an individual's record to their identity. Approaches that anonymize data by eliminating certain values in an individual's record or by replacing them with more general values have been proposed recently, but they often produce data of limited usefulness. This is because these approaches adopt value transformation strategies that do not guarantee data utility in intended applications and objective measures that may lead to excessive data distortion. In this paper, we propose a novel approach for anonymizing data in a way that satisfies data publishers' utility requirements and incurs low information loss. To achieve this, we introduce an accurate information loss measure and an effective anonymization algorithm that explores a large part of the problem space. An extensive experimental study, using click-stream and medical data, demonstrates that our approach permits many times more accurate query answering than the state-of-the-art methods, while it is comparable to them in terms of efficiency. 相似文献
2.
So far, the data anonymization approaches based on k-anonymity and l-diversity has contributed much to privacy protection from record and attributes linkage attacks. However, the existing solutions are not efficient when applied to multimedia Big Data anonymization. This paper analyzes this problem in detail in terms of the processing time, memory space, and usability, and presents two schemes to overcome such inefficiency. The first one is to reduce the processing time and space by minimizing the temporary buffer usage during anonymization process. The second is to construct an early taxonomy during the database design. The idea behind this approach is that database designers should take preliminary actions for anonymization during the early stage of a database design to alleviate the burden placed on data publishers. To evaluate the effectiveness and feasibility of these schemes, specific application tools based on the proposed approaches were implemented and experiments were conducted. 相似文献
3.
4.
Jiuyong Li Jixue Liu Muzammil Baig Raymond Chi-Wing WongAuthor vitae 《Data & Knowledge Engineering》2011,70(12):1030-1045
Anonymization is a practical approach to protect privacy in data. The major objective of privacy preserving data publishing is to protect private information in data whereas data is still useful for some intended applications, such as building classification models. In this paper, we argue that data generalization in anonymization should be determined by the classification capability of data rather than the privacy requirement. We make use of mutual information for measuring classification capability for generalization, and propose two k-anonymity algorithms to produce anonymized tables for building accurate classification models. The algorithms generalize attributes to maximize the classification capability, and then suppress values by a privacy requirement k (IACk) or distributional constraints (IACc). Experimental results show that algorithm IACk supports more accurate classification models and is faster than a benchmark utility-aware data anonymization algorithm. 相似文献
5.
A framework for condensation-based anonymization of string data 总被引:1,自引:0,他引:1
In recent years, privacy preserving data mining has become an important problem because of the large amount of personal data
which is tracked by many business applications. An important method for privacy preserving data mining is the method of condensation.
This method is often used in the case of multi-dimensional data in which pseudo-data is generated to mask the true values
of the records. However, these methods are not easily applicable to the case of string data, since they require the use of
multi-dimensional statistics in order to generate the pseudo-data. String data are especially important in the privacy preserving
data-mining domain because most DNA and biological data are coded as strings. In this article, we will discuss a new method
for privacy preserving mining of string data with the use of simple template-based models. The template-based model turns
out to be effective in practice, and preserves important statistical characteristics of the strings such as intra-record distances.
We will explore the behavior in the context of a classification application, and show that the accuracy of the application
is not affected significantly by the anonymization process.
This article is an extended version of the conference version of the paper presented at the SIAM Conference on Data Mining,
2007 Aggarwal and Yu (2007). Available at . 相似文献
6.
International Journal of Information Security - The introduction of advanced metering infrastructure (AMI) smart meters has given rise to fine-grained electricity usage data at different levels of... 相似文献
7.
The solution of the factor analysis problem is discussed. A method of factor analysis that provides processing of the data in the form of a transaction base is developed. It involves the rules extraction from the given transaction bases, which results in data generalization and, therefore, exclusion of the extract features, which allows one to reduce the search space and factor analysis execution time. The calculation complexity of the method is analyzed. The experimental investigation of solving practical and test problems is carried out. 相似文献
8.
9.
Robert Bredereck André Nichterlein Rolf Niedermeier Geevarghese Philip 《Data mining and knowledge discovery》2014,28(1):65-91
A matrix M is said to be k-anonymous if for each row r in M there are at least k ? 1 other rows in M which are identical to r. The NP-hard k-Anonymity problem asks, given an n × m-matrix M over a fixed alphabet and an integer s > 0, whether M can be made k-anonymous by suppressing (blanking out) at most s entries. Complementing previous work, we introduce two new “data-driven” parameterizations for k-Anonymity—the number t in of different input rows and the number t out of different output rows—both modeling aspects of data homogeneity. We show that k-Anonymity is fixed-parameter tractable for the parameter t in , and that it is NP-hard even for t out = 2 and alphabet size four. Notably, our fixed-parameter tractability result implies that k-Anonymity can be solved in linear time when t in is a constant. Our computational hardness results also extend to the related privacy problems p-Sensitivity and ?-Diversity, while our fixed-parameter tractability results extend to p-Sensitivity and the usage of domain generalization hierarchies, where the entries are replaced by more general data instead of being completely suppressed. 相似文献
10.
The concept of cloud computing has emerged as the next generation of computing infrastructure to reduce the costs associated with the management of hardware and software resources. It is vital to its success that cloud computing is featured efficient, flexible and secure characteristics. In this paper, we propose an efficient and anonymous data sharing protocol with flexible sharing style, named EFADS, for outsourcing data onto the cloud. Through formal security analysis, we demonstrate that EFADS provides data confidentiality and data sharer's anonymity without requiring any fully-trusted party. From experimental results, we show that EFADS is more efficient than existing competing approaches. Furthermore, the proxy re-encryption scheme we propose in this paper may be independent of interests, i.e., compared to those previously reported proxy re-encryption schemes, the proposed scheme is the first pairing-free, anonymous and unidirectional proxy re-encryption scheme in the standard model. 相似文献
11.
François Rousseau Jordi Casas-Roma Michalis Vazirgiannis 《Knowledge and Information Systems》2018,54(2):315-343
In this paper, we propose a novel edge modification technique that better preserves the communities of a graph while anonymizing it. By maintaining the core number sequence of a graph, its coreness, we retain most of the information contained in the network while allowing changes in the degree sequence, i. e. obfuscating the visible data an attacker has access to. We reach a better trade-off between data privacy and data utility than with existing methods by capitalizing on the slack between apparent degree (node degree) and true degree (node core number). Our extensive experiments on six diverse standard network datasets support this claim. Our framework compares our method to other that are used as proxies for privacy protection in the relevant literature. We demonstrate that our method leads to higher data utility preservation, especially in clustering, for the same levels of randomization and k-anonymity. 相似文献
12.
In recent years, online social networks have become a part of everyday life for millions of individuals. Also, data analysts have found a fertile field for analyzing user behavior at individual and collective levels, for academic and commercial reasons. On the other hand, there are many risks for user privacy, as information a user may wish to remain private becomes evident upon analysis. However, when data is anonymized to make it safe for publication in the public domain, information is inevitably lost with respect to the original version, a significant aspect of social networks being the local neighborhood of a user and its associated data. Current anonymization techniques are good at identifying risks and minimizing them, but not so good at maintaining local contextual data which relate users in a social network. Thus, improving this aspect will have a high impact on the data utility of anonymized social networks. Also, there is a lack of systems which facilitate the work of a data analyst in anonymizing this type of data structures and performing empirical experiments in a controlled manner on different datasets. Hence, in the present work we address these issues by designing and implementing a sophisticated synthetic data generator together with an anonymization processor with strict privacy guarantees and which takes into account the local neighborhood when anonymizing. All this is done for a complex dataset which can be fitted to a real dataset in terms of data profiles and distributions. In the empirical section we perform experiments to demonstrate the scalability of the method and the improvement in terms of reduction of information loss with respect to approaches which do not consider the local neighborhood context when anonymizing. 相似文献
13.
14.
提出了移动广播环境中有效处理实时只读事务的方法。给出了多种多版本广播磁盘组织。采用多版本机制,实现移动只读事务无阻塞提交。通过乐观方法,消除移动只读事务和移动更新事务的冲突。使用多版本动态调整串行次序技术,避免了不必要的事务重启动。在移动主机上如果移动只读事务通过向后有效性确认,则可提交,不需要提交到服务器处理,降低移动只读事务的响应时间。通过模拟仿真对提出的方法进行了性能测试,实验结果表明新方法要优于其他协议。 相似文献
15.
Pietro Parodi Roberto Fontana 《International Journal on Document Analysis and Recognition》1999,2(2-3):67-79
This paper describes a novel method for extracting text from document pages of mixed content. The method works by detecting
pieces of text lines in small overlapping columns of width , shifted with respect to each other by image elements (good default values are: of the image width, ) and by merging these pieces in a bottom-up fashion to form complete text lines and blocks of text lines. The algorithm requires
about 1.3 s for a 300 dpi image on a PC with a Pentium II CPU, 300 MHz, MotherBoard Intel440LX. The algorithm is largely independent
of the layout of the document, the shape of the text regions, and the font size and style. The main assumptions are that the
background be uniform and that the text sit approximately horizontally. For a skew of up to about 10 degrees no skew correction
mechanism is necessary. The algorithm has been tested on the UW English Document Database I of the University of Washington
and its performance has been evaluated by a suitable measure of segmentation accuracy. Also, a detailed analysis of the segmentation
accuracy achieved by the algorithm as a function of noise and skew has been carried out.
Received April 4, 1999 / Revised June 1, 1999 相似文献
16.
A hybrid approach for scalable sub-tree anonymization over big data using MapReduce on cloud 总被引:1,自引:0,他引:1
《Journal of Computer and System Sciences》2014,80(5):1008-1020
In big data applications, data privacy is one of the most concerned issues because processing large-scale privacy-sensitive data sets often requires computation resources provisioned by public cloud services. Sub-tree data anonymization is a widely adopted scheme to anonymize data sets for privacy preservation. Top–Down Specialization (TDS) and Bottom–Up Generalization (BUG) are two ways to fulfill sub-tree anonymization. However, existing approaches for sub-tree anonymization fall short of parallelization capability, thereby lacking scalability in handling big data in cloud. Still, either TDS or BUG individually suffers from poor performance for certain valuing of k-anonymity parameter. In this paper, we propose a hybrid approach that combines TDS and BUG together for efficient sub-tree anonymization over big data. Further, we design MapReduce algorithms for the two components (TDS and BUG) to gain high scalability. Experiment evaluation demonstrates that the hybrid approach significantly improves the scalability and efficiency of sub-tree anonymization scheme over existing approaches. 相似文献
17.
Segmenting customers by transaction data with concept hierarchy 总被引:1,自引:0,他引:1
The segmentation of customers is crucial for an organization wishing to develop appropriate promotion strategies for different clusters. Clustering customers provides an in-depth understanding of their behavior. However, previous studies have paid little attention to the similarity of different items in transaction. Lack of categories and concept levels of items, results from item-based segmentation methods are not as good as expected. Through employing a concept hierarchy of items, this study proposes a segmentation methodology to identify similarities between customers. First, the dissimilarity between transaction sequences is defined. Second, we adopt hierarchical clustering method to segment customers by their transaction data with concept hierarchy of consumed items. After segmentation, three cluster validation indices are used for optimizing the number of clusters of customers. Through the compassion of normalized index, the segmentation method proposed by this study rendered better results than other traditional methods. 相似文献
18.
AdaBoost is a famous ensemble learning method and has achieved successful applications in many fields.The existing studies illustrate that AdaBoost easily suffe... 相似文献
19.
A new efficient checkpointing strategy for a transaction oriented database system that operates in a stable on-line environment is proposed. Detailed experimental results are reported to evaluate the performance of the proposed scheme in terms of different system parameters. 相似文献
20.
Let us consider the following situation: \(t\) entities (e.g., hospitals) hold different databases containing different records for the same type of confidential (e.g., medical) data. They want to deliver a protected version of this data to third parties (e.g., pharmaceutical researchers), preserving in some way both the utility and the privacy of the original data. This can be done by applying a statistical disclosure control (SDC) method. One possibility is that each entity protects its own database individually, but this strategy provides less utility and privacy than a collective strategy where the entities cooperate, by means of a distributed protocol, to produce a global protected dataset. In this paper, we investigate the problem of distributed protocols for SDC protection methods. We propose a simple, efficient and secure distributed protocol for the specific SDC method of rank shuffling. We run some experiments to evaluate the quality of this protocol and to compare the individual and collective strategies for solving the problem of protecting a distributed database. With respect to other distributed versions of SDC methods, the new protocol provides either more security or more efficiency, as we discuss through the paper. 相似文献