首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Singular value decomposition (SVD) is widely used in data processing, reduction, and visualization. Applied to a positive matrix, the regular additive SVD by the first several dual vectors can yield irrelevant negative elements of the approximated matrix. We consider a multiplicative SVD modification that corresponds to minimizing the relative errors and produces always positive matrices at any approximation step. Another logistic SVD modification can be used for decomposition of the matrices of proportions, when a regular SVD can yield the elements beyond the zero-one range, while the modified SVD decomposition produces all the elements within the correct range at any step of approximation. Several additional modifications of matrix approximation are also considered.  相似文献   

2.
Two methods for privacy preserving data mining with malicious participants   总被引:1,自引:0,他引:1  
Privacy preserving data mining addresses the need of multiple parties with private inputs to run a data mining algorithm and learn the results over the combined data without revealing any unnecessary information. Most of the existing cryptographic solutions to privacy-preserving data mining assume semi-honest participants. In theory, these solutions can be extended to the malicious model using standard techniques like commitment schemes and zero-knowledge proofs. However, these techniques are often expensive, especially when the data sizes are large. In this paper, we investigate alternative ways to convert solutions in the semi-honest model to the malicious model. We take two classical solutions as examples, one of which can be extended to the malicious model with only slight modifications while another requires a careful redesign of the protocol. In both cases, our solutions for the malicious model are much more efficient than the zero-knowledge proofs based solutions.  相似文献   

3.
With the rapid development in society of the economy and of computational technologies, it is particularly important to build a secure, efficient and reliable smart grid architecture to provide users with high-quality electricity services. However, data collection and energy trading in public networks creates security and privacy challenges in smart grids. Blockchain technologies have the excellent characteristics of decentralization, immutability and traceability, which can resolve the security, integration and coordination problems faced by the traditional centralized networks for smart grids. The goal of this paper is to introduce and compare blockchain-based technologies in addressing the problems of privacy protection, identity authentication, data aggregation and electricity pricing for the data collection and power energy trading processes in smart grids. In addition, the existing challenges and future research directions of smart grids are discussed.  相似文献   

4.
5.
There has been relatively little work on privacy preserving techniques for distance based mining. The most widely used ones are additive perturbation methods and orthogonal transform based methods. These methods concentrate on privacy protection in the average case and provide no worst case privacy guarantee. However, the lack of privacy guarantee makes it difficult to use these techniques in practice, and causes possible privacy breach under certain attacking methods. This paper proposes a novel privacy protection method for distance based mining algorithms that gives worst case privacy guarantees and protects the data against correlation-based and transform-based attacks. This method has the following three novel aspects. First, this method uses a framework to provide theoretical bound of privacy breach in the worst case. This framework provides easy to check conditions that one can determine whether a method provides worst case guarantee. A quick examination shows that special types of noise such as Laplace noise provide worst case guarantee, while most existing methods such as adding normal or uniform noise, as well as random projection method do not provide worst case guarantee. Second, the proposed method combines the favorable features of additive perturbation and orthogonal transform methods. It uses principal component analysis to decorrelate the data and thus guards against attacks based on data correlations. It then adds Laplace noise to guard against attacks that can recover the PCA transform. Third, the proposed method improves accuracy of one of the popular distance-based classification algorithms: K-nearest neighbor classification, by taking into account the degree of distance distortion introduced by sanitization. Extensive experiments demonstrate the effectiveness of the proposed method.  相似文献   

6.
Recently, biometric template protection has received great attention from the research community due to the security and privacy concerns for biometric template. Although a number of biometric template protection methods have been reported, it is still a challenging task to devise a scheme which satisfies all of the four template protection criteria namely diversity, revocability, non-invertibility and performance. In this paper, a method is proposed to generate a revocable fingerprint template in terms of bit-string from a set of minutiae points via a polar grid based 3-tuple quantization technique. Two merits of the proposed method are outlined, namely alignment-free and performance. Four publicly available benchmark datasets: FVC2002 DB1, DB2 and FVC2004 DB1, DB2 are used to evaluate the performance of the proposed method. Besides, the diversity, revocability, non-invertibility criteria are also analyzed.  相似文献   

7.
Random-data perturbation techniques and privacy-preserving data mining   总被引:2,自引:4,他引:2  
Privacy is becoming an increasingly important issue in many data-mining applications. This has triggered the development of many privacy-preserving data-mining techniques. A large fraction of them use randomized data-distortion techniques to mask the data for preserving the privacy of sensitive data. This methodology attempts to hide the sensitive data by randomly modifying the data values often using additive noise. This paper questions the utility of the random-value distortion technique in privacy preservation. The paper first notes that random matrices have predictable structures in the spectral domain and then it develops a random matrix-based spectral-filtering technique to retrieve original data from the dataset distorted by adding random values. The proposed method works by comparing the spectrum generated from the observed data with that of random matrices. This paper presents the theoretical foundation and extensive experimental results to demonstrate that, in many cases, random-data distortion preserves very little data privacy. The analytical framework presented in this paper also points out several possible avenues for the development of new privacy-preserving data-mining techniques. Examples include algorithms that explicitly guard against privacy breaches through linear transformations, exploiting multiplicative and colored noise for preserving privacy in data mining applications.  相似文献   

8.
Privacy preserving clustering on horizontally partitioned data   总被引:3,自引:0,他引:3  
Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy preserving data mining researchers is to develop data mining techniques that could be applied on databases without violating the privacy of individuals. Privacy preserving techniques for various data mining models have been proposed, initially for classification on centralized data then for association rules in distributed environments. In this work, we propose methods for constructing the dissimilarity matrix of objects from different sites in a privacy preserving manner which can be used for privacy preserving clustering as well as database joins, record linkage and other operations that require pair-wise comparison of individual private data objects horizontally distributed to multiple sites. We show communication and computation complexity of our protocol by conducting experiments over synthetically generated and real datasets. Each experiment is also performed for a baseline protocol, which has no privacy concern to show that the overhead comes with security and privacy by comparing the baseline protocol and our protocol.  相似文献   

9.
This paper presents a new lossy image compression technique which uses singular value decomposition (SVD) and wavelet difference reduction (WDR). These two techniques are combined in order for the SVD compression to boost the performance of the WDR compression. SVD compression offers very high image quality but low compression ratios; on the other hand, WDR compression offers high compression. In the Proposed technique, an input image is first compressed using SVD and then compressed again using WDR. The WDR technique is further used to obtain the required compression ratio of the overall system. The proposed image compression technique was tested on several test images and the result compared with those of WDR and JPEG2000. The quantitative and visual results are showing the superiority of the proposed compression technique over the aforementioned compression techniques. The PSNR at compression ratio of 80:1 for Goldhill is 33.37 dB for the proposed technique which is 5.68 dB and 5.65 dB higher than JPEG2000 and WDR techniques respectively.  相似文献   

10.
黄凯  张曦煌 《计算机应用》2017,37(5):1392-1396
针对传统基于时序效应的奇异值分解(SVD)推荐模型在对用户预测评分建模过程中只考虑评分矩阵,采用复杂的时间函数拟合项目的生命周期、用户偏好的时序变化过程,造成模型难于解释、用户偏好捕获不准、评分预测精度不够高等问题,提出了一种改进的综合考虑评分矩阵、项目属性、用户评论标签和时序效应的推荐模型。首先,通过将时间轴划分时间段,利用sigmoid函数将项目的阶段流行度变换为[0,1]区间上的影响力来改进项目偏置;其次,利用非线性函数将用户偏置的时序变化转变为阶段评分均值与总体均值偏差的时序变化来改进用户偏置;最后,通过捕获用户对项目的阶段兴趣度,结合其相似用户在此时间段对该项目的好评率,生成用户项目交互作用影响因子,实现用户项目交互作用的改进。在Movielence 10M和20M电影评分数据集上的测试表明,改进模型能更好地捕获用户偏好的时序变化过程,提高评分预测准确性,均方根误差平均提高了2.5%。  相似文献   

11.
In data mining and knowledge discovery, there are two conflicting goals: privacy protection and knowledge preservation. On the one hand, we anonymize data to protect privacy; on the other hand, we allow miners to discover useful knowledge from anonymized data. In this paper, we present an anonymization method which provides both privacy protection and knowledge preservation. Unlike most anonymization methods, where data are generalized or permuted, our method anonymizes data by randomly breaking links among attribute values in records. By data randomization, our method maintains statistical relations among data to preserve knowledge, whereas in most anonymization methods, knowledge is lost. Thus the data anonymized by our method maintains useful knowledge for statistical study. Furthermore, we propose an enhanced algorithm for extra privacy protection to tackle the situation where the user’s prior knowledge of original data may cause privacy leakage. The privacy levels and the accuracy of knowledge preservation of our method, along with their relations to the parameters in the method are analyzed. Experiment results demonstrate that our method is effective on both privacy protection and knowledge preservation comparing with existing methods.  相似文献   

12.
Data mining provides the opportunity to extract useful information from large databases. Various techniques have been proposed in this context in order to extract this information in the most efficient way. However, efficiency is not our only concern in this study. The security and privacy issues over the extracted knowledge must be seriously considered as well. By taking this into consideration, we study the procedure of hiding sensitive association rules in binary data sets by blocking some data values and we present an algorithm for solving this problem. We also provide a fuzzification of the support and the confidence of an association rule in order to accommodate for the existence of blocked/unknown values. In addition, we quantitatively compare the proposed algorithm with other already published algorithms by running experiments on binary data sets, and we also qualitatively compare the efficiency of the proposed algorithm in hiding association rules. We utilize the notion of border rules, by putting weights in each rule, and we use effective data structures for the representation of the rules so as (a) to minimize the side effects created by the hiding process and (b) to speed up the selection of the victim transactions. Finally, we study the overall security of the modified database, using the C4.5 decision tree algorithm of the WEKA data mining tool, and we discuss the advantages and the limitations of blocking.  相似文献   

13.
This paper presents a low distortion data embedding method using pixel-value differencing and base decomposition schemes. The pixel-value differencing scheme offers the advantage of conveying a large amount of payload, while still maintaining the consistency of an image characteristic after data embedding. We introduce the base decomposition scheme, which defines a base pair for each degree in order to construct a two-base notational system. This scheme provides the advantage of significantly reducing pixel variation encountered due to secret data embedding. We analyze the pixel variation and the expected mean square error caused by concealing with secret messages. The mathematical analysis shows that our scheme produces much smaller maximal pixel variations and expected mean square error while producing a higher PSNR. We evaluate the performance of our method using 6 categories of metrics which allow us to compare with seven other state-of-the-art algorithms. Experimental statistics verify that our algorithm outperforms existing counterparts in terms of lower image distortion and higher image quality. Finally, our scheme can survive from the RS steganalysis attack and the steganalytic histogram attack of pixel-value difference. We conclude that our proposed method is capable of embedding large amounts of a message, yet still produces the embedded image with very low distortion. To the best of our knowledge, in comparison with the current seven state-of-the-art data embedding algorithms, our scheme produces the lowest image distortion while embedding the same or slightly larger quantities of messages.  相似文献   

14.
This paper provides algorithms for adding and subtracting eigenspaces, thus allowing for incremental updating and downdating of data models. Importantly, and unlike previous work, we keep an accurate track of the mean of the data, which allows our methods to be used in classification applications. The result of adding eigenspaces, each made from a set of data, is an approximation to that which would obtain were the sets of data taken together. Subtracting eigenspaces yields a result approximating that which would obtain were a subset of data used. Using our algorithms, it is possible to perform ‘arithmetic’ on eigenspaces without reference to the original data. Eigenspaces can be constructed using either eigenvalue decomposition (EVD) or singular value decomposition (SVD). We provide addition operators for both methods, but subtraction for EVD only, arguing there is no closed-form solution for SVD. The methods and discussion surrounding SVD provide the principle novelty in this paper. We illustrate the use of our algorithms in three generic applications, including the dynamic construction of Gaussian mixture models.  相似文献   

15.
宋健  许国艳  夭荣朋 《计算机应用》2016,36(10):2753-2757
在保护数据隐私的匿名技术中,为解决匿名安全性不足的问题,即匿名过程中因计算等价类质心遭受同质性和背景知识攻击造成的隐私泄漏,提出了一种基于差分隐私的数据匿名化隐私保护方法,构建了基于差分隐私的数据匿名化隐私保护模型;在利用微聚集MDAV算法划分相似等价类并在匿名属性过程中引入SuLQ框架设计得到ε-MDAV算法,同时选用Laplace实现机制合理控制隐私保护预算。通过对比不同隐私保护预算下可用性和安全性的变化,验证了该方法可以在保证数据高可用性的前提下有效地提升数据的安全性能。  相似文献   

16.
在分布式大数据的存储和传输过程中,数据极易被恶意用户攻击,造成数据的泄露和丢失。为提高分布式大数据的存储和传输安全性,设计了基于属性分类的分布式大数据隐私保护加密控制模型。挖掘用户隐私数据,以分布式结构存储。根据分布式隐私数据特征,判断数据的属性类型。利用Logistic混沌映射,迭代生成数据隐私保护密钥,通过匿名化、混沌映射、同态加密等步骤,实现对隐私数据的加密处理。利用属性分类技术,控制隐私保护数据访问进程,在传输协议的约束下,实现分布式大数据隐私保护加密控制。实验结果表明,设计模型的明文和密文相似度较低,访问撤销控制准确率高达98.9%,在有、无攻击工况下,隐私数据损失量较少,具有较好的加密、控制性能和隐私保护效果,有效降低了隐私数据的泄露风险,提高了分布式大数据的存储和传输安全性。  相似文献   

17.
Data co-clustering refers to the problem of simultaneous clustering of two data types. Typically, the data is stored in a contingency or co-occurrence matrix C where rows and columns of the matrix represent the data types to be co-clustered. An entry C ij of the matrix signifies the relation between the data type represented by row i and column j. Co-clustering is the problem of deriving sub-matrices from the larger data matrix by simultaneously clustering rows and columns of the data matrix. In this paper, we present a novel graph theoretic approach to data co-clustering. The two data types are modeled as the two sets of vertices of a weighted bipartite graph. We then propose Isoperimetric Co-clustering Algorithm (ICA)—a new method for partitioning the bipartite graph. ICA requires a simple solution to a sparse system of linear equations instead of the eigenvalue or SVD problem in the popular spectral co-clustering approach. Our theoretical analysis and extensive experiments performed on publicly available datasets demonstrate the advantages of ICA over other approaches in terms of the quality, efficiency and stability in partitioning the bipartite graph.  相似文献   

18.
An approximate microaggregation approach for microdata protection   总被引:1,自引:0,他引:1  
Microdata protection is a hot topic in the field of Statistical Disclosure Control, which has gained special interest after the disclosure of 658,000 queries by the America Online (AOL) search engine in August 2006. Many algorithms, methods and properties have been proposed to deal with microdata disclosure. One of the emerging concepts in microdata protection is k-anonymity, introduced by Samarati and Sweeney. k-Anonymity provides a simple and efficient approach to protect private individual information and is gaining increasing popularity. k-Anonymity requires that every record in the microdata table released be indistinguishably related to no fewer than k respondents.In this paper, we apply the concept of entropy to propose a distance metric to evaluate the amount of mutual information among records in microdata, and propose a method of constructing dependency tree to find the key attributes, which we then use to process approximate microaggregation. Further, we adopt this new microaggregation technique to study k-anonymity problem, and an efficient algorithm is developed. Experimental results show that the proposed microaggregation technique is efficient and effective in the terms of running time and information loss.  相似文献   

19.
Yao Liu  Hui Xiong 《Information Sciences》2006,176(9):1215-1240
A data warehouse stores current and historical records consolidated from multiple transactional systems. Securing data warehouses is of ever-increasing interest, especially considering areas where data are sold in pieces to third parties for data mining practices. In this case, existing data warehouse security techniques, such as data access control, may not be easy to enforce and can be ineffective. Instead, this paper proposes a data perturbation based approach, called the cubic-wise balance method, to provide privacy preserving range queries on data cubes in a data warehouse. This approach is motivated by the following observation: analysts are usually interested in summary data rather than individual data values. Indeed, our approach can provide a closely estimated summary data for range queries without providing access to actual individual data values. As demonstrated by our experimental results on APB benchmark data set from the OLAP council, the cubic-wise balance method can achieve both better privacy preservation and better range query accuracy than random data perturbation alternatives.  相似文献   

20.
Video summarization and retrieval using singular value decomposition   总被引:2,自引:0,他引:2  
In this paper, we propose novel video summarization and retrieval systems based on unique properties from singular value decomposition (SVD). Through mathematical analysis, we derive the SVD properties that capture both the temporal and spatial characteristics of the input video in the singular vector space. Using these SVD properties, we are able to summarize a video by outputting a motion video summary with the user-specified length. The motion video summary aims to eliminate visual redundancies while assigning equal show time to equal amounts of visual content for the original video program. On the other hand, the same SVD properties can also be used to categorize and retrieve video shots based on their temporal and spatial characteristics. As an extended application of the derived SVD properties, we propose a system that is able to retrieve video shots according to their degrees of visual changes, color distribution uniformities, and visual similarities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号