首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
云存储为用户提供文件外包存储功能,然而随着数据外包数量的激增,数据去重复变得至关重要.目前数据压缩非常有效也是很常用的一个手段是去重,即识别数据中冗余的数据块,只存储其中的一份.以前的方案可以满足不同的用户将相同的文件加密为相同的密文,这样暴露了文件的一致性,随后的方案提出基于集中式服务器作为去重辅助的方案,随着用户的增加,数据去重效率也随之降低.针对目前云存储的安全性及数据去重效率低等问题,本文提出了跨区域的重复数据删除方案.所提方案为每个数据生成随机标签和固定长度的随机密文,确保多域重复数据消除下的数据机密性及抵抗暴力攻击、保护信息平等信息不被披露.同时,对所提方案的实施情况和功能比较进行了仔细分析,安全性分析表明,所提方案在抵抗蛮力攻击的同时,实现了数据内容、平等信息和数据完整性等隐私保护,性能分析表明,所提方案展示了优于现有方案的重复搜索计算成本及时间复杂度,可实现轻量级特色.  相似文献   

2.
Deduplication is an important technology in the cloud storage service. For protecting user privacy, sensitive data usually have to be encrypted before outsourcing. This makes secure data deduplication a challenging task. Although convergent encryption is used to securely eliminate duplicate copies on the encrypted data, these secure deduplication techniques support only exact data deduplication. That is, there is no tolerance of differences in traditional deduplication schemes. This requirement is too strict for multimedia data including image. For images, typical modifications such as resizing and compression only change their binary presentation but maintain human visual perceptions, which should be eliminated as duplicate copies. Those perceptual similar images occupy a lot of storage space on the remote server and greatly affect the efficiency of deduplication system. In this paper, we first formalize and solve the problem of effective fuzzy image deduplication while maintaining user privacy. Our solution eliminates duplicated images based on the measurement of image similarity over encrypted data. The robustness evaluation is given and demonstrates that this fuzzy deduplication system is able to duplicate perceptual similar images, which optimizes the storage and bandwidth overhead greatly in cloud storage service.  相似文献   

3.
数据持有性证明(Provable Data Possession,简称PDP)和数据可恢复性证明 (Proofs of Retrievability,简称POR)是客户端用来验证存储在云端服务器上的数据完整性的主要技术.近几年它在学术界和工业界的应用广泛,很多PDP和POR方案相继出现,但是由于不同群组的特殊性和独特要求,使得群组PDP/POR方案多样化,并且群组应用中的许多重要功能(例如数据去重)没有被实现,如何构造高效及满足群组特定功能和安全需求的PDP/POR方案已经引起了人们的广泛关注.本文给出了一个支持数据去重的群组PDP方案(GPDP),基于矩阵计算和伪随机函数,GPDP可以在支持数据去重的基础上,高效的完成数据持有性证明,并且可以在群组中抵抗恶意方选择成员攻击.在标准模型下证明了GPDP的安全性,并且在百度云平台上实现了GPDP的原型系统.为了评估方案的性能,我们使用了10GB的数据量进行实验和分析,结果表明GPDP方案在达到群组中数据去重的目标基础上,可以高效地保证抵抗选择攻击和数据持有性,即预处理效率高于私有验证方案,而验证效率高于公开验证方案(与私有验证效率几乎相同).另外,与其他群组PDP/POR方案相比,GPDP方案将额外存储代价和通信代价都降到了最低.  相似文献   

4.
Explosion of multimedia content brings forth the needs of efficient resource utilization using the state of the arts cloud computing technologies such as data deduplication. In the cloud computing environments, achieving both data privacy and integrity is the challenging issue for data outsourcing service. Proof of Storage with Deduplication (POSD) is a promising solution that addresses the issue for the cloud storage systems with deduplication enabled. However, the validity of the current POSD scheme stands on the strong assumption that all clients are honest in terms of generating their keys. We present insecurity of this approach under new attack model that malicious clients exploit dishonestly manipulated keys. We also propose an improved POSD scheme to mitigate our attack.  相似文献   

5.
Secure deduplication and data auditing are significant in improving the resource utilization and data integrity protection of data outsourcing services. However, existing audit-enabled deduplication schemes cannot implement block-level deduplication over audit tags since they contain file information, and have to bear the consequent storage burden. Also, they suffer from the low coupling problem caused by the difference in optimal data chunking between deduplication and auditing. In this paper, we propose a blockchain-based compact audit-enabled deduplication scheme in decentralized storage. Specifically, we introduce the concept of N-ary tag commitment tree (NTCT), which integrates the information of all data blocks into a file authenticator to support integrity verification and enable block-level deduplication over audit tags. On this basis, we propose a blockchain-based compact audit-enabled deduplication protocol, which adopts an aggregatable vector commitment to generate audit tags, thereby overcoming the low coupling problem between deduplication and auditing. Notably, data users can outsource the task of tag generation to storage servers. Meanwhile, the capabilities of duplication detection, proof of ownership (PoW), and integrity verification are integrated into audit tags to reduce tag redundancy. Furthermore, we present an update protocol to allow data users to achieve a lightweight data update without uploading the entire new data. Finally, security analysis and performance evaluation demonstrate that our scheme is practical.  相似文献   

6.
跨用户数据去重技术, 通过在用户端减少重复数据上传来提高云端数据存储效率和用户的带宽使用效率。然而, 在数据上传过程中, 云服务商反馈给用户的确定性去重响应为攻击者建立了一个极具安全风险的边信道, 攻击者利用该边信道可推断出目标数据在云端的存在性隐私。现有的抗边信道攻击跨用户去重方法, 采用各种混淆策略试图扰乱攻击者的判断, 然而, 这些方法难以实现完全混淆, 攻击者仍然可通过字典攻击、附加块攻击等方式达到数据窃取的目的。目前, 如何防止攻击者利用边信道窃取数据的存在性隐私, 成为了跨用户数据去重技术亟待解决的重要问题。为应对这一挑战, 本文采用了一种基于广义去重的新型跨用户安全去重框架, 将原始数据从字节层面分解为基和偏移量, 对基进行跨用户去重, 并对偏移量进行云端去重。特别地, 本文采用 Reed-Solomon 纠删码编码思想实现基的提取, 使得从相似的数据中可以较高概率提取出相同的基。不仅可以实现对攻击者的混淆, 还可以有效节省通信开销和云端存储开销。此外, 为了进一步提高效率, 本文在偏移量上传前, 引入数据压缩算法, 减少偏移量间的冗余数据量。实验结果表明, 在实现有效抵抗边信道攻击的前提下, 本方法相比该领域最新工作在通信和存储效率等方面具有显著优势。  相似文献   

7.
Data deduplication has been an essential part of storage systems for big data. Traditional compare-by-hash (CBH) deduplication does not fully address the challenges for similar files with small changes. Delta compression can be complementary to further optimize the storage efficiency. In this paper, we designed and implemented a distributed storage system called DCDedupe that efficiently and intelligently use delta compression or deduplication to improve storage efficiency based on characteristics of data. Unlike prior studies, this system works well when the data locality is weak or even barely exists. In DCDedupe, we propose a pre-processing step to identify content similarity and data chunks are classified into different categories. Then, the appropriate routing algorithm ensures the data chunks are sent to the right target storage nodes in the distributed system to boost the storage efficiency. Our evaluation shows that generally storage space saving by DCDedupe outweighs the performance penalties. In some use cases, DCDeupe may become meaningful to trade off some throughput with optimized storage costs. The overheads to Input/Output (IO) operation and memory usage have also been studied with design recommendations.  相似文献   

8.
基于重复数据删除的虚拟桌面存储优化技术   总被引:1,自引:0,他引:1  
虚拟桌面基础架构依靠数据中心海量的云基础设施,为用户按需提供虚拟桌面部署所需的软硬件资源,但同时面临存储资源利用率低和虚拟机启动慢的困境.针对虚拟桌面存储中具有大量数据冗余的特性,采用重复数据删除技术缩减虚拟桌面基础架构的存储空间需求;并利用服务器本地磁盘缓存以及共享存储池内的固态硬盘来优化虚拟机的启动性能.通过原型实现,发现相比于基于内容分块的策略,静态分块策略更适合虚拟桌面存储进行重复数据删除,最优的分块大小为4KB,并能够缩减85%的存储空间容量;通过服务器本地磁盘缓存和基于闪存的固态硬盘进行I/O优化,虚拟机的启动速度能够获得35%的提升.  相似文献   

9.
Cloud backup has been an important issue ever since large quantities of valuable data have been stored on the personal computing devices. Data reduction techniques, such as deduplication, delta encoding, and Lempel-Ziv (LZ) compression, performed at the client side before data transfer can help ease cloud backup by saving network bandwidth and reducing cloud storage space. However, client-side data reduction in cloud backup services faces efficiency and privacy challenges. In this paper, we present Pangolin, a secure and efficient cloud backup service for personal data storage by exploiting application awareness. It can speedup backup operations by application-aware client-side data reduction technique, and mitigate data security risks by integrating selective encryption into data reduction for sensitive applications. Our experimental evaluation, based on a prototype implementation, shows that our scheme can improve data reduction efficiency over the state-of-the-art methods by shortening the backup window size to 33%-75%, and its security mechanism for' sensitive applications has negligible impact on backup window size.  相似文献   

10.
重复数据删除关键技术研究进展   总被引:11,自引:0,他引:11  
企业数据量的不断增长和数据传输率要求的不断提高,使得数据中心海量存储空间和高带宽网络传输需求成为当前网络存储领域面临的严峻挑战.利用特定应用数据集内数据高度冗余的特性,重复数据删除技术能够极大地缩减数据存储容量需求,提高网络带宽利用率,降低企业IT运营成本.目前,重复数据删除技术已成为国内外的研究热点.首先介绍重复数据删除技术的概念、分类及其应用;阐述重复数据删除系统的体系结构和基本原理,并与传统存储系统进行对比.然后重点分析和总结重复数据删除各项关键技术的研究现状,包括数据划分方法、I/O优化技术、高可靠数据配置策略以及系统可扩展性.最后对重复数据删除技术的研究现状进行总结,并指出未来可能的研究方向.  相似文献   

11.
Data deduplication for file communication across wide area network (WAN) in the applications such as file synchronization and mirroring of cloud environments usually achieves significant bandwidth saving at the cost of significant time overheads of data deduplication. The time overheads include the time required for data deduplication at two geographi-cally distributed nodes (e.g., disk access bottleneck) and the duplication query/answer operations between the sender and the receiver, since each query or answer introduces at least one round-trip time (RTT) of latency. In this paper, we present a data deduplication system across WAN with metadata feedback and metadata utilization (MFMU), in order to harness the data deduplication related time overheads. In the proposed MFMU system, selective metadata feedbacks from the receiver to the sender are introduced to reduce the number of duplication query/answer operations. In addition, to harness the metadata related disk I/O operations at the receiver, as well as the bandwidth overhead introduced by the metadata feedbacks, a hysteresis hash re-chunking mechanism based metadata utilization component is introduced. Our experimental results demonstrated that MFMU achieved an average of 20%~40% deduplication acceleration with the bandwidth saving ratio not reduced by the metadata feedbacks, as compared with the “baseline” content defined chunking (CDC) used in LBFS (Low-bandwith Network File system) and exiting state-of-the-art Bimodal chunking algorithms based data deduplication solutions.  相似文献   

12.
针对数据中心存在大量数据冗余的问题,特别是备份数据造成的存储容量浪费,提出一种基于Hadoop平台的分布式重复数据删除解决方案。该方案通过检测并消除特定数据集内的冗余数据,来显著降低数据存储容量,优化存储空间利用率。利用Hadoop大数据处理平台下的分布式文件系统(HDFS)和非关系型数据库HBase两种数据管理模式,设计并实现一种可扩展分布式重删存储系统。其中,MapReduce并行编程框架实现分布式并行重删处理,HDFS负责重删后的数据存储,在HBase数据库中构建索引表,实现高效数据块索引查询。最后,利用虚拟机镜像文件数据集对系统进行了测试,基于Hadoop平台的分布式重删系统能在保证高重删率的同时,具有高吞吐率和良好的可扩展性。  相似文献   

13.
Secure deduplication is a promising solution to greatly reduce the storage space of the cloud. However, the encryption key is deterministically derived from the plaintext, such that who owns the plaintext has the derived key to decrypt the ciphertext. Therefore, how to revoke a user in deduplication schemes is a critical challenge. In the existing work, when updating the data authority, the data owner has to download the data from the cloud, decrypt, re-encrypt and finally upload them to the cloud. This process increases the communication and computation overheads. In this paper, we first propose a multi-user updatable encryption scheme. Specifically, the data owner can update the remote ciphertext under a new group key by sending an update token to the cloud. Then we adopt this technique to propose a new secure deduplication scheme supporting efficiently revoking an unauthorized user. In our scheme, the data owner just needs to send a token to the cloud to update the data authority, which saves the communication and computation costs. The security and efficiency analysis demonstrate that our proposed deduplication scheme can achieve the desired security properties with high efficiency.  相似文献   

14.
Deduplication 通常在两个企业存储系统和云存储被使用了。克服性能挑战为选择恢复 deduplication 系统的操作, solid-state-drive-based (即,基于 SSD ) 读的缓存能为由缓冲加快被部署流行动态地恢复内容。不幸地,经常的数据更改由古典缓存计划导致了(例如, LRU 和 LFU ) 显著地弄短 SSD 一生当在 SSD 减慢 I/O 进程时。处理这个问题,我们建议新解决方案砍缓存极大地由扩大比例象 I/O 性能一样改进 SSD 的 write 耐久性长期流行(砍) 在写进基于 SSD 的缓存的数据之中的数据。砍缓存保留很长时间在 SSD 缓存砍数据减少的时期缓存代替的数字。而且,它在 deduplication 集装箱阻止不得人心或不必要的数据被写进 SSD 缓存。我们在一个原型 deduplication 系统实现了砍缓存评估它的性能。我们的试验性的结果显示砍缓存弄短潜伏选择与仅仅 deduplicated 数据的 5.56% 能力以小基于 SSD 的缓存的成本由 37.3% 的一般水准恢复。重要地,砍缓存由 9.77 的一个因素改进 SSD 一生。砍缓存为一个成本效率的基于 SSD 的读的缓存解决方案提供到的证据表演增加性能选择为 deduplication 恢复系统。  相似文献   

15.
随着网络技术和电力信息化业务的不断发展,网络信息越发膨胀,将导致互联网和电力信息网中存在海量网页冗余的现象,这类现象将会使数据挖掘、快速检索的复杂度加大,从而对网络设备和存储设备的性能带来了巨大的挑战,因此研究海量网页快速去重是非常有必要的。网页去重是从给定的大量的数据集合中检测出冗余的网页,然后将冗余的网页从该数据集合中去除的过程,其中基于同源网页的URL去重的研究已经取得了很大的发,但是针对海量网页去重问题,目前还没有很好的解决方案,本文在基于MD5指纹库网页去重算法的基础上,结合Counting Bloom filter算法的特性,提出了一种快速去重算法IMP-CMFilter。该算法通过减少I/0频繁操作,来提高海量网页去重的效率。实验表明,IMP-CMFilter算法的有效性。  相似文献   

16.
重复数据删除技术受到工业界和学术界的广泛关注.研究者致力于将云服务器中的冗余数据安全的删除,明文数据的重复删除方法较为简单.而用户为了保护隐私,会使用各自的密钥将数据加密后上传至云服务器,形成不同的加密数据.在保证安全性的前提下,加密数据的重复删除较难实现.目前已有的方案较多依赖在线的可信第三方.提出一种基于离线密钥分发的加密数据重复删除方案,通过构造双线性映射,在不泄露数据隐私的前提下,验证加密数据是否源自同一明文.利用广播加密技术实现加密密钥的安全存储与传递.任意数据的初始上传者能够借助云服务器,以离线方式验证后继上传者的合法性并传递数据加密密钥.无需可信第三方在线参与,实现云服务器对加密数据的重复删除.分析并证明了方案的安全性.仿真实验验证了方案的可行性与高效性.  相似文献   

17.
刘峰  王越 《计算机工程》2011,37(9):95-97
为增强协同设计过程中多版本数据存储的可靠性、提高多版本查询检索效率,在多版本树存储机制的基础上,考虑版本间变化的差异性,提出一种利用版本间相似系数和确定中间版本存储模式的算法。实验结果证明,该算法可使数据冗余变得可控,与同类算法相比,版本恢复效率更高,版本树整体性能更优。  相似文献   

18.
The exponential growth of digital data in cloud storage systems is a critical issue presently as a large amount of duplicate data in the storage systems exerts an extra load on it. Deduplication is an efficient technique that has gained attention in large-scale storage systems. Deduplication eliminates redundant data, improves storage utilization and reduces storage cost. This paper presents a broad methodical literature review of existing data deduplication techniques along with various existing taxonomies of deduplication techniques that have been based on cloud data storage. Furthermore, the paper investigates deduplication techniques based on text and multimedia data along with their corresponding taxonomies as these techniques have different challenges for duplicate data detection. This research work is useful to identify deduplication techniques based on text, image and video data. It also discusses existing challenges and significant research directions in deduplication for future researchers, and article concludes with a summary of valuable suggestions for future enhancements in deduplication.  相似文献   

19.
IaaS的发展使得云服务能够快速地部署虚拟机集群。然而,在部署过程中虚拟机群的版本控制效率不高。目前的版本控制方法存在网络负载大、操作速度慢的问题。提出一种新颖的虚拟机集群版本控制方法,叫做FlatVC。FlatVC在计算节点增量地生成虚拟机版本,以避免将版本数据传输至存储节点,并在虚拟机版本恢复时按需传输版本数据,因此减小了网络传输负载并加速了版本控制过程。通过使用缓存树结构来共享网络传输数据,FlatVC减小了根节点数据传输压力。此外,我们针对增量版本所构成的版本链进行了I/O优化,避免了版本链导致的性能下降。实验结果显示,FlatVC能有效地实施虚拟机集群版本控制,加速版本生成以及恢复过程。  相似文献   

20.
基于STEP的协同设计版本存储控制策略   总被引:5,自引:2,他引:3       下载免费PDF全文
付喜梅 《计算机工程》2008,34(24):61-63
计算机支持的协同设计的版本管理应保存整个设计过程中产生的所有版本,以便查询和检索。在对版本信息的存储方式中,完整版本存储的存取速度快,但占用存储空间大、冗余信息多,差值存储需要的存储空间少,但存取速度慢、安全性低。该文基于STEP数据交换标准,提出2个差值存储模型,通过构造版本存储阈值函数,将差值存储与完整存储相结合,有效实现了空间效率和时间效率的平衡。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号