首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 546 毫秒
1.
肖雄  唐卓  肖斌  李肯立 《计算机学报》2023,(5):1019-1044
联邦学习作为人工智能领域的新兴技术,它兼顾处理“数据孤岛”和隐私保护问题,将分散的数据方联合起来训练全局模型同时保持每一方的数据留在本地.联邦学习在很大程度上给需要将数据融合处理的数据敏感型应用带来了希望,但它仍然存在一些潜在的隐私泄露隐患和数据安全问题.为了进一步探究基于联邦学习的隐私保护和安全防御技术研究现状,本文对联邦学习的隐私和安全问题在现有最前沿的研究成果上进行了更清晰的分类,并对威胁隐私和安全的手段进行了威胁强度的划分.本文首先介绍了涉及联邦学习隐私和安全问题的威胁根源,并从多个方面罗列了其在联邦学习中的破坏手段及威胁性.其次,本文总结了关于联邦学习隐私和安全问题所面临的挑战.对于隐私保护而言,本文同时分析了包括单个恶意参与方或中央服务器的攻击和多方恶意合谋泄露隐私的场景,并探讨了相应的最先进保护技术.对于安全问题而言,本文着重分析了影响全局模型性能的多种恶意攻击手段,并系统性地阐述了先进的安全防御方案,以帮助规避构建安全的大规模分布式联邦学习计算环境中潜在的风险.同时与其他联邦学习相关综述论文相比,本文还介绍了联邦学习的多方恶意合谋问题,对比分析了现有的联邦安全聚合算法及...  相似文献   

2.
王飞  秦小麟  刘亮  沈尧 《计算机科学》2015,42(5):204-210
k-近邻连接查询是空间数据库中一种常用的操作,该查询处理过程涉及连接和最近邻查询两个复杂操作.传统的集中式k-近邻连接查询算法已不能适应当前呈爆炸式增长的数据规模,设计分布式k-近邻连接查询算法成为了目前亟需解决的问题.现有的分布式k-近邻连接查询算法都包括了多轮串行的MapReduce任务,而每个MapReduce任务均需要读写分布式文件系统,导致MapReduce不能有效表达多个任务之间的依赖关系,因此算法效率低下.首先提出了一种基于数据流的计算框架,该框架建立在MapReduce之上,将数据处理过程按照数据流图建模.在该框架基础上,提出了一种高效的k-近邻连接算法,它利用空间填充曲线将多维数据映射为一维数据,从而将k-近邻连接查询转化为一维范围查询.实验结果表明,该算法的可扩展性较高,且效率比现有算法更优.  相似文献   

3.
联邦学习为解决“数据孤岛”下的多方联合建模问题提出了新的思路。联邦支持向量机能够在数据不出本地的前提下实现跨设备的支持向量机建模,然而现有研究存在训练过程中隐私保护不足、缺乏针对非线性联邦支持向量机的研究等缺陷。针对以上问题,利用随机傅里叶特征方法和CKKS同态加密机制,提出了一种隐私保护的非线性联邦支持向量机训练(PPNLFedSVM)算法。首先,基于随机傅里叶特征方法在各参与方本地生成相同的高斯核近似映射函数,将各参与方的训练数据由低维空间显式映射至高维空间中;其次,基于CKKS密码体制的模型参数安全聚合算法,保障模型聚合过程中各参与方模型参数及其贡献的隐私性,并结合CKKS密码体制的特性对参数聚合过程进行针对性优化调整,以提高安全聚合算法的效率。针对安全性的理论分析和实验结果表明,PPNLFedSVM算法可以在不损失模型精度的前提下,保证参与方模型参数及其贡献在训练过程中的隐私性。  相似文献   

4.
近年来,在线社交网络恶意用户呈现出分散性、潜伏性、复杂性等特征,如何在保障普通用户数据隐私的前提下,融合多方数据进行建模分析,实现对恶意用户的精确检测成为研究人员关注的焦点.本文提出了一种基于纵向联邦学习的社交网络跨平台恶意用户检测方案.首先,通过对多源异构数据进行预处理,采用加密样本对齐和加密模型训练方法,构建了基于纵向联邦学习的跨平台恶意用户检测层次化架构;其次,对安全联邦提升树算法进行分析和改进,提出了一种面向多方隐私保护的恶意用户检测算法;最后,基于现实社交网络平台实验研究分析,所提出的方案不仅具有安全性,而且模型算法相较于其他两个基线模型,准确率分别提升了14.03%和1.918%.  相似文献   

5.
大数据时代,数据作为生产要素具有重要价值.因此,通过数据共享实现大规模数据的分析挖掘与利用具有重要意义.然而,近年来日益严格的隐私安全保护要求使得数据分散异质的多方之间不能任意共享数据,加剧了"数据孤岛"问题.数据联邦能让多数据拥有方在保护隐私的前提下完成联合查询.因此,基于"数据不动计算动"的联邦计算思想实现了一种多...  相似文献   

6.
基于Hadoop 的高效连接查询处理算法CHMJ   总被引:3,自引:0,他引:3  
赵彦荣  王伟平  孟丹  张书彬  李均 《软件学报》2012,23(8):2032-2041
提出了一种并行连接查询处理算法CoLocationHashMapJoin(CHMJ).首先,设计了多副本一致性哈希算法,将具有连接关系的表根据其连接属性的哈希值在机群中进行分布,在提升了连接查询处理中数据本地性的同时,保证了数据的可用性;其次,基于多副本一致性哈希数据分布,提出了HashMapJoin并行连接查询处理算法,有效地提高了连接查询的处理效率.CHMJ算法在腾讯公司的数据仓库系统中进行了应用,结果表明,CHMJ连接查询的处理效率比Hive系统提高了近5倍.  相似文献   

7.
针对基于位置服务(LBS)中外包计算最短路径可能泄露用户隐私的问题,基于同态加密和安全多方计算,提出了一个基于同态加密的云环境障碍最短路径导航的隐私保护算法,为用户和数据所有者提供隐私保护.在该算法中,使用安全多方计算解决两种不同条件下计算道路中有无障碍物的最短路径隐私问题,并基于同态加密提出了有障碍物查询和无障碍物查询两个协议.最后,依照上述协议在理论和实践两个方面证明了所提出框架的有效性.  相似文献   

8.
安全多方计算(Secure multi-party computation:MPC)允许在不公开各参与方私有数据的情况下完成联合计算。然而,现有的计算任务往往涉及到多方海量数据集的分析与处理,使得MPC的实际可用性显著降低。提高MPC数据处理体量,是目前研究的主要方向之一。为提高MPC处理大规模数据的能力,将MPC算法与数据并行分析框架相结合,基于最小化多方计算任务的思想,提出安全多方计算效率优化技术。创建算法的有向无环图,标注MPC节点及非MPC节点,采用静态分析、查询重写转换和分区启发式等技术,最小化MPC计算量,提高计算的并发程度。以多方线性回归为例,讨论适应大数据分析的安全多方计算技术。实验结果表明提出的安全多方计算优化技术在确保计算精度的条件下能够显著降低计算耗时。算法提高了系统的效率,增强了MPC的实用能力。  相似文献   

9.
近年来,随着人工智能技术的飞速发展,人们越来越重视数据隐私与安全,世界各国也出台一系列法律法规以保护用户隐私.面对制约人工智能发展的数据孤岛以及数据隐私和安全问题,联邦学习作为一种新型的分布式机器学习技术应运而生.然而,高通信开销问题阻碍着联邦学习的进一步发展,为此,本文提出了基于选择性通信策略的高效联邦学习算法.具体地,该算法基于联邦学习的网络结构特点,采取选择性通信策略,在客户端通过最大均值差异衡量本地模型与全局模型的相关性以过滤相关性较低的本地模型,并在服务器端依据相关性对本地模型进行加权聚合.通过上述操作,所提算法在保证模型快速收敛的同时能够有效减少通信开销.仿真结果表明,与FedAvg算法和FedProx算法相比,所提算法能够在保证准确率的前提下,将通信轮次分别减少54%和60%左右.  相似文献   

10.
在水文学、气象学以及保险理赔评估等领域中,通常假设因变量服从Gamma分布,相比多元线性回归,在Gamma分布假设下建立起的Gamma回归具有更出色的拟合效果。以往获得Gamma回归模型的方法是将数据集中起来进行训练,当数据是由多方提供时,在不交换数据的情况下训练满足隐私保护的Gamma回归模型成为需要解决的问题。为此,提出了一种多方安全的纵向联邦Gamma回归算法,该算法首先使用迭代法推导出纵向联邦Gamma回归模型的对数似然估计表达式,然后结合工程实际确定模型的连接函数,进而构造损失函数建立参数的梯度更新策略,最后对同态加密后的各方参数进行融合更新,获得联邦学习后的Gamma回归模型。在两种公开数据集上进行性能测试,实验结果表明,所提联邦Gamma回归算法在不交换数据的前提下,可有效利用多方数据的价值生成Gamma回归模型,该模型对数据的拟合效果逼近数据在集中情况下学习到的Gamma回归模型,优于单方独立学习获得的Gamma回归模型。  相似文献   

11.
In the era of big data, data is of great value as an essential factor in production. It is of great significance to implement its analysis, mining, and utilization of large-scale data via data sharing. However, due to the heterogeneous dispersion of data and increasingly rigorous privacy protection regulations, data owners cannot arbitrarily share data, and thus data owners are turned into data silos. Since data federation can achieve collaborative queries while preserving the privacy of data silos, we present in this paper a secure multi-party relational data federation system based on the idea of federated computation that ``data stays, computation moves.'' The system is compatible with a variety of relational databases and can shield users from the heterogeneity of the underlying data from multiple data owners. On the basis of secret sharing, the system implements the secure multi-party operator library supporting the secure multi-party basic operations, and the resulting reconstruction process of operators is optimized with higher execution efficiency. On this basis, the system supports query operations such as Summation (SUM), Averaging (AVG), Minimization/Maximization (MIN/MAX), equi-join, and $\theta $-join and makes full use of multi-party features to reduce data interactions among data owners and security overhead, thus effectively supporting efficient data sharing. Finally, experiments are conducted on the benchmark dataset TPC-H. The experimental results show that the system can support more data owners than the current data federation systems SMCQL and Conclave and has higher execution efficiency in a variety of query operations, exceeding the existing systems by as much as 3.75 times.  相似文献   

12.
This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing site for query execution. This typically introduces high communication overhead. Our observation is that semijoin, effective in reducing communication overhead in distributed database query processing, can be also effective in distributed stream query processing. The challenge, however, lies in the streaming nature of the tuples, as it requires continuous and incremental processing of an unbounded sequence of tuples instead of one-time processing of a set of stored tuples. This paper describes our comprehensive work done to address the challenge. Specifically, we first propose a distributed stream join processing model that handles the issue of network delays introduced from the shipment of data streams, and allows for efficient batch processing. Then, based on the model, we propose join algorithms in a multi-way join case: first, one-way join algorithms for different combinations of join placement and join method and, then, multi-way join algorithms assuming linear join ordering. Regarding the join method, two distributed join methods are introduced: (1) simple join, in which full tuples are forwarded to the query processing site and (2) semijoin-based join, in which partial tuples are forwarded. A semijoin-based join can be executed with different possible semijoin strategies which incur different communication overheads. We present a complete set of join algorithms considering all possible semijoin strategies, and propose an optimization algorithm. The join algorithms are executed continuously in an incremental manner as tuples arrive, and never ship tuples redundantly. The optimization algorithm constructs an efficient multi-way join plan by using a greedy heuristic which adds to the plan one stream with the minimum join execution cost in each step. Through extensive experiments, we conduct comparative studies of the performance among the proposed one-way join algorithms and the efficiency of the generated plan between the optimization algorithm based on the greedy heuristic and the exhaustive search, respectively.  相似文献   

13.
现有的基于单服务器的Skyline查询算法已经不能很好地应用于无线传感器网络这类分布式多跳自组织网络中。基于聚簇结构的Skyline查询算法就是针对 这类特定的网络结构而提出的。该算法采用基于聚簇的路由结构,为了减少Skyline查询处理过程中传感器节点的通信开销,挑选具有最大支配力的数据元组作为全局过滤元组来过滤不满足Skyline条件的数据。同时,在Skyline查询处理过程中引入滑动窗口机制,该机制也能有效地降低通信开销。大量的仿真实验结果显示,所提Skyline查询算法在确保能耗的基础上仍然具有很好的性能。  相似文献   

14.
为解决现有公钥基础设施跨域认证方案的效率问题,利用具有分布式和不易被篡改优点的区块链技术,提出基于联盟区块链的跨域认证方案。一方面,该方案对联盟链在传统实用拜占庭共识算法(PBFT)的基础上加入了节点动态增减功能;改进了主节点选举方式;将三阶段广播缩减为两阶段广播,减少了通信开销。另一方面,该方案设计了联盟链跨域认证协议,给出了区块链证书格式,描述了跨域认证协议,并进行了安全和效率分析。分析表明,在安全方面,该方案具有抵抗分布式攻击等安全属性;在效率方面,与已有跨域认证方案相比,该方案在计算开销上、通信开销上都有优势。  相似文献   

15.
Massive XML data are increasingly generated for the representation, storage and exchange of web information. Twig query processing over massive XML data has become a research focus. However, most traditional algorithms cannot be directly implemented in a distributed manner. Some of the existing distributed algorithms generate a lot of useless intermediate results and execute many join operations of partial results in most cases; others require the priori knowledge of query pattern before XML partition, storage and query processing, which is impractical in the cases of large-scale data or frequent incoming new queries. To improve efficiency and scalability, in this paper, we propose a 3-phase distributed algorithm DisT3 based on node distribution mechanism to avoid unnecessary intermediate results. Furthermore, we propose a lightweight local index ReP with an enhanced XML partitioning approach using arbitrary partitioning strategy, and based on ReP we propose an improved 2-phase distributed algorithm DisT2ReP to further reduce the communication cost. After the performance guarantees are analyzed, extensive experiments are conducted to verify the efficiency and scalability of our proposed algorithms in distributed twig query applications.  相似文献   

16.
With the prevalence of cloud computing, data owners are motivated to outsource their databases to the cloud server. However, to preserve data privacy, sensitive private data have to be encrypted before outsourcing, which makes data utilization a very challenging task. Existing work either focus on keyword searches and single-dimensional range query, or suffer from inadequate security guarantees and inefficiency. In this paper, we consider the problem of multidimensional private range queries over encrypted cloud data. To solve the problem, we systematically establish a set of privacy requirements for multidimensional private range queries, and propose a multidimensional private range query (MPRQ) framework based on private block retrieval (PBR), in which data owners keep the query private from the cloud server. To achieve both efficiency and privacy goals, we present an efficient and fully privacy-preserving private range query (PPRQ) protocol by using batch codes and multiplication avoiding technique. To our best knowledge, PPRQ is the first to protect the query, access pattern and single-dimensional privacy simultaneously while achieving efficient range queries. Moreover, PPRQ is secure in the sense of cryptography against semi-honest adversaries. Experiments on real-world datasets show that the computation and communication overhead of PPRQ is modest.  相似文献   

17.
Aiming at the problem of top-k spatial join query processing in cloud computing systems, a Spark-based top-k spatial join (STKSJ) query processing algorithm is proposed. In this algorithm, the whole data space is divided into grid cells of the same size by a grid partitioning method, and each spatial object in one data set is projected into a grid cell. The Minimum Bounding Rectangle (MBR) of all spatial objects in each grid cell is computed. The spatial objects overlapping with these MBRs in another spatial data set are replicated to the corresponding grid cells, thereby filtering out spatial objects for which there are no join results, thus reducing the cost of subsequent spatial join processing. An improved plane sweeping algorithm is also proposed that speeds up the scanning mode and applies threshold filtering, thus greatly reducing the communication and computation costs of intermediate join results in subsequent top-k aggregation operations. Experimental results on synthetic and real data sets show that the proposed algorithm has clear advantages, and better performance than existing top-k spatial join query processing algorithms.  相似文献   

18.
选择密文安全模型能有效刻画主动攻击,更接近现实环境.现有抵抗选择密文攻击的密码算法以国外算法为主,缺乏我国自主设计且能抵抗选择密文攻击的密码算法.虽然实现选择密文安全存在通用转化方法,代价是同时增加计算开销和通信开销.基于国密SM9标识加密算法,提出一种具有选择密文安全的标识广播加密方案.方案的设计继承了SM9标识加密算法结构,用户密钥和密文的大小都是固定的,其中用户密钥由一个群元素组成,密文由3个元素组成,与实际参与加密的接收者数量无关.借助随机谕言器,基于GDDHE困难问题可证明方案满足CCA安全.加密算法的设计引入虚设标识,通过该标识可成功回复密文解密询问,实现CCA的安全性.分析表明,所提方案与现有高效标识广播加密方案在计算效率和存储效率上相当.  相似文献   

19.
Data warehouses are very large databases usually designed using the star schema. Queries defined on data warehouses are generally complex due to join operations involved. The performance of star schema queries in data warehouses is highly critical and its optimization is hard in general. Several query performance optimization methods exist, such as indexes and table partitioning. In this paper, we propose a new approach based on binary particle swarm optimization for solving the bitmap join index selection problem in data warehouses. This approach selects the optimal set of bitmap join indexes based on a mathematical cost model. Several experiments are performed to demonstrate the effectiveness of the proposed method on the bitmap join index selection problem. Further testing of the method is performed using a database environment specific cost function. The binary particle swarm optimization is found to be more effective than both the genetic algorithm and data mining based approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号