首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
在非可信中心服务器下的隐私保护联邦学习框架中,存在以下两个问题。(1)在中心服务器上聚合分布式学习模型时使用固定的权重,通常是每个参与方的数据集大小。然而,不同参与方具有非独立同分布的数据,设置固定聚合权重会使全局模型的效用无法达到最优。(2)现有框架建立在中心服务器是诚实的假定下,没有考虑中央服务器不可信导致的参与方的数据隐私泄露问题。为了解决上述问题,基于比较流行的DP-Fed Avg算法,提出了一种非可信中心服务器下的动态聚合权重的隐私保护联邦学习DP-DFL框架,其设定了一种动态的模型聚合权重,该方法从不同参与方的数据中直接学习联邦学习中的模型聚合权重,从而适用于非独立同分布的数据环境。此外,在本地模型隐私保护阶段注入噪声进行模型参数的隐私保护,满足不可信中心服务器的设定,从而降低本地参与方模型参数上传中的隐私泄露风险。在数据集CIFAR-10上的实验证明,DP-DFL框架不仅提供本地隐私保证,同时可以实现更高的准确率,相较DP-Fed Avg算法模型的平均准确率提高了2.09%。  相似文献   

2.
刘艺璇  陈红  刘宇涵  李翠平 《软件学报》2022,33(3):1057-1092
联邦学习是顺应大数据时代和人工智能技术发展而兴起的一种协调多个参与方共同训练模型的机制.它允许各个参与方将数据保留在本地,在打破数据孤岛的同时保证参与方对数据的控制权.然而联邦学习引入了大量参数交换过程,不仅和集中式训练一样受到模型使用者的威胁,还可能受到来自不可信的参与设备的攻击,因此亟需更强的隐私手段保护各方持有的...  相似文献   

3.
近年来,联邦学习成为解决机器学习中数据孤岛与隐私泄露问题的新思路。联邦学习架构不需要多方共享数据资源,只要参与方在本地数据上训练局部模型,并周期性地将参数上传至服务器来更新全局模型,就可以获得在大规模全局数据上建立的机器学习模型。联邦学习架构具有数据隐私保护的特质,是未来大规模数据机器学习的新方案。然而,该架构的参数交互方式可能导致数据隐私泄露。目前,研究如何加强联邦学习架构中的隐私保护机制已经成为新的热点。从联邦学习中存在的隐私泄露问题出发,探讨了联邦学习中的攻击模型与敏感信息泄露途径,并重点综述了联邦学习中的几类隐私保护技术:以差分隐私为基础的隐私保护技术、以同态加密为基础的隐私保护技术、以安全多方计算(SMC)为基础的隐私保护技术。最后,探讨了联邦学习中隐私保护中的若干关键问题,并展望了未来研究方向。  相似文献   

4.
5.
Traditional data-driven energy consumption forecasting models, including machine learning and deep learning methods, showed outstanding performance in terms of forecasting accuracy and efficiency. The superior performances are based on enough training data samples. Moreover, the derived forecasting model is only applicable to the training dataset and usually is applied to specific household. In real-world smart city development, a centralized forecasting model is required to model and forecasting energy consumption patterns for multiple households, whereas the traditional data-driven forecasting approaches may become invalid. A consistent model is demanded in this scenario modeling multiple households’ energy consumption patterns. Additionally, privacy issues are also highly concerned in such scenarios. Accurate energy consumption forecasting with privacy preservations becomes a key point for the state-of-art research. In this study, we adopt an innovative privacy-preserving structure that combines deep learning and federated learning. Under the premise of guaranteeing forecasting accuracy and privacy preservation, this structure can achieve the forecasting of various household energy consumption with a consistent model that simultaneously forecast multiple household energy consumption data by transmission control protocol.  相似文献   

6.
The Internet of Things (IoT) environment plays a crucial role in the design of smart environments. Security and privacy are the major challenging problems that exist in the design of IoT-enabled real-time environments. Security susceptibilities in IoT-based systems pose security threats which affect smart environment applications. Intrusion detection systems (IDS) can be used for IoT environments to mitigate IoT-related security attacks which use few security vulnerabilities. This paper introduces a modified garden balsan optimization-based machine learning model for intrusion detection (MGBO-MLID) in the IoT cloud environment. The presented MGBO-MLID technique focuses on the identification and classification of intrusions in the IoT cloud atmosphere. Initially, the presented MGBO-MLID model applies min-max normalization that can be utilized for scaling the features in a uniform format. In addition, the MGBO-MLID model exploits the MGBO algorithm to choose the optimal subset of features. Moreover, the attention-based bidirectional long short-term (ABiLSTM) method can be utilized for the detection and classification of intrusions. At the final level, the Aquila optimization (AO) algorithm is applied as a hyperparameter optimizer to fine-tune the ABiLSTM methods. The experimental validation of the MGBO-MLID method is tested using a benchmark dataset. The extensive comparative study reported the betterment of the MGBO-MLID algorithm over recent approaches.  相似文献   

7.
现有的加密流量检测技术缺少对数据和模型的隐私性保护,不仅违反了隐私保护法律法规,而且会导致严重的敏感信息泄露.主要研究了基于梯度提升决策树(GBDT)算法的加密流量检测模型,结合差分隐私技术,设计并实现了一个隐私保护的加密流量检测系统.在CICIDS2017数据集下检测了 DDoS攻击和端口扫描的恶意流量,并对系统性能...  相似文献   

8.
With the extensive applications of machine learning, it has been witnessed that machine learning has been applied in various fields such as e-commerce, mobile data processing, health analytics and behavioral analytics etc. Word vector training is usually deployed in machine learning to provide a model architecture and optimization, for example, to learn word embeddings from a large amount of datasets. Training word vector in machine learning needs a lot of datasets to train and then outputs a model, however, some of which might contain private and sensitive information, and the training phase will lead to the exposure of the trained model and user datasets. In order to offer utilizable, plausible, and personalized alternatives to users, this process usually also entails a breach of their privacy. For instance, the user data might contain of face,irirs and personal identities etc. This will release serious problem in the machine learning. In this article, we investigate the problem of training high-quality word vectors on encrypted datasets by using privacy-preserving learning algorithms. Firstly, we use a pseudo-random function to generate a statistical token for each word to help build the vocabulary of the word vector. Then we employ functional inner-product encryption to calculate the activation function to obtain the inner product, securely. Finally, we use BGN cryptosystem to encrypt and hide the sensitive datasets, and complete the homomorphic operation over the ciphertexts to perform the training procedure. In order to implement the privacy preservation of word vector training, we propose four privacy-preserving machine learning schemes to provide the privacy protection in our scheme. We analyze the security and efficiency of our protocols and give the numerical experiments. Compared with the existing solutions, it indicates that our scheme can provide a higher efficiency and less communication overhead.  相似文献   

9.

Code smell detection is essential to improve software quality, enhancing software maintainability, and decrease the risk of faults and failures in the software system. In this paper, we proposed a code smell prediction approach based on machine learning techniques and software metrics. The local interpretable model-agnostic explanations (LIME) algorithm was further used to explain the machine learning model’s predictions and interpretability. The datasets obtained from Fontana et al. were reformed and used to build binary-label and multi-label datasets. The results of 10-fold cross-validation show that the performance of tree-based algorithms (mainly Random Forest) is higher compared with kernel-based and network-based algorithms. The genetic algorithm based feature selection methods enhance the accuracy of these machine learning algorithms by selecting the most relevant features in each dataset. Moreover, the parameter optimization techniques based on the grid search algorithm significantly enhance the accuracy of all these algorithms. Finally, machine learning techniques have high potential in predicting the code smells, which contribute to detect these smells and enhance the software’s quality.

  相似文献   

10.
物联网隐私保护研究与方法综述   总被引:3,自引:1,他引:2  
通过建立物联网的体系结构,详细分析了体系结构中感知层和处理层所面临的隐私安全威胁,对现有的与物联网技术相关的隐私保护方法进行了系统性的综述,重点讨论了匿名化方法、加密技术和路由协议方法的基本原理与特点,并在此基础上指出了物联网隐私保护技术今后的研究方向。  相似文献   

11.
Recently, Internet of Things (IoT) devices produces massive quantity of data from distinct sources that get transmitted over public networks. Cybersecurity becomes a challenging issue in the IoT environment where the existence of cyber threats needs to be resolved. The development of automated tools for cyber threat detection and classification using machine learning (ML) and artificial intelligence (AI) tools become essential to accomplish security in the IoT environment. It is needed to minimize security issues related to IoT gadgets effectively. Therefore, this article introduces a new Mayfly optimization (MFO) with regularized extreme learning machine (RELM) model, named MFO-RELM for Cybersecurity Threat Detection and classification in IoT environment. The presented MFO-RELM technique accomplishes the effectual identification of cybersecurity threats that exist in the IoT environment. For accomplishing this, the MFO-RELM model pre-processes the actual IoT data into a meaningful format. In addition, the RELM model receives the pre-processed data and carries out the classification process. In order to boost the performance of the RELM model, the MFO algorithm has been employed to it. The performance validation of the MFO-RELM model is tested using standard datasets and the results highlighted the better outcomes of the MFO-RELM model under distinct aspects.  相似文献   

12.
In current software defect prediction (SDP) research, most previous empirical studies only use datasets provided by PROMISE repository and this may cause a threat to the external validity of previous empirical results. Instead of SDP dataset sharing, SDP model sharing is a potential solution to alleviate this problem and can encourage researchers in the research community and practitioners in the industrial community to share more models. However, directly sharing models may result in privacy disclosure, such as model inversion attack. To the best of our knowledge, we are the first to apply differential privacy (DP) to privacy-preserving SDP model sharing and then propose a novel method DP-Share, since DP mechanisms can prevent this attack when the privacy budget is carefully selected. In particular, DP-Share first performs data preprocessing for the dataset, such as over-sampling for minority instances (i.e., defective modules) and conducting discretization for continuous features to optimize privacy budget allocation. Then, it uses a novel sampling strategy to create a set of training sets. Finally it constructs decision trees based on these training sets and these decision trees can form a random forest (i.e., model). The last phase of DP-Share uses Laplace and exponential mechanisms to satisfy the requirements of DP. In our empirical studies, we choose nine experimental subjects from real software projects. Then, we use AUC (area under ROC curve) as the performance measure and holdout as our model validation technique. After privacy and utility analysis, we find that DP-Share can achieve better performance than a baseline method DF-Enhance in most cases when using the same privacy budget. Moreover, we also provide guidelines to effectively use our proposed method. Our work attempts to fill the research gap in terms of differential privacy for SDP, which can encourage researchers and practitioners to share more SDP models and then effectively advance the state of the art of SDP.  相似文献   

13.
Software quality engineering comprises of several quality assurance activities such as testing, formal verification, inspection, fault tolerance, and software fault prediction. Until now, many researchers developed and validated several fault prediction models by using machine learning and statistical techniques. There have been used different kinds of software metrics and diverse feature reduction techniques in order to improve the models’ performance. However, these studies did not investigate the effect of dataset size, metrics set, and feature selection techniques for software fault prediction. This study is focused on the high-performance fault predictors based on machine learning such as Random Forests and the algorithms based on a new computational intelligence approach called Artificial Immune Systems. We used public NASA datasets from the PROMISE repository to make our predictive models repeatable, refutable, and verifiable. The research questions were based on the effects of dataset size, metrics set, and feature selection techniques. In order to answer these questions, there were defined seven test groups. Additionally, nine classifiers were examined for each of the five public NASA datasets. According to this study, Random Forests provides the best prediction performance for large datasets and Naive Bayes is the best prediction algorithm for small datasets in terms of the Area Under Receiver Operating Characteristics Curve (AUC) evaluation parameter. The parallel implementation of Artificial Immune Recognition Systems (AIRS2Parallel) algorithm is the best Artificial Immune Systems paradigm-based algorithm when the method-level metrics are used.  相似文献   

14.

The algorithm selection problem is defined as identifying the best-performing machine learning (ML) algorithm for a given combination of dataset, task, and evaluation measure. The human expertise required to evaluate the increasing number of ML algorithms available has resulted in the need to automate the algorithm selection task. Various approaches have emerged to handle the automatic algorithm selection challenge, including meta-learning. Meta-learning is a popular approach that leverages accumulated experience for future learning and typically involves dataset characterization. Existing meta-learning methods often represent a dataset using predefined features and thus cannot be generalized across different ML tasks, or alternatively, learn a dataset’s representation in a supervised manner and therefore are unable to deal with unsupervised tasks. In this study, we propose a novel learning-based task-agnostic method for producing dataset representations. Then, we introduce TRIO, a meta-learning approach, that utilizes the proposed dataset representations to accurately recommend top-performing algorithms for previously unseen datasets. TRIO first learns graphical representations for the datasets, using four tools to learn the latent interactions among dataset instances and then utilizes a graph convolutional neural network technique to extract embedding representations from the graphs obtained. We extensively evaluate the effectiveness of our approach on 337 datasets and 195 ML algorithms, demonstrating that TRIO significantly outperforms state-of-the-art methods for algorithm selection for both supervised (classification and regression) and unsupervised (clustering) tasks.

  相似文献   

15.
With the advances of machine learning algorithms and the pervasiveness of network terminals, online medical primary diagnosis scheme, which can provide the primary diagnosis service anywhere anytime, has attracted considerable interest recently. However, the flourish of online medical primary diagnosis scheme still faces many challenges including information security and privacy preservation. In this paper, we propose an efficient and privacy-preserving medical primary diagnosis scheme, called PDiag, on naive Bayes classification. With PDiag, the sensitive personal health information can be processed without privacy disclosure during online medical primary diagnosis service. Specifically, based on an improved expression for the naive Bayes classifier, an efficient and privacy-preserving classification scheme is introduced with lightweight polynomial aggregation technique. The encrypted user query is directly operated at the service provider without decryption, and the diagnosis result can only be decrypted by user. Through extensive analysis, we show that PDiag ensures users’ health information and service provider’s prediction model are kept confidential, and has significantly less computation and communication overhead than existing schemes. In addition, performance evaluations via implementing PDiag on smartphone and computer demonstrate PDiag’s effectiveness in term of real environment.  相似文献   

16.
Since smartphones embedded with positioning systems and digital maps are widely used, location-based services (LBSs) are rapidly growing in popularity and providing unprecedented convenience in people’s daily lives; however, they also cause great concern about privacy leakage. In particular, location queries can be used to infer users’ sensitive private information, such as home addresses, places of work and appointment locations. Hence, many schemes providing query anonymity have been proposed, but they typically ignore the fact that an adversary can infer real locations from the correlations between consecutive locations in a continuous LBS. To address this challenge, a novel dual privacy-preserving scheme (DPPS) is proposed that includes two privacy protection mechanisms. First, to prevent privacy disclosure caused by correlations between locations, a correlation model is proposed based on a hidden Markov model (HMM) to simulate users’ mobility and the adversary’s prediction probability. Second, to provide query probability anonymity of each single location, an advanced k-anonymity algorithm is proposed to construct cloaking regions, in which realistic and indistinguishable dummy locations are generated. To validate the effectiveness and efficiency of DPPS, theoretical analysis and experimental verification are further performed on a real-life dataset published by Microsoft, i.e., GeoLife dataset.  相似文献   

17.
The Internet of Things (IoT) has gained more popularity in research because of its large-scale challenges and implementation. But security was the main concern when witnessing the fast development in its applications and size. It was a dreary task to independently set security systems in every IoT gadget and upgrade them according to the newer threats. Additionally, machine learning (ML) techniques optimally use a colossal volume of data generated by IoT devices. Deep Learning (DL) related systems were modelled for attack detection in IoT. But the current security systems address restricted attacks and can be utilized outdated datasets for evaluations. This study develops an Artificial Algae Optimization Algorithm with Optimal Deep Belief Network (AAA-ODBN) Enabled Ransomware Detection in an IoT environment. The presented AAA-ODBN technique mainly intends to recognize and categorize ransomware in the IoT environment. The presented AAA-ODBN technique follows a three-stage process: feature selection, classification, and parameter tuning. In the first stage, the AAA-ODBN technique uses AAA based feature selection (AAA-FS) technique to elect feature subsets. Secondly, the AAA-ODBN technique employs the DBN model for ransomware detection. At last, the dragonfly algorithm (DFA) is utilized for the hyperparameter tuning of the DBN technique. A sequence of simulations is implemented to demonstrate the improved performance of the AAA-ODBN algorithm. The experimental values indicate the significant outcome of the AAA-ODBN model over other models.  相似文献   

18.
Record linkage is a process of identifying records that refer to the same real-world entity. Many existing approaches to record linkage apply supervised machine learning techniques to generate a classification model that classifies a pair of records as either match or non-match. The main requirement of such an approach is a labelled training dataset. In many real-world applications no labelled dataset is available hence manual labelling is required to create a sufficiently sized training dataset for a supervised machine learning algorithm. Semi-supervised machine learning techniques, such as self-learning or active learning, which require only a small manually labelled training dataset have been applied to record linkage. These techniques reduce the requirement on the manual labelling of the training dataset. However, they have yet to achieve a level of accuracy similar to that of supervised learning techniques. In this paper we propose a new approach to unsupervised record linkage based on a combination of ensemble learning and enhanced automatic self-learning. In the proposed approach an ensemble of automatic self-learning models is generated with different similarity measure schemes. In order to further improve the automatic self-learning process we incorporate field weighting into the automatic seed selection for each of the self-learning models. We propose an unsupervised diversity measure to ensure that there is high diversity among the selected self-learning models. Finally, we propose to use the contribution ratios of self-learning models to remove those with poor accuracy from the ensemble. We have evaluated our approach on 4 publicly available datasets which are commonly used in the record linkage community. Our experimental results show that our proposed approach has advantages over the state-of-the-art semi-supervised and unsupervised record linkage techniques. In 3 out of 4 datasets it also achieves comparable results to those of the supervised approaches.  相似文献   

19.

Continuous authentication modalities collect and utilize users’ sensitive data to authenticate them continuously. Such data contain information about user activities, behaviors, and other demographic information, which causes privacy concerns. In this paper, we propose two privacy-preserving protocols that enable continuous authentication while preventing the disclosure of user-sensitive information to an authentication server. We utilize homomorphic cryptographic primitives that protect the privacy of biometric features with an oblivious transfer protocol that enables privacy-preserving information retrieval. We performed the biometric evaluation of the proposed protocols on two datasets, a swipe gesture dataset and a keystroke dynamics dataset. The biometric evaluation shows that the protocols have very good performance. The execution time of the protocols is measured by considering continuous authentication using: only swipe gestures, keystroke dynamics, and hybrid modalities. The execution time proves the protocols are very efficient, even on high-security levels.

  相似文献   

20.
In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used in machine learning for data compaction and efficient learning by eliminating the curse of dimensionality. There exist many solutions for feature selection when the data are located at a central location. However, it becomes extremely challenging to perform the same when the data are distributed across a large number of peers or machines. Centralizing the entire dataset or portions of it can be very costly and impractical because of the large number of data sources, the asynchronous nature of the peer-to-peer networks, dynamic nature of the data/network, and privacy concerns. The solution proposed in this paper allows us to perform feature selection in an asynchronous fashion with a low communication overhead where each peer can specify its own privacy constraints. The algorithm works based on local interactions among participating nodes. We present results on real-world dataset in order to test the performance of the proposed algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号