首页 | 本学科首页   官方微博 | 高级检索  
     

基于乌鸦搜索的隐私保护聚类算法
引用本文:夏雪薇,张磊,李晶,邓雨康. 基于乌鸦搜索的隐私保护聚类算法[J]. 计算机应用研究, 2023, 40(12)
作者姓名:夏雪薇  张磊  李晶  邓雨康
作者单位:佳木斯大学 信息电子技术学院,佳木斯大学 信息电子技术学院,佳木斯大学 信息电子技术学院,佳木斯大学 信息电子技术学院
基金项目:黑龙江省自然科学基金联合引导项目(LH2021F054);黑龙江省省属高等学校基本科研业务费优秀创新团队建设项目(2022-KYYWF-0654);黑龙江省哲学社会科学研究规划项目(22GLH084);佳木斯大学国家基金培育项目(JMSUGPZR2022-014)
摘    要:针对基于差分隐私的K-means聚类存在数据效用差的问题,基于乌鸦搜索和轮廓系数提出了一个隐私保护的聚类算法(privacy preserving clustering algorithm based on crow search,CS-PCA)。该算法,一方面利用轮廓系数对每次迭代中每个簇的聚类效果进行评估,根据聚类效果添加不同数量的噪声,并利用聚类合并思想降低噪声对聚类的影响;另一方面利用乌鸦搜索对差分隐私的K-means隐私保护聚类算法中初始质心的选择进行优化,防止算法陷入局部最优。实验结果表明,CS-PCA算法的聚类有效性更高,并且同样适用于大规模数据。从整体上看,随着隐私预算的不断增大,CS-PCA算法的F-measure值分别比DP-KCCM和PADC算法高了约0~281.3312%和4.5876%~470.3704%。在相同的隐私预算下,CS-PCA算法在绝大多数情况下聚类结果可用性优于对比算法。

关 键 词:乌鸦搜索   轮廓系数   K-means聚类   差分隐私   最优初始质心
收稿时间:2023-04-19
修稿时间:2023-11-10

Privacy preserving clustering algorithm based on crow search
Xia Xuewei,Zhang Lei,Li Jing and Deng Yukang. Privacy preserving clustering algorithm based on crow search[J]. Application Research of Computers, 2023, 40(12)
Authors:Xia Xuewei  Zhang Lei  Li Jing  Deng Yukang
Affiliation:School of Information & Electronic Technology Jiamusi University,,,
Abstract:K-means clustering for differential privacy has the problem of poor data utility. This paper proposed a privacy preserving clustering algorithm(CS-PCA) based on crow search and silhouette coefficient. On the one hand, the algorithm used silhouette coefficient to evaluate the clustering effect of each cluster in each iteration, and added different amounts of noise according to the clustering effect, and used the idea of clustering merging to reduce the influence of noise on clustering. On the other hand, it used crow search to optimize the selection of initial centroid in the K-means privacy protection clustering algorithm of differential privacy, and prevented the algorithm from falling into local optimum. The experimental results show the CS-PCA algorithm is more effective for clustering, and also suitable for large-scale data. As a whole, as privacy budgets continue to grow, the F-measure values of CS-PCA algorithm are 0 to 281.3312% and 4.5876% to 470.3704% higher than DP-KCCM and PADC algorithm respectively. With the same privacy budget, CS-PCA algorithm outperforms the comparison algorithm in terms of availability of clustering results.
Keywords:crow search   contour coefficient   K-means clustering   differential privacy   optimal initial centroid
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号