首页 | 本学科首页   官方微博 | 高级检索  
     

基于黑盒水印的NLP神经网络版权保护
引用本文:代龙,张静,樊雪峰,周晓谊. 基于黑盒水印的NLP神经网络版权保护[J]. 网络与信息安全学报, 2023, 9(1): 140-149. DOI: 10.11959/j.issn.2096-109x.2023009
作者姓名:代龙  张静  樊雪峰  周晓谊
作者单位:海南大学网络空间安全学院,海南 海口 570100
基金项目:海南省重点研发计划(ZDYF2022GXTS224)
摘    要:随着自然语言处理(NLP,natural language processing)技术的快速发展,语言模型在文本分类和情感分析中的应用不断增加。然而,语言模型容易遭到盗版再分发,对模型所有者的知识产权造成严重威胁。因此,研究者着手设计保护机制来识别语言模型的版权信息。现有的适用于文本分类任务的语言模型水印无法与所有者身份相关联,且鲁棒性不足以及无法再生成触发集。为了解决这些问题,提出一种新的适用于文本分类任务模型的黑盒水印方案,可以远程快速验证模型所有权。将模型所有者的版权消息和密钥通过密钥相关的哈希运算消息认证码(HMAC,hash-based message authentication code)得到版权消息摘要,由HMAC得到的消息摘要可以防止被伪造,具有很强的安全性。从原始训练集各个类别中随机挑选一定的文本数据,将摘要与文本数据结合构建触发集,并在训练过程中对语言模型嵌入水印。为了评估水印的性能,在IMDB电影评论、CNEWS中文新闻文本分类数据集上对3种常见的语言模型嵌入水印。实验结果表明,在不影响原始模型测试精度的情况下,所提出的水印验证方案的准确率可以达到 100%。即使在模型微调和剪枝等常见攻击下,也能表现出较强的鲁棒性,并且具有抗伪造攻击的能力。同时,水印的嵌入不会影响模型的收敛时间,具有较高的嵌入效率。

关 键 词:自然语言处理  文本分类  版权保护  语言模型  黑盒水印  

NLP neural network copyright protection based on black box watermark
Long DAI,Jing ZHANG,Xuefeng FAN,Xiaoyi ZHOU. NLP neural network copyright protection based on black box watermark[J]. Chinese Journal of Network and Information Security, 2023, 9(1): 140-149. DOI: 10.11959/j.issn.2096-109x.2023009
Authors:Long DAI  Jing ZHANG  Xuefeng FAN  Xiaoyi ZHOU
Affiliation:School of Cyberspace Security, Hainan University, Haikou 570100, China
Abstract:With the rapid development of natural language processing techniques, the use of language models in text classification and sentiment analysis has been increasing.However, language models are susceptible to piracy and redistribution by adversaries, posing a serious threat to the intellectual property of model owners.Therefore, researchers have been working on designing protection mechanisms to identify the copyright information of language models.However, existing watermarking of language models for text classification tasks cannot be associated with the owner’s identity, and they are not robust enough and cannot regenerate trigger sets.To solve these problems, a new model, namely black-box watermarking scheme for text classification tasks, was proposed.It was a scheme that can remotely and quickly verify model ownership.The copyright message and the key of the model owner were obtained through the Hash-based Message Authentication Code (HMAC), and the message digest obtained by HMAC can prevent forgery and had high security.A certain amount of text data was randomly selected from each category of the original training set and the digest was combined with the text data to construct the trigger set, then the watermark was embedded on the language model during the training process.To evaluate the performance of the proposed scheme, watermarks were embedded on three common language models on the IMDB’s movie reviews and CNews text classification datasets.The experimental results show that the accuracy of the proposed watermarking verification scheme can reach 100% without affecting the original model.Even under common attacks such as model fine-tuning and pruning, the proposed watermarking scheme shows strong robustness and resistance to forgery attacks.Meanwhile, the embedding of the watermark does not affect the convergence time of the model and has high embedding efficiency.
Keywords:natural language processing  text classification  copyright protection  language model  black box watermarking  
点击此处可从《网络与信息安全学报》浏览原始摘要信息
点击此处可从《网络与信息安全学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号