首页 | 本学科首页   官方微博 | 高级检索  
     

面向中文敏感词变形体的识别方法研究
引用本文:付聪,余敦辉,张灵莉.面向中文敏感词变形体的识别方法研究[J].计算机应用研究,2019,36(4).
作者姓名:付聪  余敦辉  张灵莉
作者单位:湖北大学计算机与信息工程学院,武汉,430062;湖北省教育信息化工程技术中心,武汉,430062
基金项目:国家"973"计划资助项目(2014CB340404);国家自然科学基金资助项目(61373037,61672387)
摘    要:为净化网络环境,需要对网络信息进行审查。针对网络信息中所包含的敏感词,尤其是中文敏感词变形体的识别成为了一个迫切需要解决的问题。通过分析汉字的结构和读音等特征提出了一种中文敏感词变形体的识别方法。该方法针对词的拼音、词的简称和词的拆分三种敏感词变形体分别设计了基于易混拼音分组的敏感词的识别算法(SPGR)、字符串的简称识别算法(SNR)和基于KMP的汉字拆分识别算法(WS-KMP),有效提高了敏感词审查的准确率和效率。实验结果表明,该方法在识别中文敏感词变形体的时候有较高的查全率和查准率。

关 键 词:变形体  敏感词识别  编辑距离  KMP算法
收稿时间:2017/11/11 0:00:00
修稿时间:2019/3/2 0:00:00

Study on identification method for change form of Chinese sensitive words
fucong,yudunhui and zhanglingli.Study on identification method for change form of Chinese sensitive words[J].Application Research of Computers,2019,36(4).
Authors:fucong  yudunhui and zhanglingli
Affiliation:Hubei University,,
Abstract:To purify the network environment, the network information needs to be reviewed. Recognizing the sensitive words in the network information, especially the change form of Chinese sensitive words, is an urgent problem to be solved. By analyzing the structure and pronunciation of Chinese characters, this paper proposes a method of recognition of the change form of Chinese sensitive words. This method has designed sensitive word recognition algorithm based on the grouping of confusing pinyin, String abbreviation recognition algorithm and recognition algorithm based on KMP''s character split recognition algorithm for the pinyin of word , the abbreviation of word and the split of word, and improve the accuracy and efficiency of the review. The experimental results show that the proposed method has higher recall and precision when recognizing the change form of Chinese sensitive words.
Keywords:change form  sensitive word recognition  edit distance  KMP algorithm
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号