首页 | 本学科首页   官方微博 | 高级检索  
     

一种用于文本抄袭检测的特征提取算法
引用本文:金标,赵萌萌,吴国华.一种用于文本抄袭检测的特征提取算法[J].计算机应用研究,2018,35(9).
作者姓名:金标  赵萌萌  吴国华
作者单位:国家保密科技测评中心,杭州电子科技大学计算机学院,杭州电子科技大学网络空间安全学院
基金项目:国家保密局保密科研项目(BMKY2016AT02);复杂系统建模与仿真教育部重点实验室资助.
摘    要:特征提取是文本抄袭检测的重要环节,文本特征的数量和质量严重影响文本抄袭检测的准确率。针对现有方法的不足,提出一种基于依存句法的文本抄袭检测算法。该算法在依存句法分析的基础上,通过分析句子中词语间的关系以及合并短小词语建立句法框架,进而提取文本特征。其中,短小词语的合并能够使无意义词语合并成为有意义实体来表示文本特征,使文本特征更全面。实验结果表明,该文本特征提取算法能够准确选择文本的特征集,解决了文本特征数量多的问题,检测的准确率也有所提高。

关 键 词:文本特征提取    抄袭检测    依存句法    句法框架
收稿时间:2017/5/15 0:00:00
修稿时间:2018/8/6 0:00:00

Feature extraction algorithm for text plagiarism detection
Jin Biao,Zhao Mengmeng and Wu Guohua.Feature extraction algorithm for text plagiarism detection[J].Application Research of Computers,2018,35(9).
Authors:Jin Biao  Zhao Mengmeng and Wu Guohua
Affiliation:National Secrecy Science and Technology Evaluation Center,,
Abstract:Feature extraction is an important part of text plagiarism detection, the quantity and quality of text features seriously affect the accuracy of text plagiarism detection. In view of the shortcomings of the existing methods, this paper proposed a text plagiarism detection algorithm based on dependency syntax. Based on the dependency syntax, the syntactic framework was established through analyzing the dependency relations of words in a sentence and merging short words, and then the text feature can be extracted. Short-words integration can make the nonsense words into meaningful entities to represent text features, and it makes text features more comprehensive. Experimental results show that the proposed text feature extraction algorithm can accurately select the text feature, and improve the detection accuracy rate.
Keywords:feature extraction  plagiarism detection  dependency syntax  syntactic frame  
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号