首页 | 本学科首页   官方微博 | 高级检索  
     

基于聚团词的大规模文本转载识别算法
引用本文:张京阳,张华平,刘金刚.基于聚团词的大规模文本转载识别算法[J].计算机应用,2010,30(6):1661-1663.
作者姓名:张京阳  张华平  刘金刚
作者单位:1. 北京中科天玑信息技术有限公司2. 3. 首都师范大学 计算机科学联合研究院
基金项目:国家863计划项目(2007AA01Z438);;中国科学院计算技术研究所2008知识创新基金资助项目
摘    要:文本转载识别是指从大规模文本库中检测出内容相同或相近的文档集合,在热门话题检测、搜索引擎结果凝练、学术文章抄袭识别等诸多应用上,存在普遍的需求。为适应网络文本转载形式的日趋多样化,并进一步提升实用系统效率,对各种文本特征及比较算法进行了研究分析,提出了基于聚团词的大规模文本转载识别算法,即:依据词语的分布属性,识别并提取高得分聚团词用于表征文本,之后通过对文本集进行扩展线性比较与多维比较两次操作,最终筛选出转载识别结果。对比实验表明:该算法在准确率、召回率与效率上有较高的综合性能。

关 键 词:转载识别  聚团词  特征选择  扩展线性比较  向量空间模型  
收稿时间:2009-12-15
修稿时间:2010-02-10

Large-scale document forward detection algorithm based on agglomerate-term
ZHANG Jing-yang,ZHANG Hua-ping,LIU Jin-gang.Large-scale document forward detection algorithm based on agglomerate-term[J].journal of Computer Applications,2010,30(6):1661-1663.
Authors:ZHANG Jing-yang  ZHANG Hua-ping  LIU Jin-gang
Affiliation:1.Joint Faculty of Computer Scientific Research/a>;Capital Normal University/a>;Beijing 100037/a>;China/a>;2.School of Computer Science and Technology/a>;Beijing Institute of Technology/a>;Beijing 100080/a>;3.Institute of Computing Technology/a>;Chinese Academy of Sciences/a>;Beijing 100190/a>;China
Abstract:Document forward detection is that to find out article collection of the same or close content from a large-scale text library.It has widespread demand in popular articles exploring,results organizing of search engine,copy detection and so on.To meet the growing diverse forms of Internet text forward and improve system efficiency,this paper discussed certain text features and researched some comparison algorithms.Then,the large-scale document forward detection algorithm based on agglomerate-term was introdu...
Keywords:forward detection                                                                                                                        Agglomerate-Term (AgT)                                                                                                                        feature selection                                                                                                                        extensive linear comparison                                                                                                                        Vector Space Model (VSM)
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号