首页 | 本学科首页   官方微博 | 高级检索  
     

基于概念和语义网络的近似网页检测算法
引用本文:曹玉娟,牛振东,赵堃,彭学平.基于概念和语义网络的近似网页检测算法[J].软件学报,2011,22(8):1816-1826.
作者姓名:曹玉娟  牛振东  赵堃  彭学平
作者单位:1. 北京理工大学计算机科学技术学院,北京 100081;北京航天飞行控制中心,北京 100094
2. 北京理工大学计算机科学技术学院,北京,100081
基金项目:国家自然科学基金(60803050,60705022); 新世纪优秀人才计划(NCET-06-0161)
摘    要:在搜索引擎的检索结果页面中,用户经常会得到内容近似的网页.为了提高检索整体性能和用户满意度,提出了一种基于概念和语义网络的近似网页检测算法DWDCS(near-duplicate webpages detection based on concept and semantic network).改进了经典基于小世界理论...

关 键 词:网页去重算法  小世界网络  近似网页  均方差
收稿时间:2009/10/9 0:00:00
修稿时间:2010/1/20 0:00:00

Near Duplicated Web Pages Detection Based on Concept and Semantic Network
CAO Yu-Juan,NIU Zhen-Dong,ZHAO Kun and PENG Xue-Ping.Near Duplicated Web Pages Detection Based on Concept and Semantic Network[J].Journal of Software,2011,22(8):1816-1826.
Authors:CAO Yu-Juan  NIU Zhen-Dong  ZHAO Kun and PENG Xue-Ping
Affiliation:CAO Yu-Juan1,2,NIU Zhen-Dong1,ZHAO Kun1,PENG Xue-Ping1 1(School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China) 2(Beijing Aerospace Command Centre,Beijing 100094,China)
Abstract:Reprinting websites and blogs produces a great deal redundant WebPages.To improve search efficiency and user satisfaction,the near-Duplicate WebPages Detection based on Concept and Semantic network(DWDCS) is proposed.In the course of developing a near-duplicate detection system for a multi-billion pages repository,this paper makes two research contributions.First,the key concept is extracted,instead of the keyphrase,to build Small Word Network(SWN).This not only reduces the complexity of the semantic networ...
Keywords:duplicate removal algorithm  small world network  near duplicated Web page  standard deviation  
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号