首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进K最近邻分类算法的不良网页并行识别
引用本文:徐雅斌 李卓 陈俊伊. 基于改进K最近邻分类算法的不良网页并行识别[J]. 计算机应用, 2013, 33(12): 3368-3371
作者姓名:徐雅斌 李卓 陈俊伊
作者单位:1. 北京信息科技大学 计算机学院,北京 100101;2. 网络文化与数字传播北京市重点实验室(北京信息科技大学),北京 100101
基金项目:国家自然科学基金资助项目;国家自然科学基金资助项目;国家自然科学基金资助项目
摘    要:互联网中,黄色、暴力、赌博、反动等不良网页大量存在。如果不进行有效过滤,将给搜索服务带来不良的影响。采用改进的K最近邻分类算法来提高识别的准确率,并在虚拟化平台上通过开源的Hadoop软件所提供的MapReduce模型进行分布式并行处理。对比实验结果表明,所采用的识别方法的识别准确率和识别效率都有较大的提高。

关 键 词:不良网页  文本分类  K最近邻分类算法  Hadoop  MapReduce  
收稿时间:2013-07-30

Parallel recognition of illegal Web pages based on improved KNN classification algorithm
XU Yabin LI Zhuo CHEN Junyi. Parallel recognition of illegal Web pages based on improved KNN classification algorithm[J]. Journal of Computer Applications, 2013, 33(12): 3368-3371
Authors:XU Yabin LI Zhuo CHEN Junyi
Affiliation:1. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research (Beijing Information Science and Technology University), Beijing 100101, China2. Computer School, Beijing Information Science and Technology University, Beijing 100101, China
Abstract:There are many illegal Web pages on the Internet, which may have pornographic, violent, gambling or reactionary content. Without being filtered effectively, they will exercise a malign influence on the searching services. An improved K-Nearest Neighbors (KNN) classification algorithm to promote the recognition accuracy was proposed and implemented on a virtualized platform following the MapReduce model provided by the open source software Hadoop, which made it distributed and parallel. Through experiments and comparison with the existing work, it is proved that the proposed recognition method improves the accuracy and efficiency greatly.The algorithm is implemented on a virtualized platform following the MapReduce model provided by the open source software Hadoop, which makes it distributed and parallel. Through experiments and comparison with existing work, it is proved that the recognition method we propose improves the accuracy and efficiency greatly.
Keywords:illegal Web page   text classification   K-Nearest Neighbors (KNN) classification algorithm   Hadoop   MapReduce
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号