基于朴素贝叶斯算法的主题爬虫的研究 Research on Focused Crawler Based on Naive Bayes Algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于朴素贝叶斯算法的主题爬虫的研究

引用本文：	皮靖,邵雄凯,肖雅夫.基于朴素贝叶斯算法的主题爬虫的研究[J].计算机与数字工程,2012,40(6):76-78,123.

作者姓名：	皮靖邵雄凯肖雅夫

作者单位：	湖北工业大学计算机学院武汉430068

摘要：	主题爬虫是实现主题搜索引擎的关键部分。提出了利用朴素贝叶斯算法进行主题识别的方法,介绍了主题爬虫实现过程中所涉及到的关键部分,包括种子URL集合的生成、页面分析及特征提取、主题识别等。将基于朴素贝叶斯算法的主题爬虫,与基于链接分析的主题爬虫和基于主题词表的主题爬虫进行比较,实验表明基于朴素贝叶斯算法的主题爬虫准确性较好,论证了方法的可行性,为主题信息的采集奠定了良好的基础。
关键词：	朴素贝叶斯算法主题爬虫主题相关度信息采集
Research on Focused Crawler Based on Naive Bayes Algorithm

PI Jing , SHAO Xiongkai , XIAO Yafu.Research on Focused Crawler Based on Naive Bayes Algorithm[J].Computer and Digital Engineering,2012,40(6):76-78,123.

Authors:	PI Jing SHAO Xiongkai XIAO Yafu

Affiliation:	(School of Computer Science,Hubei University of Technology,Wuhan 430068)

Abstract:	Focused crawler is a key part of the focused search engine.This paper proposed a method of using Naive Bayes algorithm to identify topics,introduced the core part of the focused crawler,including the generation of seed URL collection,the page analysis and feature extraction and the topic identify.Compared the focused crawler based on Naive Bayes algorithm with the focused crawler base on links analysis and thesaurus,the experiment result proved that the focused crawler based on Naive Bayes algorithm has better accuracy and the method is feasible.It laid good foundation for the topic information collection.

Keywords:	Naive Bayes algorithm focused crawler topic correlativity information collection
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏