首页 | 本学科首页   官方微博 | 高级检索  
     

支持向量机在化学主题爬虫中的应用
引用本文:祝宇,夏诏杰,聂峰光,郭力.支持向量机在化学主题爬虫中的应用[J].计算机与应用化学,2006,23(4):329-332.
作者姓名:祝宇  夏诏杰  聂峰光  郭力
作者单位:1. 中国科学院过程工程研究所多相反应实验室,北京,100080;中国科学院研究生院,北京,100049
2. 中国科学院过程工程研究所多相反应实验室,北京,100080
摘    要:爬虫是搜索引擎的重要组成部分,它沿着网页中的超链接自动爬行,搜集各种资源。为了提高对特定主题资源的采集效率,文本分类技术被用来指导爬虫的爬行。本文把基于支持向量机的文本自动分类技术应用到化学主题爬虫中,通过SVM 分类器对爬行的网页进行打分,用于指导它爬行化学相关网页。通过与基于广度优先算法的非主题爬虫和基于关键词匹配算法的主题爬虫的比较,表明基于SVM分类器的主题爬虫能有效地提高针对化学Web资源的采集效率。

关 键 词:支持向量机(SVM)  化学主题爬虫  文本分类  搜索引擎
文章编号:1001-4160(2006)04-329-332
收稿时间:2005-08-16
修稿时间:2005-08-162005-12-28

Research on chemistry focused crawler with support vector machine classifier
Zhu Yu,Xia Zhaojie,Nie Fengguang,Guo Li.Research on chemistry focused crawler with support vector machine classifier[J].Computers and Applied Chemistry,2006,23(4):329-332.
Authors:Zhu Yu  Xia Zhaojie  Nie Fengguang  Guo Li
Affiliation:1. Lab of Multi-Phase Reaction, Institute of Process Engineering, Chinese Academy of Sciences, Beijing, 100080, China; 2. Graduate University of Chinese Academy of Sciences, Beijing, 100049, China
Abstract:Crawler is an important component of search engine, which collects Web pages through hyperlink between the pages. In order to enhance the performance of topic-specific search engines, text categorization techniques can be used to direct the crawling of focused crawlers. Based on Support Vector Machine, a new chemistry focused crawler is proposed in this paper. It can guide the focused crawler to collect the chemistry Web pages, and ignore the irrelevant information. The experiment results show that the focused crawler with SVM classifier is more effective to collect chemistry relevant pages, compared to the crawlers based on breadth first and keyword matching.
Keywords:support vector machine  chemistry focused crawler  text categorization  search engine
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号