首页 | 本学科首页   官方微博 | 高级检索  
     

基于共现词查询的主题爬虫研究
引用本文:葛玲,蒋宗礼.基于共现词查询的主题爬虫研究[J].计算机工程,2010,36(8):286-288.
作者姓名:葛玲  蒋宗礼
作者单位:北京工业大学计算机学院,北京,100124
摘    要:通过建立一个共现词库改进主题模型,以提高下载网页的主题相关度及质量,并且能描述其语境的上下文,揣测用户意图,调节检索结果排序。在此基础上设计并实现一个FDC主题爬虫系统,该系统采用改进的主题敏感FDC-PageRank算法来计算网页优先级。实验表明其效果良好。

关 键 词:主题爬虫  共现词  FDC主题模型  FDC_Topic  Sensitive  PageRank算法
修稿时间: 

Research of Co-occurrence Words Search-based Topic Crawler
GE Ling,JIANG Zong-li.Research of Co-occurrence Words Search-based Topic Crawler[J].Computer Engineering,2010,36(8):286-288.
Authors:GE Ling  JIANG Zong-li
Affiliation:(College of Computer, Beijing University of Technology, Beijing 100124)
Abstract:This paper improves the topic mode through a co-occurrence words database. The topic mode can advance the rate of relationship and quality. Besides, it can describe the environment of key words, conjecture the purpose of users and adjust the rank of search result. A topic crawler system which employs topic sensitive FDC-PageRank to predict the priority of Web page is designed and implemented. Experiments show the system performs well.
Keywords:topic crawler  co-occurrence words  FDC topic model  FDC_Topic Sensitive PageRank algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号