首页 | 本学科首页   官方微博 | 高级检索  
     


The Term Vector Database: fast access to indexing terms for Web pages
Authors:Raymie Stata  Krishna Bharat  Farzin Maghoul
Abstract:We have built a database that provides term vector information for large numbers of pages (hundreds of millions). The basic operation of the database is to take URLs and return term vectors. Compared to computing vectors by downloading pages via HTTP, the Term Vector Database is several orders of magnitude faster, enabling a large class of applications that would be impractical without such a database. This paper describes the Term Vector Database in detail. It also reports on two applications built on top of the database. The first application is an optimization of connectivity-based topic distillation. The second application is a Web page classifier used to annotate results returned by a Web search engine.
Keywords:Page classification  Term vectors  Topic distillation  Web connectivity  Web search
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号