The Term Vector Database: fast access to indexing terms for Web pages |
| |
Authors: | Raymie Stata Krishna Bharat Farzin Maghoul |
| |
Abstract: | We have built a database that provides term vector information for large numbers of pages (hundreds of millions). The basic operation of the database is to take URLs and return term vectors. Compared to computing vectors by downloading pages via HTTP, the Term Vector Database is several orders of magnitude faster, enabling a large class of applications that would be impractical without such a database. This paper describes the Term Vector Database in detail. It also reports on two applications built on top of the database. The first application is an optimization of connectivity-based topic distillation. The second application is a Web page classifier used to annotate results returned by a Web search engine. |
| |
Keywords: | Page classification Term vectors Topic distillation Web connectivity Web search |
本文献已被 ScienceDirect 等数据库收录! |