Visualizing the structure of Web communities based on data acquired from a search engine |
| |
Authors: | Murata T. |
| |
Affiliation: | Nat. Inst. of Informatics, Tokyo, Japan; |
| |
Abstract: | Discovery of Web communities, groups of Web pages sharing common interests, is important for assisting users' information retrieval from the Web. This paper describes a method for visualizing Web communities and their internal structures. visualization of Web communities in the form of graphs enables users to access related pages easily, and it often reflects the characteristics of the Web communities. Since related Web pages are often co-referred from the same Web page, the number of co-occurrences of references in a search engine is used for measuring the relation among pages. Two URLs are given to a search engine as keywords, and the value of the number of pages searched from both URLs divided by the number of pages searched from either URL, which is called the Jaccard coefficient, is calculated as the criteria for evaluating the relation between the two URLs. The value is used for determining the length of an edge in a graph so that vertices of related pages will be located close to each other. Our visualization system based on the method succeeds in clarifying various genres of Web communities, although the system does not interpret the contents of the pages. The method of calculating the Jaccard coefficient is easily processed by computer systems, and it is suitable for visualization using the data acquired from a search engine. |
| |
Keywords: | |
|
|