Web image indexing by using associated texts |
| |
Authors: | Zhiguo Gong Leong Hou U. Chan Wa Cheang |
| |
Affiliation: | (1) Faculty of Science and Technology, University of Macau, Macao, P.R. China |
| |
Abstract: | In order to index Web images, the whole associated texts are partitioned into a sequence of text blocks, then the local relevance of a term to the corresponding image is calculated with respect to both its local occurrence in the block and the distance of the block to the image. Thus, the overall relevance of a term is determined as the sum of all its local weight values multiplied by the corresponding distance factors of the text blocks. In the present approach, the associated text of a Web image is firstly partitioned into three parts, including a page-oriented text (TM), a link-oriented text (LT), and a caption-oriented text (BT). Since the big size and semantic divergence, the caption-oriented text is further partitioned into finer blocks based on the tree structure of the tag elements within the BT text. During the processing, all heading nodes are pulled up in order to correlate with their semantic scopes, and a collapse algorithm is also exploited to remove the empty blocks. In our system, the relevant factors of the text blocks are determined by using a greedy Two-Way-Merging algorithm. Zhiguo Gong is an associate Professor in the Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macao, China. He received his BS, MS, and PhD from the Hebei Normal University, Peking University, and the Chinese Academy of Science in 1983, 1988, and 1998, respectively. His research interests include Distributed Database, Multimedia Database, Digital Library, Web Information Retrieval, and Web Mining. Leong Hou U is currently a Master Candidate in the Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macao, China. He received his BS from National Chi Nan University, Taiwan in 2003. His research interests include Web Information Retrieval and Web Mining. Chan Wa Cheang is currently a Master Candidate in the Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macao, China. He received his BS from the National Taiwan University, Taiwan in 2003. His research interests include Web Information Retrieval and Web Mining. |
| |
Keywords: | Web images Text-based Indexing Segmentation Retrieval |
本文献已被 SpringerLink 等数据库收录! |
|