首页 | 本学科首页   官方微博 | 高级检索  
     


Logical structure analysis: From HTML to XML
Affiliation:Department of Computer Science, Yonsei University, 134, Shinchon-dong, Sudaemoon-ku, Seoul, 120-749, Korea;Department of Information and Communication Technology, University of Trento, Trento, Italy;Department of Engineering, University of Sannio, Benevento, Italy
Abstract:This paper presents an efficient method for extracting a logical structure from a Web document. The proposed method consists of three phases: visual grouping, element identification, and logical grouping. To produce a logical structure more accurately, the proposed method defines a document model that is able to describe logical structure information of a specific document class. Since the proposed method is based on a visual structure from the visual grouping phase as well as a document model that describes logical structure information of a document type, it supports sophisticated structure analysis. Experimental results with HTML documents from the Web show that the method has performed logical structure analysis successfully, compared with previous work. Particularly, the method generates XML documents as the result of structure analysis, so that it enhances the reusability of documents.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号