KPS: a Web information mining algorithm |
| |
Affiliation: | 1. Department of Marketing, Clemson University, United States;2. Department of Management, Clemson University, United States;1. Department of Information Systems, College of Business, City University of, Hong Kong, Hong Kong Special Administrative Region;2. School of Information, Renmin University of China, Beijing 100872, PR China;3. Smart City Research Center, Renmin University of China, Beijing 100872, PR China |
| |
Abstract: | The Web mostly contains semi-structured information. It is, however, not easy to search and extract structural data hidden in a Web page. Current practices address this problem by (1) syntax analysis (i.e. HTML tags); or (2) wrappers or user-defined declarative languages. The former is only suitable for highly structured Web sites and the latter is time-consuming and offers low scalability. Wrappers could handle tens, but certainly not thousands, of information sources. In this paper, we present a novel information mining algorithm, namely KPS, over semi-structured information on the Web. KPS employs keywords, patterns and/or samples to mine the desired information. Experimental results show that KPS is more efficient than existing Web extracting methods. |
| |
Keywords: | |
本文献已被 ScienceDirect 等数据库收录! |
|