Towards a theory of tables |
| |
Authors: | Matthew Hurst |
| |
Affiliation: | (1) Nielsen BuzzMetrics, 56, West 22nd Street, 3rd Floor, New York, NY 10010, USA |
| |
Abstract: | Tables appearing in natural language documents provide a compact method for presenting relational information in an immediate and intuitive manner, while simultaneously organizing and indexing that information. Despite their ubiquity and obvious utility, tables have not received the same level of formal characterization enjoyed by sentential text. Rather, they are modeled in terms of geometry, simple hierarchies of strings and database-like relational structures. Tables have been the focus of a large volume of research in the document image analysis field and lately, have received particular attention from researchers interested in extracting information from non-trivial elements of web pages. This paper provides a framework for representing tables at both the semantic and structural levels. It presents a representation of the indexing structures present in tables and the relationship between these structures and the underlying categories. Matthew Hurst graduated from Edinburgh University in 1992 and completed an MPhil at Cambridge in Computer Speech and Language Processing. He then worked at The University of Edinburgh on a number of projects involving text and document analysis before enroling in the PhD programme. While studying for his PhD, he completed a European Science and Technology Fellowship in Japan. After working for IBM Research, Tokyo he moved tothe United States of America to work for a number of companies with unique applications utilizing applied natural language processing and document analysis. He is currently the Director of Science and Innovation at Nielsen BuzzMetrics. |
| |
Keywords: | Table understanding Information extraction |
本文献已被 SpringerLink 等数据库收录! |
|