首页 | 本学科首页   官方微博 | 高级检索  
     


Towards a theory of tables
Authors:Matthew Hurst
Affiliation:(1) Nielsen BuzzMetrics, 56, West 22nd Street, 3rd Floor, New York, NY 10010, USA
Abstract:Tables appearing in natural language documents provide a compact method for presenting relational information in an immediate and intuitive manner, while simultaneously organizing and indexing that information. Despite their ubiquity and obvious utility, tables have not received the same level of formal characterization enjoyed by sentential text. Rather, they are modeled in terms of geometry, simple hierarchies of strings and database-like relational structures. Tables have been the focus of a large volume of research in the document image analysis field and lately, have received particular attention from researchers interested in extracting information from non-trivial elements of web pages. This paper provides a framework for representing tables at both the semantic and structural levels. It presents a representation of the indexing structures present in tables and the relationship between these structures and the underlying categories. Matthew Hurst graduated from Edinburgh University in 1992 and completed an MPhil at Cambridge in Computer Speech and Language Processing. He then worked at The University of Edinburgh on a number of projects involving text and document analysis before enroling in the PhD programme. While studying for his PhD, he completed a European Science and Technology Fellowship in Japan. After working for IBM Research, Tokyo he moved tothe United States of America to work for a number of companies with unique applications utilizing applied natural language processing and document analysis. He is currently the Director of Science and Innovation at Nielsen BuzzMetrics.
Keywords:Table understanding  Information extraction
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号