Detecting figures and part labels in patents: competition-based development of graphics recognition algorithms |
| |
Authors: | Christoph Riedl Richard Zanibbi Marti A. Hearst Siyu Zhu Michael Menietti Jason Crusan Ivan Metelsky Karim R. Lakhani |
| |
Affiliation: | 1.D’Amore-McKim School of Business, and College of Computer and Information Science,Northeastern University,Boston,USA;2.Department of Computer Science,Rochester Institute of Technology,Rochester,USA;3.School of Information,UC Berkeley,Berkeley,USA;4.Center for Imaging Science,Rochester Institute of Technology,Rochester,USA;5.Institute for Quantitative Social Science,Harvard University,Cambridge,USA;6.Advanced Exploration Systems Division,NASA,Washington,USA;7.TopCoder Inc.,Glastonbury,USA;8.Department of Technology and Operations Management,Harvard Business School,Boston,USA |
| |
Abstract: | Most United States Patent and Trademark Office (USPTO) patent documents contain drawing pages which describe inventions graphically. By convention and by rule, these drawings contain figures and parts that are annotated with numbered labels but not with text. As a result, readers must scan the document to find the description of a given part label. To make progress toward automatic creation of ‘tool-tips’ and hyperlinks from part labels to their associated descriptions, the USPTO hosted a monthlong online competition in which participants developed algorithms to detect figures and diagram part labels. The challenge drew 232 teams of two, of which 70 teams (30 %) submitted solutions. An unusual feature was that each patent was represented by a 300-dpi page scan along with an HTML file containing patent text, allowing integration of text processing and graphics recognition in participant algorithms. The design and performance of the top-5 systems are presented along with a system developed after the competition, illustrating that the winning teams produced near state-of-the-art results under strict time and computation constraints. The first place system used the provided HTML text, obtaining a harmonic mean of recall and precision (F-measure) of 88.57 % for figure region detection, 78.81 % for figure regions with correctly recognized figure titles, and 70.98 % for part label detection and recognition. Data and source code for the top-5 systems are available through the online UCI Machine Learning Repository to support follow-on work by others in the document recognition community. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|