期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

全文获取类型

收费全文	2篇
免费	0篇

专业分类

自动化技术

2篇

出版年

2021年	1篇
2018年	1篇

排序方式： 共有2条查询结果，搜索用时 62 毫秒

Making scanned Arabic documents machine accessible using an ensemble of SVM classifiers

Elanwar Randa Qin Wenda Betke Margrit 《International Journal on Document Analysis and Recognition》2018,21(1-2):59-75

International Journal on Document Analysis and Recognition (IJDAR) - Raster-image PDF files originating from scanning or photographing paper documents are inaccessible to both text search engines... 相似文献

Extracting text from scanned Arabic books: a large-scale benchmark dataset and a fine-tuned Faster-R-CNN model

Elanwar Randa Qin Wenda Betke Margrit Wijaya Derry 《International Journal on Document Analysis and Recognition》2021,24(4):349-362

Datasets of documents in Arabic are urgently needed to promote computer vision and natural language processing research that addresses the specifics of the language. Unfortunately, publicly available Arabic datasets are limited in size and restricted to certain document domains. This paper presents the release of BE-Arabic-9K, a dataset of more than 9000 high-quality scanned images from over 700 Arabic books. Among these, 1500 images have been manually segmented into regions and labeled by their functionality. BE-Arabic-9K includes book pages with a wide variety of complex layouts and page contents, making it suitable for various document layout analysis and text recognition research tasks. The paper also presents a page layout segmentation and text extraction baseline model based on fine-tuned Faster R-CNN structure (FFRA). This baseline model yields cross-validation results with an average accuracy of 99.4% and F1 score of 99.1% for text versus non-text block classification on 1500 annotated images of BE-Arabic-9K. These results are remarkably better than those of the state-of-the-art Arabic book page segmentation system ECDP. FFRA also outperforms three other prior systems when tested on a competition benchmark dataset, making it an outstanding baseline model to challenge.

相似文献