iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing

doi:10.1007/S10766-020-00690-Y

Open AccessJournal ArticleDOI

iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing

Menbere Kina Tekleyohannes, +5 more

- 01 Apr 2021 -

International Journal of Parallel Progra...

- Vol. 49, Iss: 2, pp 253-284

TLDR

In this paper, the authors proposed a low power energy-efficient accelerator with real-time capabilities called iDocChip, which is a configurable hybrid hardware-software programmable system-on-chip (SoC) based anyOCR for digitizing historical documents.

Abstract:

In recent years, $$\hbox {optical character recognition (OCR)}$$ systems have been used to digitally preserve historical archives. To transcribe historical archives into a machine-readable form, first, the documents are scanned, then an $$\hbox {OCR}$$ is applied. In order to digitize documents without the need to remove them from where they are archived, it is valuable to have a portable device that combines scanning and $$\hbox {OCR}$$ capabilities. Nowadays, there exist many commercial and open-source document digitization techniques, which are optimized for contemporary documents. However, they fail to give sufficient text recognition accuracy for transcribing historical documents due to the severe quality degradation of such documents. On the contrary, the anyOCR system, which is designed to mainly digitize historical documents, provides high accuracy. However, this comes at a cost of high computational complexity resulting in long runtime and high power consumption. To tackle these challenges, we propose a low power energy-efficient accelerator with real-time capabilities called iDocChip, which is a configurable hybrid hardware-software programmable $$\hbox {System-on-Chip (SoC)}$$ based on anyOCR for digitizing historical documents. In this paper, we focus on one of the most crucial processing steps in the anyOCR system: Text and Image Segmentation, which makes use of a multi-resolution morphology-based algorithm. Moreover, an optimized $$\hbox {FPGA}$$ -based hybrid architecture of this anyOCR step along with its optimized software implementations are presented. We demonstrate our results on multiple embedded and general-purpose platforms with respect to runtime and power consumption. The resulting hardware accelerator outperforms the existing anyOCR by 6.2 $$\times$$ , while achieving 207 $$\times$$ higher energy-efficiency and maintaining its high accuracy.

iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing

Citations

High-Performance Matrix Eigenvalue Decomposition Using the Parallel Jacobi Algorithm on FPGA

Adaptive Threshold-Based Database Preparation Method for Handwritten Image Classification

iDocChip: A Configurable Hardware Accelerator for an End-to-End Historical Document Image Processing.

References

Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA

Recurrent Neural Networks Hardware Implementation on FPGA

Text/non-text image classification in the wild with convolutional neural networks

A comprehensive survey of mostly textual document segmentation algorithms since 2008

Document image segmentation using discriminative learning over connected components

Related Papers (5)

Design of an Embedded Arabic Optical Character Recognition

Performance characterization and acceleration of Optical Character Recognition on handheld platforms

Model-integrated program synthesis for real-time image processing

Evolution Of Image Processing Algorithms From Software To Hardware

Co-design of software and hardware to implement remote sensing algorithms