scispace - formally typeset
Search or ask a question

Showing papers on "Document processing published in 2022"


Journal ArticleDOI
TL;DR: This article proposed a novel OCR-free VDU model named Donut, which stands for Document understanding transformer and achieved state-of-the-art performance in terms of both speed and accuracy.
Abstract: Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs. Although such OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility of OCR models on languages or types of documents; 3) OCR error propagation to the subsequent process. To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. As the first step in OCR-free VDU research, we propose a simple architecture (i.e., Transformer) with a pre-training objective (i.e., cross-entropy loss). Donut is conceptually simple yet effective. Through extensive experiments and analyses, we show a simple OCR-free VDU model, Donut, achieves state-of-the-art performances on various VDU tasks in terms of both speed and accuracy. In addition, we offer a synthetic data generator that helps the model pre-training to be flexible in various languages and domains. The code, trained model, and synthetic data are available at https://github.com/clovaai/donut .

3 citations


Proceedings ArticleDOI
22 Jun 2022
TL;DR: The primary objective of this is to explain the significance of preprocessing techniques for offline HWCR in Tamil script.
Abstract: Handwritten Character Recognition (HWCR) is one of the difficult tasks in the field of pattern recognition and machine learning. In the HWCR application, human handwritten characters are recognized by the computer. Moreover, this can also be utilized in other applications such as postal processing script recognition, banking security, and scripting language identification. In the Handwritten Character recognition, the preprocessing phase has a great significance to improve the character recognition accuracy. In this paper, various preprocessing techniques, used for offline handwritten character recognition are discussed. Additionally, the various challenges concerning offline handwritten recognition are also addressed in this paper. The primary objective of this is to explain the significance of preprocessing techniques for offline HWCR in Tamil script.

2 citations


Book ChapterDOI
12 Feb 2022
TL;DR: This paper proposed a recognition-free QA approach for handwritten document image collections, which outperformed the state-of-the-art classification-free models on the challenging BenthamQA and HW-SQuAD datasets.
Abstract: In recent years, considerable progress has been made in the research area of Question Answering (QA) on document images. Current QA approaches from the Document Image Analysis community are mainly focusing on machine-printed documents and perform rather limited on handwriting. This is mainly due to the reduced recognition performance on handwritten documents. To tackle this problem, we propose a recognition-free QA approach, especially designed for handwritten document image collections. We present a robust document retrieval method, as well as two QA models. Our approaches outperform the state-of-the-art recognition-free models on the challenging BenthamQA and HW-SQuAD datasets.

2 citations


Journal ArticleDOI
TL;DR: Several algorithms for data pre-processing including image deskewing, table and document layout analysis to improve the accuracy of the OCR model are investigated and an end-to-end scanned document management system is built.
Abstract: The quality of the document images is a crucial factor for the performance of an Optical Character Recognition (OCR) model. Various issues from the input data hinder the recognition success such as heterogeneous layouts, skewness and proportional fonts. This paper investigated several algorithms for data pre-processing including image deskewing, table and document layout analysis to improve the accuracy of the OCR model and then built an end-to-end scanned document management system. We verified the algorithms using a well-known OCR software namely Tesseract. The experiments on a real dataset shown that our methods can accurately process document images with arbitrary angles of rotation, and different layouts. As a result, the accuracy by words of Tesseract can boost 23% for documents with complex structures. The quality of the output text allows to build a system to store and search documents efficiently. Index

2 citations


Proceedings ArticleDOI
10 Jan 2022
TL;DR: In this article , the authors proposed CNN and BiLSTM models for text recognition and achieved 92% character recognition accuracy on the IAM dataset and deployed to the Firebase as a custom model to increase usability.
Abstract: Nowadays, Storing information from handwritten documents for future use is becoming necessary. An easy way to store information is to capture handwritten documents and save them in image format. Recognizing the text or characters present in the image is called Optical Character Recognition. Text extraction from the image in the recent research is challenging due to stroke variation, inconsistent writing style, Cursive handwriting, etc. We have proposed CNN and BiLSTM models for text recognition in this work. This model is evaluated on the IAM dataset and achieved 92% character recognition accuracy. This model is deployed to the Firebase as a custom model to increase usability. We have developed an android application that will allow the user to capture or browse the image and extract the text from the picture by calling the firebase model and saving text in the file. To store the text file user can browse for the appropriate location. The proposed model works on both printed and handwritten text.

1 citations


Proceedings ArticleDOI
12 Mar 2022
TL;DR: In this paper , a multilingual character recognition system using character image geometry features and Artificial Neural Networks was developed to recognize printed Sinhala and English scripts together, which achieved an 85% success rate with a database containing around 800 images.
Abstract: The optical character recognition technique is used to convert information, mainly printed or handwritten text in paper materials, into an electronic format that the computers can edit. According to the literature, there are few competent OCR systems for recognizing multilingual characters in the form of Sinhala and English characters together. The lack of an appropriate technology to recognize multilingual text still remains as a problem that the current research community must address, and it has been designated as the key problem for this study. The main goal of this research is to develop a multilingual character recognition system that uses character image geometry features and Artificial Neural Networks to recognize printed Sinhala and English scripts together. It is intended that the solution would be improved to cover three Sri Lanka’s most commonly spoken languages, with the addition of Tamil as a later upgrade. The primary technologies for this study were character geometry features and Artificial Neural Networks. At the moment almost an 85% of success rate has been achieved with a database containing around 800 images, which are divided into 46 characters (20 Sinhala and 26 English), and each character is represented in 20 different forms of character images. Recognition of text from printed bi-lingual documents is experimented by extracting individual character data from such printed text documents and feeding them to the system.

1 citations


Proceedings ArticleDOI
23 Sep 2022
TL;DR: In this paper , a review of OCR-related works and the methods used within this framework to support further research is presented, which can be classified into Image Preprocessing, Text Segmentation/Localization, Feature Extraction, Text Recognition, and Post Processing.
Abstract: Most organizations worldwide still rely on paper-based documents. Usage of paper-based documents gives a hard time extracting the data required from those documents. This heavy paper usage also damages the efficiency in cost and time, not to mention the impact on the environment caused by deforestation to produce these papers. These are some reasons that motivate the need to digitalize paper-based documents. To convert the usage of paper-based documents into paperless documents cannot be done in an instant. In its transition, these paper-based documents are usually scanned into image format to reduce the usage of paper. From this comes a need for technology that is able to recognize and extract data in the scanned image of paper-based documents. Optical Character Recognition makes it possible to do text recognition appearing in images. However, despite its long history of development, OCR for text recognition has yet to achieve 100% accuracy. In general, OCR process will be divided into Image Pre-processing, Text Segmentation/Localization, Feature Extraction, Text Recognition, and Post-Processing. Thus, this research will review OCR-related works and the methods used within this framework to support further research.

1 citations


Posted ContentDOI
Giouli Korobili1
19 May 2022
TL;DR: In this paper , the authors line-out the steps and stages used in the recognition of Kannada handwritten words and proposed a machine learning algorithm for handwritten answer written in answer booklet and solved recognition problem by using machine learning algorithms.
Abstract: Abstract Handwriting recognition has been an issue of concern for many researchers and analysts throughout the previous few decades. Different applications need solution to recognize the cursive nature of handwritten text. The stated nature of written styles needs to implement. To build an efficient working OCR the main drawback is to preprocess noisy document, segment the word, character and then recognize the written text. This paper comprises the needs, relevant research towards handwritten recognition and how to process. We line-out the steps and stages used in the recognition of Kannada handwritten words. The main aim of proposed work is to identify Kannada handwritten answer written in answer booklet and to solve recognition problem by using machine learning algorithms. System provides a detailed concept on pre-processing, segmentation, classifier used to develop systematic OCR tool. The achieved accuracy is of 90% for Kannada handwritten words.

1 citations


Journal ArticleDOI
TL;DR: In this article , the statistical linguistic features obtained from text corpora and OCR text datasets and employed in OCR post-processing approaches are discussed and two statistical language models based on these linguistic features and their OCR error correction performances on two published databases attracting research efforts in text recognition and correction are presented and discussed.
Abstract: Low-quality Optical Character Recognition systems often result in different kinds of errors in OCR-generated texts. Hence, OCR error detection and correction are essential OCR post-processing tasks for improving the OCR text readability and usability. In this paper, we present and discuss the statistical linguistic features obtained from text corpora and OCR text datasets and employed in OCR post-processing approaches. In addition, we show our two statistical language models based on these linguistic features and their OCR error correction performances on two published databases attracting research efforts in text recognition and correction, one database in the ICDAR 2017 OCR post-correction competition and the other database in the Vietnamese online handwriting recognition competition.

1 citations


Proceedings ArticleDOI
20 May 2022
TL;DR: Optical character recognition (OCR) as mentioned in this paper is a rapidly growing topic of research aimed at creatinga computer system that can automatically extract and interpret text from images, such as handwritten text, printed text, or scanned text images.
Abstract: Almost all institutions and organizations rely substantially on data to run their operations. Data is necessary for making informed decisions, adapting to change, and defining strategic objectives. Data administration has always relied on manual data entry. Manual input is used to entail transferring data from various documents into record books, ledger books, and other such books. Manual data entry, as used in recent years, comprises manually entering particular and predetermined data into a target program, such as customer name, business kind, money amount, and so on, from various sources, such as paper bills, invoices, orders, receipts, and so on. Depending on the sort of business, the target program can be handwritten records, spreadsheets, or computer databases. Several businesses require manual data entry, which has a high rate of mistakes. This is because the manual approach places far too much reliance on the ability of humans tocomprehend handwritten documents. As a result, a method for retrieving and storing information from images, particularly text, is required. OCR (optical character recognition) is a rapidly growing topic of research aimed at creatinga computer system that can automatically extract and interpret text from images. OCR converts any type of text or text-containing documents, such as handwritten text, printed text, or scanned text images, into an editable digital format for deeper and more complex processing. As a result, OCR enables a machine to recognize text in such documents without the need for human intervention. In order to achieve successful automation, a few significant difficulties must be identified and resolved. One of the most pressing issues is the quality of character typefaces in paper documents, as well as image quality. The computer system may not correctly recognize characters as a result of these difficulties. We examine OCR utilizing four different approaches in this research. We begin by laying down all of the possible issues that may arise during the OCR stages. We next go over the pre-processing, segmentation, normalization, feature extraction, classification, and post- processing aspects of an OCR system. As a result, this conversation paints a rather complete picture of the current state of text recognition domain.

1 citations


Proceedings ArticleDOI
17 Aug 2022
TL;DR: In this article , the authors used an optical character recognition (OCR) tool to convert an image of a handwritten response sheet into a text document, and the algorithm will look for red-colored digits on the answer sheet, extract the numbers from the response sheet, and then add the digits to get the total marks for the final content.
Abstract: The educational system went online during the outbreak, which resulted in computerized test submissions. Because of the Covid-19 outbreak, the paper assessment process in schools and universities has moved online. Hand-counting the marks on a large number of documents is a time-consuming and laborious operation. Evaluators digitally correct the script before manually counting the marks as they return to previous pages, a task that has become one of the most time-consuming. The length of time it takes to count the marks increases as the number of papers increases. This programme captures images with a scanner device, such as a smartphone, and converts them to a portable document format. Using an optical character recognition (OCR) tool, the recommendation system converts an image of a handwritten response sheet into a text document. The algorithm will look for red-colored digits on the answer sheet, extract the numbers from the response sheet, and then add the digits to get the total marks for the final content. The accuracy obtained by comparing the three different OCR in the proposed system are as follows Intelligent character recognition (ICR) accuracy – 89%, Convolutional neural network (CNN) accuracy – 86% and Intelligent word recognition (IWR) accuracy – 84%.

Proceedings ArticleDOI
15 May 2022
TL;DR: In this article , a new approach for common parameter tuning is proposed, which reduces the number of required parameters and the adjustment range of the parameters is decreased, which shows a recall of 94% in the experimental results.
Abstract: Extracting personal information from the documents is important to protect personal data. Personal information in a form or document with a standard structure can be determined by using the methods applied in the fields of machine learning, natural language processing, image processing, optical character recognition, etc. However, if the document does not have a standard structure, analyzing the layout of this document and processing the different document or form structures in the document separately may be necessary. There are different methods used for document layout analysis in the literature. However, each of these methods has its own parameters. In this study, a new approach for common parameter tuning is proposed. With the proposed approach, the number of required parameters is reduced and the adjustment range of the parameters is decreased. The proposed approach showed a recall of 94% in the experimental results.

Book ChapterDOI
23 Jun 2022
TL;DR: In this article , an algorithm that utilized projection profile, bounding box, and connected component labeling techniques for the development of Gujarati handwritten dataset and segmentation of handwritten text from Gujarati documents into line, word, and characters.
Abstract: Modern innovations make great impacts on the human lifestyle and their way of working. It boosts the efficiency and productivity of people by reducing efforts, which help to handle several tasks at a time. Nowadays, all the government offices, banks, businesses, and education systems are influenced by paperless technology. It improves documentation and the secure sharing of information by saving space and resources. Paperless technology works on Optical Character Recognition (OCR) to convert all physical documents into machine-based documents. OCR consists of mainly two steps: segmentation, and recognition. The success rate of character recognition depends on the segmentation of required regions of interest. This paper introduced an algorithm that utilized projection profile, bounding box, and connected component labeling techniques for the development of Gujarati handwritten dataset and segmentation of handwritten text from Gujarati documents into line, word, and characters.

Book ChapterDOI
01 Jan 2022
TL;DR: In this article , a generic and integrated bilingual English-Hindi document classification system is proposed, which classifies heterogeneous documents using a dual class feeder and two character corpora.
Abstract: Today, rapid digitization requires efficient bilingual non-image and image document classification systems. Although many bilingual NLP and image-based systems provide solutions for real-world problems, they primarily focus on text extraction, identification, and recognition tasks with limited document types. This article discusses a journey of these systems and provides an overview of their methods, feature extraction techniques, document sets, classifiers, and accuracy for English-Hindi and other language pairs. The gaps found lead toward the idea of a generic and integrated bilingual English-Hindi document classification system, which classifies heterogeneous documents using a dual class feeder and two character corpora. Its non-image and image modules include pre- and post-processing stages and pre-and post-segmentation stages to classify documents into predefined classes. This article discusses many real-life applications on societal and commercial issues. The analytical results show important findings of existing and proposed systems.

Posted ContentDOI
12 Feb 2022
TL;DR: The authors proposed a recognition-free QA approach for handwritten document image collections, which outperformed the state-of-the-art classification-free models on the challenging BenthamQA and HW-SQuAD datasets.
Abstract: In recent years, considerable progress has been made in the research area of Question Answering (QA) on document images. Current QA approaches from the Document Image Analysis community are mainly focusing on machine-printed documents and perform rather limited on handwriting. This is mainly due to the reduced recognition performance on handwritten documents. To tackle this problem, we propose a recognition-free QA approach, especially designed for handwritten document image collections. We present a robust document retrieval method, as well as two QA models. Our approaches outperform the state-of-the-art recognition-free models on the challenging BenthamQA and HW-SQuAD datasets.


Journal ArticleDOI
TL;DR: A thorough analysis of the handwritten character recognition field can be found in this article , where several strategies have been proposed for character recognition in a system for handwriting recognition, and a substantial number of studies and papers outline the methods for transforming the text of a paper document into a machine-readable format.
Abstract: Due to its widespread use, handwriting recognition has drawn a lot of interest in the domains of pattern recognition and machine learning. The application domain for optical character recognition (OCR) and handwritten character recognition (HCR) is specific. For character recognition in a system for handwriting recognition, several strategies have been proposed. Despite this, a substantial number of studies and papers outline the methods for transforming the text of a paper document into a machine-readable format. Character recognition (CR) technology may be crucial in the near future in order to process and digitize existing paper documents in order to establish a paperless environment. This essay offers a thorough analysis of the handwritten character recognition field.

Proceedings ArticleDOI
07 Oct 2022
TL;DR: In this article , the authors proposed a system which takes image as an input and will automatically detect and display the digit present in it, which will save the time and efforts and will reduce the chances of involvement of human error in the system.
Abstract: The hand-written recognition of digits is a maj or step in many document processing and analysis applications using digital image processing technology. Image processing and Machine Learning are fast-growing domain with growing applications in engineering and computer science. Pre-processing and analysis of documents are becoming more popular in many pattern recognition applications. The aim of this paper is to propose a system which will take image as an input and will automatically detect and display the digit present in it. This will save the time and efforts and will reduce the chances of involvement of human error in the system. This system can be further used in many applications like reading postal addresses, bank check amount, forms etc. In the proposed system input image with handwritten digit will be taken as input and after pre-processing it the image will be given to the model for further prediction. The output will be saved in a text file for future use.

Journal ArticleDOI
TL;DR: The main objective is to bridge the gap between the actual paper and the digital world and in doing so, one can operate on the digital data much faster as compared to the actual data.
Abstract: Image Processing is a vital tool when one is dealing with several images and wishes to perform several complex actions on the same. With advances in technologies, one can now compress, manipulate, extract required information etc. One such application of Image Processing is detecting handwritten text and converting it into a digital text format. The main objective is to bridge the gap between the actual paper and the digital world and in doing so, one can operate on the digital data much faster as compared to the actual data. The detection of handwritten text via optical Character Recognition.

Posted ContentDOI
05 Dec 2022
TL;DR: Universal Document Processing (UDOP) as mentioned in this paper unifies text, image, and layout modalities together with varied task formats, including document understanding and generation, for document editing and content customization.
Abstract: We propose Universal Document Processing (UDOP), a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation. UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation. With a novel Vision-Text-Layout Transformer, UDOP unifies pretraining and multi-domain downstream tasks into a prompt-based sequence generation scheme. UDOP is pretrained on both large-scale unlabeled document corpora using innovative self-supervised objectives and diverse labeled data. UDOP also learns to generate document images from text and layout modalities via masked image reconstruction. To the best of our knowledge, this is the first time in the field of document AI that one model simultaneously achieves high-quality neural document editing and content customization. Our method sets the state-of-the-art on 8 Document AI tasks, e.g., document understanding and QA, across diverse data domains like finance reports, academic papers, and websites. UDOP ranks first on the leaderboard of the Document Understanding Benchmark.

Journal ArticleDOI
TL;DR: A Dogra handwriting character dataset which contains around 38690 character images etc grouped in 73 character classes extracted from 530 one-page handwritings of 265 individuals of having variable age, sex, qualification, location is presented.
Abstract: Handwritten Text Recognition is an important area of research because of growing demand to process and convert a huge data and information available in handwritten form to Digital form. The digital data instead of handwritten form can prove to be highly useful in different fields. Handwritten text recognition plays an important role in applications involved in, postal services, banks for cheque processing, searching of information and organization dealing with such applications. In text recognition application dataset of the specified script is required for training purpose. Datasets of the different languages could be found online but dataset of dogra script characters is still not available. This paper presents a Dogra handwriting character dataset which contains around 38690 character images etc grouped in 73 character classes extracted from 530 one-page handwritings of 265 individuals of having variable age, sex, qualification, location. The dogra character dataset would be freely accessible by scholars and researchers which could also be used for further recognition improvement and updating with more characters and word, Identification of writer, dogra word segmentation. Dogra dataset could also be used for extracting variation of handwriting according to age and gender.

Proceedings ArticleDOI
15 Sep 2022
TL;DR: In this paper , the authors focused on extracting the ID card image characters into an editable format by implementing the Optical Character Recognition (OCR) in the mobile platform and achieved the best accuracy of 87.5% from 50 samples of Indonesian National ID cards with 2.137 characters recognized.
Abstract: The Indonesian Identity Card (ID card) has been widely used as authentication in various fields. One of the most common uses of ID cards is registration on various business processes that require some personal privacy data, including an ID number, full name, date of birth, address, and many more. This research focuses on extracting the ID card image characters into an editable format by implementing the Optical Character Recognition (OCR) in the mobile platform. There are three significant steps of the proposed algorithm: image pre-processing, character recognition, and character extraction. The output of this research is an application that enables the user to recognize some important information on the ID card image and save the data into a file by utilizing an Android-based smartphone. The system proposed achieved the best accuracy of 87.5% from 50 samples of Indonesian National ID cards with 2.137 characters recognized (the average is 43 words).

Proceedings ArticleDOI
28 Apr 2022
TL;DR: In this article , an approach for identifying electronic ID cards using image detection and Optical Character Recognition (OCR) was realized, which achieved 99 percent ID detection in a bank application.
Abstract: A broad range of computer vision studies have been conducted on the recognition of identity documents using mobile devices. A portfolio of techniques and algorithms for solving problems like face recognition, document detection and correction, text field recognition, and so on has been developed, and the lack of data sets has become a critical challenge. The electronic ID card has been commonly used to verify or recognize an individual since its discovery in 2011. Take note of several issues, such as high requirements for detecting the id-card field and challenges recognizing the data in the id-card. In this research, we realized an approach for identifying electronic ID cards using image detection and Optical Character Recognition (OCR). This paper provides a short outline of how banking institutions and other fintech companies utilize highly developed OCR tools to automate their processes, enhancing not just service quality, as well as the precision, speed, and protection of their transaction data. As an outcome of our image processing and OCR techniques, we can achieve 99 percent ID detection. This research was based on the interface of a mobile application, such as those owned by one of Kazakhstan's banks.

Book ChapterDOI
22 Sep 2022
TL;DR: In this article , a solution is proposed which uses open-source components to automate the process of data extraction from scanned documents with minimal user input, which is capable of extracting data from tables and stamps present in documents in a well-structured format.
Abstract: AbstractDigital technologies are now becoming part of all the sectors be it banking, automobile, infrastructure, and more. These technologies are empowered by “Data”. This is raising the need for the digitization of documents to fulfill the need for data for driving the digital transformation throughout sectors. Digitization requires the extraction of a huge amount of data from paper-based documents. Automating data extraction from paper-based documents can help in dealing with large volumes of data at a lower cost with lesser efforts. A solution is proposed which uses open-source components to automate the process of data extraction from scanned documents with minimal user input. The solution is capable of generating the structured output reflecting the document layout with the data in a document. The solution is capable of extracting data from tables and stamps present in documents in a well-structured format. The solution is driven by a configuration file, which can help in fine-tuning different processes to improve extracted data. The solution generates an XML for the scanned document which can be used further for storing and processing the data present in paper-based documents by different digital processes.KeywordsAutomationImage processingDocument extractionTemplate less extractionOptical character recognitionStructured data extractionTable extractionScanned document XML

Book ChapterDOI
M. Malini1
01 Jan 2022
TL;DR: In this article , a survey on various techniques used to recognize handwritten characters in south regional languages, several datasets used in the process of training and testing the model is presented with the statistics report on the number of works done for two consecutive years in four regional languages in South India.
Abstract: Handwritten characters in a document are used in every part of life. The document is a proof of communication and need to restore digitally. Digital format of the handwritten document when scanned, the system should be able to recognize each character in the document and store. This pattern recognition in deep learning needs to be effectively performed. Using deep learning, the model adapts multiple neural networks to learn and test the enormous observations. The scope of the paper presents the survey on various techniques used to recognize handwritten characters in south regional languages, several datasets used in the process of training and testing the model is presented with the statistics report on the number of works done for two consecutive years in four regional languages in South India.

Journal ArticleDOI
TL;DR: The most important action in the pre-processing step is the conversion of a colour image into a binary image, which separates the text from the backdrop as discussed by the authors , aided by the segmentation process.
Abstract: The method of extracting text from photographs is crucial in the current environment. The amount of information being stored in digital form instead than on paper has greatly increased in recent years. This makes it easier to store information and allows for easy retrieval of that information as needed. Pre-processing, segmentation, feature extraction, classification, and post processing are some of the stages in text recognition. The most important action in the pre-processing step is the conversion of a colour image into a binary image, which separates the text from the backdrop. Character separation is aided by the segmentation process. The most important data from the image can be extracted using features to help in text recognition. The classification method makes it possible to locate the text in accordance with clearly stated guidelines. After that, post processing is carried out to minimise errors. Many applications depend heavily on text recognition. This paper provides the different uses of text recognition from photographs as well as a discussion of the text recognition module. This paper also examines related literature.

Proceedings ArticleDOI
23 Sep 2022
TL;DR: In this article , the authors used YOLOv4 for detection and recognition, Darknet is being used as the backbone in proposed paper, it has been found that the mean average precesion (mAP) for this proposed implementation is around 68.43%.
Abstract: The prime objective of an Optical Character Recognition (OCR) is to convert printed, handwritten words or characters into electronic text. The main aim of this paper is to detect and recognize the labels or part number that are engraved on the materials. Also, for further processing one can store the recognized data into the database with time and other related information as per the end user's requirements. This implementation is being conducted by using YOLOv4 for detection and recognition, Darknet is being used as the backbone in proposed paper. It has been found that the mean average precesion (mAP) for this proposed implementation is around 68.43%.

Book ChapterDOI
01 Jan 2023
TL;DR: In this article , the authors proposed multilanguage recognition translator (MLRT) mobile app to help the education system, especially for students who are new to Arabic, Malay, and English to learn the languages and translated to other languages.
Abstract: A mobile translator is a Phone’s app that lets user to translate between languages. In this paper, we proposed multilanguage recognition translator (MLRT) mobile app to help the education system, especially for students who are new to Arabic, Malay, and English to learn the languages and translated to other languages. OCR Methodology has been chosen for this project because it is the most appropriate methodology to develop a mobile application. Data acquisition, pre-processing, segmentation, feature extraction, classification, and post-processing are the six phases for OCR methodology. The Convolutional Neural Network (CNN) algorithm is used by deep learning to identify objects in image and Optical Character Recognition (OCR) is used for feature extraction to process Arabic words and translate them into Malay. A system architecture has been created to provide an overview of how the application will run and the functionality and the framework of output to show the application works.

Posted ContentDOI
30 Mar 2022
TL;DR: This paper introduced Dessurt, a relatively simple document understanding transformer capable of being fine-tuned on a greater variety of document tasks than prior methods, which receives a document image and task string as input and generates arbitrary text autoregressively as output.
Abstract: We introduce Dessurt, a relatively simple document understanding transformer capable of being fine-tuned on a greater variety of document tasks than prior methods. It receives a document image and task string as input and generates arbitrary text autoregressively as output. Because Dessurt is an end-to-end architecture that performs text recognition in addition to the document understanding, it does not require an external recognition model as prior methods do. Dessurt is a more flexible model than prior methods and is able to handle a variety of document domains and tasks. We show that this model is effective at 9 different dataset-task combinations.

Proceedings ArticleDOI
09 Nov 2022
TL;DR: In this article , a convolutional neural network (CNN) was used to recognize written characters in a handwritten signature recognition system, where a high-definition picture of a handwritten document is sent into the system, the system separates out the individual characters and recognizes them.
Abstract: Handwritten Recently, one of the most difficult problems in image processing and pattern recognition has been character recognition. Digitally repurposing a picture of a paper document known as a handwritten signature As an input, an optical picture of a character is sent into OCR, which generates an output character based on that character. Convolutional Neural Network (CNN), a prominent deep neural network architecture, is used to recognize written characters in our suggested system. When ahigh-definition picture of a handwritten document is sent into the system, the system separates out the individual characters and recognizes them. The desired result is an image format for the input picture that has been provided.