Showing papers on "Document processing published in 2021"

PDF

Open Access

Book Chapter•DOI•

LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis.

[...]

Zejiang Shen¹, Ruochen Zhang², Melissa Dell³, Benjamin Charles Germain Lee⁴, Jacob C. Carlson³, Weining Li⁵ - Show less +2 more•Institutions (5)

Allen Institute for Artificial Intelligence¹, Brown University², Harvard University³, University of Washington⁴, University of Waterloo⁵

05 Sep 2021

TL;DR: The LayoutParser library as mentioned in this paper is an open-source library for streamlining the usage of deep learning in document image analysis research and applications, which includes a set of simple and intuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks.

...read moreread less

Abstract: Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks. Ideally, research outcomes could be easily deployed in production and extended for further investigation. However, various factors like loosely organized codebases and sophisticated model configurations complicate the easy reuse of important innovations by a wide audience. Though there have been on-going efforts to improve reusability and simplify deep learning (DL) model development in disciplines like natural language processing and computer vision, none of them are optimized for challenges in the domain of DIA. This represents a major gap in the existing toolkit, as DIA is central to academic research across a wide range of disciplines in the social sciences and humanities. This paper introduces LayoutParser, an open-source library for streamlining the usage of DL in DIA research and applications. The core LayoutParser library comes with a set of simple and intuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks. To promote extensibility, LayoutParser also incorporates a community platform for sharing both pre-trained models and full document digitization pipelines. We demonstrate that LayoutParser is helpful for both lightweight and large-scale digitization pipelines in real-word use cases. The library is publicly available at https://layout-parser.github.io.

...read moreread less

51 citations

Journal Article•DOI•

A small samples training framework for deep Learning-based automatic information extraction: Case study of construction accident news reports analysis

[...]

Dan Feng¹, Hainan Chen²•Institutions (2)

Wuhan University¹, Sun Yat-sen University²

01 Jan 2021-Advanced Engineering Informatics

TL;DR: In this article, a natural language data augmentation-based small-sample training framework for automatic information extraction modeling is proposed, where the cross combination-based text augmentation algorithm is employed to build up automatic information extract models without large-scale raw data and manual annotations.

...read moreread less

30 citations

Journal Article•DOI•

Transformers-based information extraction with limited data for domain-specific business documents

[...]

Minh-Tien Nguyen¹, Dung Tien Le, Linh Le²•Institutions (2)

Hung Yen University of Technology and Education¹, University of Queensland²

01 Jan 2021-Engineering Applications of Artificial Intelligence

TL;DR: The model takes into account the contextual aspect of pre-trained language models trained on a huge amount of data on general domains for word representation and employs transfer learning by stacking Convolutional Neural Networks to learn hidden representation for classification.

...read moreread less

19 citations

Journal Article•DOI•

Beyond document object detection: instance-level segmentation of complex layouts

[...]

Sanket Biswas¹, Pau Riba¹, Josep Lladós¹, Umapada Pal²•Institutions (2)

Autonomous University of Barcelona¹, Indian Statistical Institute²

01 Sep 2021-International Journal on Document Analysis and Recognition

TL;DR: In this article, the task of instance segmentation on the document image domain is defined, which is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image.

...read moreread less

Abstract: Information extraction is a fundamental task of many business intelligence services that entail massive document processing. Understanding a document page structure in terms of its layout provides contextual support which is helpful in the semantic interpretation of the document terms. In this paper, inspired by the progress of deep learning methodologies applied to the task of object recognition, we transfer these models to the specific case of document object detection, reformulating the traditional problem of document layout analysis. Moreover, we importantly contribute to prior arts by defining the task of instance segmentation on the document image domain. An instance segmentation paradigm is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image. Finally, we provide an extensive evaluation, both qualitative and quantitative, that demonstrates the superior performance of the proposed methodology over the current state of the art.

...read moreread less

10 citations

Posted Content•

Deep learning-based NLP Data Pipeline for EHR Scanned Document Information Extraction.

[...]

Enshuo Hsu, Ioannis Malagaris, Yong Fang Kuo, Rizwana Sultana¹, Kirk Roberts² - Show less +1 more•Institutions (2)

University of Texas Medical Branch¹, University of Texas Health Science Center at Houston²

14 Sep 2021-arXiv: Computation and Language

TL;DR: In this paper, the authors evaluated the use of image preprocessing and document layout for sleep apnea, Apnea hypopnea index (AHI) and oxygen saturation (SaO2) from scanned sleep study reports.

...read moreread less

Abstract: Scanned documents in electronic health records (EHR) have been a challenge for decades, and are expected to stay in the foreseeable future. Current approaches for processing often include image preprocessing, optical character recognition (OCR), and text mining. However, there is limited work that evaluates the choice of image preprocessing methods, the selection of NLP models, and the role of document layout. The impact of each element remains unknown. We evaluated this method on a use case of two key indicators for sleep apnea, Apnea hypopnea index (AHI) and oxygen saturation (SaO2) values, from scanned sleep study reports. Our data that included 955 manually annotated reports was secondarily utilized from a previous study in the University of Texas Medical Branch. We performed image preprocessing: gray-scaling followed by 1 iteration of dilating and erode, and 20% contrast increasing. The OCR was implemented with the Tesseract OCR engine. A total of seven Bag-of-Words models (Logistic Regression, Ridge Regression, Lasso Regression, Support Vector Machine, k-Nearest Neighbor, Na\"ive Bayes, and Random Forest) and three deep learning-based models (BiLSTM, BERT, and Clinical BERT) were evaluated. We also evaluated the combinations of image preprocessing methods (gray-scaling, dilate & erode, increased contrast by 20%, increased contrast by 60%), and two deep learning architectures (with and without structured input that provides document layout information). Our proposed method using Clinical BERT reached an AUROC of 0.9743 and document accuracy of 94.76% for AHI, and an AUROC of 0.9523, and document accuracy of 91.61% for SaO2. We demonstrated the proper use of image preprocessing and document layout could be beneficial to scanned document processing.

...read moreread less

10 citations

Proceedings Article•DOI•

Hierarchical Recurrent Neural Network for Handwritten Strokes Classification

[...]

Illya Degtyarenko¹, Ivan Deriuga¹, Andrii Grygoriev¹, Serhii Polotskyi¹, Volodymyr Melnyk¹, Dmytro Zakharchuk¹, Olga Radyvonenko¹ - Show less +3 more•Institutions (1)

Samsung¹

06 Jun 2021

TL;DR: In this paper, a hierarchical recurrent neural network (RNN) architecture is proposed to address the hierarchical structure inherent to the handwritten document, and the novelty of feature aggregation pooling technique for transferring data between hierarchical levels allows achieving higher computational efficiency for using the suggested approach in on-device mobile computing.

...read moreread less

Abstract: The paper presents an original solution to the online handwritten document processing in a free form, which is aimed at separating multi-class handwritten documents into texts, tables, formulas, drawings, etc. Stroke classification is an important step in automatic document layout analysis (DLA) in handwritten document recognition systems. Major DLA challenges arise due to a wide diversity of handwritten content, various writing styles, a lack of contextual knowledge, and the complicated structure of freeform handwritten documents. In this paper, we propose the hierarchical recurrent neural network (RNN) architecture to address the hierarchical structure inherent to the handwritten document. The novelty of feature aggregation pooling technique for transferring data between hierarchical levels allows achieving higher computational efficiency for using the suggested approach in on-device mobile computing. The presented approach gives an access to new state-of-the-art results in the task of multi-class classification with an accuracy of 97.25% on the IAMonDo dataset. This result can serve as the basis for efficient mobile applications for freeform handwriting document recognition.

...read moreread less

9 citations

Book Chapter•DOI•

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

[...]

Andrii Grygoriev¹, Illya Degtyarenko¹, Ivan Deriuga¹, Serhii Polotskyi¹, Volodymyr Melnyk¹, Dmytro Zakharchuk², Dmytro Zakharchuk¹, Olga Radyvonenko¹ - Show less +4 more•Institutions (2)

Samsung¹, Taras Shevchenko National University of Kyiv²

05 Sep 2021

TL;DR: In this paper, a hierarchical deep neural network (HDNN) architecture with high computational efficiency is proposed for handwritten document processing and particularly for multi-class stroke classification, which uses a stack of 1D convolutional neural networks (CNN) on the lower level and a stacked RNN on the upper level.

...read moreread less

Abstract: Stroke classification is an essential task for applications with free-form handwriting input. Implementation of this type of application for mobile devices places stringent requirements on different aspects of embedded machine learning models, which results in finding a trade-off between model performance and model complexity. In this work, a novel hierarchical deep neural network (HDNN) architecture with high computational efficiency is proposed. It is adopted for handwritten document processing and particularly for multi-class stroke classification. The architecture uses a stack of 1D convolutional neural networks (CNN) on the lower (point) hierarchical level and a stack of recurrent neural networks (RNN) on the upper (stroke) level. The novel fragment pooling techniques for feature transition between hierarchical levels are presented. On-device implementation of the proposed architecture establishes new state-of-the-art results in the multi-class handwritten document processing with a classification accuracy of 97.58% on the IAMonDo dataset. Our method is also more efficient in both processing time and memory consumption than the previous state-of-the-art RNN-based stroke classifier.

...read moreread less

3 citations

Posted Content•

LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis.

[...]

Zejiang Shen¹, Ruochen Zhang², Melissa Dell³, Benjamin Charles Germain Lee⁴, Jacob C. Carlson³, Weining Li⁵ - Show less +2 more•Institutions (5)

Allen Institute for Artificial Intelligence¹, Brown University², Harvard University³, University of Washington⁴, University of Waterloo⁵

29 Mar 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: The layoutparser library as mentioned in this paper provides a set of simple and intuitive interfaces for applying and customizing deep learning models for layout detection, character recognition, and many other document processing tasks.

...read moreread less

Abstract: Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks. Ideally, research outcomes could be easily deployed in production and extended for further investigation. However, various factors like loosely organized codebases and sophisticated model configurations complicate the easy reuse of important innovations by a wide audience. Though there have been on-going efforts to improve reusability and simplify deep learning (DL) model development in disciplines like natural language processing and computer vision, none of them are optimized for challenges in the domain of DIA. This represents a major gap in the existing toolkit, as DIA is central to academic research across a wide range of disciplines in the social sciences and humanities. This paper introduces layoutparser, an open-source library for streamlining the usage of DL in DIA research and applications. The core layoutparser library comes with a set of simple and intuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks. To promote extensibility, layoutparser also incorporates a community platform for sharing both pre-trained models and full document digitization pipelines. We demonstrate that layoutparser is helpful for both lightweight and large-scale digitization pipelines in real-word use cases. The library is publicly available at this https URL.

...read moreread less

3 citations

Book Chapter•DOI•

Detection and Localisation of Struck-Out-Strokes in Handwritten Manuscripts

[...]

Arnab Poddar¹, Akash Chakraborty¹, Jayanta Mukhopadhyay¹, Prabir Kumar Biswas¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

05 Sep 2021

TL;DR: In this article, a system for simultaneous detection of struck-out words and localisation of the struck out strokes using a single network architecture based on Generative Adversarial Network (GAN) is introduced.

...read moreread less

Abstract: The presence of struck-out texts in handwritten manuscripts adversely affects the performance of state-of-the-art automatic handwritten document processing systems. The information of struck-out words (STW) are often important for real-time applications like handwritten character recognition, writer identification, digital transcription, forensic applications, historical document analysis etc. Hence, the detection of STW and localisation of struck-out strokes (SS) are crucial tasks. In this paper, we introduce a system for simultaneous detection of STWs and localisation of the SS using a single network architecture based on Generative Adversarial Network (GAN). The system requires no prior information about the type of SS stroke and it is also able to robustly handle variant of strokes like straight, slanted, cris-cross, multiple-lines, underlines and partial STW as well. However, we also present a methodology to generate STW with high variability of SS for network learning. We have evaluated the proposed pipeline on publicly available IAM dataset and also on struck-out words collected from real-world writers with high variability factors like age, gender, stroke-width, stroke-type etc. The evaluation metrics show robustness and applicability in real-world scenario.

...read moreread less

2 citations

Proceedings Article•DOI•

Linecounter: Learning Handwritten Text Line Segmentation By Counting

[...]

Deng Li¹, Yue Wu², Yicong Zhou¹•Institutions (2)

University of Macau¹, Amazon.com²

19 Sep 2021

TL;DR: A novel Line Counting formulation for HTLS – that involves counting the number of text lines from the top at every pixel location – is proposed that helps learn an end-to-end HTLS solution that directly predicts per-pixel line number for a given document image.

...read moreread less

Abstract: Handwritten Text Line Segmentation (HTLS) is a low-level but important task for many higher-level document processing tasks like handwritten text recognition. It is often formulated in terms of semantic segmentation or object detection in deep learning. However, both formulations have serious shortcomings. The former requires heavy post-processing of splitting/merging adjacent segments, while the latter may fail on dense or curved texts. In this paper, we propose a novel Line Counting formulation for HTLS -- that involves counting the number of text lines from the top at every pixel location. This formulation helps learn an end-to-end HTLS solution that directly predicts per-pixel line number for a given document image. Furthermore, we propose a deep neural network (DNN) model LineCounter to perform HTLS through the Line Counting formulation. Our extensive experiments on the three public datasets (ICDAR2013-HSC, HIT-MW, and VML-AHTE) demonstrate that LineCounter outperforms state-of-the-art HTLS approaches. Source code is available at this https URL.

...read moreread less

2 citations

Patent•

Automation and digitizalization of document processing systems

[...]

Ghatage Prakash¹, Viswanathan Kumar¹, Fernandes Sebastian, Thangaraj Naveen Kumar•Institutions (1)

Accenture¹

07 Jan 2021

TL;DR: In this article, a machine learning model was used to detect languages utilized in the digitized documents, and to translate the digitised documents, in other languages that are different than a common language, into the common language and to generate translated documents.

...read moreread less

Abstract: A device receives documents from various sources, and processes the documents, with an optical character recognition engine, to generate digitized documents. The device processes the digitized documents, with a first machine learning model, to detect languages utilized in the digitized documents, and processes the digitized documents, in other languages that are different than a common language and with a second machine learning model, to translate the digitized documents, in the other languages, into the common language and to generate translated digitized documents. The device processes the translated digitized documents and untranslated digitized documents, with a classification model, to generate classified documents, and processes the classified documents, with a third machine learning model, to generate extracted information from the classified documents. The device validates the extracted information based on business rules and to generate validated extracted information, and generates a smart contract for a transaction based on the validated extracted information.

...read moreread less

Proceedings Article•DOI•

Engineering of an artificial intelligence safety data sheet document processing system for environmental, health, and safety compliance

[...]

Kevin Fenton¹, Steven J. Simske¹•Institutions (1)

Colorado State University¹

16 Aug 2021

TL;DR: This research focuses on the reverse engineering of SDS document types to adapt to various layouts and the harnessing of meta-algorithmic and neural network approaches to provide a means of moving industrial institutions towards a digital universal SDS processing methodology.

...read moreread less

Abstract: Chemical Safety Data Sheets (SDS) are the primary method by which chemical manufacturers communicate the ingredients and hazards of their products to the public. These SDSs are used for a wide variety of purposes ranging from environmental calculations to occupational health assessments to emergency response measures. Although a few companies have provided direct digital data transfer platforms using xml or equivalent schemata, the vast majority of chemical ingredient and hazard communication to product users still occurs through the use of millions of PDF documents that are largely loaded through manual data entry into downstream user databases. This research focuses on the reverse engineering of SDS document types to adapt to various layouts and the harnessing of meta-algorithmic and neural network approaches to provide a means of moving industrial institutions towards a digital universal SDS processing methodology. The complexities of SDS documents including the lack of format standardization, text and image combinations, and multi-lingual translation needs, combined, limit the accuracy and precision of optical character recognition tools. The approach in this document is to translate entire SDSs from thousands of chemical vendors, each with distinct formatting, to machine-encoded text with a high degree of accuracy and precision. Then the system will "read" and assess these documents as a human would; that is, ensuring that the documents are compliant, determining whether chemical formulations have changed, ensuring reported values are within expected thresholds, and comparing them to similar products for more environmentally friendly alternatives.

...read moreread less

Journal Article•DOI•

Multi-Layout Invoice Document Dataset (MIDD): A Dataset for Named Entity Recognition

[...]

Dipali Baviskar¹, Swati Ahirrao¹, Ketan Kotecha¹•Institutions (1)

Symbiosis International University¹

20 Jul 2021

TL;DR: In this article, the authors provide a high-quality, highly diverse, multi-layout, and annotated invoice documents dataset for extracting key information from unstructured documents and develop an artificial intelligence (AI)-based tool to identify and extract named entities in the invoice documents.

...read moreread less

Abstract: The day-to-day working of an organization produces a massive volume of unstructured data in the form of invoices, legal contracts, mortgage processing forms, and many more. Organizations can utilize the insights concealed in such unstructured documents for their operational benefit. However, analyzing and extracting insights from such numerous and complex unstructured documents is a tedious task. Hence, the research in this area is encouraging the development of novel frameworks and tools that can automate the key information extraction from unstructured documents. However, the availability of standard, best-quality, and annotated unstructured document datasets is a serious challenge for accomplishing the goal of extracting key information from unstructured documents. This work expedites the researcher’s task by providing a high-quality, highly diverse, multi-layout, and annotated invoice documents dataset for extracting key information from unstructured documents. Researchers can use the proposed dataset for layout-independent unstructured invoice document processing and to develop an artificial intelligence (AI)-based tool to identify and extract named entities in the invoice documents. Our dataset includes 630 invoice document PDFs with four different layouts collected from diverse suppliers. As far as we know, our invoice dataset is the only openly available dataset comprising high-quality, highly diverse, multi-layout, and annotated invoice documents.

...read moreread less

Proceedings Article•DOI•

Binarisation of photographed documents image quality and processing time assessment

[...]

Rafael Dueire Lins¹, Steven J. Simske², Rodrigo Barros Bernardino•Institutions (2)

Universidade Federal Rural de Pernambuco¹, Colorado State University²

16 Aug 2021

TL;DR: In this article, the authors evaluated the quality and time performance of 13 new algorithms and 50 existing algorithms for document binarization using a dataset of offset, laser, and deskjet printed documents, photographed using four widely used mobile devices with the strobe flash on and off, under two different angles and places of capture.

...read moreread less

Abstract: Smartphones with cameras are omnipresent in today's world and are very often used to photograph documents. Document binarization is a key process in many document processing platforms. This competition on binarizing photographed documents assessed the quality and time performance of 13 new algorithms and 50 existing algorithms. The evaluation dataset is composed of offset, laser, and deskjet printed documents, photographed using four widely-used mobile devices with the strobe flash on and off, under two different angles and places of capture.

...read moreread less

Posted Content•

Vec2GC - A Graph Based Clustering Method for Text Representations.

[...]

Rajesh N. Rao, Manojit Chakraborty

15 Apr 2021-arXiv: Information Retrieval

TL;DR: Vec2GC (Vector to Graph Communities) as mentioned in this paper is an end-to-end pipeline to cluster terms or documents for any given text corpus using community detection on a weighted graph, created using text representation learning.

...read moreread less

Abstract: NLP pipelines with limited or no labeled data, rely on unsupervised methods for document processing. Unsupervised approaches typically depend on clustering of terms or documents. In this paper, we introduce a novel clustering algorithm, Vec2GC (Vector to Graph Communities), an end-to-end pipeline to cluster terms or documents for any given text corpus. Our method uses community detection on a weighted graph of the terms or documents, created using text representation learning. Vec2GC clustering algorithm is a density based approach, that supports hierarchical clustering as well.

...read moreread less

Proceedings Article•DOI•

Analytical Study of Handwritten Character Recognition: A Deep Learning Way

[...]

Pooja Raundale¹, Hadi Maredia¹•Institutions (1)

Sardar Patel Institute of Technology¹

25 Jun 2021

TL;DR: In this paper, the authors compared several neural networks viz: Simple (Artificial) Neural Network, Convolutional Neural Network and Recurrent Neural Network that use deep learning to implement Handwritten Character Recognition.

...read moreread less

Abstract: In this current tech-savvy world, there is a rising challenge for software systems to be able to recognize characters via computing systems, a lot of crucial and sensitive data is scanned through documents that are solely paper-based and are accessible to us only in the form of newspapers, books, thesis, articles, documents etc. which are in printed format only. Nowadays, there is an ever-increasing demand for storing this crucial data that is apparently present only in these paper-based documents into a storage disk of digital nature and then reutilizing the same whenever deemed necessary simply by a predefined search process. A simple way to transfer data from these paper documents into digital storage systems is to first scan those documents and then store them as images. But the challenge is introduced when we feel the need to reutilize this data as it gets quite challenging to read a specific data from these documents. A major cause for this challenge is that the font properties of these characters that appear in paper documents are different when compared to the fonts of the characters in computing systems. Hence, a computer is ceases to recognize these characters while reading them. This concept of processing data from hard paper documents in digital storage spaces and then reading it is called Document Processing. In Document Processing, we make use of a system called Optical Character Recognition to achieve the needful. To further expand our understanding of how these systems work, this paper analyzes and compares several neural networks viz: Simple (Artificial) Neural Network, Convolutional Neural Network and Recurrent Neural Network, that use Deep Learning to implement Handwritten Character Recognition.

...read moreread less

Journal Article•DOI•

Learning from similarity and information extraction from structured documents

[...]

Martin Holeček¹•Institutions (1)

Charles University in Prague¹

11 Jun 2021-International Journal on Document Analysis and Recognition

TL;DR: In this paper, the authors used Siamese networks, concepts of similarity, one-shot learning, and context/memory awareness to improve the performance of document classification in the huge real-world document dataset.

...read moreread less

Abstract: The automation of document processing has recently gained attention owing to its great potential to reduce manual work. Any improvement in information extraction systems or reduction in their error rates aids companies working with business documents because lowering reliance on cost-heavy and error-prone human work significantly improves the revenue. Neural networks have been applied to this area before, but they have been trained only on relatively small datasets with hundreds of documents so far. To successfully explore deep learning techniques and improve information extraction, we compiled a dataset with more than 25,000 documents. We expand on our previous work in which we proved that convolutions, graph convolutions, and self-attention can work together and exploit all the information within a structured document. Taking the fully trainable method one step further, we now design and examine various approaches to using Siamese networks, concepts of similarity, one-shot learning, and context/memory awareness. The aim is to improve micro $$F_{1}$$ of per-word classification in the huge real-world document dataset. The results verify that trainable access to a similar (yet still different) page, together with its already known target information, improves the information extraction. The experiments confirm that all proposed architecture parts (Siamese networks, employing class information, query-answer attention module and skip connections to a similar page) are all required to beat the previous results. The best model yields an 8.25% gain in the $$F_{1}$$ score over the previous state-of-the-art results. Qualitative analysis verifies that the new model performs better for all target classes. Additionally, multiple structural observations about the causes of the underperformance of some architectures are revealed, since all the techniques used in this work are not problem-specific and can be generalized for other tasks and contexts.

...read moreread less

Patent•

Device and method for processing value documents, more particularly bank notes, and value document processing system

[...]

Derks Hendrik, Hanussek Marja

10 Jun 2021

TL;DR: In this article, a device for processing value documents, more particularly bank notes, has been proposed, having at least one image capture unit which is configured to capture at least four images (4) of at least two character strings (2, 3) located on a value document, and an evaluation unit which detects, in each image (4), one or more first characters contained in the at least first character string (2) and one ore more second character contained in at least second character string(3), to form, from at least some of the first characters and/or

...read moreread less

Abstract: The invention relates to a device for processing value documents, more particularly bank notes, having at least one image capture unit which is configured to capture at least one image (4) of at least two character strings (2, 3) located on a value document, and an evaluation unit which is configured to detect, in the at least one image (4), one or more first characters contained in the at least one first character string (2) and one ore more second characters contained in at least one second character string (3), to form, from at least some of the first characters and/or at least some of the second characters, a concatenated character string (5) and to store, in a storage unit (30), image sections (7, 8) of the at least one image (4), said image sections showing the first and/or second characters contained in the concatenated character string (5), together with the concatenated character string (5). The invention also relates to a corresponding method for processing value documents, and a value document processing system.

...read moreread less

Patent•

Associating biometric user characteristics with document processing jobs

[...]

Holland Steven, Williams Kathryn Rachael, Teres Nieto Marcos, Achuthan Rajendrababu Anoop

21 Jan 2021

TL;DR: In this article, a biometric user characteristic associated with a document processing job was captured via biometric authentication component, and a log entry comprising the user characteristic and a plurality of details associated with the job was created.

...read moreread less

Abstract: Examples dis closed herein relate to receiving a request to perform a document processing job, capturing a biometric user characteristic associated with the request via a biometric authentication component, and creating a log entry comprising the biometric user characteristic and a plurality of details associated with the document processing job.

...read moreread less

Posted Content•

Donut: Document Understanding Transformer without OCR

[...]

Geewook Kim, Teakgyu Hong, Moonbin Yim, Jin Young Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park - Show less +5 more

30 Nov 2021-arXiv: Learning

TL;DR: In this article, the authors propose a novel VDU model that is end-to-end trainable without underpinning OCR framework and a synthetic document image generator to pre-train the model to mitigate the dependencies on large-scale real document images.

...read moreread less

Abstract: Understanding document images (e.g., invoices) has been an important research topic and has many applications in document processing automation. Through the latest advances in deep learning-based Optical Character Recognition (OCR), current Visual Document Understanding (VDU) systems have come to be designed based on OCR. Although such OCR-based approach promise reasonable performance, they suffer from critical problems induced by the OCR, e.g., (1) expensive computational costs and (2) performance degradation due to the OCR error propagation. In this paper, we propose a novel VDU model that is end-to-end trainable without underpinning OCR framework. To this end, we propose a new task and a synthetic document image generator to pre-train the model to mitigate the dependencies on large-scale real document images. Our approach achieves state-of-the-art performance on various document understanding tasks in public benchmark datasets and private industrial service datasets. Through extensive experiments and analysis, we demonstrate the effectiveness of the proposed model especially with consideration for a real-world application.

...read moreread less

Posted Content•

LAWDR: Language-Agnostic Weighted Document Representations from Pre-trained Models.

[...]

Hongyu Gong, Vishrav Chaudhary, Yuqing Tang, Francisco Guzmán

07 Jun 2021-arXiv: Computation and Language

TL;DR: The authors proposed unsupervised Language-Agnostic Weighted Document Representations (LAWDR), which leverages the geometry of pre-trained sentence embeddings and leverage it to derive document representations without fine-tuning.

...read moreread less

Abstract: Cross-lingual document representations enable language understanding in multilingual contexts and allow transfer learning from high-resource to low-resource languages at the document level. Recently large pre-trained language models such as BERT, XLM and XLM-RoBERTa have achieved great success when fine-tuned on sentence-level downstream tasks. It is tempting to apply these cross-lingual models to document representation learning. However, there are two challenges: (1) these models impose high costs on long document processing and thus many of them have strict length limit; (2) model fine-tuning requires extra data and computational resources, which is not practical in resource-limited settings. In this work, we address these challenges by proposing unsupervised Language-Agnostic Weighted Document Representations (LAWDR). We study the geometry of pre-trained sentence embeddings and leverage it to derive document representations without fine-tuning. Evaluated on cross-lingual document alignment, LAWDR demonstrates comparable performance to state-of-the-art models on benchmark datasets.

...read moreread less

DOI•

Legal Drafting and Automation

[...]

Benjamin Werthmann

01 Nov 2021

Patent•

Image processing apparatus and non-transitory computer readable medium for document processing

[...]

Murakami Takashi¹•Institutions (1)

Fuji Xerox¹

01 Jun 2021

TL;DR: An image processing apparatus includes a camera, an image reading unit, and a controller as discussed by the authors, which is configured to capture an image of a face of a person and to output a document image.

...read moreread less

Abstract: An image processing apparatus includes a camera, an image reading unit, and a controller. The camera is configured to capture an image of a face of a person. The image reading unit is configured to read a document and to output a document image. The controller is configured to perform control to permit certain processing on the document image if the image of the face captured by the camera matches a face image extracted from the document image.

...read moreread less

Patent•

Document processing system using augmented reality and virtual reality, and method therefor

[...]

Jang Wonseok, Jang Jun, Jang Heok, Jang Yun

18 Mar 2021

TL;DR: In this article, a document processing system using augmented reality and virtual reality, and a processing method therefor, is presented, where one user can write and store various types of virtual documents and allow other users to view the virtual documents in the augmented reality or virtual image.

...read moreread less

Abstract: The present invention relates to a document processing system using augmented reality and virtual reality, and a processing method therefor. The document processing system of the present invention shares contents of an object so that one user can write the contents of the object at the location of the object displayed in an augmented reality or virtual reality image to write and store various types of virtual documents and allow other users to view the virtual documents in the augmented reality or virtual image. The document processing system writes and shares the virtual documents by using a mobile terminal capable of expressing augmented reality or virtual reality. The document processing system shares the virtual documents between mobile terminals in a P2P method or a method of using a server. According to the present invention, a new document sharing platform can be implemented to provide a differentiated service to a user by displaying a virtual document written by using augmented reality or virtual reality to be shared in real time.

...read moreread less

Posted Content•

LineCounter: Learning Handwritten Text Line Segmentation by Counting

[...]

Deng Li¹, Yue Wu², Yicong Zhou¹•Institutions (2)

University of Macau¹, Amazon.com²

24 May 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: Li et al. as mentioned in this paper proposed a novel Line Counting formulation for handwritten text line segmentation that involves counting the number of text lines from the top at every pixel location, which helps learn an end-to-end HTLS solution that directly predicts per-pixel line number.

...read moreread less

Patent•

Systems and methods for document processing

[...]

Yang Yishu, Agrawal Raj

18 Feb 2021

TL;DR: In this article, the authors describe a method or a system able to process documents to extract features, predict outcomes and visualize feature relations, which can be used to predict feature relations.

...read moreread less

Abstract: Among other things, technologies disclosed herein include a method or a system able to process documents to extract features, predict outcomes and visualize feature relations.

...read moreread less

Patent•

Electronic document segmentation using deep learning

[...]

Sarkar Mausoom¹, Jain Arneh¹•Institutions (1)

Adobe Systems¹

18 Feb 2021

TL;DR: In this paper, a document processing application segments an electronic document image into strips and then computes, from a combined mask derived from the first mask and the second mask, an output electronic document that identifies elements in the electronic document and the respective element types.

...read moreread less

Abstract: Techniques for document segmentation. In an example, a document processing application segments an electronic document image into strips. A first strip overlaps a second strip. The application generates a first mask indicating one or more elements and element types in the first strip by applying a predictive model network to image content in the first strip and a prior mask generated from image content of the first strip. The application generates a second mask indicating one or more elements and element types in the second strip by applying the predictive model network to image content in the second strip and the first mask. The application computes, from a combined mask derived from the first mask and the second mask, an output electronic document that identifies elements in the electronic document and the respective element types.

...read moreread less

Patent•

Metamodeling for confidence prediction in machine learning based document extraction

[...]

Torres Terrence J¹, Ravichandran Venkatesh Coimbatore, Lowe Karen Kraemer•Institutions (1)

Intuit¹

18 Mar 2021

TL;DR: In this article, a document extraction system may efficiently route tasks to the manual and automated systems based on a predicted probability that the results generated by the automated system meet some baseline level of accuracy.

...read moreread less

Abstract: A document extraction system executed by a processor, may process documents using manual and automated systems. The document extraction system may efficiently route tasks to the manual and automated systems based on a predicted probability that the results generated by the automated system meet some baseline level of accuracy. To increase document processing speed, documents having a high likelihood of accurate automated processing may be routed to an automated system. To ensure a baseline level of accuracy, documents having a smaller likelihood of accurate automated processing may be routed to a manual system.

...read moreread less

Posted Content•

SDL: New data generation tools for full-level annotated document layout.

[...]

Son Nguyen Truong¹•Institutions (1)

Tokyo Institute of Technology¹

29 Jun 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper presented a novel data generation tool for document processing, which focuses on providing a maximal level of visual information in a normal type document, ranging from character position to paragraph-level position, and enables working with a large dataset on low-resource languages as well as providing a mean of processing thorough full-level information of the documented text.

...read moreread less

Abstract: We present a novel data generation tool for document processing. The tool focuses on providing a maximal level of visual information in a normal type document, ranging from character position to paragraph-level position. It also enables working with a large dataset on low-resource languages as well as providing a mean of processing thorough full-level information of the documented text. The data generation tools come with a dataset of 320000 Vietnamese synthetic document images and an instruction to generate a dataset of similar size in other languages. The repository can be found at: this https URL

...read moreread less

Proceedings Article•DOI•

Chinese Government Official Document Named Entity Recognition Based on Albert

[...]

Ziqi Xiong, Dezhi Kong, Zhichao Xia, Yankai Xue, Ziyu Song, Peng Wang - Show less +2 more

24 Apr 2021

TL;DR: Wang et al. as mentioned in this paper proposed a pre-trained language model called GovAlbert Based on Albert (GovAlbert-CRF) for the processing of Chinese government official documents.

...read moreread less

Abstract: The automated processing of Chinese government documents is in its early stage, and information extraction based on Named Entity Recognition (NER) plays an important role in the automated processing and analysis of Chinese government documents. This paper proposes and implements the pre-trained language model called GovAlbert Based on Albert which the pre-trained language model, which for the processing of Chinese government official documents. We study and analyze NER tasks of the Chinese government official document based on the pre-trained language model, and annotate the Chinese government official documents' Entity recognition corpus, and construct four named entity recognition models based on GovAlbert. The experimental results show that the GovAlbert model for government official document processing has an improved macro-average F1 value (harmonized average of accuracy and recall) than Albert. four named entity recognition models based on GovAlbert in multiple NER tasks of government official documents are all better than the public pre-training model, and through experiments, it has been explored that the GovAlbert-CRF combined model can achieve the best F1 value, so it can be better qualified for the NER tasks of government official documents.

...read moreread less