scispace - formally typeset
Search or ask a question
Author

Yulia S. Chernyshova

Bio: Yulia S. Chernyshova is an academic researcher. The author has contributed to research in topics: Computer science & Physics. The author has an hindex of 5, co-authored 8 publications receiving 73 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: An “on the device” text line recognition framework that is designed for mobile or embedded systems based on two separate artificial neural networks (ANN) and dynamic programming instead of employing image processing methods for the segmentation step or end-to-end ANN.
Abstract: In this paper, we introduce an “on the device” text line recognition framework that is designed for mobile or embedded systems. We consider per-character segmentation as a language-independent problem and individual character recognition as a language-dependent one. Thus, the proposed solution is based on two separate artificial neural networks (ANN) and dynamic programming instead of employing image processing methods for the segmentation step or end-to-end ANN. To satisfy the tight constraints on memory size imposed by embedded systems and to avoid overfitting, we employ ANNs with a small number of trainable parameters. The primary purpose of our framework is the recognition of low-quality images of identity documents with complex backgrounds and a variety of languages and fonts. We demonstrate that our solution shows high recognition accuracy on natural datasets even being trained on purely synthetic data. We use MIDV-500 and Census 1961 Project datasets for text line recognition. The proposed method considerably surpasses the algorithmic method implemented in Tesseract 3.05, the LSTM method (Tesseract 4.00), and unpublished method used in the ABBYY FineReader 15 system. Also, our framework is faster than other compared solutions. We show the language-independence of our segmenter with the experiment with Cyrillic, Armenian, and Chinese text lines.

53 citations

Proceedings ArticleDOI
13 Apr 2018
TL;DR: An algorithm is described which allows to create artificial training datasets for OCR systems using russian passport as a case study to reduce the gap between natural and synthetic data distributions.
Abstract: This paper addresses one of the fundamental problems of machine learning - training data acquiring. Obtaining enough natural training data is rather difficult and expensive. In last years usage of synthetic images has become more beneficial as it allows to save human time and also to provide a huge number of images which otherwise would be difficult to obtain. However, for successful learning on artificial dataset one should try to reduce the gap between natural and synthetic data distributions. In this paper we describe an algorithm which allows to create artificial training datasets for OCR systems using russian passport as a case study.

23 citations

Proceedings ArticleDOI
15 Mar 2019
TL;DR: The most common label-preserving deformations are considered, which can be useful in many practical tasks and developed own real-time augmentation system, which demonstrated the effectiveness of suggested approach.
Abstract: In this paper we study the real-time augmentation - method of increasing variability of training dataset during the learning process. We consider the most common label-preserving deformations, which can be useful in many practical tasks. Due to limitations of existing augmentation tools like increase in learning time or dependence on a specific platform, we developed own real-time augmentation system. Experiments on MNIST and SVHN datasets demonstrated the effectiveness of suggested approach - the quality of the trained models improves, and learning time remains the same as if augmentation was not used.

23 citations

Proceedings ArticleDOI
TL;DR: It is concluded that the proposed method is sufficient for authentication of the fonts and can be used as a part of the forgery detection system for images acquired with a smartphone camera.
Abstract: In this paper, we consider the problem of detecting counterfeit identity documents in images captured with smartphones. As the number of documents contain special fonts, we study the applicability of convolutional neural networks (CNNs) for detection of the conformance of the fonts used with the ones, corresponding to the government standards. Here, we use multi-task learning to differentiate samples by both fonts and characters and compare the resulting classifier with its analogue trained for binary font classification. We train neural networks for authenticity estimation of the fonts used in machine-readable zones and ID numbers of the Russian national passport and test them on samples of individual characters acquired from 3238 images of the Russian national passport. Our results show that the usage of multi-task learning increases sensitivity and specificity of the classifier. Moreover, the resulting CNNs demonstrate high generalization ability as they correctly classify fonts which were not present in the training set. We conclude that the proposed method is sufficient for authentication of the fonts and can be used as a part of the forgery detection system for images acquired with a smartphone camera.

11 citations

Proceedings ArticleDOI
31 Jan 2020
TL;DR: A per-character segmentation method based on the light weight convolutional neural network (CNN) which is suitable for on-premise applications for various mobile devices and which decreases the segmentation error rate for the majority of test datasets.
Abstract: Character segmentation is one of the crucial problems of modern text line recognition methods. In this paper, we propose a per-character segmentation method based on the light weight convolutional neural network (CNN) which is suitable for on-premise applications for various mobile devices. The distinctive feature of our method is that it provides the coordinates of the start and end points of each character, not the coordinates of the “cut” between two characters. It allows us to utilize known geometrical properties of glyphs efficiently. Consequently, the target character images are not flawed because of characters intersections or wide spaces. We present the results measured for text lines with various letter spacing. Results illustrate that the proposed method decreases the segmentation error rate for the majority of test datasets.

7 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new neural network-based audio processing framework with graphics processing unit (GPU) support that leverages 1D convolutional neural networks to perform time domain to frequency domain conversion, which allows on-the-fly spectrogram extraction due to its fast speed, without the need to store any spectrograms on the disk.
Abstract: In this paper, we present nnAudio, a new neural network-based audio processing framework with graphics processing unit (GPU) support that leverages 1D convolutional neural networks to perform time domain to frequency domain conversion. It allows on-the-fly spectrogram extraction due to its fast speed, without the need to store any spectrograms on the disk. Moreover, this approach also allows back-propagation on the waveforms-to-spectrograms transformation layer, and hence, the transformation process can be made trainable, further optimizing the waveform-to-spectrogram transformation for the specific task that the neural network is trained on. All spectrogram implementations scale as Big-O of linear time with respect to the input length. nnAudio, however, leverages the compute unified device architecture (CUDA) of 1D convolutional neural network from PyTorch, its short-time Fourier transform (STFT), Mel spectrogram, and constant-Q transform (CQT) implementations are an order of magnitude faster than other implementations using only the central processing unit (CPU). We tested our framework on three different machines with NVIDIA GPUs, and our framework significantly reduces the spectrogram extraction time from the order of seconds (using a popular python library librosa) to the order of milliseconds, given that the audio recordings are of the same length. When applying nnAudio to variable input audio lengths, an average of 11.5 hours are required to extract 34 spectrogram types with different parameters from the MusicNet dataset using librosa. An average of 2.8 hours is required for nnAudio, which is still four times faster than librosa. Our proposed framework also outperforms existing GPU processing libraries such as Kapre and torchaudio in terms of processing speed.

55 citations

Journal ArticleDOI
TL;DR: An “on the device” text line recognition framework that is designed for mobile or embedded systems based on two separate artificial neural networks (ANN) and dynamic programming instead of employing image processing methods for the segmentation step or end-to-end ANN.
Abstract: In this paper, we introduce an “on the device” text line recognition framework that is designed for mobile or embedded systems. We consider per-character segmentation as a language-independent problem and individual character recognition as a language-dependent one. Thus, the proposed solution is based on two separate artificial neural networks (ANN) and dynamic programming instead of employing image processing methods for the segmentation step or end-to-end ANN. To satisfy the tight constraints on memory size imposed by embedded systems and to avoid overfitting, we employ ANNs with a small number of trainable parameters. The primary purpose of our framework is the recognition of low-quality images of identity documents with complex backgrounds and a variety of languages and fonts. We demonstrate that our solution shows high recognition accuracy on natural datasets even being trained on purely synthetic data. We use MIDV-500 and Census 1961 Project datasets for text line recognition. The proposed method considerably surpasses the algorithmic method implemented in Tesseract 3.05, the LSTM method (Tesseract 4.00), and unpublished method used in the ABBYY FineReader 15 system. Also, our framework is faster than other compared solutions. We show the language-independence of our segmenter with the experiment with Cyrillic, Armenian, and Chinese text lines.

53 citations

Proceedings ArticleDOI
15 Mar 2019
TL;DR: The most common label-preserving deformations are considered, which can be useful in many practical tasks and developed own real-time augmentation system, which demonstrated the effectiveness of suggested approach.
Abstract: In this paper we study the real-time augmentation - method of increasing variability of training dataset during the learning process. We consider the most common label-preserving deformations, which can be useful in many practical tasks. Due to limitations of existing augmentation tools like increase in learning time or dependence on a specific platform, we developed own real-time augmentation system. Experiments on MNIST and SVHN datasets demonstrated the effectiveness of suggested approach - the quality of the trained models improves, and learning time remains the same as if augmentation was not used.

23 citations

Journal ArticleDOI
TL;DR: The Systematic Literature Review discovered that AI-based approaches have a strong potential to extract useful information from unstructured documents automatically, however, they face certain challenges in processing multiple layouts of the unstructuring documents.
Abstract: The unstructured data impacts 95% of the organizations and costs them millions of dollars annually. If managed well, it can significantly improve business productivity. The traditional information extraction techniques are limited in their functionality, but AI-based techniques can provide a better solution. A thorough investigation of AI-based techniques for automatic information extraction from unstructured documents is missing in the literature. The purpose of this Systematic Literature Review (SLR) is to recognize, and analyze research on the techniques used for automatic information extraction from unstructured documents and to provide directions for future research. The SLR guidelines proposed by Kitchenham and Charters were adhered to conduct a literature search on various databases between 2010 and 2020. We found that: 1. The existing information extraction techniques are template-based or rule-based, 2. The existing methods lack the capability to tackle complex document layouts in real-time situations such as invoices and purchase orders, 3. The datasets available publicly are task-specific and of low quality. Hence, there is a need to develop a new dataset that reflects real-world problems. Our SLR discovered that AI-based approaches have a strong potential to extract useful information from unstructured documents automatically. However, they face certain challenges in processing multiple layouts of the unstructured documents. Our SLR brings out conceptualization of a framework for construction of high-quality unstructured documents dataset with strong data validation techniques for automated information extraction. Our SLR also reveals a need for a close association between the businesses and researchers to handle various challenges of the unstructured data analysis.

22 citations

Journal ArticleDOI
TL;DR: In this article, an integrated intelligent approach based on natural language processing technology (NLP), which mainly involves three stages, is proposed to classify the construction on-site reports by analyzing and extracting report text features, and then the classified construction report texts are analyzed with improved frequency-inverse document frequency by mutual information to identify and mine construction knowledge.

20 citations