scispace - formally typeset
Search or ask a question
Author

Vladimir V. Arlazarov

Bio: Vladimir V. Arlazarov is an academic researcher from Moscow Institute of Physics and Technology. The author has contributed to research in topics: Computer science & Artificial neural network. The author has an hindex of 10, co-authored 57 publications receiving 327 citations.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: The Mobile Identity Document Video dataset (MIDV-500) as mentioned in this paper is a collection of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems.
Abstract: A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their sensitiveness as they contain personal data, all source document images used in MIDV-500 are either in public domain or distributed under public copyright licenses. The main goal of this paper is to present a dataset. However, in addition and as a baseline, we present evaluation results for existing methods for face detection, text line recognition, and document data extraction, using the presented dataset.

71 citations

Proceedings ArticleDOI
01 Nov 2017
TL;DR: This work is devoted to an identity document recognition system design for use in mobile phones and tablets using the computational capabilities of the device itself and experimental results are presented for an implemented commercial system "Smart IDReader" designed for identity documents recognition.
Abstract: This work is devoted to an identity document recognition system design for use in mobile phones and tablets using the computational capabilities of the device itself Key differences are discussed in relation to conservative cloud recognition systems which commonly use single images as an input by design A mobile recognition system chart is presented which is constructed with computational limitations in mind and which is implemented in a commercial solution An original approach designed to improve recognition precision and reliability using post-OCR results integration in video stream, as opposed to approaches which rely on frame image integration using "super-resolution" algorithms An interactive feedback between the system and its operator is discussed, such as automatic video stream recognition stopping decision Experimental results are presented for an implemented commercial system "Smart IDReader" designed for identity documents recognition

57 citations

Journal ArticleDOI
TL;DR: An “on the device” text line recognition framework that is designed for mobile or embedded systems based on two separate artificial neural networks (ANN) and dynamic programming instead of employing image processing methods for the segmentation step or end-to-end ANN.
Abstract: In this paper, we introduce an “on the device” text line recognition framework that is designed for mobile or embedded systems. We consider per-character segmentation as a language-independent problem and individual character recognition as a language-dependent one. Thus, the proposed solution is based on two separate artificial neural networks (ANN) and dynamic programming instead of employing image processing methods for the segmentation step or end-to-end ANN. To satisfy the tight constraints on memory size imposed by embedded systems and to avoid overfitting, we employ ANNs with a small number of trainable parameters. The primary purpose of our framework is the recognition of low-quality images of identity documents with complex backgrounds and a variety of languages and fonts. We demonstrate that our solution shows high recognition accuracy on natural datasets even being trained on purely synthetic data. We use MIDV-500 and Census 1961 Project datasets for text line recognition. The proposed method considerably surpasses the algorithmic method implemented in Tesseract 3.05, the LSTM method (Tesseract 4.00), and unpublished method used in the ABBYY FineReader 15 system. Also, our framework is faster than other compared solutions. We show the language-independence of our segmenter with the experiment with Cyrillic, Armenian, and Chinese text lines.

53 citations

Journal ArticleDOI
TL;DR: This paper presents a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems.
Abstract: A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their sensitiveness as they contain personal data, all source document images used in MIDV-500 are either in public domain or distributed under public copyright licenses. The main goal of this paper is to present a dataset. However, in addition and as a baseline, we present evaluation results for existing methods for face detection, text line recognition, and document data extraction, using the presented dataset. (The dataset is available for download at this ftp URL.)

41 citations

Proceedings ArticleDOI
31 Jan 2020
TL;DR: In this article, the authors presented a new dataset, the MIDV-2019 dataset, containing video clips shot with modern high-resolution mobile cameras, with strong projective distortions and with low lighting conditions.
Abstract: Recognition of identity documents using mobile devices has become a topic of a wide range of computer vision research. The portfolio of methods and algorithms for solving such tasks as face detection, document detection and rectification, text field recognition, and other, is growing, and the scarcity of datasets has become an important issue. One of the openly accessible datasets for evaluating such methods is MIDV-500, containing video clips of 50 identity document types in various conditions. However, the variability of capturing conditions in MIDV-500 did not address some of the key issues, mainly significant projective distortions and different lighting conditions. In this paper we present a MIDV-2019 dataset, containing video clips shot with modern high-resolution mobile cameras, with strong projective distortions and with low lighting conditions. The description of the added data is presented, and experimental baselines for text field recognition in different conditions.

29 citations


Cited by
More filters
Reference EntryDOI
15 Oct 2004

2,118 citations

01 Jan 2018
TL;DR: The conferencia "Les politiques d'Open Data / Open Acces: Implicacions a la recerca" orientada a investigadors i gestors de projectes europeus que va tenir lloc el 20 de setembre de 2018 a la Universitat Autonoma de Barcelona.
Abstract: Presentacio sobre l'Oficina de Proteccio de Dades Personals de la UAB i la politica Open Science. Va formar part de la conferencia "Les politiques d'Open Data / Open Acces: Implicacions a la recerca" orientada a investigadors i gestors de projectes europeus que va tenir lloc el 20 de setembre de 2018 a la Universitat Autonoma de Barcelona

665 citations

Proceedings ArticleDOI
01 Nov 2017
TL;DR: This work is devoted to an identity document recognition system design for use in mobile phones and tablets using the computational capabilities of the device itself and experimental results are presented for an implemented commercial system "Smart IDReader" designed for identity documents recognition.
Abstract: This work is devoted to an identity document recognition system design for use in mobile phones and tablets using the computational capabilities of the device itself Key differences are discussed in relation to conservative cloud recognition systems which commonly use single images as an input by design A mobile recognition system chart is presented which is constructed with computational limitations in mind and which is implemented in a commercial solution An original approach designed to improve recognition precision and reliability using post-OCR results integration in video stream, as opposed to approaches which rely on frame image integration using "super-resolution" algorithms An interactive feedback between the system and its operator is discussed, such as automatic video stream recognition stopping decision Experimental results are presented for an implemented commercial system "Smart IDReader" designed for identity documents recognition

57 citations

Book ChapterDOI
23 Aug 2020
TL;DR: This work reduces the dependency on labeled data by building on the classic knowledge-based priors while using deep networks to learn features, and shows that adding prior knowledge improves data efficiency as line priors no longer need to be learned from data.
Abstract: Classical work on line segment detection is knowledge-based; it uses carefully designed geometric priors using either image gradients, pixel groupings, or Hough transform variants. Instead, current deep learning methods do away with all prior knowledge and replace priors by training deep networks on large manually annotated datasets. Here, we reduce the dependency on labeled data by building on the classic knowledge-based priors while using deep networks to learn features. We add line priors through a trainable Hough transform block into a deep network. Hough transform provides the prior knowledge about global line parameterizations, while the convolutional layers can learn the local gradient-like line features. On the Wireframe (ShanghaiTech) and York Urban datasets we show that adding prior knowledge improves data efficiency as line priors no longer need to be learned from data.

53 citations