scispace - formally typeset
Search or ask a question
Author

Ekaterina Emelianova

Bio: Ekaterina Emelianova is an academic researcher. The author has contributed to research in topics: Identification (biology) & Artificial intelligence. The author has co-authored 2 publications.

Papers
More filters
Posted Content
TL;DR: The MIDV-2020 dataset as discussed by the authors contains 1000 video clips, 2000 scanned images, and 1000 photos of 1000 unique mock identity documents, each with unique text field values and unique artificially generated faces, with rich annotation.
Abstract: Identity documents recognition is an important sub-field of document analysis, which deals with tasks of robust document detection, type identification, text fields recognition, as well as identity fraud prevention and document authenticity validation given photos, scans, or video frames of an identity document capture. Significant amount of research has been published on this topic in recent years, however a chief difficulty for such research is scarcity of datasets, due to the subject matter being protected by security requirements. A few datasets of identity documents which are available lack diversity of document types, capturing conditions, or variability of document field values. In addition, the published datasets were typically designed only for a subset of document recognition problems, not for a complex identity document analysis. In this paper, we present a dataset MIDV-2020 which consists of 1000 video clips, 2000 scanned images, and 1000 photos of 1000 unique mock identity documents, each with unique text field values and unique artificially generated faces, with rich annotation. For the presented benchmark dataset baselines are provided for such tasks as document location and identification, text fields recognition, and face detection. With 72409 annotated images in total, to the date of publication the proposed dataset is the largest publicly available identity documents dataset with variable artificially generated data, and we believe that it will prove invaluable for advancement of the field of document analysis and recognition. The dataset is available for download at this ftp URL and this http URL .

6 citations

Book ChapterDOI
05 Sep 2021
TL;DR: In this article, the authors presented a new dataset for identity documents (ID) recognition called MIDV-LAIT, which includes textual fields in Perso-Arabic, Thai, and Indian scripts.
Abstract: In this paper, we present a new dataset for identity documents (IDs) recognition called MIDV-LAIT. The main feature of the dataset is the textual fields in Perso-Arabic, Thai, and Indian scripts. Since open datasets with real IDs may not be published, we synthetically generated all the images and data. Even faces are generated and do not belong to any particular person. Recently some datasets have appeared for evaluation of the IDs detection, type identification, and recognition, but these datasets cover only Latin-based and Cyrillic-based languages. The proposed dataset is to fix this issue and make it easier to evaluate and compare various methods. As a baseline, we process all the textual field images in MIDV-LAIT with Tesseract OCR. The resulting recognition accuracy shows that the dataset is challenging and is of use for further researches.

3 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The DLC-2021 dataset is presented, which consists of 1424 video clips captured in a wide range of real-world conditions, focused on tasks relating to ID document forensics, and contains images of synthetic IDs with generated owner photos and artificial personal information.
Abstract: Various government and commercial services, including, but not limited to, e-government, fintech, banking, and sharing economy services, widely use smartphones to simplify service access and user authorization. Many organizations involved in these areas use identity document analysis systems in order to improve user personal-data-input processes. The tasks of such systems are not only ID document data recognition and extraction but also fraud prevention by detecting document forgery or by checking whether the document is genuine. Modern systems of this kind are often expected to operate in unconstrained environments. A significant amount of research has been published on the topic of mobile ID document analysis, but the main difficulty for such research is the lack of public datasets due to the fact that the subject is protected by security requirements. In this paper, we present the DLC-2021 dataset, which consists of 1424 video clips captured in a wide range of real-world conditions, focused on tasks relating to ID document forensics. The novelty of the dataset is that it contains shots from video with color laminated mock ID documents, color unlaminated copies, grayscale unlaminated copies, and screen recaptures of the documents. The proposed dataset complies with the GDPR because it contains images of synthetic IDs with generated owner photos and artificial personal information. For the presented dataset, benchmark baselines are provided for tasks such as screen recapture detection and glare detection. The data presented are openly available in Zenodo.

1 citations

Proceedings ArticleDOI
20 Feb 2023
TL;DR: In this article , a computer-vision-based system was proposed to detect changes in the background of the aforementioned documents as a result of manipulations made to its contents through the use of image subtraction.
Abstract: Document manipulation is a recently arising problem, especially with the rapid spread of fabrication technology. The tools to alter documents are now publicly available and can result in high quality forgeries, indistinguishable from genuine ones. Forged documents may wreak havoc on many processes dependent on the validity of the document, leading to lasting consequences such as financial loss. Therefore, the process of identifying a document that has been altered is essential. A system that is capable of scrutinizing documents as either forged or genuine through discriminative features (such as distortions or character misalignment) can assist industries with heavily reliance on documents for processes such as identity verification. Most of the documents involved in such processes have sufficiently complex backgrounds. We present a computer-vision-based system that detects changes in the background of the aforementioned documents as a result of manipulations made to its contents through the use of image subtraction. The system takes an image as input and then classifies the document as genuine or forged. Our proposed system produces an accuracy of 95% using CNN on unaligned images as well as 100% for aligned images.
Journal ArticleDOI
TL;DR: In this paper , a data-driven approach was proposed to train a memory-efficient local feature descriptor for identity documents location and classification on mobile and embedded devices, based on the specifics of document detection in smartphone camera-captured images with a template matching approach.
Abstract: In this paper, we propose a data-driven approach to training a memory-efficient local feature descriptor for identity documents location and classification on mobile and embedded devices. The proposed algorithm for retrieving a dataset of patches is based on the specifics of document detection in smartphone camera-captured images with a template matching approach. The retrieved dataset of patches relevant to the domain, which includes splits for features training, features selection, and testing, is made public. We train a binary descriptor using the retrieved dataset of patches, each bit of the descriptor relies on a single computationally-efficient feature. To estimate the influence of different feature spaces on the descriptor performance, we perform descriptor training experiments using gradient-based and intensity-based features. Extensive experiments in identity document location and classification benchmarks showed that the resulting 128 and 192-bit descriptors which use gradient-based features outperformed a state-of-the-art 512-bit BEBLID descriptor for arbitrary keypoints matching in all cases except the cases of extreme projective distortions, being significantly more efficient in cases of low lighting. The 64-bit gradient-based descriptor obtained within the approach showed better quality than 128 and 256-bit BinBoost descriptors in scanned document images. To evaluate the influence of the descriptor size on the matching speed, we propose a model based on the required number of processor instructions for computing the Hamming distance between a pair of descriptors on various energy-efficient processor architectures.