scispace - formally typeset
Search or ask a question

Showing papers by "Michael Blumenstein published in 2022"


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors exploited the property of DCT for finding significant information in images by selecting multiple channels, and proposed a method that studies texture distribution based on statistical measurement to extract features.
Abstract: Text detection from natural scene images is an active research area for computer vision, signal, and image processing because of several real-time applications such as driving vehicles automatically and tracing person behaviors during sports or marathon events. In these situations, there is a high probability of missing text information due to the occlusion of different objects/persons while capturing images. Unlike most of the existing methods, which focus only on text detection by ignoring the effect of missing texts, this work detects and predicts missing texts so that the performance of the OCR improves. The proposed method exploits the property of DCT for finding significant information in images by selecting multiple channels. For chosen DCT channels, the proposed method studies texture distribution based on statistical measurement to extract features. We propose to adopt Bayesian classifier for categorizing text pixels using extracted features. Then a deep learning model is proposed for eliminating false positives to improve text detection performance. Further, the proposed method employs a Natural Language Processing (NLP) model for predicting missing text information by using detected and recognition texts. Experimental results on our dataset, which contains texts occluded by objects, show that the proposed method is effective in predicting missing text information. To demonstrate the effectiveness and objectiveness of the proposed method, we also tested it on the standard datasets of natural scene images, namely, ICDAR 2017-MLT, Total-Text, and CTW1500.

6 citations



Journal ArticleDOI
TL;DR: In this paper , the authors give a detail introduction to instance segmentation technology based on deep learning, reinforcement learning and transformers and discuss about its development in this field along with the most common datasets used.
Abstract: Abstract In recent years, instance segmentation has become a key research area in computer vision. This technology has been applied in varied applications such as robotics, healthcare and intelligent driving. Instance segmentation technology not only detects the location of the object but also marks edges for each single instance, which can solve both object detection and semantic segmentation concurrently. Our survey will give a detail introduction to the instance segmentation technology based on deep learning, reinforcement learning and transformers. Further, we will discuss about its development in this field along with the most common datasets used. We will also focus on different challenges and future development scope for instance segmentation. This technology will provide a strong reference for future researchers in our survey paper.

4 citations


Proceedings ArticleDOI
24 Jul 2022
TL;DR: This paper first presents the shortcomings of current pose transfer algorithms and then proposes a novel text-based pose transfer technique to address those issues, which generates promising results with significant qualitative and quantitative scores in the authors' experiments.
Abstract: In computer vision, human pose synthesis and transfer deal with probabilistic image generation of a person in a previously unseen pose from an already available observation of that person. Though researchers have recently proposed several methods to achieve this task, most of these techniques derive the target pose directly from the desired target image on a specific dataset, making the underlying process challenging to apply in real-world scenarios as the generation of the target image is the actual aim. In this paper, we first present the shortcomings of current pose transfer algorithms and then propose a novel text-based pose transfer technique to address those issues. We divide the problem into three independent stages: (a) text to pose representation, (b) pose refinement, and (c) pose rendering. To the best of our knowledge, this is one of the first attempts to develop a text-based pose transfer framework where we also introduce a new dataset DF-PASS, by adding descriptive pose annotations for the images of the DeepFashion dataset. The proposed method generates promising results with significant qualitative and quantitative scores in our experiments.

4 citations


Journal ArticleDOI
TL;DR: A new model based on conformable moments and deep ensemble neural networks for forged handwriting detection in noisy and blurry environments is presented and experimental results demonstrate that the proposed method outperforms the existing methods in terms of classification rate.
Abstract: Detecting forged handwriting is important in a wide variety of machine learning applications, and it is challenging when the input images are degraded with noise and blur. This article presents a new model based on conformable moments (CMs) and deep ensemble neural networks (DENNs) for forged handwriting detection in noisy and blurry environments. Since CMs involve fractional calculus with the ability to model nonlinearities and geometrical moments as well as preserving spatial relationships between pixels, fine details in images are preserved. This motivates us to introduce a DENN classifier, which integrates stenographic kernels and spatial features to classify input images as normal (original, clean images), altered (handwriting changed through copy-paste and insertion operations), noisy (added noise to original image), blurred (added blur to original image), altered-noise (noise is added to the altered image), and altered-blurred (blur is added to the altered image). To evaluate our model, we use a newly introduced dataset, which comprises handwritten words altered at the character level, as well as several standard datasets, namely ACPR 2019, ICPR 2018-FDC, and the IMEI dataset. The first two of these datasets include handwriting samples that are altered at the character and word levels, and the third dataset comprises forged International Mobile Equipment Identity (IMEI) numbers. Experimental results demonstrate that the proposed method outperforms the existing methods in terms of classification rate.

2 citations


Proceedings ArticleDOI
06 Jun 2022
TL;DR: This work proposes a novel pipeline to generate and insert contextually relevant person images into an existing scene while preserving the global semantics and achieves high- resolution photo-realistic generation results while preserves the general context of the scene.
Abstract: Person image generation is an intriguing yet challenging problem. However, this task becomes even more difficult under constrained situations. In this work, we propose a novel pipeline to generate and insert contextually relevant person images into an existing scene while preserving the global semantics. More specifically, we aim to insert a person such that the location, pose, and scale of the person being inserted blends in with the existing persons in the scene. Our method uses three individual networks in a sequential pipeline. At first, we predict the potential location and the skeletal structure of the new person by conditioning a Wasserstein Generative Adversarial Network (WGAN) on the existing human skeletons present in the scene. Next, the predicted skeleton is refined through a shallow linear network to achieve higher structural accuracy in the generated image. Finally, the target image is generated from the refined skeleton using another generative network conditioned on a given image of the target person. In our experiments, we achieve high-resolution photo-realistic generation results while preserving the general context of the scene. We conclude our paper with multiple qualitative and quantitative benchmarks on the results.

2 citations


Journal ArticleDOI
TL;DR: This work has proposed a novel deep network that takes two inputs (the grayscale image and the respective encoded text description) and tries to predict the relevant color gamut and finds that it outperforms the state-of-the-art colorization algorithms both qualitatively and quantitatively.
Abstract: —Image colorization is a well-known problem in computer vision. However, due to the ill-posed nature of the task, image colorization is inherently challenging. Though several attempts have been made by researchers to make the colorization pipeline automatic, these processes often produce unrealistic results due to a lack of conditioning. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the grayscale image that is to be colorized, to improve the fidelity of the colorization process. To the best of our knowledge, this is one of the first attempts to incorporate textual conditioning in the colorization pipeline. To do so, we have proposed a novel deep network that takes two inputs (the grayscale image and the respective encoded text description) and tries to predict the relevant color gamut. As the respective textual descriptions contain color information of the objects present in the scene, the text encoding helps to improve the overall quality of the predicted colors. We have evaluated our proposed model using different metrics and found that it outperforms the state-of-the-art colorization algorithms both qualitatively and quantitatively.

2 citations





Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors used focused and defocused information in the input images to extract contextual information and fused them to estimate cross-covariance and define a linear relationship between them.
Abstract: Classification of photos captured by different photographers is an important and challenging problem in knowledge-based and image processing. Monitoring and authenticating images uploaded on social media are essential, and verifying the source is one key piece of evidence. We present a novel framework for classifying photos of different photographers based on the combination of local features and deep learning models. The proposed work uses focused and defocused information in the input images to extract contextual information. The model estimates the weighted gradient and calculates entropy to strengthen context features. The focused and defocused information is fused to estimate cross-covariance and define a linear relationship between them. This relationship results in a feature matrix fed to Knowledge Enforcement Network (KEN) for obtaining representative features. Due to the strong discriminative ability of deep learning models, we employ the lightweight and accurate MobileNetV2. The output of KEN and MobileNetV2 is sent to a classifier for photographer classification. Experimental results of the proposed model on our dataset of 46 photographer classes (46234 images) and publicly available datasets of 41 photographer classes (218303 images) show that the method outperforms the existing techniques by 5%–10% on average. The dataset created for the experimental purpose will be made available upon publication.

TL;DR: An Omni-Scale block (OS-block) is proposed for 1D-CNNs, where the kernel sizes are decided by a simple and universal rule and it is a set of kernel sizes that can cover the best RF size across different datasets via consisting of multiple prime numbers according to the length of the time series.
Abstract: The Receptive Field (RF) size has been one of the most important factors for One Dimensional Convolutional Neural Networks (1D-CNNs) on time series classification tasks. Large efforts have been taken to choose the appropriate size because it has a huge influence on the performance and differs significantly for each dataset. In this paper, we propose an Omni-Scale block (OS-block) for 1D-CNNs, where the kernel sizes are decided by a simple and universal rule. Particularly, it is a set of kernel sizes that can efficiently cover the best RF size across different datasets via consisting of multiple prime numbers according to the length of the time series. The experiment result shows that models with the OS-block can achieve a similar performance as models with the searched optimal RF size and due to the strong optimal RF size capture ability, simple 1D-CNN models with OS-block achieves the state-of-the-art performance on four time series benchmarks, including both univariate and multivariate data from multiple domains. Comprehensive analysis and discussions shed light on why the OS-block can capture optimal RF sizes across different datasets. Code available here 1



Journal ArticleDOI
TL;DR: In this paper , a novel method for document age classification at the text line level is presented, which extracts structural, contrast, and spatial features to study degradations at different wavelet decomposition levels.
Abstract: Document age estimation using handwritten text line images is useful for several pattern recognition and artificial intelligence applications such as forged signature verification, writer identification, gender identification, personality traits identification, and fraudulent document identification. This paper presents a novel method for document age classification at the text line level. For segmenting text lines from handwritten document images, the wavelet decomposition is used in a novel way. We explore multiple levels of wavelet decomposition, which introduce blur as the number of levels increases for detecting word components. The detected components are then used for a direction guided-driven growing approach with linearity, and nonlinearity criteria for segmenting text lines. For classification of text line images of different ages, inspired by the observation that, as the age of a document increases, the quality of its image degrades, the proposed method extracts the structural, contrast, and spatial features to study degradations at different wavelet decomposition levels. The specific advantages of DenseNet, namely, strong feature propagation, mitigation of the vanishing gradient problem, reuse of features, and the reduction of the number of parameters motivated us to use DenseNet121 along with a Multi-layer Perceptron (MLP) for the classification of text lines of different ages by feeding features and the original image as input. To demonstrate the efficacy of the proposed model, experiments were conducted on our own as well as standard datasets for both text line segmentation and document age classification. The results show that the proposed method outperforms the existing methods for text line segmentation in terms of precision, recall, F-measure, and document age classification in terms of average classification rate.