scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Writer Identification in Noisy Handwritten Documents

01 Mar 2017-pp 1177-1186
TL;DR: This work exceeds the state of the art in writer identification of noisy handwritten documents by over 10% and blends both deep learning and traditional computer vision approaches, exploring deep convolutional neural networks for denoising in conjunction with hand-crafted descriptor features.
Abstract: Identifying the writer of a handwritten document based on visual features is difficult, as evidenced by the limited number of subject matter experts proficient in forensic document analysis. Automating writer identification would be beneficial for such experts' workloads. Academic work in identifying writers has focused on clean benchmark datasets: plain white documents with uniform writing instruments. Solutions on this type of data have achieved hitin-top-10 accuracy rates reaching upwards of 98%. Unfortunately, transferring competitive techniques to handwritten documents with noise is nontrivial. This work highlights efforts in unconstrained writer identification in diverse conditions, including but not limited to lined and graph paper, coffee stains, stamps, and different writing implements. The proposed methodology blends both deep learning and traditional computer vision approaches, exploring deep convolutional neural networks (CNNs) for denoising in conjunction with hand-crafted descriptor features. Our identification algorithms are trained on existing clean datasets artificially augmented with noise, and we evaluate them on a commissioned dataset, which features a diverse but balanced set of writers, writing implements, and writing substrates (incorporating various types of noise). Experimenting with mixtures of segmentation methods, novel denoisers, specialized CNNs, and handcrafted features, we exceed the state of the art in writer identification of noisy handwritten documents by over 10%.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper proposes a new benchmark study for writer identification based on word or text block images which approximately contain one word and proposes a deep neural network, named FragNet, which is used to extract powerful features on these word images.
Abstract: Writer identification based on a small amount of text is a challenging problem. In this paper, we propose a new benchmark study for writer identification based on word or text block images which approximately contain one word. In order to extract powerful features on these word images, a deep neural network, named FragNet, is proposed. The FragNet has two pathways: feature pyramid which is used to extract feature maps and fragment pathway which is trained to predict the writer identity based on fragments extracted from the input image and the feature maps on the feature pyramid. We conduct experiments on four benchmark datasets, which show that our proposed method can generate efficient and robust deep representations for writer identification based on both word and page images.

38 citations


Cites methods from "Writer Identification in Noisy Hand..."

  • ...In [34], a denoising network is used to extract deep features on small patches....

    [...]

Journal ArticleDOI
TL;DR: An end-to-end system to identify writers in medieval manuscripts based on deep neural networks trained with transfer learning techniques and specialized to solve the task in hand, which proves to be very effective in identifying page writers.

29 citations

Journal ArticleDOI
TL;DR: This work proposes an end-to-end system that relies on a straightforward yet well-designed deep network and very efficient feature extraction, emphasizing feature engineering, and empirically demonstrates that the conjugated network outperforms the original ResNet and can work well for real-world applications in which patches with few letters exist.

25 citations

Journal ArticleDOI
01 Jan 2020
TL;DR: This review paper covers the forensic-relevant literature in questioned documents from 2016 to 2019 as a part of the 19th Interpol International Forensic Science Managers Symposium.
Abstract: This review paper covers the forensic-relevant literature in questioned documents from 2016 to 2019 as a part of the 19th Interpol International Forensic Science Managers Symposium. The review papers are also available at the Interpol website at: https://www.interpol.int/content/download/14458/file/Interpol Review Papers 2019.pdf .

12 citations


Cites background from "Writer Identification in Noisy Hand..."

  • ...and al [163] proposed a methodology for denoising handwritten documents to improve the writer identification of noisy handwritten documents (lined, graph paper, coffee stains, stamps ....

    [...]

Proceedings ArticleDOI
01 Sep 2019
TL;DR: This paper presents a deep learning based approach to effectively characterize the year of production of sample documents from the Medieval Paleographical Scale (MPS) dataset, and significantly reduced the Mean Absolute Error (MAE) reported in previous studies.
Abstract: Digitization of historical manuscripts from premodern eras, has captivated the document analysis and pattern recognition community in recent years. Estimation of the period of production of such documents is a challenging yet favored research problem. In this paper, we present a deep learning based approach to effectively characterize the year of production of sample documents from the Medieval Paleographical Scale (MPS) dataset. By employing transfer learning on a number of popular pre-trained Convolutional Neural Network (CNN) models, we have significantly reduced the Mean Absolute Error (MAE) reported in previous studies.

10 citations


Cites methods from "Writer Identification in Noisy Hand..."

  • ...As discussed earlier, ConvNets outperform the traditional feature extraction techniques and the same has been validated for a number of tasks on handwriting images [11], [14], [25], [26] as well....

    [...]

References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Proceedings ArticleDOI
Jia Deng1, Wei Dong1, Richard Socher1, Li-Jia Li1, Kai Li1, Li Fei-Fei1 
20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

49,639 citations

Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations

Journal ArticleDOI
TL;DR: Quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures, including FCN and DeconvNet.
Abstract: We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network [1] . The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2] and also with the well known DeepLab-LargeFOV [3] , DeconvNet [4] architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at http://mi.eng.cam.ac.uk/projects/segnet/ .

13,468 citations


"Writer Identification in Noisy Hand..." refers methods in this paper

  • ...Secondly, because the application space differs from those of [25, 26], the proposed denoising methodology necessitates different cost functions and strategies, as will be apparent in the next section....

    [...]

  • ...While our initial cues were taken from fully convolutional networks without dense layers like [25, 26], there was a quick realization that while smaller in parameter space, training times for fully convolutional networks have the potential to be 3–4× slower using our CUDA kernels and Maxwell class GPUs....

    [...]

  • ...The denoising algorithm that we propose is based on stacked convolutional autoencoders [22] and is most similar to many networks used for deconvolution [23, 24, 25], which has shown promise in applications like image deblurring [25] and segmentation [26]....

    [...]

Book ChapterDOI
06 Sep 2014
TL;DR: A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
Abstract: Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark Krizhevsky et al. [18]. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we explore both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. Used in a diagnostic role, these visualizations allow us to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark. We also perform an ablation study to discover the performance contribution from different model layers. We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets.

12,783 citations

Trending Questions (2)
What appropriate noise removal technique is suitable for hand writer and personality identification?

The paper does not explicitly mention the specific noise removal technique suitable for hand writer and personality identification.

How to install Windows XPS Document Writer?

Automating writer identification would be beneficial for such experts' workloads.