Home
/
Authors
/
Saikat Roy

Author

Saikat Roy

Bio: Saikat Roy is an academic researcher from Jadavpur University. The author has contributed to research in topics: Convolutional neural network & Deep learning. The author has an hindex of 4, co-authored 6 publications receiving 152 citations. Previous affiliations of Saikat Roy include University of Bonn.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Handwritten isolated Bangla compound character recognition: A new benchmark using a novel deep learning approach

[...]

Saikat Roy¹, Nibaran Das¹, Mahantapas Kundu¹, Mita Nasipuri¹•Institutions (1)

Jadavpur University¹

15 Apr 2017-Pattern Recognition Letters

TL;DR: A novel deep learning technique for the recognition of handwritten Bangla isolated compound character is presented and a new benchmark of recognition accuracy on the CMATERdb 3.3.1.3 dataset is reported.

...read moreread less

113 citations

Proceedings Article•DOI•

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

[...]

Arindam Das¹, Saikat Roy², Ujjwal Bhattacharya³, Swapan K. Parui³•Institutions (3)

Valeo¹, University of Bonn², Indian Statistical Institute³

29 Jan 2018

TL;DR: The proposed region-based Deep Convolutional Neural Network framework for document structure learning achieves state-of-the-art accuracy of 92.21% on the popular RVL-CDIP document image dataset, exceeding the benchmarks set by the existing algorithms.

...read moreread less

Abstract: In this article, a region-based Deep Convolutional Neural Network framework is presented for document structure learning. The contribution of this work involves efficient training of region based classifiers and effective ensembling for document image classification. A primary level of ‘inter-domain’ transfer learning is used by exporting weights from a pre-trained VGG16 architecture on the ImageNet dataset to train a document classifier on whole document images. Exploiting the nature of region based influence modelling, a secondary level of ‘intra-domain’ transfer learning is used for rapid training of deep learning models for image segments. Finally, a stacked generalization based ensembling is utilized for combining the predictions of the base deep neural network models. The proposed method achieves state-of-the-art accuracy of 92.21% on the popular RVL-CDIP document image dataset, exceeding the benchmarks set by the existing algorithms.

...read moreread less

60 citations

Posted Content•

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

[...]

Arindam Das¹, Saikat Roy², Ujjwal Bhattacharya³, Swapan K. Parui³•Institutions (3)

Valeo¹, University of Bonn², Indian Statistical Institute³

29 Jan 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a region-based deep convolutional neural network framework is proposed for document structure learning, which involves efficient training of region based classifiers and effective ensembling for document image classification.

...read moreread less

Abstract: In this work, a region-based Deep Convolutional Neural Network framework is proposed for document structure learning The contribution of this work involves efficient training of region based classifiers and effective ensembling for document image classification A primary level of `inter-domain' transfer learning is used by exporting weights from a pre-trained VGG16 architecture on the ImageNet dataset to train a document classifier on whole document images Exploiting the nature of region based influence modelling, a secondary level of `intra-domain' transfer learning is used for rapid training of deep learning models for image segments Finally, stacked generalization based ensembling is utilized for combining the predictions of the base deep neural network models The proposed method achieves state-of-the-art accuracy of 922% on the popular RVL-CDIP document image dataset, exceeding benchmarks set by existing algorithms

...read moreread less

24 citations

Proceedings Article•DOI•

Generalized stacking of layerwise-trained Deep Convolutional Neural Networks for document image classification

[...]

Saikat Roy¹, Arindam Das², Ujjwal Bhattacharya³•Institutions (3)

Jadavpur University¹, HCL Technologies², Indian Statistical Institute³

01 Dec 2016

TL;DR: Results of the experimentations show that the proposed strategy involving a considerably smaller network architecture can produce comparable document classification accuracies in competition with the state-of-the-art architectures making it more suitable for use in comparatively low configuration mobile devices.

...read moreread less

Abstract: This article presents our recent study of a lightweight Deep Convolutional Neural Network (DCNN) architecture for document image classification. Here, we concentrated on training of a committee of generalized, compact and powerful base DCNNs. A support vector machine (SVM) is used to combine the outputs of individual DCNNs. The main novelty of the present study is introduction of supervised layerwise training of DCNN architecture in document classification tasks for better initialization of weights of individual DCNNs. Each DCNN of the committee is trained for a specific part or the whole document. Also, here we used the principle of generalized stacking for combining the normalized outputs of all the members of the DCNN committee. The proposed document classification strategy has been tested on the well-known Tobacco3482 document image dataset. Results of our experimentations show that the proposed strategy involving a considerably smaller network architecture can produce comparable document classification accuracies in competition with the state-of-the-art architectures making it more suitable for use in comparatively low configuration mobile devices.

...read moreread less

14 citations

Proceedings Article•DOI•

Convolutional regression framework for health behavior prediction

[...]

Srinka Basu¹, Saikat Roy², Ujjwal Maulik²•Institutions (2)

Kalyani Government Engineering College¹, Jadavpur University²

01 Jan 2017

TL;DR: A scalable supervised prediction model based on convolutional regression framework that is particularly suitable for short time series data is proposed and various schemes to model social influence for health behavior change are proposed.

...read moreread less

Abstract: Understanding the propagation of human health behavior, such as smoking and obesity, and identification of the factors that control such phenomenon is an important area of research in recent years mainly because, in industrialized countries a substantial proportion of the mortality and quality of life is due to particular behavior patterns, and that these behavior patterns are modifiable. Predicting the individuals who are going to be overweight or obese in future, as overweight and obesity propagate over dynamic human interaction network, is an important problem in this area. However, the problem has received limited attention from the network analysis and machine learning perspective till date. In this work, we propose a scalable supervised prediction model based on convolutional regression framework that is particularly suitable for short time series data. We propose various schemes to model social influence for health behavior change. Further we study the contribution of the primary factors of overweight and obesity, like unhealthy diets, recent weight gains and inactivity in the prediction task. A thorough experiment shows the superiority of the proposed method over the state-of-the-art.

...read moreread less

5 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

[...]

Yiheng Xu¹, Minghao Li², Lei Cui³, Shaohan Huang³, Furu Wei³, Ming Zhou³ - Show less +2 more•Institutions (3)

Harbin Institute of Technology¹, Beihang University², Microsoft³

23 Aug 2020

TL;DR: The LayoutLM is proposed to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

...read moreread less

Abstract: Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at https://aka.ms/layoutlm.

...read moreread less

388 citations

Journal Article•

Document Analysis and Recognition

[...]

Takahiro Watanabe

25 Mar 1999-IEICE Transactions on Information and Systems

TL;DR: This paper addresses current topics about document image understanding from a technical point of view as a survey and proposes methods/approaches for recognition of various kinds of documents.

...read moreread less

Abstract: The subject about document image understanding is to extract and classify individual data meaningfully from paper-based documents. Until today, many methods/approaches have been proposed with regard to recognition of various kinds of documents, various technical problems for extensions of OCR, and requirements for practical usages. Of course, though the technical research issues in the early stage are looked upon as complementary attacks for the traditional OCR which is dependent on character recognition techniques, the application ranges or related issues are widely investigated or should be established progressively. This paper addresses current topics about document image understanding from a technical point of view as a survey. key words: document model, top-down, bottom-up, layout structure, logical structure, document types, layout recognition

...read moreread less

222 citations

Book Chapter•DOI•

A Saliency-Based Convolutional Neural Network for Table and Chart Detection in Digitized Documents

[...]

Isaak Kavasidis¹, Carmelo Pino¹, Simone Palazzo¹, Francesco Rundo², Daniela Giordano¹, P. Messina, Concetto Spampinato¹ - Show less +3 more•Institutions (2)

University of Catania¹, STMicroelectronics²

09 Sep 2019

TL;DR: A saliency-based fully-convolutional neural network performing multi-scale reasoning on visual cues followed by a fully-connected conditional random field (CRF) for localizing tables and charts in digital/digitized documents is proposed.

...read moreread less

Abstract: Within the realm of information extraction from documents, detection of tables and charts is particularly needed as they contain a visual summary of the most valuable information contained in a document. For a complete automation of the visual information extraction process from tables and charts, it is necessary to develop techniques that localize them and identify precisely their boundaries. In this paper we aim at solving the table/chart detection task through an approach that combines deep convolutional neural networks, graphical models and saliency concepts. In particular, we propose a saliency-based fully-convolutional neural network performing multi-scale reasoning on visual cues followed by a fully-connected conditional random field (CRF) for localizing tables and charts in digital/digitized documents. Performance analysis, carried out on an extended version of the ICDAR 2013 (with annotated charts as well as tables) dataset, shows that our approach yields promising results, outperforming existing models.

...read moreread less

100 citations

Journal Article•DOI•

A multi-scale deep quad tree based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts

[...]

Ritesh Sarkhel¹, Nibaran Das¹, Aritra Das¹, Mahantapas Kundu¹, Mita Nasipuri¹ - Show less +1 more•Institutions (1)

Jadavpur University¹

01 Nov 2017-Pattern Recognition

TL;DR: In the present work, a non-explicit feature based approach, more specifically, a multi-column multi-scale convolutional neural network (MMCNN) based architecture has been proposed for this purpose and a deep quad-tree based staggered prediction model has be proposed for faster character recognition.

...read moreread less

88 citations

Book Chapter•DOI•

Multimodal deep networks for text and image-based document classification

[...]

Nicolas Audebert, Catherine Herold, Kuider Slimani, Cédric Vidal

16 Sep 2019

TL;DR: A multimodal neural network able to learn from word embeddings, computed on text extracted by OCR, and from the image is designed that boosts pure image accuracy by 3% on Tobacco3482 and RVL-CDIP augmented by the new QS-OCR text dataset, even without clean text information.

...read moreread less

Abstract: Classification of document images is a critical step for accelerating archival of old manuscripts, online subscription and administrative procedures. Computer vision and deep learning have been suggested as a first solution to classify documents based on their visual appearance. However, achieving the fine-grained classification that is required in real-world setting cannot be achieved by visual analysis alone. Often, the relevant information is in the actual text content of the document, although this text is not available in digital form. In this work, we introduce a novel pipeline based on off-the-shelf architectures to deal with document classification by taking into account both text and visual information. We design a multimodal neural network that is able to learn both the image and from word embeddings, computed on noisy text extracted by OCR. We show that this approach allows us to improve single-modality classification accuracy by several points on the small Tobacco3482 and large RVL-CDIP datasets, even without clean text information. We release a post-OCR text classification (https://github.com/Quicksign/ocrized-text-dataset) that complements the Tobacco3482 and RVL-CDIP ones to encourage researchers to look into multi-modal text/image classification.

...read moreread less

77 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

Collapse