Home
/
Authors
/
Arindam Das

Author

Arindam Das

Other affiliations: HCL Technologies, Xerox, Narula Institute of Technology

Bio: Arindam Das is an academic researcher from Valeo. The author has contributed to research in topics: Computer science & Object detection. The author has an hindex of 4, co-authored 26 publications receiving 99 citations. Previous affiliations of Arindam Das include HCL Technologies & Xerox.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

[...]

Arindam Das¹, Saikat Roy², Ujjwal Bhattacharya³, Swapan K. Parui³•Institutions (3)

Valeo¹, University of Bonn², Indian Statistical Institute³

29 Jan 2018

TL;DR: The proposed region-based Deep Convolutional Neural Network framework for document structure learning achieves state-of-the-art accuracy of 92.21% on the popular RVL-CDIP document image dataset, exceeding the benchmarks set by the existing algorithms.

...read moreread less

Abstract: In this article, a region-based Deep Convolutional Neural Network framework is presented for document structure learning. The contribution of this work involves efficient training of region based classifiers and effective ensembling for document image classification. A primary level of ‘inter-domain’ transfer learning is used by exporting weights from a pre-trained VGG16 architecture on the ImageNet dataset to train a document classifier on whole document images. Exploiting the nature of region based influence modelling, a secondary level of ‘intra-domain’ transfer learning is used for rapid training of deep learning models for image segments. Finally, a stacked generalization based ensembling is utilized for combining the predictions of the base deep neural network models. The proposed method achieves state-of-the-art accuracy of 92.21% on the popular RVL-CDIP document image dataset, exceeding the benchmarks set by the existing algorithms.

...read moreread less

60 citations

Posted Content•

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

[...]

Arindam Das¹, Saikat Roy², Ujjwal Bhattacharya³, Swapan K. Parui³•Institutions (3)

Valeo¹, University of Bonn², Indian Statistical Institute³

29 Jan 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a region-based deep convolutional neural network framework is proposed for document structure learning, which involves efficient training of region based classifiers and effective ensembling for document image classification.

...read moreread less

Abstract: In this work, a region-based Deep Convolutional Neural Network framework is proposed for document structure learning The contribution of this work involves efficient training of region based classifiers and effective ensembling for document image classification A primary level of `inter-domain' transfer learning is used by exporting weights from a pre-trained VGG16 architecture on the ImageNet dataset to train a document classifier on whole document images Exploiting the nature of region based influence modelling, a secondary level of `intra-domain' transfer learning is used for rapid training of deep learning models for image segments Finally, stacked generalization based ensembling is utilized for combining the predictions of the base deep neural network models The proposed method achieves state-of-the-art accuracy of 922% on the popular RVL-CDIP document image dataset, exceeding benchmarks set by existing algorithms

...read moreread less

24 citations

Proceedings Article•DOI•

Generalized stacking of layerwise-trained Deep Convolutional Neural Networks for document image classification

[...]

Saikat Roy¹, Arindam Das², Ujjwal Bhattacharya³•Institutions (3)

Jadavpur University¹, HCL Technologies², Indian Statistical Institute³

01 Dec 2016

TL;DR: Results of the experimentations show that the proposed strategy involving a considerably smaller network architecture can produce comparable document classification accuracies in competition with the state-of-the-art architectures making it more suitable for use in comparatively low configuration mobile devices.

...read moreread less

Abstract: This article presents our recent study of a lightweight Deep Convolutional Neural Network (DCNN) architecture for document image classification. Here, we concentrated on training of a committee of generalized, compact and powerful base DCNNs. A support vector machine (SVM) is used to combine the outputs of individual DCNNs. The main novelty of the present study is introduction of supervised layerwise training of DCNN architecture in document classification tasks for better initialization of weights of individual DCNNs. Each DCNN of the committee is trained for a specific part or the whole document. Also, here we used the principle of generalized stacking for combining the normalized outputs of all the members of the DCNN committee. The proposed document classification strategy has been tested on the well-known Tobacco3482 document image dataset. Results of our experimentations show that the proposed strategy involving a considerably smaller network architecture can produce comparable document classification accuracies in competition with the state-of-the-art architectures making it more suitable for use in comparatively low configuration mobile devices.

...read moreread less

14 citations

Posted Content•

TiledSoilingNet: Tile-level Soiling Detection on Automotive Surround-view Cameras Using Coverage Metric

[...]

Arindam Das¹, Pavel Krizek¹, Ganesh Sistu¹, Fabian Burger¹, Sankaralingam Madasamy¹, Michal Uricar¹, Varun Ravi Kumar¹, Senthil Yogamani¹ - Show less +4 more•Institutions (1)

Valeo¹

01 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a novel method to regress the area of each soiling type within a tile directly and integrated it into an object detection and semantic segmentation multi-task model using an asynchronous back-propagation algorithm.

...read moreread less

Abstract: Automotive cameras, particularly surround-view cameras, tend to get soiled by mud, water, snow, etc. For higher levels of autonomous driving, it is necessary to have a soiling detection algorithm which will trigger an automatic cleaning system. Localized detection of soiling in an image is necessary to control the cleaning system. It is also necessary to enable partial functionality in unsoiled areas while reducing confidence in soiled areas. Although this can be solved using a semantic segmentation task, we explore a more efficient solution targeting deployment in low power embedded system. We propose a novel method to regress the area of each soiling type within a tile directly. We refer to this as coverage. The proposed approach is better than learning the dominant class in a tile as multiple soiling types occur within a tile commonly. It also has the advantage of dealing with coarse polygon annotation, which will cause the segmentation task. The proposed soiling coverage decoder is an order of magnitude faster than an equivalent segmentation decoder. We also integrated it into an object detection and semantic segmentation multi-task model using an asynchronous back-propagation algorithm. A portion of the dataset used will be released publicly as part of our WoodScape dataset to encourage further research.

...read moreread less

11 citations

Posted Content•

Spatio-Contextual Deep Network Based Multimodal Pedestrian Detection For Autonomous Driving.

[...]

Kinjal Dasgupta, Arindam Das, Sudip Das, Ujjwal Bhattacharya, Senthil Yogamani - Show less +1 more

26 May 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors proposed an end-to-end multimodal fusion model for pedestrian detection using RGB and thermal images, which consists of two distinct deformable ResNeXt-50 encoders for feature extraction from the two modalities.

...read moreread less

Abstract: Pedestrian Detection is the most critical module of an Autonomous Driving system. Although a camera is commonly used for this purpose, its quality degrades severely in low-light night time driving scenarios. On the other hand, the quality of a thermal camera image remains unaffected in similar conditions. This paper proposes an end-to-end multimodal fusion model for pedestrian detection using RGB and thermal images. Its novel spatio-contextual deep network architecture is capable of exploiting the multimodal input efficiently. It consists of two distinct deformable ResNeXt-50 encoders for feature extraction from the two modalities. Fusion of these two encoded features takes place inside a multimodal feature embedding module (MuFEm) consisting of several groups of a pair of Graph Attention Network and a feature fusion unit. The output of the last feature fusion unit of MuFEm is subsequently passed to two CRFs for their spatial refinement. Further enhancement of the features is achieved by applying channel-wise attention and extraction of contextual information with the help of four RNNs traversing in four different directions. Finally, these feature maps are used by a single-stage decoder to generate the bounding box of each pedestrian and the score map. We have performed extensive experiments of the proposed framework on three publicly available multimodal pedestrian detection benchmark datasets, namely KAIST, CVC-14, and UTokyo. The results on each of them improved the respective state-of-the-art performance. A short video giving an overview of this work along with its qualitative results can be seen at this https URL.

...read moreread less

10 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

[...]

Yiheng Xu¹, Minghao Li², Lei Cui³, Shaohan Huang³, Furu Wei³, Ming Zhou³ - Show less +2 more•Institutions (3)

Harbin Institute of Technology¹, Beihang University², Microsoft³

23 Aug 2020

TL;DR: The LayoutLM is proposed to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

...read moreread less

Abstract: Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42). The code and pre-trained LayoutLM models are publicly available at https://aka.ms/layoutlm.

...read moreread less

388 citations

Book Chapter•DOI•

A Saliency-Based Convolutional Neural Network for Table and Chart Detection in Digitized Documents

[...]

Isaak Kavasidis¹, Carmelo Pino¹, Simone Palazzo¹, Francesco Rundo², Daniela Giordano¹, P. Messina, Concetto Spampinato¹ - Show less +3 more•Institutions (2)

University of Catania¹, STMicroelectronics²

09 Sep 2019

TL;DR: A saliency-based fully-convolutional neural network performing multi-scale reasoning on visual cues followed by a fully-connected conditional random field (CRF) for localizing tables and charts in digital/digitized documents is proposed.

...read moreread less

Abstract: Within the realm of information extraction from documents, detection of tables and charts is particularly needed as they contain a visual summary of the most valuable information contained in a document. For a complete automation of the visual information extraction process from tables and charts, it is necessary to develop techniques that localize them and identify precisely their boundaries. In this paper we aim at solving the table/chart detection task through an approach that combines deep convolutional neural networks, graphical models and saliency concepts. In particular, we propose a saliency-based fully-convolutional neural network performing multi-scale reasoning on visual cues followed by a fully-connected conditional random field (CRF) for localizing tables and charts in digital/digitized documents. Performance analysis, carried out on an extended version of the ICDAR 2013 (with annotated charts as well as tables) dataset, shows that our approach yields promising results, outperforming existing models.

...read moreread less

100 citations

Book Chapter•DOI•

Multimodal deep networks for text and image-based document classification

[...]

Nicolas Audebert, Catherine Herold, Kuider Slimani, Cédric Vidal

16 Sep 2019

TL;DR: A multimodal neural network able to learn from word embeddings, computed on text extracted by OCR, and from the image is designed that boosts pure image accuracy by 3% on Tobacco3482 and RVL-CDIP augmented by the new QS-OCR text dataset, even without clean text information.

...read moreread less

Abstract: Classification of document images is a critical step for accelerating archival of old manuscripts, online subscription and administrative procedures. Computer vision and deep learning have been suggested as a first solution to classify documents based on their visual appearance. However, achieving the fine-grained classification that is required in real-world setting cannot be achieved by visual analysis alone. Often, the relevant information is in the actual text content of the document, although this text is not available in digital form. In this work, we introduce a novel pipeline based on off-the-shelf architectures to deal with document classification by taking into account both text and visual information. We design a multimodal neural network that is able to learn both the image and from word embeddings, computed on noisy text extracted by OCR. We show that this approach allows us to improve single-modality classification accuracy by several points on the small Tobacco3482 and large RVL-CDIP datasets, even without clean text information. We release a post-OCR text classification (https://github.com/Quicksign/ocrized-text-dataset) that complements the Tobacco3482 and RVL-CDIP ones to encourage researchers to look into multi-modal text/image classification.

...read moreread less

77 citations

Journal Article•DOI•

Epileptic Signal Classification With Deep EEG Features by Stacked CNNs

[...]

Jiuwen Cao¹, Jiahua Zhu¹, Wenbin Hu², Anton Kummert³•Institutions (3)

Hangzhou Dianzi University¹, Huawei², University of Wuppertal³

01 Dec 2020-IEEE Transactions on Cognitive and Developmental Systems

TL;DR: Comparisons to many state-of-the-art epileptic classification methods are provided to show the superiority of the proposed SCNN+AWF algorithm.

...read moreread less

Abstract: The scalp electroencephalogram (EEG)-based epileptic seizure/nonseizure detection has been comprehensively studied, and fruitful achievements have been reported in the past. Yet, few investigations have been paid to the preictal stage detection, which is practically more crucial to epileptics in taking precautions before seizure onset. In this article, a novel epileptic preictal state classification and seizure detection algorithm based on deep features learned by stacked convolutional neural networks (SCNNs) is developed. The mean amplitude of sub-band spectrum map (MAS) obtained from the average sub-band spectra of multichannel EEGs is adopted for representation. The probability feature vectors by stacked convolutional neural networks (CNNs) are extracted in the softmax layer of CNNs, where an adaptive and discriminative feature weighting fusion (AWF) is developed for performance enhancement. Following the deep extraction layer, the effective kernel extreme learning machine (KELM) is adopted for feature learning and epileptic classification. Experiments on the benchmark CHB-MIT database and a real recorded epileptic database are conducted for performance demonstration. Comparisons to many state-of-the-art epileptic classification methods are provided to show the superiority of the proposed SCNN+AWF algorithm.

...read moreread less

68 citations

Proceedings Article•DOI•

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

[...]

Arindam Das¹, Saikat Roy², Ujjwal Bhattacharya³, Swapan K. Parui³•Institutions (3)

Valeo¹, University of Bonn², Indian Statistical Institute³

29 Jan 2018

...read moreread less

60 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

Collapse