scispace - formally typeset
Search or ask a question
Author

Imran Malik

Bio: Imran Malik is an academic researcher from University of the Sciences. The author has contributed to research in topics: Deep learning & Table (database). The author has an hindex of 1, co-authored 2 publications receiving 113 citations.

Papers
More filters
Proceedings ArticleDOI
01 Nov 2017
TL;DR: The proposed method works with high precision on document images with varying layouts that include documents, research papers, and magazines and beats Tesseract's state of the art table detection system by a significant margin.
Abstract: Table detection is a crucial step in many document analysis applications as tables are used for presenting essential information to the reader in a structured manner. It is a hard problem due to varying layouts and encodings of the tables. Researchers have proposed numerous techniques for table detection based on layout analysis of documents. Most of these techniques fail to generalize because they rely on hand engineered features which are not robust to layout variations. In this paper, we have presented a deep learning based method for table detection. In the proposed method, document images are first pre-processed. These images are then fed to a Region Proposal Network followed by a fully connected neural network for table detection. The proposed method works with high precision on document images with varying layouts that include documents, research papers, and magazines. We have done our evaluations on publicly available UNLV dataset where it beats Tesseract's state of the art table detection system by a significant margin.

159 citations

Book ChapterDOI
16 Jul 2020
TL;DR: Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) are used to extract the percentage of attentiveness and non-attentiveness of students based on the student emotions in the classroom using deep learning techniques along with computer vision.
Abstract: This research aims for investigation of workers in an industrial environment and can be used as an alternate for monitoring an attention of operator in real-time Detection of attentiveness and non-attentiveness of people working in an industry could help to identify the weaknesses and strengths of any industrial organization Human factor is the main and the most critical part of any industrial organization As a special case, we have established how to detect student attention in the classroom using deep learning techniques along with computer vision Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) are used to extract the percentage of attentiveness and non-attentiveness of students based on the student emotions in the classroom We used the FER-2013 data set for this paper As per the study, human has finite number of emotions So, it is easy if we include some emotions in an attentive (Happy, Anger, Surprised and Neutral) domain and some emotions in non-attentive (Sad, Fear and Disgust) domain This will help the teacher in a way that he can easily evaluate his class attentiveness On another side, it is also the evaluation of the teacher’s teaching methodology because if the students are engaged in his lecture it means his teaching methodology is good and if most of the students are not engaged then the teacher needs to revise his methodology of teaching in order to engage his class during the lecture

1 citations


Cited by
More filters
Posted Content
TL;DR: The authors developed the PubLayNet dataset for document layout analysis by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central, where typical document layout elements are annotated.
Abstract: Recognizing the layout of unstructured digital documents is an important step when parsing the documents into structured machine-readable format for downstream applications. Deep neural networks that are developed for computer vision have been proven to be an effective method to analyze layout of document images. However, document layout datasets that are currently publicly available are several magnitudes smaller than established computing vision datasets. Models have to be trained by transfer learning from a base model that is pre-trained on a traditional computer vision dataset. In this paper, we develop the PubLayNet dataset for document layout analysis by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central. The size of the dataset is comparable to established computer vision datasets, containing over 360 thousand document images, where typical document layout elements are annotated. The experiments demonstrate that deep neural networks trained on PubLayNet accurately recognize the layout of scientific articles. The pre-trained models are also a more effective base mode for transfer learning on a different document domain. We release the dataset (this https URL) to support development and evaluation of more advanced models for document layout analysis.

177 citations

Proceedings ArticleDOI
16 Aug 2019
TL;DR: The PubLayNet dataset for document layout analysis is developed by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central and demonstrated that deep neural networks trained on Pub LayNet accurately recognize the layout of scientific articles.
Abstract: Recognizing the layout of unstructured digital documents is an important step when parsing the documents into structured machine-readable format for downstream applications. Deep neural networks that are developed for computer vision have been proven to be an effective method to analyze layout of document images. However, document layout datasets that are currently publicly available are several magnitudes smaller than established computing vision datasets. Models have to be trained by transfer learning from a base model that is pre-trained on a traditional computer vision dataset. In this paper, we develop the PubLayNet dataset for document layout analysis by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central. The size of the dataset is comparable to established computer vision datasets, containing over 360 thousand document images, where typical document layout elements are annotated. The experiments demonstrate that deep neural networks trained on PubLayNet accurately recognize the layout of scientific articles. The pre-trained models are also a more effective base mode for transfer learning on a different document domain. We release the dataset (https://github.com/ibm-aur-nlp/PubLayNet) to support development and evaluation of more advanced models for document layout analysis.

160 citations

Proceedings ArticleDOI
01 Sep 2019
TL;DR: This paper proposed an architecture based on graph networks as a better alternative to standard neural networks for table recognition, which combines the benefits of convolutional neural network for visual feature extraction and graph networks for dealing with the problem structure.
Abstract: Document structure analysis, such as zone segmentation and table recognition, is a complex problem in document processing and is an active area of research. The recent success of deep learning in solving various computer vision and machine learning problems has not been reflected in document structure analysis since conventional neural networks are not well suited to the input structure of the problem. In this paper, we propose an architecture based on graph networks as a better alternative to standard neural networks for table recognition. We argue that graph networks are a more natural choice for these problems, and explore two gradient-based graph neural networks. Our proposed architecture combines the benefits of convolutional neural networks for visual feature extraction and graph networks for dealing with the problem structure. We empirically demonstrate that our method outperforms the baseline by a significant margin. In addition, we identify the lack of large scale datasets as a major hindrance for deep learning research for structure analysis and present a new large scale synthetic dataset for the problem of table recognition. Finally, we open-source our implementation of dataset generation and the training framework of our graph networks to promote reproducible research in this direction.

111 citations

Book ChapterDOI
09 Sep 2019
TL;DR: A saliency-based fully-convolutional neural network performing multi-scale reasoning on visual cues followed by a fully-connected conditional random field (CRF) for localizing tables and charts in digital/digitized documents is proposed.
Abstract: Within the realm of information extraction from documents, detection of tables and charts is particularly needed as they contain a visual summary of the most valuable information contained in a document. For a complete automation of the visual information extraction process from tables and charts, it is necessary to develop techniques that localize them and identify precisely their boundaries. In this paper we aim at solving the table/chart detection task through an approach that combines deep convolutional neural networks, graphical models and saliency concepts. In particular, we propose a saliency-based fully-convolutional neural network performing multi-scale reasoning on visual cues followed by a fully-connected conditional random field (CRF) for localizing tables and charts in digital/digitized documents. Performance analysis, carried out on an extended version of the ICDAR 2013 (with annotated charts as well as tables) dataset, shows that our approach yields promising results, outperforming existing models.

100 citations