scispace - formally typeset
Search or ask a question
Author

J.B. Srivastava

Bio: J.B. Srivastava is an academic researcher from Indian Institute of Technology Delhi. The author has contributed to research in topics: View synthesis & Affine transformation. The author has an hindex of 3, co-authored 7 publications receiving 24 citations.

Papers
More filters
Proceedings ArticleDOI
07 Apr 2014
TL;DR: A novel learning based framework to extract articles from newspaper images using a Fixed-Point Model that uses contextual information and features of each block to learn the layout of newspaper images and attains a contraction mapping to assign a unique label to every block.
Abstract: This paper presents a novel learning based framework to extract articles from newspaper images using a Fixed-Point Model. The input to the system comprises blocks of text and graphics, obtained using standard image processing techniques. The fixed point model uses contextual information and features of each block to learn the layout of newspaper images and attains a contraction mapping to assign a unique label to every block. We use a hierarchical model which works in two stages. In the first stage, a semantic label (heading, sub-heading, text-blocks, image and caption) is assigned to each segmented block. The labels are then used as input to the next stage to group the related blocks into news articles. Experimental results show the applicability of our algorithm in newspaper labeling and article extraction.

17 citations

Proceedings ArticleDOI
01 Dec 2008
TL;DR: A classifier unifying local features based representation and subspace based learning is presented and the system allows hierarchy by merging the KES in the feature space, which shows hierarchy on a dataset of videos collected over the internet.
Abstract: We present a classifier unifying local features based representation and subspace based learning. We also propose a novel method to merge kernel eigen spaces (KES) in feature space. Subspace methods have traditionally been used with the full appearance of the image. Recently local features based bag-of-features (BoF) representation has performed impressively on classification tasks. We use KES with BoF vectors to construct class specific subspaces and use the distance of a query vector from the database KESs as the classification criteria. The use of local features makes our approach invariant to illumination, rotation, scale, small affine transformation and partial occlusions. The system allows hierarchy by merging the KES in the feature space. The classifier performs competitively on the challenging Caltech-101 dataset under normal and simulated occlusion conditions. We show hierarchy on a dataset of videos collected over the internet.

8 citations

Journal ArticleDOI
TL;DR: A method for synthesis of views corresponding to translational motion of the camera, which can handle occlusions and changes in visibility in the synthesized views, and gives a characterisation of the viewpoints corresponding to which views can be synthesized.

3 citations

Journal ArticleDOI
01 Feb 2007
TL;DR: Two techniques are proposed for novel view synthesis of scenes containing man-made objects from images taken by arbitrary, uncalibrated cameras under the assumption of availability of the correspondence of three vanishing points.
Abstract: We have attempted the problem of novel view synthesis of scenes containing man-made objects from images taken by arbitrary, uncalibrated cameras. Under the assumption of availability of the correspondence of three vanishing points, in general position, we propose two techniques. The first is a transfer-based scheme which synthesizes new views with only a translation of the virtual camera and computes z-buffer values for handling occlusions in synthesized views. The second is a reconstruction-based scheme which synthesizes arbitrary new views in which the camera can undergo rotation as well as translation. We present experimental results to establish the validity of both formulations.

1 citations

Proceedings ArticleDOI
16 Dec 2008
TL;DR: A novel framework for object detection and localization in images containing appreciable clutter and occlusions is proposed and a method similar to the recently proposed spatial scan statistic is used to refine the object localization estimates obtained from the sampling process.
Abstract: We propose a novel framework for object detection and localization in images containing appreciable clutter and occlusions. The problem is cast in a statistical hypothesis testing framework. The image under test is converted into a set of local features using affine invariant local region detectors, described using the popular SIFT descriptor. Due to clutter and occlusions, this set is expected to contain features which do not belong to the object. We sample subsets of local features from this set and test for the alternate hypothesis of object present against the null hypothesis of object absent. Further, we use a method similar to the recently proposed spatial scan statistic to refine the object localization estimates obtained from the sampling process. We demonstrate the results of our method on the two datasets TUD Motorbikes and TUD Cars. TUD Cars database has background clutter. TUD Motorbikes dataset is recognized to have substantial variation in terms of scale, background, illumination, viewpoint and occlusions.

Cited by
More filters
Journal ArticleDOI
TL;DR: An algorithm is proposed that is able to accurately remove data from a kernel eigenspace without performing a batch recomputation and an adaptive version determines an appropriately sized sliding window of data and when a model update is necessary.
Abstract: Kernel principal component analysis and the reconstruction error is an effective anomaly detection technique for non-linear data sets. In an environment where a phenomenon is generating data that is non-stationary, anomaly detection requires a recomputation of the kernel eigenspace in order to represent the current data distribution. Recomputation is a computationally complex operation and reducing computational complexity is therefore a key challenge. In this paper, we propose an algorithm that is able to accurately remove data from a kernel eigenspace without performing a batch recomputation. Coupled with a kernel eigenspace update, we demonstrate that our technique is able to remove and add data to a kernel eigenspace more accurately than existing techniques. An adaptive version determines an appropriately sized sliding window of data and when a model update is necessary. Experimental evaluations on both synthetic and real-world data sets demonstrate the superior performance of the proposed approach in comparison to alternative incremental KPCA approaches and alternative anomaly detection techniques.

24 citations

Proceedings ArticleDOI
14 Dec 2014
TL;DR: A novel learning-based framework to identify tables from scanned document images as a structured labeling problem, which learns the layout of the document and labels its various entities as table header, table trailer, table cell and non-table region is presented.
Abstract: The paper presents a novel learning-based framework to identify tables from scanned document images. The approach is designed as a structured labeling problem, which learns the layout of the document and labels its various entities as table header, table trailer, table cell and non-table region. We develop features which encode the foreground block characteristics and the contextual information. These features are provided to a fixed point model which learns the inter-relationship between the blocks. The fixed point model attains a contraction mapping and provides a unique label to each block. We compare the results with Condition Random Fields(CRFs). Unlike CRFs, the fixed point model captures the context information in terms of the neighbourhood layout more efficiently. Experiments on the images picked from UW-III (University of Washington) dataset, UNLV dataset and our dataset consisting of document images with multicolumn page layout, show the applicability of our algorithm in layout analysis and table detection.

23 citations

Proceedings ArticleDOI
17 May 2011
TL;DR: Non linear χ2-kernel support vector machine is used as a learning classifier and bag-of-features representation is used to represent the image features in this paper.
Abstract: Classifying the unknown image into the correct related class is the aim of the object class recognition systems. Two main points should be kept in mind to implement a class recognition system. Which descriptors that have a higher discriminative power that needs to be extracted from the images? Which classifier can classify these descriptors successfully? The most famous image descriptor is the Scale Invariant Feature Transform (SIFT). Although, SIFT has a high performance, it is partially an illumination invariant. Adding local color information to SIFT descriptors are then suggested to increase the illumination invariant, these descriptors can be called color SIFT descriptors. In this paper, different color SIFT descriptors were implemented to evaluate their performance in the object class recognition systems. This is due to the fact that some descriptors may have a good performance in one class and bad performance in another class at the same time. All possible combinations of these descriptors were used. Some combinations of color SIFT descriptors achieved remarkable classification accuracy. Non linear χ2-kernel support vector machine is used as a learning classifier and bag-of-features representation is used to represent the image features in this paper.

13 citations

Proceedings ArticleDOI
01 Sep 2019
TL;DR: An innovative 2D Markov model is developed and evaluated that encodes reading order and substantially outperforms the current state-of-the-art, reaching similar accuracy to human annotators.
Abstract: Document analysis and recognition is increasingly used to digitise collections of historical books, newspapers and other periodicals. In the digital humanities, it is often the goal to apply information retrieval (IR) and natural language processing (NLP) techniques to help researchers analyse and navigate these digitised archives. The lack of article segmentation is impairing many IR and NLP systems, which assume text is split into ordered, error-free documents. We define a document analysis and image processing task for segmenting digitised newspapers into articles and other content, e.g. adverts, and we automatically create a dataset of 11602 articles. Using this dataset, we develop and evaluate an innovative 2D Markov model that encodes reading order and substantially outperforms the current state-of-the-art, reaching similar accuracy to human annotators.

10 citations

Proceedings ArticleDOI
01 Dec 2019
TL;DR: A deep learning solution for the problem of newspaper page semantic segmentation of the main newspaper elements (articles, advertisements, and page headers) by employing the instance segmentation method mask R-CNN mask_rcnn to create a language-agnostic model that logically deconstructs a newspaper page raw image into its main elements based only on its visual features.
Abstract: Newspaper digitization has gained wide interest around the world. Archives of digitized newspapers contain a wealth of information that spans decades. To extract this abundance of information, optical character recognition (OCR) techniques are used. However, as a first step, the newspaper pages should be logically deconstructed into articles to gain meaningful knowledge. This is difficult due to the complex layout of newspapers and the various styles, shapes, and languages of newspaper articles. Newspaper pages also contain other elements besides articles, such as advertisements that come in multiple shapes and forms, and top headers that contain information about the newspaper's issue and page. Therefore, it is important to detect these elements before information extraction begins. In this paper, we present a deep learning solution for the problem of newspaper page semantic segmentation of the main newspaper elements (articles, advertisements, and page headers). We employed the instance segmentation method mask R-CNN mask_rcnn to create a language-agnostic model that logically deconstructs a newspaper page raw image into its main elements based only on its visual features. We show the results of experiments that display the accuracy and robustness of our model.

9 citations