scispace - formally typeset
Search or ask a question
Author

Syed Saqib Bukhari

Bio: Syed Saqib Bukhari is an academic researcher from German Research Centre for Artificial Intelligence. The author has contributed to research in topics: Optical character recognition & Eye tracking. The author has an hindex of 20, co-authored 95 publications receiving 1150 citations. Previous affiliations of Syed Saqib Bukhari include Kaiserslautern University of Technology.


Papers
More filters
Proceedings ArticleDOI
09 Jun 2010
TL;DR: This work trains a self-tunable multi-layer perceptron (MLP) classifier for distinguishing between text and non-text connected components using shape and context information as a feature vector to introduce connected component based classification.
Abstract: Segmentation of a document image into text and non-text regions is an important preprocessing step for a variety of document image analysis tasks, like improving OCR, document compression etc. Most of the state-of-the-art document image segmentation approaches perform segmentation using pixel-based or zone(block)-based classification. Pixel-based classification approaches are time consuming, whereas block-based methods heavily depend on the accuracy of block segmentation step. In contrast to the state-of-the-art document image segmentation approaches, our segmentation approach introduces connected component based classification, thereby not requiring a block segmentation beforehand. Here we train a self-tunable multi-layer perceptron (MLP) classifier for distinguishing between text and non-text connected components using shape and context information as a feature vector. Experimental results prove the effectiveness of our proposed algorithm. We have evaluated our method on subset of UW-III, ICDAR 2009 page segmentation competition test images and circuit diagrams datasets and compared its results with the state-of-the-art leptonica's page segmentation algorithm.

81 citations

Proceedings ArticleDOI
26 Jul 2009
TL;DR: A novel, script-independent textline segmentation approach for handwritten documents, which is robust against above mentioned problems, and uses matched filter bank approach for smoothing and does not require heuristic post processing steps for merging or splitting segmented textlines.
Abstract: Handwritten document images contain textlines with multi orientations, touching and overlapping characters within consecutive textlines, and small inter-line spacing making textline segmentation a difficult task. In this paper we propose a novel, script-independent textline segmentation approach for handwritten documents, which is robust against above mentioned problems. We model textline extraction as a general image segmentation task. We compute the central line of parts of textlines using ridges over the smoothed image. Then we adapt the state-of-the-art active contours (snakes) over ridges, which results in textline segmentation. Unlike the ``Level Set'' and "Mumford-Shah model'' based handwritten textline segmentation methods, our method use matched filter bank approach for smoothing and does not require heuristic post processing steps for merging or splitting segmented textlines. Experimental results prove the effectiveness of the proposed algorithm. We evaluated our algorithm on ICDAR 2007 handwritten segmentation contest dataset and obtained an accuracy of 96.3%.

77 citations

Proceedings ArticleDOI
18 Sep 2012
TL;DR: This work introduces an approach that segments text appearing in page margins from manuscripts with complex layout format, independent of block segmentation, as well as pixel level analysis.
Abstract: Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component level and subsequently robust feature vectors are generated. Multilayer perception classifier is exploited to classify connected components to the relevant class of text. A voting scheme is then applied to refine the resulting segmentation and produce the final classification. In contrast to state-of-the-art segmentation approaches, this method is independent of block segmentation, as well as pixel level analysis. The proposed method has been trained and tested on a dataset that contains a variety of complex side-notes layout formats, achieving a segmentation accuracy of about 95%.

65 citations

Proceedings ArticleDOI
24 Jan 2011
TL;DR: Modifications to the text/non-text segmentation algorithm presented by Bloomberg are described which result in significant improvements and achieved better segmentation accuracy than the original algorithm for UW-III, UNLV, ICDAR 2009 page segmentation competition test images and circuit diagram datasets.
Abstract: Page segmentation into text and non-text elements is an essential preprocessing step before optical character recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage characters due to the presence of non-text elements. This paper describes modifications to the text/non-text segmentation algorithm presented by Bloomberg,1 which is also available in his open-source Leptonica library.2The modifications result in significant improvements and achieved better segmentation accuracy than the original algorithm for UW-III, UNLV, ICDAR 2009 page segmentation competition test images and circuit diagram datasets.

62 citations

Proceedings ArticleDOI
28 Aug 2018
TL;DR: This work has created multiple classifiers for document classification and compared their accuracy on raw and processed data and is also exploring hierarchical classifier for classification of classes and subclasses.
Abstract: In this contemporaneous world, it is an obligation for any organization working with documents to end up with the insipid task of classifying truckload of documents, which is the nascent stage of venturing into the realm of information retrieval and data mining. But classification of such humongous documents into multiple classes, calls for a lot of time and labor. Hence a system which could classify these documents with acceptable accuracy would be of an unfathomable help in document engineering. We have created multiple classifiers for document classification and compared their accuracy on raw and processed data. We have garnered data used in a corporate organization as well as publicly available data for comparison. Data is processed by removing the stop-words and stemming is implemented to produce root words. Multiple traditional machine learning techniques like Naive Bayes, Logistic Regression, Support Vector Machine, Random forest Classifier and Multi-Layer Perceptron are used for classification of documents. Classifiers are applied on raw and processed data separately and their accuracy is noted. Along with this, Deep learning technique such as Convolution Neural Network is also used to classify the data and its accuracy is compared with that of traditional machine learning techniques. We are also exploring hierarchical classifiers for classification of classes and subclasses. The system classifies the data faster and with better accuracy than if done manually. The results are discussed in the results and evaluation section.

59 citations


Cited by
More filters
01 Jan 2010
TL;DR: A passive stereo system for capturing the 3D geometry of a face in a single-shot under standard light sources is described, modified of standard stereo refinement methods to capture pore-scale geometry, using a qualitative approach that produces visually realistic results.
Abstract: This paper describes a passive stereo system for capturing the 3D geometry of a face in a single-shot under standard light sources. The system is low-cost and easy to deploy. Results are submillimeter accurate and commensurate with those from state-of-the-art systems based on active lighting, and the models meet the quality requirements of a demanding domain like the movie industry. Recovered models are shown for captures from both high-end cameras in a studio setting and from a consumer binocular-stereo camera, demonstrating scalability across a spectrum of camera deployments, and showing the potential for 3D face modeling to move beyond the professional arena and into the emerging consumer market in stereoscopic photography. Our primary technical contribution is a modification of standard stereo refinement methods to capture pore-scale geometry, using a qualitative approach that produces visually realistic results. The second technical contribution is a calibration method suited to face capture systems. The systemic contribution includes multiple demonstrations of system robustness and quality. These include capture in a studio setup, capture off a consumer binocular-stereo camera, scanning of faces of varying gender and ethnicity and age, capture of highly-transient facial expression, and scanning a physical mask to provide ground-truth validation.

254 citations

Proceedings ArticleDOI
01 Jul 2017
TL;DR: Li et al. as mentioned in this paper presented an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images, which considers document semantic structure extraction as a pixel-wise segmentation task, and proposes a unified model that classifies pixels based not only on their visual appearance, but also on the content of underlying text.
Abstract: We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover, we propose an efficient synthetic document generation process that we use to generate pretraining data for our network. Once the network is trained on a large set of synthetic documents, we fine-tune the network on unlabeled real documents using a semi-supervised approach. We systematically study the optimum network architecture and show that both our multimodal approach and the synthetic data pretraining significantly boost the performance.

199 citations

Journal ArticleDOI
TL;DR: A key outcome from this review is the realization of a need to develop standardized methodologies for the performance evaluation of gaze tracking systems and achieve consistency in their specification and comparative evaluation.
Abstract: In this paper, a review is presented for the research on eye gaze estimation techniques and applications, which has progressed in diverse ways over the past two decades. Several generic eye gaze use-cases are identified: desktop, TV, head-mounted, automotive, and handheld devices. Analysis of the literature leads to the identification of several platform specific factors that influence gaze tracking accuracy. A key outcome from this review is the realization of a need to develop standardized methodologies for the performance evaluation of gaze tracking systems and achieve consistency in their specification and comparative evaluation. To address this need, the concept of a methodological framework for practical evaluation of different gaze tracking systems is proposed.

193 citations

Journal ArticleDOI
TL;DR: A selective review of the psychophysiological measures that can be utilized to assess cognitive states in real-world driving environments to advance the development of effective human-machine driving interfaces and driver support systems is provided.
Abstract: As driving functions become increasingly automated, motorists run the risk of becoming cognitively removed from the driving process. Psychophysiological measures may provide added value not captured through behavioral or self-report measures alone. This paper provides a selective review of the psychophysiological measures that can be utilized to assess cognitive states in real-world driving environments. First, the importance of psychophysiological measures within the context of traffic safety is discussed. Next, the most commonly used physiology-based indices of cognitive states are considered as potential candidates relevant for driving research. These include: electroencephalography and event-related potentials, optical imaging, heart rate and heart rate variability, blood pressure, skin conductance, electromyography, thermal imaging, and pupillometry. For each of these measures, an overview is provided, followed by a discussion of the methods for measuring it in a driving context. Drawing from recent empirical driving and psychophysiology research, the relative strengths and limitations of each measure are discussed to highlight each measures' unique value. Challenges and recommendations for valid and reliable quantification from lab to (less predictable) real-world driving settings are considered. Finally, we discuss measures that may be better candidates for a near real-time assessment of motorists' cognitive states that can be utilized in applied settings outside the lab. This review synthesizes the literature on in-vehicle psychophysiological measures to advance the development of effective human-machine driving interfaces and driver support systems.

174 citations