scispace - formally typeset
Search or ask a question

Showing papers by "Partha Pratim Roy published in 2018"


Journal ArticleDOI
TL;DR: Mg-HA composite structure shows impressive potential to be used in orthopaedic fracture fixing accessories, and addition of 5wt% HA is found effective in reducing the corrosion rate and improvement in the compressive yield strength of biodegradable magnesium alloy by 23%.
Abstract: Development of biodegradable implants has grown into one of the important areas in medical science. Degradability becomes more important for orthopaedic accessories used to support fractured and damaged bones, in order to avoid second surgery for their removal after healing. Clinically available biodegradable orthopaedic materials are mainly made of polymers or ceramics. These orthopaedic accessories have an unsatisfactory mechanical strength, when used in load-bearing parts. Magnesium and its alloys can be suitable candidate for this purpose, due to their outstanding strength to weight ratio, biodegradability, non-toxicity and mechanical properties, similar to natural bone. The major drawback of magnesium is its low corrosion resistance, which also influences its mechanical and physical characteristics in service condition. An effort has been taken in this research to improve the corrosion resistance, bioactivity and mechanical strength of biodegradable magnesium alloys by synthesizing Mg-3wt% Zn matrix composite, reinforced with thermally treated hydroxyapatite(HA) [Ca10(PO4)6(OH)2], a bioactive and osteogenic ceramic. Addition of 5wt% HA is found effective in reducing the corrosion rate by 42% and improvement in the compressive yield strength of biodegradable magnesium alloy by 23%. In-vitro evaluation, up to 56 days, reveal improved resistance to degradation with HA reinforcement to Mg. Osteoblast cells show better growth and proliferation on HA reinforced surfaces of the composite. Mg-HA composite structure shows impressive potential to be used in orthopaedic fracture fixing accessories.

92 citations


Journal ArticleDOI
01 Feb 2018
TL;DR: A coarse-to-fine-level envisioned speech recognition framework with the help of EEG signals that outperforms the existing research work in terms of accuracy and robustness is proposed.
Abstract: Recent advances in EEG technology makes brain-computer-interface (BCI) an exciting field of research BCI is primarily used to adopt with the paralyzed human body parts However, BCI in envisioned speech recognition using electroencephalogram (EEG) signals has not been studied in details Therefore, developing robust speech recognition system using EEG signals was proposed In this paper, we propose a coarse-to-fine-level envisioned speech recognition framework with the help of EEG signals that can be thought of as a serious contribution in this field of research Coarse-level classification is used to differentiate/categorize text and non-text classes using random forest (RF) classifier Next, a finer-level imagined speech recognition of each class has been carried out EEG data of 30 text and not-text classes including characters, digits, and object images have been imagined by 23 participants in this study A recognition accuracy of 8520 and 6703% has been recorded at coarse- and fine-level classifications, respectively The proposed framework outperforms the existing research work in terms of accuracy We also show the robustness in envisioned speech recognition

71 citations


Journal ArticleDOI
TL;DR: In this paper, a new texture descriptor based on the local neighborhood intensity difference is proposed for content based image retrieval (CBIR), which considers the relative intensity difference between a particular pixel and the center pixel by considering its adjacent neighbors and generate a sign and a magnitude pattern.
Abstract: In this paper, a new texture descriptor based on the local neighborhood intensity difference is proposed for content based image retrieval (CBIR). For computation of texture features like Local Binary Pattern (LBP), the center pixel in a 3 × 3 window of an image is compared with all the remaining neighbors, one pixel at a time to generate a binary bit pattern. It ignores the effect of the adjacent neighbors of a particular pixel for its binary encoding and also for texture description. The proposed method is based on the concept that neighbors of a particular pixel hold significant amount of texture information that can be considered for efficient texture representation for CBIR. The main impact of utilizing the mutual relationship among adjacent neighbors is that we do not rely on the sign of the intensity difference between central pixel and one of its neighbors (Ii) only, rather we take into account the sign of difference values between Ii and its adjacent neighbors along with the central pixels and same set of neighbors of Ii. This makes our pattern more resistant to illumination changes. Moreover, most of the local patterns including LBP concentrates mainly on the sign information and thus ignores the magnitude. The magnitude information which plays an auxiliary role to supply complementary information of texture descriptor, is integrated in our approach by considering the mean of absolute deviation about each pixel Ii from its adjacent neighbors. Taking this into account, we develop a new texture descriptor, named as Local Neighborhood Intensity Pattern (LNIP) which considers the relative intensity difference between a particular pixel and the center pixel by considering its adjacent neighbors and generate a sign and a magnitude pattern. Finally, the sign pattern (LNIPS) and the magnitude pattern (LNIPM) are concatenated into a single feature descriptor to generate a more effective feature descriptor. The proposed descriptor has been tested for image retrieval on four databases, including three texture image databases - Brodatz texture image database, MIT VisTex database and Salzburg texture database and one face database - AT&T face database. The precision and recall values observed on these databases are compared with some state-of-art local patterns. The proposed method showed a significant improvement over many other existing methods.

65 citations


Journal ArticleDOI
TL;DR: A novel multimodal framework for SLR system is presented by incorporating facial expression with sign gesture using two different sensors, namely Leap motion and Kinect.

64 citations


Journal ArticleDOI
TL;DR: An investigation have been made to analyze the impact of positive and negative emotions using Electroencephalogram (EEG), and shows that emotions recognition is possible from EEG signals.

62 citations


Journal ArticleDOI
TL;DR: A robust position invariant SLR framework is presented that is capable of recognizing occluded sign gestures and has been tested on a dataset of 2700 gestures.
Abstract: Sign language is the only means of communication for speech and hearing impaired people. Using machine translation, Sign Language Recognition (SLR) systems provide medium of communication between speech and hearing impaired and others who have difficulty in understanding such languages. However, most of the SLR systems require the signer to sign in front of the capturing device/sensor. Such systems fail to recognize some gestures when the relative position of the signer is changed or when the body occlusion occurs due to position variations. In this paper, we present a robust position invariant SLR framework. A depth-sensor device (Kinect) has been used to obtain the signer’s skeleton information. The framework is capable of recognizing occluded sign gestures and has been tested on a dataset of 2700 gestures. The recognition process has been performed using Hidden Markov Model (HMM) and the results show the efficiency of the proposed framework with an accuracy of 83.77% on occluded gestures.

59 citations


Journal ArticleDOI
TL;DR: A novel Coarse-to-Fine framework for continuous HAR using Microsoft Kinect to develop a robust and continuous Human Activity Recognition (HAR) system and has been compared with existing approaches.

46 citations


Journal ArticleDOI
TL;DR: A novel approach of video text detection using Fourier–Laplacian filtering in the frequency domain that includes a verification technique using Hidden Markov Model (HMM) that surpasses existing methods on text detection.

38 citations


Posted Content
TL;DR: The inter-channel relationship between Hue and Saturation channels in the HSV color space has been explored and the proposed descriptors provide a significant improvement over existing descriptors for content base color image retrieval.
Abstract: In this paper, we have proposed a novel feature descriptors combining color and texture information collectively. In our proposed color descriptor component, the inter-channel relationship between Hue (H) and Saturation (S) channels in the HSV color space has been explored which was not done earlier. We have quantized the H channel into a number of bins and performed the voting with saturation values and vice versa by following a principle similar to that of the HOG descriptor, where orientation of the gradient is quantized into a certain number of bins and voting is done with gradient magnitude. This helps us to study the nature of variation of saturation with variation in Hue and nature of variation of Hue with the variation in saturation. The texture component of our descriptor considers the co-occurrence relationship between the pixels symmetric about both the diagonals of a 3x3 window. Our work is inspired from the work done by Dubey et al.[1]. These two components, viz. color and texture information individually perform better than existing texture and color descriptors. Moreover, when concatenated the proposed descriptors provide significant improvement over existing descriptors for content base color image retrieval. The proposed descriptor has been tested for image retrieval on five databases, including texture image databases - MIT VisTex database and Salzburg texture database and natural scene databases Corel 1K, Corel 5K and Corel 10K. The precision and recall values experimented on these databases are compared with some state-of-art local patterns. The proposed method provided satisfactory results from the experiments.

36 citations


Journal ArticleDOI
TL;DR: In this paper, a color channel selection approach is proposed for text recognition from scene images and video frames, which is based on Hidden Markov Model (HMM) which uses Pyramidal Histogram of Oriented Gradient features extracted from selected color channel.
Abstract: In recent years, recognition of text from natural scene image and video frame has got increased attention among the researchers due to its various complexities and challenges. Because of low resolution, blurring effect, complex background, different fonts, color and variant alignment of text within images and video frames, etc., text recognition in such scenario is difficult. Most of the current approaches usually apply a binarization algorithm to convert them into binary images and next OCR is applied to get the recognition result. In this paper, we present a novel approach based on color channel selection for text recognition from scene images and video frames. In the approach, at first, a color channel is automatically selected and then selected color channel is considered for text recognition. Our text recognition framework is based on Hidden Markov Model (HMM) which uses Pyramidal Histogram of Oriented Gradient features extracted from selected color channel. From each sliding window of a color channel our color-channel selection approach analyzes the image properties from the sliding window and then a multi-label Support Vector Machine (SVM) classifier is applied to select the color channel that will provide the best recognition results in the sliding window. This color channel selection for each sliding window has been found to be more fruitful than considering a single color channel for the whole word image. Five different features have been analyzed for multi-label SVM based color channel selection where wavelet transform based feature outperforms others. Our framework of color channel selection is script-independent. It has been tested in English (Roman) and Devanagari (Indic) scripts. We have tested our approach on English datasets (ICDAR 2003, ICDAR 2013, MSRA-TD500, IIIT5K, SVT, YVT) publicly available for both video and scene images. For Devanagari script, we collected our own dataset. The performances obtained from experimental results are encouraging and show the advantage of the proposed method.

32 citations


Journal ArticleDOI
TL;DR: A novel multimodal user identification and verification scheme combining two inter-linked biometric traits, i.e., signature and brain signals (Electroencephalography (EEG) and its effectiveness is evaluated.

Journal ArticleDOI
01 Dec 2018-Displays
TL;DR: The proposed methodology works on the principle of augmenting 3D virtual objects over the English alphabets that are used as printed markers that are believed to create engaging experience for the kids, especially the kindergarten age group.

Journal ArticleDOI
TL;DR: A compartmental chemosensor probe HL has been designed and synthesized for the selective recognition of zinc ions over other transition metal ions via fluorescence "ON" strategy and a dual response was established based on "OFF-ON-OFF" strategy for detection of both cation and anion.
Abstract: A compartmental chemosensor probe HL has been designed and synthesized for the selective recognition of zinc ions over other transition metal ions via fluorescence “ON” strategy. The chemosensing behaviour of HL was demonstrated through fluorescence, absorption and NMR spectroscopic techniques. The molecular structure of the zinc complex derived from HL was determined by X-ray crystallography. A probable mechanism of this selective sensing behavior was described on the basis of spectroscopic results and theoretical studies by density functional theory (DFT). The biological applicability of the chemosensor HL was examined via cell imaging on HeLa cells. The HL-zinc complex served as a secondary fluorescent probe responding to the pyrophosphate anion specifically over other anions. The fluorescence enhancement of HL in association with Zn2+ ions was quenched in the presence of pyrophosphate (PPi). Thus, a dual response was established based on “OFF–ON–OFF” strategy for detection of both cation and anion. This phenomenon was utilized in the construction of a “INHIBIT” logic gate.

Journal ArticleDOI
TL;DR: In this article, a cross-language platform for handwritten word recognition and spotting for such low-resource scripts where training is performed with a sufficiently large dataset of an available script and testing is done on other scripts (considered as target script).

Journal ArticleDOI
TL;DR: A lexicon free approach for the recognition of 3D handwritten words in Latin and Devanagari scripts by combining multiple classifiers by using the Recognizer Output Voting Error Reduction (ROVER) framework.

Journal ArticleDOI
TL;DR: This paper proposes a sequential classifier comprising of bidirectional long short-term memory (BLSTM) classifier followed by convexity defect-based arrowhead detection, which outperforms the existing state-of-the-art arrow detection techniques.
Abstract: Biomedical images are often complex, and contain several regions that are annotated using arrows. Annotated arrow detection is a critical precursor to region-of-interest (ROI) labeling, which is useful in content-based image retrieval (CBIR). In this paper, we propose a sequential classifier comprising of bidirectional long short-term memory (BLSTM) classifier followed by convexity defect-based arrowhead detection. Different image layers are first segmented via fuzzy binarization. Candidate regions are then checked whether they are arrows by using BLSTM classifier, where Npen++ features are used. In case of low confidence score (i.e., BLSTM classifier score), we take convexity defect-based arrowhead detection technique into account. Our test results on biomedical images from imageCLEF 2010 collection outperforms the existing state-of-the-art arrow detection techniques, by approximately more than 3% in precision, 12% in recall, and therefore 8% in $$\text{F}_1$$ score.

Journal ArticleDOI
TL;DR: This work proposes a supervised trajectory classification approach using a combination of global and Segmental Hidden Markov Model (HMM) based classifiers and shows the effectiveness of the system, which outperforms traditional HMM-based systems in SVC2004 signature dataset.
Abstract: Trajectory classification techniques face various challenges due to varying length and lack of the presence of clear boundaries among the trajectory classes. To overcome such challenges, a trajectory shrinking framework using Adaptive Multi-Kernel based Shrinkage (AMKS) can be used. However, such a strategy often results in over-shrinking of trajectories leading to poor classification. To improve classification performance, we introduce two additional kernels that are based on convex hull and Ramer–Douglas–Peucker (RDP) algorithm. Next, we propose a supervised trajectory classification approach using a combination of global and Segmental Hidden Markov Model (HMM) based classifiers. In the first stage, HMM is used globally for classification of trajectory to provide state-wise distribution of trajectory segments. In the second stage, state-wise trajectory segments are classified and combined with global recognition performance to improve the classification results. Combination of Global HMM and Segmental HMM is performed using a genetic algorithm (GA) based framework in the final stage. We have conducted experiments over two publicly available datasets, popularly known as T15 and MIT. We have achieved 94.80% and 96.75% of accuracies on T15 and MIT datasets, respectively. We also analyzed the robustness of the proposed framework by adding Gaussian noise. To show the effectiveness of the system, we have performed recognition of on-line signature using proposed Segmental HMM based combination model. In SVC2004 signature dataset, it outperforms traditional HMM-based systems.

Journal ArticleDOI
TL;DR: A graph-based representation of a given surveillance scene and learning of relevant features including origin, destination, path, speed, size, etc are combined and correlated with target behaviors to detect abnormalities in moving object trajectories and an aggregation method that reduces the number of missed alarms during aggregation is proposed.
Abstract: Use of CCTV is growing rapidly in surveillance applications. Rapid advancement in machine learning and camera hardware has opened-up adequate scopes to build next generation of expert systems aiming at understanding surveillance environments automatically by detection of trajectory abnormality through analyzing object behavior. Such intelligent surveillance systems should be able to learn and combine multiple concepts of abnormality in real-life scenario and classify the events of interest as normal or abnormal. Primary challenges of such systems are to represent and learn patterns in surveillance scenes and combine multiple concepts of abnormalities to activate the alarm system. This paper presents a graph-based representation of a given surveillance scene and learning of relevant features including origin, destination, path, speed, size, etc. These features are combined and correlated with target behaviors to detect abnormalities in moving object trajectories. We also propose an aggregation method that reduces the number of missed alarms during aggregation. Several cases using publicly available surveillance video datasets have been presented and the results indicate that the proposed method can be useful to design intelligent and expert surveillance systems.

Journal ArticleDOI
TL;DR: This paper presents a video summarization framework based on users emotion while they watch videos by analyzing cerebral activities through Electroencephalogram (EEG) signals, and uses a crowdsourcing model for effective summarization of the videos.

Proceedings ArticleDOI
01 Aug 2018
TL;DR: In this article, an encoder-decoder sequence-to-sequence model is proposed to recover the pen trajectory of offline characters, which is a crucial step for handwritten character recognition.
Abstract: In this paper, we introduce a novel technique to recover the pen trajectory of offline characters which is a crucial step for handwritten character recognition. Generally, online acquisition approach has more advantage than its offline counterpart as the online technique keeps track of the pen movement. Hence, pen tip trajectory retrieval from offline text can bridge the gap between online and offline methods. Our proposed framework employs sequence to sequence model which consists of an encoder-decoder LSTM module. The proposed encoder module consists of Convolutional LSTM network, which takes an offline character image as the input and encodes the feature sequence to a hidden representation. The output of the encoder is fed to a decoder LSTM and we get the successive coordinate points from every time step of the decoder LSTM. Although the sequence to sequence model is a popular paradigm in various computer vision and language translation tasks, the main contribution of our work lies in designing an end-to-end network for a decade old popular problem in document image analysis community. Tamil, Telugu and Devanagari characters of LIPI Toolkit dataset are used for our experiments. Our proposed method has achieved superior performance compared to the other conventional approaches.

Journal ArticleDOI
TL;DR: It is shown that the 3rd dimension, which essentially represents instantaneous pressure during writing, can improve the accuracy of the biometric systems and believe, Leap motion can be an alternative to the existing biometric setups.
Abstract: Signature recognition is identifying the signature’s owner, whereas verification is the process to find whether a signature is genuine or forged. Though, both are important in the field of forensic sciences, however, verification is more important to banks and credit card companies. In this paper, we have proposed a methodology to analyze 3D signatures captured using Leap motion sensor. We have extended existing 2D features into 3D from raw signatures and applied well-known classifiers for recognition as well as verification. We have shown that the 3rd dimension, which essentially represents instantaneous pressure during writing, can improve the accuracy of the biometric systems. We have created a large dataset containing more than 2000 signatures registered by 100 volunteers using the Leap motion interface. This has been made available online for the research community. Our analysis shows that, the proposed 3D extension is better than its original 2D version. Recognition and verification accuracy have increased by 6.8% and 9.5%, respectively using k-NN. Similarly, accuracy has increased by 9.9% (recognition) and 6.5% (verification) when HMM is used as the classifier. Similar results have been recorded on benchmark datasets. A comparison with 2D tablet-stylus interface has been carried out which also supports our claims. We believe, Leap motion can be an alternative to the existing biometric setups.

Journal ArticleDOI
TL;DR: A word-familiarity prediction approach based on EEG signals from the user’s brain waves has been developed and the characteristics of brain waves at the time of unknown word perception can be detected.

Proceedings ArticleDOI
01 Aug 2018
TL;DR: In this article, the authors proposed a novel approach for staff line removal, based on Generative Adversarial Networks, which converted staff line images into patches and feed them into a U-Net, used as generator.
Abstract: Staff line removal is a crucial pre-processing step in Optical Music Recognition. In this paper we propose a novel approach for staff line removal, based on Generative Adversarial Networks. We convert staff line images into patches and feed them into a U-Net, used as Generator. The Generator intends to produce staff-less images at the output. Then the Discriminator does binary classification and differentiates between the generated fake staff-less image and real ground truth staff less image. For training, we use a Loss function which is a weighted combination of L2 loss and Adversarial loss. L2 loss minimizes the difference between real and fake staff-less image. Adversarial loss helps to retrieve more high quality textures in generated images. Thus our architecture supports solutions which are closer to ground truth and it reflects in our results. For evaluation we consider the ICDAR/GREC 2013 staff removal database. Our method achieves superior performance in comparison to other conventional approaches on the same dataset.

Journal ArticleDOI
TL;DR: A line based date spotting approach using hidden Markov model (HMM) which is used to detect the date information in a given text is proposed and the results show the effectiveness of the proposeddate spotting approach.

Journal ArticleDOI
TL;DR: Two series of diversified heterocyclic molecules, tetracyclic benzimidazoles and perimidines have been synthesized in good yields by condensation of acid anhydrides and diacids with various diamines using microwave irradiation for in vitro antiproliferative activity against five human cancer cell lines.
Abstract: Benzimidazoles and perimidines are subsidiary structures for research and development of new biologically active molecules and have established prominence because of their promising biological activities. Two series of diversified heterocyclic molecules, tetracyclic benzimidazole derivatives, tetracyclic and pentacyclic perimidine derivatives have been synthesized in good yields by condensation of acid anhydrides and diacids with various diamines using microwave irradiation. All synthesized derivatives were fully characterized and evaluated for in vitro antiproliferative activity against five human cancer cell lines. Compounds 3a (breast T47D, lung NCl H-522), 3b (colon HCT-15), 3d (lung NCl H-522, ovary PA-1), 3f (breast T47D, liver HepG2) and 5a (breast T47D) exhibited good anticancer activity with $$\hbox {IC}_{50}$$ values ranging from $$7.5\pm 0.3\,\upmu \hbox {M}$$ to $$14.6\pm 0.4\,\upmu \hbox {M}$$ .

Journal ArticleDOI
TL;DR: This paper presents a methodology to analyse 3D signatures captured using Leap motion sensor with the help of a new feature-set extracted using convex hull vertices enclosing the signature, using k-NN and HMM classifiers to classify signatures.
Abstract: Recognition of signature is a method of identification, whereas verification takes the decision about its genuineness. Though recognition and verification both play important role in forensic sciences, however, recognition is of special importance to the banking sectors. In this paper, we present a methodology to analyse 3D signatures captured using Leap motion sensor with the help of a new feature-set extracted using convex hull vertices enclosing the signature. We have used k-NN and HMM classifiers to classify signatures. Experiments carried out using our dataset as well as publicly available datasets reveal that the proposed feature-set can reduce the computational burden significantly as compared to existing features. It has been observed that a 10-fold computational gain can be achieved with non-noticeable loss in performance using the proposed feature-set as compared with the existing high-level features due to significant reduction in the feature vector size. On a large dataset of 1600 samples, two of the existing features take approximately 60s and 3s to recognise signatures using k-NN and HMM classifiers. However, features constructed using convex hull vertices take 1.9s and 0.4s, respectively. Our proposed system can be used in applications where recognition and verification need to be performed quickly on large datasets comprising with billions of samples.

Posted Content
TL;DR: In this paper, a multimodal deep network which takes both offline and online modality of the data as input in order to explore the information from both the modalities jointly for script identification task is proposed.
Abstract: In this paper, we propose a novel approach of word-level Indic script identification using only character-level data in training stage. The advantages of using character level data for training have been outlined in section I. Our method uses a multimodal deep network which takes both offline and online modality of the data as input in order to explore the information from both the modalities jointly for script identification task. We take handwritten data in either modality as input and the opposite modality is generated through intermodality conversion. Thereafter, we feed this offline-online modality pair to our network. Hence, along with the advantage of utilizing information from both the modalities, it can work as a single framework for both offline and online script identification simultaneously which alleviates the need for designing two separate script identification modules for individual modality. One more major contribution is that we propose a novel conditional multimodal fusion scheme to combine the information from offline and online modality which takes into account the real origin of the data being fed to our network and thus it combines adaptively. An exhaustive experiment has been done on a data set consisting of English and six Indic scripts. Our proposed framework clearly outperforms different frameworks based on traditional classifiers along with handcrafted features and deep learning based methods with a clear margin. Extensive experiments show that using only character level training data can achieve state-of-art performance similar to that obtained with traditional training using word level data in our framework.

Posted Content
TL;DR: The proposed method can successfully identify some of the important traffic anomalies such as vehicles not following lane driving, sudden speed variations, abrupt termination of vehicle movement, and vehicles moving in wrong directions.
Abstract: Classifying time series data using neural networks is a challenging problem when the length of the data varies. Video object trajectories, which are key to many of the visual surveillance applications, are often found to be of varying length. If such trajectories are used to understand the behavior (normal or anomalous) of moving objects, they need to be represented correctly. In this paper, we propose video object trajectory classification and anomaly detection using a hybrid Convolutional Neural Network (CNN) and Variational Autoencoder (VAE) architecture. First, we introduce a high level representation of object trajectories using color gradient form. In the next stage, a semi-supervised way to annotate moving object trajectories extracted using Temporal Unknown Incremental Clustering (TUIC), has been applied for trajectory class labeling. Anomalous trajectories are separated using t-Distributed Stochastic Neighbor Embedding (t-SNE). Finally, a hybrid CNN-VAE architecture has been used for trajectory classification and anomaly detection. The results obtained using publicly available surveillance video datasets reveal that the proposed method can successfully identify some of the important traffic anomalies such as vehicles not following lane driving, sudden speed variations, abrupt termination of vehicle movement, and vehicles moving in wrong directions. The proposed method is able to detect above anomalies at higher accuracy as compared to existing anomaly detection methods.

Proceedings ArticleDOI
01 Aug 2018
TL;DR: The proposed Convolutional Recurrent Generative model is the first of its kind which can handle images of varying widths and is compared with some of the state-of-the-art methods for image translation.
Abstract: Conversion of one font to another font is very useful in real life applications. In this paper, we propose a Convolutional Recurrent Generative model to solve the word level font transfer problem. Our network is able to convert the font style of any printed text images from its current font to the required font. The network is trained end-to-end for the complete word images. Thus it eliminates the necessary pre-processing steps, like character segmentations. We extend our model to conditional setting that helps to learn one-to-many mapping function. We employ a novel convolutional recurrent model architecture in the Generator that efficiently deals with the word images of arbitrary width. It also helps to maintain the consistency of the final images after concatenating the generated image patches of target font. Besides, the Generator and the Discriminator network, we employ a Classification network to classify the generated word images of converted font style to their subsequent font categories. Most of the earlier works related to image translation are performed on square images. Our proposed architecture is the first of its kind which can handle images of varying widths. Word images generally have varying width depending on the number of characters present. Hence, we test our model on a synthetically generated font dataset. We compare our method with some of the state-of-the-art methods for image translation. The superior performance of our network on the same dataset proves the ability of our model to learn the font distributions.

Posted Content
TL;DR: A novel technique to recover the pen trajectory of offline characters which is a crucial step for handwritten character recognition and has achieved superior performance compared to the other conventional approaches.
Abstract: In this paper, we introduce a novel technique to recover the pen trajectory of offline characters which is a crucial step for handwritten character recognition. Generally, online acquisition approach has more advantage than its offline counterpart as the online technique keeps track of the pen movement. Hence, pen tip trajectory retrieval from offline text can bridge the gap between online and offline methods. Our proposed framework employs sequence to sequence model which consists of an encoder-decoder LSTM module. Our encoder module consists of Convolutional LSTM network, which takes an offline character image as the input and encodes the feature sequence to a hidden representation. The output of the encoder is fed to a decoder LSTM and we get the successive coordinate points from every time step of the decoder LSTM. Although the sequence to sequence model is a popular paradigm in various computer vision and language translation tasks, the main contribution of our work lies in designing an end-to-end network for a decade old popular problem in Document Image Analysis community. Tamil, Telugu and Devanagari characters of LIPI Toolkit dataset are used for our experiments. Our proposed method has achieved superior performance compared to the other conventional approaches.