scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Generating synthetic handwriting using n-gram letter glyphs

TL;DR: A non-parametric data driven generation scheme able to mimic the variation observed in handwritten glyph samples to synthesize natural looking synthetic glyphs that can find application in text personalization, or in generation of synthetic data for recognition systems.
Abstract: We propose a framework for synthesis of natural semi cursive handwritten Latin script that can find application in text personalization, or in generation of synthetic data for recognition systems. Our method is based on the generation of synthetic n-gram letter glyphs and their subsequent concatenation. We propose a non-parametric data driven generation scheme that is able to mimic the variation observed in handwritten glyph samples to synthesize natural looking synthetic glyphs. These synthetic glyphs are then stitched together to form complete words, using a spline based concatenation scheme. Further, as a refinement, our method is able to generate pen-lifts, giving our results a natural semi-cursive look. Through subjective experiments and detailed analysis of the results, we demonstrate the effectiveness of our formulation in being able to generate natural looking synthetic script.
Citations
More filters
Proceedings ArticleDOI
01 Aug 2018
TL;DR: A method for generating clones of a target writer's handwritten character images (called handwritten character clones or HCCs) using an incomplete seed character set, which consists of at most one or no example of his/her actual handwriting per character.
Abstract: In this paper, we propose a method for generating clones of a target writer's handwritten character images (called handwritten character clones or HCCs) using an incomplete seed character set, which consists of at most one or no example of his/her actual handwriting per character. In HCC generation, not a single HCC but its distribution should be created for each character because humans' actual handwriting images differ from each other even if the same writer writes the same character. However, it is difficult to achieve this from the incomplete seed character set. To solve the problem, in the proposed method, we first create a number of HCC distributions for each character by clustering a set of handwritten character images offered by other writers. Next, for each character contained in the seed character set, we choose the distribution best fit to its example. Finally, for the other characters, we estimate the best distribution for them employing collaborative filtering. We conducted pilot experiments focusing on Japanese character images, in which the proposed method successfully generated various HCCs with a certain level of quality for each character.

4 citations


Cites background or methods from "Generating synthetic handwriting us..."

  • ...In a typical procedure of the sentence-wise methods [6], [7], [8], which have been more widely studied in the past decade [15], a sentence is assumed as a sequence of 2-dimensional pen-tip locations and a generator of such sequence is tried to be learned....

    [...]

  • ...In addition, with the development of deep network technologies such as autoencoders (AE) [4] and generative adversarial networks (GAN) [5], handwritten character generation [6], [7], [8], [9] has also been a hot topic in recent years, whose purpose is to generate synthetic (or clone) images of handwritten characters resembling a target writer’s actual handwriting....

    [...]

References
More filters
Book
01 Jan 1967

6,827 citations


"Generating synthetic handwriting us..." refers methods in this paper

  • ...This same approach was taken by Guyon et al. [10], where they used an approximately 1000 entries lexicon processed from the Brown Corpus[13]....

    [...]

  • ...This was developed by n-gram frequency analysis of the Google’s Trillion Word Corpus, while the Brown Corpus consists of about one million words....

    [...]

  • ...[10], where they used an approximately 1000 entries lexicon processed from the Brown Corpus[13]....

    [...]

Posted Content
TL;DR: This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time.
Abstract: This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwriting (where the data are real-valued). It is then extended to handwriting synthesis by allowing the network to condition its predictions on a text sequence. The resulting system is able to generate highly realistic cursive handwriting in a wide variety of styles.

3,551 citations


"Generating synthetic handwriting us..." refers background in this paper

  • ...In a recent work, handwriting synthesis was also attempted as a purely sequence generation task [9], with rather promising results....

    [...]

Book
01 Dec 2005
TL;DR: The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and includes detailed algorithms for supervised-learning problem for both regression and classification.
Abstract: Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

2,732 citations


"Generating synthetic handwriting us..." refers methods in this paper

  • ...With these in mind, we opted for Gaussian Process (GP) regression [18]....

    [...]

Journal ArticleDOI
TL;DR: A new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression, relies on expressing the effective prior which the methods are using, and highlights the relationship between existing methods.
Abstract: We provide a new unifying view, including all existing proper probabilistic sparse approximations for Gaussian process regression. Our approach relies on expressing the effective prior which the methods are using. This allows new insights to be gained, and highlights the relationship between existing methods. It also allows for a clear theoretically justified ranking of the closeness of the known approximations to the corresponding full GPs. Finally we point directly to designs of new better sparse approximations, combining the best of the existing strategies, within attractive computational constraints.

1,881 citations


"Generating synthetic handwriting us..." refers background in this paper

  • ...Thus the joint posterior X syn as given by ([15]) is,...

    [...]

Proceedings Article
05 Dec 2005
TL;DR: It is shown that this new Gaussian process (GP) regression model can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime.
Abstract: We present a new Gaussian process (GP) regression model whose co-variance is parameterized by the the locations of M pseudo-input points, which we learn by a gradient based optimization. We take M ≪ N, where N is the number of real data points, and hence obtain a sparse regression method which has O(M2N) training cost and O(M2) prediction cost per test case. We also find hyperparameters of the covariance function in the same joint optimization. The method can be viewed as a Bayesian regression model with particular input dependent noise. The method turns out to be closely related to several other sparse GP approaches, and we discuss the relation in detail. We finally demonstrate its performance on some large data sets, and make a direct comparison to other sparse GP methods. We show that our method can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime.

1,708 citations


"Generating synthetic handwriting us..." refers methods in this paper

  • ...Under the Bayesian framework, the non-parametric GP model provides a flexible and elegant method for non-linear regression[20, 26]....

    [...]