scispace - formally typeset
Search or ask a question

Showing papers by "Sandra Maria Aluísio published in 2019"


Journal ArticleDOI
TL;DR: It is shown that gross errors present even in state-of-the-art systems can be avoided and that an accurate acoustic model can be built in a hierarchical fashion and that even with a small amount of data, accurate and robust recognition rates can be obtained.
Abstract: In low-resource scenarios, for example, small datasets or a lack in computational resources available, state-of-the-art deep learning methods for speech recognition have been known to fail. It is possible to achieve more robust models if care is taken to ensure the learning guarantees provided by the statistical learning theory. This work presents a shallow and hybrid approach using a convolutional neural network feature extractor fed into a hierarchical tree of support vector machines for classification. Here, we show that gross errors present even in state-of-the-art systems can be avoided and that an accurate acoustic model can be built in a hierarchical fashion. Furthermore, we present proof that our algorithm does adhere to the learning guarantees provided by the statistical learning theory. The acoustic model produced in this work outperforms traditional hidden Markov models, and the hierarchical support vector machine tree outperforms a multi-class multilayer perceptron classifier using the same features. More importantly, we isolate the performance of the acoustic model and provide results on both the frame and phoneme level, considering the true robustness of the model. We show that even with a small amount of data, accurate and robust recognition rates can be obtained.

9 citations


Proceedings ArticleDOI
01 Jan 2019
TL;DR: This work presents a two-fold novelty where a carefully designed CNN architecture, together with a knowledge-driven classifier achieves nearly state-of-the-art phoneme recognition results with absolutely no pretraining or external weight initialization.
Abstract: A common belief in the community is that deep learning requires large datasets to be effective. We show that with careful parameter selection, deep feature extraction can be applied even to small datasets.We also explore exactly how much data is necessary to guarantee learning by convergence analysis and calculating the shattering coefficient for the algorithms used. Another problem is that state-of-the-art results are rarely reproducible because they use proprietary datasets, pretrained networks and/or weight initializations from other larger networks. We present a two-fold novelty for this situation where a carefully designed CNN architecture, together with a knowledge-driven classifier achieves nearly state-of-the-art phoneme recognition results with absolutely no pretraining or external weight initialization. We also beat the best replication study of the state of the art with a 28% FER. More importantly, we are able to achieve transparent, reproducible frame-level accuracy and, additionally, perform a convergence analysis to show the generalization capacity of the model providing statistical evidence that our results are not obtained by chance. Furthermore, we show how algorithms with strong learning guarantees can not only benefit from raw data extraction but contribute with more robust results.

2 citations


Journal ArticleDOI
01 Sep 2019
TL;DR: A lexicon that will be used to support the task of automatically detecting and correcting discourse marker errors and can potentially identify many others, as long as new lexical inputs are incorporated into them is presented.
Abstract: Discourse markers are words and expressions (such as: firstly, then, for example, because, as a result, likewise, in comparison, in contrast) that explicitly state the relational structure of the information in the text, i.e. signalling a sequential relationship between the current message and the previous discourse. Using these markers improves the cohesion and coherence of texts, facilitating reading comprehension. Although often included in tools that support the rhetoric structuring of texts, discourse markers have hardly been explored in writing support tools for learners of a second language. However, learners of a second language, including those at advanced levels, have trouble producing these lexical items, frequently replacing them with items from their native language or with literal translations of items in their own language, which often do not result in proper lexical items in the second language. In addition, students learn a single marker per function and use it repeatedly, producing monotonous texts. With the aim of contributing to reducing these difficulties, this paper presents a lexicon that will be used to support the task of automatically detecting and correcting discourse marker errors. Several heuristics have been evaluated to generate different types of errors. Automatic translation methods were used to semi-automatically compile the lexicon used in these heuristics. Similarity measures were also combined with these heuristics to correct discourse marker errors. The evaluated methods proved to be suitable for the task of identifying some types of discourse marker errors and can potentially identify many others, as long as new lexical inputs are incorporated into them.

1 citations