scispace - formally typeset
Search or ask a question

Showing papers by "Brno University of Technology published in 2011"


Proceedings Article
01 Jan 2011
TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
Abstract: We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state automata (using the freely available OpenFst), together with detailed documentation and a comprehensive set of scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phonetic-context sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users.

5,857 citations


Proceedings ArticleDOI
22 May 2011
TL;DR: Several modifications of the original recurrent neural network language model are presented, showing approaches that lead to more than 15 times speedup for both training and testing phases and possibilities how to reduce the amount of parameters in the model.
Abstract: We present several modifications of the original recurrent neural network language model (RNN LM).While this model has been shown to significantly outperform many competitive language modeling techniques in terms of accuracy, the remaining problem is the computational complexity. In this work, we show approaches that lead to more than 15 times speedup for both training and testing phases. Next, we show importance of using a backpropagation through time algorithm. An empirical comparison with feedforward networks is also provided. In the end, we discuss possibilities how to reduce the amount of parameters in the model. The resulting RNN model can thus be smaller, faster both during training and testing, and more accurate than the basic one.

1,675 citations


Journal ArticleDOI
TL;DR: In this article, the authors review history, types, structure and especially the different synthesis methods for CNTs preparation including arc discharge, laser ablation and chemical vapour deposition.
Abstract: Carbon nanotubes (CNTs) have been under scientific investigation for more than fifteen years because of their unique properties that predestine them for many potential applications. The field of nanotechnology and nanoscience push their investigation forward to produce CNTs with suitable parameters for future applications. It is evident that new approaches of their synthesis need to be developed and optimized. In this paper we review history, types, structure and especially the different synthesis methods for CNTs preparation including arc discharge, laser ablation and chemical vapour deposition. Moreover, we mention some rarely used ways of arc discharge deposition which involves arc discharge in liquid solutions in contrary to standard used deposition in a gas atmosphere. In addition, the methods for uniform vertically aligned CNTs synthesis using lithographic techniques for catalyst deposition as well as a method utilizing a nanoporous anodized aluminium oxide as a pattern for selective CNTs growth are reported too.

648 citations


Proceedings ArticleDOI
01 Dec 2011
TL;DR: This work describes how to effectively train neural network based language models on large data sets and introduces hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model.
Abstract: We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model. This leads to significant reduction of computational complexity. We achieved around 10% relative reduction of word error rate on English Broadcast News speech recognition task, against large 4-gram model trained on 400M tokens.

539 citations


Proceedings Article
01 Aug 2011
TL;DR: It is concluded that for both small and moderately sized tasks, new state of the art results with combination of models, that is significantly better than performance of any individual model are obtained.
Abstract: We present results obtained with several advanced language modeling techniques, including class based model, cache model, maximum entropy model, structured language model, random forest language model and several types of neural network based language models. We show results obtained after combining all these models by using linear interpolation. We conclude that for both small and moderately sized tasks, we obtain new state of the art results with combination of models, that is significantly better than performance of any individual model. Obtained perplexity reductions against Good-Turing trigram baseline are over 50% and against modified Kneser-Ney smoothed 5-gram over 40%. Index Terms: language modeling, neural networks, model combination, speech recognition

326 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used jute, flax, and hemp to develop a new insulating material from renewable resources with comparable building physics and mechanical properties to commonly used insulations materials.

309 citations


Journal ArticleDOI
TL;DR: A new approach to speech recognition, in which all Hidden Markov Model states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state, appears to give better results than a conventional model.

304 citations


Proceedings Article
01 Jan 2011
TL;DR: To recognize language in the iVector space, three different linear classifiers are experiment with: one based on a generative model, where classes are modeled by Gaussian distributions with shared covariance matrix, and two discriminative classifiers, namely linear Support Vector Machine and Logistic Regression.
Abstract: The concept of so called iVectors, where each utterance is represented by fixed-length low-dimensional feature vector, has recently become very successfully in speaker verification. In this work, we apply the same idea in the context of Language Recognition (LR). To recognize language in the iVector space, we experiment with three different linear classifiers: one based on a generative model, where classes are modeled by Gaussian distributions with shared covariance matrix, and two discriminative classifiers, namely linear Support Vector Machine and Logistic Regression. The tests were performed on the NIST LRE 2009 dataset and the results were compared with stateof-the-art LR based on Joint Factor Analysis (JFA). While the iVector system offers better performance, it also seems to be complementary to JFA, as their fusion shows another improvement.

248 citations


Proceedings ArticleDOI
22 May 2011
TL;DR: The use of universal background models (UBM) with full-covariance matrices is suggested and thoroughly experimentally tested and dimensionality reduction of i-vectors before entering the PLDA-HT modeling is investigated.
Abstract: In this paper, we describe recent progress in i-vector based speaker verification. The use of universal background models (UBM) with full-covariance matrices is suggested and thoroughly experimentally tested. The i-vectors are scored using a simple cosine distance and advanced techniques such as Probabilistic Linear Discriminant Analysis (PLDA) and heavy-tailed variant of PLDA (PLDA-HT). Finally, we investigate into dimensionality reduction of i-vectors before entering the PLDA-HT modeling. The results are very competitive: on NIST 2010 SRE task, the results of a single full-covariance LDA-PLDA-HT system approach those of complex fused system.

194 citations


Proceedings ArticleDOI
22 May 2011
TL;DR: The speaker verification score for a pair of i-vectors representing a trial is computed with a functional form derived from the successful PLDA generative model, which provides up to 40% relative improvement on the NIST SRE 2010 evaluation task.
Abstract: Recently, i-vector extraction and Probabilistic Linear Discriminant Analysis (PLDA) have proven to provide state-of-the-art speaker verification performance. In this paper, the speaker verification score for a pair of i-vectors representing a trial is computed with a functional form derived from the successful PLDA generative model. In our case, however, parameters of this function are estimated based on a discriminative training criterion. We propose to use the objective function to directly address the task in speaker verification: discrimination between same-speaker and different-speaker trials. Compared with a baseline which uses a generatively trained PLDA model, discriminative training provides up to 40% relative improvement on the NIST SRE 2010 evaluation task.

193 citations


Book
20 Dec 2011
TL;DR: In this article, a combination of literature review, face to face interviews and focus group meetings was applied to complete the research objective and six specific skills and associated behaviours were identified as being most important.
Abstract: It is recognized by academics and the community of practice that the management of people plays an important role in project management. Recent people skills research expresses the need to develop a better understanding of what good people management is. This paper proposes what project management practitioners consider to be skills and behaviours of an effective people project manager. A combination of literature review, face to face interviews and focus group meetings was applied to complete the research objective. Six specific skills and associated behaviours were identified as being most important. The results suggest that project managers would benefit from adopting these skills and behaviours to strengthen their managing people skills and behaviours to improve the successful delivery of projects. The findings also suggest that some skill sets and behaviours may be more appropriate for application in certain project environments such as IT or the construction industry.

Journal ArticleDOI
TL;DR: This paper presents a new approach to inverting (fitting) models of coupled dynamical systems based on state-of-the-art Kalman filtering, which promises to provide a significant advance in characterizing the functional architectures of distributed neuronal systems, even in the absence of known exogenous input.

Journal ArticleDOI
TL;DR: Three models of a constant-phase element consisting of passive R and C components are described, which can be used for practical realization of fractional analog differentiators and integrators, fractional oscillators, chaotic networks or for analog simulation of fractionsal control systems.
Abstract: SUMMARY The paper describes models of a constant-phase element consisting of passive R and C components. The models offer any input impedance argument (phase) between −90° and 0° over a selectable frequency band covering several decades. The design procedure makes it possible to choose values of average phase, phase ripple, frequency bandwidth, and total number of R and C elements. The model can cover three frequency decades with as few as five resistors and five capacitors. The models can be used for practical realization of fractional analog differentiators and integrators, fractional oscillators, chaotic networks or for analog simulation of fractional control systems. Copyright © 2011 John Wiley & Sons, Ltd.

Proceedings Article
01 Jan 2011
TL;DR: This work uses recurrent neural network (RNN) based language models to improve the BUT English meeting recognizer and examines the influence of word history on WER and shows how to speed-up rescoring by caching common prefix strings.
Abstract: We use recurrent neural network (RNN) based language models to improve the BUT English meeting recognizer. On the baseline setup using the original language models we decrease word error rate (WER) more than 1% absolute by n-best list rescoring and language model adaptation. When n-gram language models are trained on the same moderately sized data set as the RNN models, improvements are higher yielding a system which performs comparable to the baseline. A noticeable improvement was observed with unsupervised adaptation of RNN models. Furthermore, we examine the influence of word history on WER and show how to speed-up rescoring by caching common prefix strings. Index Terms: automatic speech recognition, language modeling, recurrent neural networks, rescoring, adaptation

Journal ArticleDOI
TL;DR: In this paper, the correlation between grain size, optical birefringence, and transparency is discussed for tetragonal zirconia (ZrO2) ceramics using the Mie, Rayleigh, and Rayleigh-Gans-Debye scattering models.
Abstract: The correlation between grain size, optical birefringence, and transparency is discussed for tetragonal zirconia (ZrO2) ceramics using the Mie, Rayleigh, and Rayleigh–Gans–Debye scattering models. Our results demonstrate that at the degree of mean birefringence in the range (0.03–0.04) expected for tetragonal ZrO2, only the Mie theory provides reasonable results. At small particle size (o50 nm) the more straightforward Rayleigh approximation correlates with the Mie model. A real in-line transmission of B50% at visible light and 1 mm thickness is expected at a mean grain size o40 nm and B70% at a mean grain size o20 nm. At an infrared (IR) wavelength of 5 lm there should not be any scattering caused by birefringence for grain sizes o200 nm. Our simulations were validated with experimental data for tetragonal ZrO2 (3 mol% Y2O3) ceramics made from a powder with an initial particle size of B10 nm by sintering in air and using hot-isostatic pressing. The maximum in-line transmission of about 77% was observed at IR wavelengths of 3–5 lm.

Proceedings ArticleDOI
22 May 2011
TL;DR: Under certain assumptions, the formulas for i-vector extraction—also used in i- vector extractor training—can be simplified and lead to a faster and memory more efficient code.
Abstract: This paper introduces some simplifications to the i-vector speaker recognition systems. I-vector extraction as well as training of the i-vector extractor can be an expensive task both in terms of memory and speed. Under certain assumptions, the formulas for i-vector extraction—also used in i-vector extractor training—can be simplified and lead to a faster and memory more efficient code. The first assumption is that the GMM component alignment is constant across utterances and is given by the UBM GMM weights. The second assumption is that the i-vector extractor matrix can be linearly transformed so that its per-Gaussian components are orthogonal. We use PCA and HLDA to estimate this transform.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the non-crossing-type hysteretic loop cannot occur in ideal memristors, memcapacitors and meminductors which are defined axiomatically via the corresponding constitutive relations or via other equivalent characteristics.
Abstract: Recently, novel findings have been published, according to which some mem-systems excited by harmonic signals can be characterised by the so-called `non-crossing-type pinched hysteretic loops`. Presented is a proof that this phenomenon cannot occur in ideal memristors, memcapacitors and meminductors which are defined axiomatically via the corresponding constitutive relations or via other equivalent characteristics, and that the `crossing-type hysteretic loop` is thus one of their typical fingerprints.

Book ChapterDOI
06 Sep 2011
TL;DR: The basic Ramsey-based approach to checking language inclusion between two nondeterministic Buchi automata A and B is built on, with the following new techniques: a larger subsumption relation based on a combination of backward and forward simulations, and abstraction techniques that can speed up the computation and lead to early detection of counterexamples.
Abstract: Checking language inclusion between two nondeterministic Buchi automata A and B is computationally hard (PSPACE-complete). However, several approaches which are efficient in many practical cases have been proposed. We build on one of these, which is known as the Ramsey-based approach. It has recently been shown that the basic Ramsey-based approach can be drastically optimized by using powerful subsumption techniques, which allow one to prune the search-space when looking for counterexamples to inclusion. While previous works only used subsumption based on set inclusion or forward simulation on A and B, we propose the following new techniques: (1) A larger subsumption relation based on a combination of backward and forward simulations on A and B. (2) A method to additionally use forward simulation between A and B. (3) Abstraction techniques that can speed up the computation and lead to early detection of counterexamples. The new algorithm was implemented and tested on automata derived from real-world model checking benchmarks, and on the Tabakov-Vardi random model, thus showing the usefulness of the proposed techniques.

Journal ArticleDOI
TL;DR: In this article, the thermal decomposition of kaolinite was studied by differential thermogravimetry (DTG) technique under non-isothermal conditions, and the apparent activation energy and frequency factor for the dehydroxylation of kaolin was evaluated by Kissinger method.

01 Sep 2011
TL;DR: A possible realization of such a model that is quite simple and in spite of its simplicity makes it possible to simulate the properties of ideal CPEs is described.
Abstract: Analysis of fractal systems (i.e. systems described by fractional differential equations) necessitates to create an electrical analog model of a crucial subsystem called Constant Phase Element (CPE). The paper describes a possible realization of such a model that is quite simple and in spite of its simplicity makes it possible to simulate the properties of ideal CPEs. The paper also deals with the effect of component tolerances on the resultant responses of the model and describes several typical model applica- tions.

Journal ArticleDOI
22 May 2011
TL;DR: It is shown that it is possible to train a gender-independent discriminative model that achieves state-of-the-art accuracy, comparable to the one of aGender-dependent system, saving memory and execution time both in training and in testing.
Abstract: This work presents a new and efficient approach to discriminative speaker verification in the i-vector space. We illustrate the development of a linear discriminative classifier that is trained to discriminate between the hypothesis that a pair of feature vectors in a trial belong to the same speaker or to different speakers. This approach is alternative to the usual discriminative setup that discriminates between a speaker and all the other speakers. We use a discriminative classifier based on a Support Vector Machine (SVM) that is trained to estimate the parameters of a symmetric quadratic function approximating a log-likelihood ratio score without explicit modeling of the i-vector distributions as in the generative Probabilistic Linear Discriminant Analysis (PLDA) models. Training these models is feasible because it is not necessary to expand the i -vector pairs, which would be expensive or even impossible even for medium sized training sets. The results of experiments performed on the tel-tel extended core condition of the NIST 2010 Speaker Recognition Evaluation are competitive with the ones obtained by generative models, in terms of normalized Detection Cost Function and Equal Error Rate. Moreover, we show that it is possible to train a gender-independent discriminative model that achieves state-of-the-art accuracy, comparable to the one of a gender-dependent system, saving memory and execution time both in training and in testing.

Proceedings ArticleDOI
01 Dec 2011
TL;DR: A novel technique for discriminative feature-level adaptation of automatic speech recognition system and found it to be complementary to common adaptation techniques.
Abstract: We presented a novel technique for discriminative feature-level adaptation of automatic speech recognition system. The concept of iVectors popular in Speaker Recognition is used to extract information about speaker or acoustic environment from speech segment. iVector is a low-dimensional fixed-length representing such information. To utilized iVectors for adaptation, Region Dependent Linear Transforms (RDLT) are discriminatively trained using MPE criterion on large amount of annotated data to extract the relevant information from iVectors and to compensate speech feature. The approach was tested on standard CTS data. We found it to be complementary to common adaptation techniques. On a well tuned RDLT system with standard CMLLR adaptation we reached 0.8% additive absolute WER improvement.

Proceedings ArticleDOI
01 Dec 2011
TL;DR: A Convolutive Bottleneck Network is proposed as extension of the current state-of-the-art Universal Context Network and leads to 5.5% relative reduction of WER, compared to the Universal Context ANN baseline.
Abstract: In this paper, we focus on improvements of the bottleneck ANN in a Tandem LVCSR system. First, the influence of training set size and the ANN size is evaluated. Second, a very positive effect of linear bottleneck is shown. Finally a Convolutive Bottleneck Network is proposed as extension of the current state-of-the-art Universal Context Network. The proposed training method leads to 5.5% relative reduction of WER, compared to the Universal Context ANN baseline. The relative improvement compared to the 5-layer single-bottleneck network is 17.7%. The dataset ctstrain07 composed of more than 2000 hours of English Conversational Telephone Speech was used for the experiments. The TNet toolkit with CUDA GPGPU implementation was used for fast training.

Journal ArticleDOI
TL;DR: A simple model for realistic simulations of nanoparticle deposition is presented and this model is employed for modeling nanoparticles on rough substrates and it is shown that the elimination of user influence on the data processing algorithm is a key step for obtaining accurate results while analyzing nanoparticles measured in non-ideal conditions.
Abstract: Nanoparticles are often measured using atomic force microscopy or other scanning probe microscopy methods. For isolated nanoparticles on flat substrates, this is a relatively easy task. However, in real situations, we often need to analyze nanoparticles on rough substrates or nanoparticles that are not isolated. In this article, we present a simple model for realistic simulations of nanoparticle deposition and we employ this model for modeling nanoparticles on rough substrates. Different modeling conditions (coverage, relaxation after deposition) and convolution with different tip shapes are used to obtain a wide spectrum of virtual AFM nanoparticle images similar to those known from practice. Statistical parameters of nanoparticles are then analyzed using different data processing algorithms in order to show their systematic errors and to estimate uncertainties for atomic force microscopy analysis of nanoparticles under non-ideal conditions. It is shown that the elimination of user influence on the data processing algorithm is a key step for obtaining accurate results while analyzing nanoparticles measured in non-ideal conditions.

Proceedings Article
01 Jan 2011
TL;DR: This work proposes a mixture of Probabilistic Linear Discriminant Analysis models (PLDA) as a solution for making systems independent of speaker gender and shows the effectiveness of the mixture model on microphone speech.
Abstract: The Speaker Recognition community that participates in NIST evaluations has concentrated on designing genderand channel-conditioned systems. In the real word, this conditioning is not feasible. Our main purpose in this work is to propose a mixture of Probabilistic Linear Discriminant Analysis models (PLDA) as a solution for making systems independent of speaker gender. In order to show the effectiveness of the mixture model, we first experiment on 2010 NIST telephone speech (det5), where we prove that there is no loss of accuracy compared with a baseline gender-dependent model. We also test with success the mixture model on a more realistic situation where there are cross-gender trials. Furthermore, we report results on microphone speech for the det1, det2, det3 and det4 tasks to confirm the effectiveness of the mixture model.

Journal ArticleDOI
TL;DR: In this paper, a working electrical scheme for modeling the memristor was presented and the user can use it to verify the theoretical presumptions about the properties of memristors.
Abstract: The paper presents a working electrical scheme modeling the memristor. The scheme allows experimenting with the model under various testing signals. The user can use it to verify the theoretical presumptions about the memristor properties. Examples of several typical measurements are shown. Copyright © 2010 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, the authors report on multi-wavelength observations of the corona taken simultaneously in broadband white light, and in seven spectral lines, H-alpha 656.3 nm, Fe IX 435.9 nm and 637.4 nm, respectively.
Abstract: We report on multi-wavelength observations of the corona taken simultaneously in broadband white light, and in seven spectral lines, H-alpha 656.3 nm, Fe IX 435.9 nm, Fe X 637.4 nm, Fe XI 789.2 nm, Fe XIII 1074.7 nm, Fe XIV 530.3 nm and Ni XV 670.2 nm. The observations were made during the total solar eclipse of 11 July 2010 from the atoll of Tatakoto in French Polynesia. Simultaneous imaging with narrow bandpass filters in each of these spectral lines and in their corresponding underlying continua maximized the observing time during less than ideal observing conditions and yielded outstanding quality data. The application of two complementary image processing techniques revealed the finest details of coronal structures at 1" resolution in white light, and 6.5" in each of the spectral lines. This comprehensive wavelength coverage confirmed earlier eclipse findings that the solar corona has a clear two-temperature structure: The open field lines, expanding outwards from the solar surface, are characterized by electron temperatures near 1 X 10(exp 6) K, while the hottest plasma around 2X 10(exp 6) K resides in loop-like structures forming the bulges of streamers. The first images of the corona in the forbidden lines of Fe IX and Ni XV, showed that there was very little coronal plasma at temperatures below 5 X 10(exp 5) K and above 2.5X 10(exp 6) K. The data also enabled temperature differentiations as low as 0:2 X 10(exp 6) K in different density structures. These observations showed how the passage of CMEs through the corona, prior to totality, produced large scale ripples and very sharp streaks, which could be identified with distinct temperatures for the first time. The ripples were most prominent in emission from spectral lines associated with temperatures around 10(exp 6) K. The most prominent streak was associated with a conical-shaped void in the emission from the coolest line of Fe IX and from the hottest line of Ni XV. A prominence, which erupted prior to totality, appeared in the shape of a hook in the cooler lines of Fe X and Fe XI, spanning 0.5 R(solar) in extent starting at a heliocentric distance of 1.3 R(solar), with a complex trail of hot and cool twisted structures connecting it to the solar surface. Simultaneous Fe X 17.4 nm observations from space by Proba2/SWAP provided an ideal opportunity for comparing emission from a coronal forbidden line, namely Fe X 637.4 nm, with a space-based EUV allowed line. Comparison of the Fe X 17.4 nm and 637.4 nm emission provided the first textbook example of the role of radiative excitation in extending the detectability of coronal emission to much larger heliocentric distances than its collisionally excited component. These eclipse observations demonstrate the unique capabilities of coronal forbidden lines for exploring the evolution of the coronal magnetic field in the heliocentric distance range of 1 - 3 R(solar), which is currently inaccessible to any space-borne or ground-based observatory.

Journal ArticleDOI
TL;DR: In this paper, a system where municipal solid waste incinerator is integrated with combined gas-steam cycle is evaluated in the same manner and condition under which is the incinerator classified as highly efficient are specified and analyzed.
Abstract: Discussion about utilization of waste for energy production (waste-to-energy, WTE) has moved on to next development phase. Waste fired power plants are discussed and investigated. These facilities focus on electricity production whereas heat supply is diminished and operations are not limited by insufficient heat demand. Present results of simulation prove that increase of net electrical efficiency above 20% for units processing 100 kt/year (the most common ones) is problematic and tightly bound with increased investments. Very low useful heat production in Rankine-cycle based cogeneration system with standard steam parameters leads to ineffective utilization of energy. This is documented in this article with the help of newly developed methodology based on primary energy savings evaluation. This approach is confronted with common method for energy recovery efficiency evaluation required by EU legislation (Energy Efficiency—R1 Criteria). New term highly-efficient WTE is proposed and condition under which is the incinerator classified as highly efficient are specified and analyzed. Once sole electricity production is compelled by limited local heat demand, application of non-conventional arrangements is highly beneficial to secure effective energy utilization. In the paper a system where municipal solid waste incinerator is integrated with combined gas–steam cycle is evaluated in the same manner.

Book ChapterDOI
14 Jul 2011
TL;DR: Predator is a new open source tool for verification of sequential C programs with dynamic linked data structures based on separation logic with inductive predicates although it uses a graph description of heaps.
Abstract: Predator is a new open source tool for verification of sequential C programs with dynamic linked data structures. The tool is based on separation logic with inductive predicates although it uses a graph description of heaps. Predator currently handles various forms of lists, including singly-linked as well as doubly-linked lists that may be circular, hierarchically nested and that may have various additional pointer links. Predator is implemented as a gcc plug-in and it is capable of handling lists in the form they appear in real system code, especially the Linux kernel, including a limited support of pointer arithmetic. Collaboration on further development of Predator is welcome.

Journal ArticleDOI
TL;DR: In this paper, the authors present the sensitivity and statistical analyses of the load-carrying capacity of a steel portal frame, and apply the Sobol sensitivity analysis to identify the dominant input random imperfections and their higher order interaction effects on the load carrying capacity.