scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 1988"


Journal ArticleDOI
TL;DR: The basic principles of linear predictive coding (LPC) are presented and least-squares methods for obtaining the LPC coefficients characterizing the all-pole filter are described.
Abstract: The basic principles of linear predictive coding (LPC) are presented. Least-squares methods for obtaining the LPC coefficients characterizing the all-pole filter are described. Computational factors, instantaneous updating, and spectral estimation are discussed. >

224 citations


Journal ArticleDOI
TL;DR: A low-bit-rate linear predictive coder (LPC) that is based on variable-length segment quantization that is compared to that of fixed-length segments quantization and vector quantization for voice coding is presented.
Abstract: A low-bit-rate linear predictive coder (LPC) that is based on variable-length segment quantization is presented. In this vocoder, the speech spectral-parameter sequence is represented as the concatenation of variable-length spectral segments generated by linearly time-warping fixed-length code segments. Both the sequence of code segments and the segment lengths are efficiently determined using a dynamic programming procedure. This procedure minimizes the spectral distance measured between the original and the coded spectral sequence in a given interval. An iterative algorithm is developed for designing fixed-length code segments for the training spectral sequence. It updates the segment boundaries of the training spectral sequence using an a priori codebook and updates the codebook using these segment sequences. The convergence of this algorithm is discussed theoretically and experimentally. In experiments, the performance of variable-length segment quantization for voice coding is compared to that of fixed-length segment quantization and vector quantization. >

209 citations


Proceedings Article
John Moody1
01 Jan 1988
TL;DR: A class of fast, supervised learning algorithms inspired by Albus's CMAC model that use local representations, hashing, and multiple scales of resolution to approximate functions which are piece-wise continuous are presented.
Abstract: A class of fast, supervised learning algorithms is presented. They use local representations, hashing, and multiple scales of resolution to approximate functions which are piece-wise continuous. Inspired by Albus's CMAC model, the algorithms learn orders of magnitude more rapidly than typical implementations of back propagation, while often achieving comparable qualities of generalization. Furthermore, unlike most traditional function approximation methods, the algorithms are well suited for use in real time adaptive signal processing. Unlike simpler adaptive systems, such as linear predictive coding, the adaptive linear combiner, and the Kalman filter, the new algorithms are capable of efficiently capturing the structure of complicated non-linear systems. As an illustration, the algorithm is applied to the prediction of a chaotic timeseries.

171 citations


Journal ArticleDOI
TL;DR: An LPC (linear predictive coding) cepstrum distance measure (CD) is introduced as an objective measure for estimating the subjective quality of speech signals and good correspondence between LPC CD and the subjectivequality, expressed in terms of both opinion equivalent Q and mean opinion score are shown.
Abstract: An LPC (linear predictive coding) cepstrum distance measure (CD) is introduced as an objective measure for estimating the subjective quality of speech signals. Good correspondence between LPC CD and the subjective quality, expressed in terms of both opinion equivalent Q and mean opinion score, are shown. Good repeatability of objective quality evaluation using LPC CD is also shown. A method for generating an artificial voice signal that reflects the characteristics of real speech signals is described. The LPC CD values calculated using this artificial voice are almost the same as those calculated using real speech signals. The speaker-dependency of the coded-speech quality is shown to be an important factor in low-bit-rate speech coding. Even taking this factor into consideration, LPC CD is shown to be effective for estimating the subjective quality. >

151 citations


PatentDOI
TL;DR: In this paper, a speech synthesizer that synthesizes speech by actuating a voice source and a filter which processes output of the voice source according to speech parameters in each successive short interval of time according to feature vectors which include formant frequencies, formant bandwidth, speech rate and so on.
Abstract: A speech synthesizer that synthesizes speech by actuating a voice source and a filter which processes output of the voice source according to speech parameters in each successive short interval of time according to feature vectors which include formant frequencies, formant bandwidth, speech rate and so on. Each feature vector, or speech parameter is defined by two target points (r1, r2), and a value at each target point together with a connection curve between target points. A speech rate is defined by a speech rate curve which defines elongation or shortening of the speech rate, by start point (d1) of elongation (or shorteninng), end point (d2), and elongation ratio between d1 and d2. The ratios between the relative time of each speech parameter and absolute time are preliminarily calculated according to the speech rate table in each predetermined short interval.

137 citations


Journal ArticleDOI
TL;DR: The most effective estimation technique for packets containing 16 ms of speech in a pulse-code-modulation format is pitch waveform replication, which extends the acceptable ratio of missing packets to 10%.
Abstract: Missing packets are a major cause of impairment in packet voice networks. While it is easiest to allow these gaps in received speech to appear as silent intervals in reconstructed speech, speech quality is improved by filling the gaps with estimates of the transmitted waveform. Several estimation techniques have been investigated for packets containing 16 ms of speech in a pulse-code-modulation format. The simplest method, packet repetition, extends from 2% to 5%, the acceptable ratio of missing packets. Here, acceptability is defined as a mean opinion score midway between fair and good on a five-point opinion scale. The most effective estimation technique (although not the most complex) is pitch waveform replication. It extends the acceptable ratio of missing packets to 10%. >

131 citations


Journal ArticleDOI
TL;DR: This paper deals with the problem of enhancing speech signals that have been degraded by statistically independent quasistationary noise and proposes maximum likelihood estimation solutions that are based upon the E–M algorithm and its derivatives.
Abstract: This paper deals with the problem of enhancing speech signals that have been degraded by statistically independent quasistationary noise. The estimation of the clean speech waveform, and of the parameters of autoregressive (AR) models for the clean speech, given the noisy speech, is considered. The two problems are demonstrated to be closely related in the sense that a good solution to one of them can be used for achieving a satisfactory solution for the other. The difficulties in solving these estimation problems are mainly due to the lack of explicit knowledge of the statistics of the clean speech signal and of the noise process. Maximum likelihood estimation solutions that are based upon the E–M algorithm and its derivatives are proposed. For estimating the speech waveform, the statistics of the clean speech signal and of the noise process are first estimated by training a pair of Gaussian AR hidden Markov models, one for the clean speech and the other for the noise, using long training sequences from the two sources. Then, the speech waveform is reestimated by applying the E–M algorithm to the estimated statistics. An approximation to the E–M algorithm is interpreted as being an iterative procedure in which Wiener filtering and AR modeling are alternatively applied. The different algorithms considered here will be compared and demonstrated.

115 citations


Journal ArticleDOI
Chin-Hui Lee1
TL;DR: A robust linear prediction (LP) algorithms is proposed that minimizes the sum of appropriately weighted residuals and takes into account the non-Gaussian nature of the excitations for voiced speech and gives a more efficient and less biased estimate for the prediction coefficients than conventional methods.
Abstract: A robust linear prediction (LP) algorithms is proposed that minimizes the sum of appropriately weighted residuals. The weight is a function of the prediction residual, and the cost function is selected to give more weight to the bulk of small residuals while deemphasizing the small portion of large residuals. In contrast, the conventional LP procedure weights all prediction residuals equally. The robust algorithm takes into account the non-Gaussian nature of the excitations for voiced speech and gives a more efficient (less variance) and less biased estimate for the prediction coefficients than conventional methods. The algorithm can be used in the front-end features extractor for a speech recognition system and as an analyzer for a speech coding system. Testing on synthetic vowel data demonstrates that the robust LP procedure is able to reduce the formant and bandwidth error rate by more than an order of magnitude compared to the conventional LP procedures and is relatively insensitive to the placement of the LPC (LP coding) analysis window and to the value of the pitch period, for a given section of speech signal. >

112 citations


Journal ArticleDOI
TL;DR: It is shown that postfilters based on higher order LPC (linear predictive coding) models can provide very low distortion in terms of special tilt and can provide better speech enhancement than circuits based on the backward-adaptive pole-zero predictor in ADPCM (adaptive digital pulse code modulation).
Abstract: It is shown that postfiltering circuits based on higher order LPC (linear predictive coding) models can provide very low distortion in terms of special tilt. Thus, they can provide better speech enhancement than circuits based on the backward-adaptive pole-zero predictor in ADPCM (adaptive digital pulse code modulation). Quantitative criteria for designing postfiltering circuits based on higher-order LPC models are discussed. These postfilters are particularly attractive for systems where high-order LPC analysis is an integral part of the coding algorithm. In a subjective test that used a computer-simulated version of these circuits, enhanced ADPCM obtained a mean opinion score of 3.6 at 16 kb/s. >

65 citations


Proceedings ArticleDOI
11 Apr 1988
TL;DR: The synthesizer, in conjunction with straightforward noise suppression, greatly improve the quality of the synthetic speech, rendering the sinusoidal transform coder (STC) algorithm a truly robust system.
Abstract: A technique for sine-wave synthesis is described that uses the fast Fourier transform overlap-add method at a 100 Hz rate based on sine-wave parameter coded at a 50 Hz rate. This technique leads to an implementation requiring less than one-half the computational power of a digital-signal-processor chip. The synthesis method implicitly introduces a frequency jitter which renders the encoded synthetic speech more natural. For speech computed by additive acoustic noise, the synthesizer, in conjunction with straightforward noise suppression, greatly improve the quality of the synthetic speech, rendering the sinusoidal transform coder (STC) algorithm a truly robust system. More recent architecture studies of the STC algorithm suggests that an entire implementation requires no more than two ADSP2 100 chips. >

62 citations


Proceedings ArticleDOI
P. Kroon1, B. Atal1
11 Apr 1988
TL;DR: It is found that the coder does not reproduce high frequencies well and that rapid changes in the speech signal are not adequately tracked.
Abstract: Some of the distortions produced by CELP (code-excited linear prediction) coders are characterized. It is found that the coder does not reproduce high frequencies well and that rapid changes in the speech signal are not adequately tracked. Within the framework of the current CELP concept, strategies are discussed that can reduce these distortions. Nonstationarities in the speech signal can be better followed by allowing a flexible allocation of the bits used for the excitation. However, the bit allocation procedures and the way the bits are used need further improvement. The reproduction of higher frequencies can be improved by changing the error-weighting procedure or by shaping the code-book excitation functions. >

Proceedings ArticleDOI
11 Apr 1988
TL;DR: It has been found that the LPC parameter bit rate required to achieve high-quality synthetic speech is only 1300 b/s, and when SIVP is combined with scalar quantization, the bit rate can be reduced even further without introducing any perceivable quantization noise in the reconstructed speech.
Abstract: An efficient, low-complexity method called switched-adaptive interframe vector prediction (SIVP) has been developed for linear predictive coding (LPC) of spectral parameters in the development of low-bit-rate speech coding systems. SIVP utilizes vector linear prediction to exploit the high frame-to-frame redundancy present in the successive frames of LPC parameters. When SIVP is combined with scalar quantization, it has been found that the LPC parameter bit rate required to achieve high-quality synthetic speech is only 1300 b/s. With vector quantization, the bit-rate can be reduced even further (to 1000 b/s) without introducing any perceivable quantization noise in the reconstructed speech. >

Proceedings ArticleDOI
11 Apr 1988
TL;DR: The result of information listening tests indicate that this system can achieve high quality for both clean and noisy speech as the MBE speech is extremely robust to the presence of background noise in speech.
Abstract: A speech model, referred to as the multiband excitation (MBE) speech model, has been shown to be capable of synthesizing speech without the artifacts common to model-based speech systems and has been used to develop a 4.8 kb/s speech coder. This system was developed using several new approaches to quantize the MBE model parameters. These techniques were designed to utilize additional redundancy amongst these parameters, thereby permitting more efficient quantization. The result of information listening tests indicate that this system can achieve high quality for both clean and noisy speech as the MBE speech is extremely robust to the presence of background noise in speech. >

Journal ArticleDOI
TL;DR: Improvements to the SELP algorithm are described which result in better speech quality and higher computational efficiency, and a new recursive algorithm which performs a very fast search through the adaptive codebook.

PatentDOI
TL;DR: A speech recognizer which utilizes hypothesis testing to determine formant frequencies for use in speech recognition and an optimum formant selector operates with a comparator to select from the formant candidates those formants which best match stored reference formants.
Abstract: A speech recognizer which utilizes hypothesis testing to determine formant frequencies for use in speech recognition. A pre-processor (36) receives speech signal frames and utilizes linear predictive coding to generate all formant frequency candidates. An optimum formant selector (38) operates with a comparator (40) to select from the formant candidates those formants which best match stored reference formants. A dynamic time warper (42) and high level recognition logic (44) operate to determine whether or not to declare a recognized word.

Patent
Kumar Swaminathan1
30 Sep 1988
TL;DR: In this article, a sub-band speech coding arrangement was proposed, which divides the speech spectrum into sub-bands and allocates bits to encode the time frame interval samples of each subband responsive to the speech energies of the subbands.
Abstract: A sub-band speech coding arrangement divides the speech spectrum into sub-bands and allocates bits to encode the time frame interval samples of each sub-band responsive to the speech energies of the sub-bands. The sub-band samples are quantized according to the sub-band energy bit allocation and the time frame quantized samples and speech energy signals are coded. A signal representative of the residual difference between the each time frame interval speech sample of the sub-band and the corresponding quantized speech sample of the sub-band is generated. The quality of the sub-band coded signal is improved by selecting the sub-bands with the largest residual differences, producing a vector signal from the sequence of residual difference signals of each selected sub-band, and matching the sub-band vector signal to one of a set of stored Gaussian codebook entries to generate a reduced bit code for the selected vector signal. The coded time frame interval quantized signals, speech energy signals and reduced bit codes for the selected residual differences are combined to form a multiplexed stream for the speech pattern of the time frame interval.

Journal ArticleDOI
TL;DR: Quality assessment methodologies for speech waveform coding, source coding, and speech synthesis by rule from the viewpoints of naturalness and intelligibility are reviewed.
Abstract: The concept of speech quality assessment is examined. Quality assessment methodologies for speech waveform coding, source coding, and speech synthesis by rule from the viewpoints of naturalness and intelligibility are reviewed. Both subjective and objective measures are considered. >

Journal ArticleDOI
F.K. Soong1, Man Mohan Sondhi1
TL;DR: The authors propose an adaptively weighted Itakura distortion measure, which they studied its effects on the performance of a conventional dynamic time-warping (DTW)-based speech recognizer in a series of speaker-independent, isolated-digit-recognition experiments.
Abstract: The authors propose an adaptively weighted Itakura distortion measure. They studied its effects on the performance of a conventional dynamic time-warping (DTW)-based speech recognizer in a series of speaker-independent, isolated-digit-recognition experiments. The equivalent SNR improvement achieved by using the proposed weighted Itakura distortion at low SNRs is about 5-7 dB. >

Proceedings ArticleDOI
D. Mansour1, Biing-Hwang Juang1
11 Apr 1988
TL;DR: Experimental results show that the new measures cause no degradation in recognition accuracy at high SNR, but perform significantly better when tested under noisy conditions using only clean reference templates.
Abstract: The authors aim at the formulation of similarity measures for robust speech recognition. Their consideration focuses on the speech cepstrum derived from linear prediction coefficients (the LPC cepstrum). By using common models for noisy speech, they analytically and empirically show how the ambient noise can affect some important attributes of the LPC cepstrum such as the vector norm, coefficient order, and the direction perturbation. The new findings led them to propose a family of distortion measures based on the projection between two cepstral vectors. Performance evaluation of these measures has been conducted in both speaker-dependent and speaker-independent isolated word recognition tasks. Experimental results show that the new measures cause no degradation in recognition accuracy at high SNR, but perform significantly better when tested under noisy conditions using only clean reference templates. At an SNR of 5 dB, the new measures are shown to be able to achieve a recognition rate equivalent to that obtained by the filtered cepstral measure at 20 dB SNR, demonstrating a gain of 15 dB. >

Proceedings ArticleDOI
11 Apr 1988
TL;DR: If a number of VQ stages is increased sufficiently, MSVXC can be expressed as a form of transform coding, in which the computationally intensive excitation codebook search is completely eliminated.
Abstract: An approach to vector-excitation-coding (VXC) speech compression utilizing multiple-stage vector quantization (VQ) is considered. Called multiple-stage VXC (MSVXC), this technique facilitates the use of high-dimensional excitation vectors at medium-band rates without substantially increasing computation. The basic approach consists of successively approximating the input speech vector in several cascaded VQ stages, where the input vector for each stage is the quantization error vector from the preceding stage. It is shown that if a number of VQ stages is increased sufficiently, MSVXC can be expressed as a form of transform coding, in which the computationally intensive excitation codebook search is completely eliminated. >

Proceedings ArticleDOI
D. Mansour1, Biing-Hwang Juang1
11 Apr 1988
TL;DR: The short-time modified coherence (SCM) representation, proposed here, is an all-pole modeling of the autocorrelation sequence followed by a spectral shaper, essentially a square root operator in the frequency domain that compensates for the inherent spectral distortion introduced by the autOCorrelation operation on the autoceanic sequence of the signal.
Abstract: A technique for robust spectral representation of all-pole sequences is proposed. It is shown that the autocorrelation of an all-pole sequence, obtained by passing white noise through an all-pole filter 1/A(z), is an all-pole sequence of the form 1/A/sup 2/(z). The short-time modified coherence (SCM) representation, proposed here, is an all-pole modeling of the autocorrelation sequence followed by a spectral shaper. The spectral shaper, essentially a square root operator in the frequency domain, compensates for the inherent spectral distortion introduced by the autocorrelation operation on the autocorrelation sequence of the signal. The properties of the SMC representation, especially its robustness to additive white noise, are analyzed. Initial implementation of the SMC in a speaker-dependent isolated-word recognizer shows a considerable improvement over the standard linear predictive coding (LPC) representation. The SMC recognizer achieved an improvement in recognition accuracy equivalent to an increase in input SNR of approximately 13 dB, as compared to the LPC recognizer. >

Proceedings ArticleDOI
11 Apr 1988
TL;DR: Analysis parameters and various distance measures are investigated for a template matching scheme for speaker identity verification (SIV) and performance varies significantly across vocabulary, and average performance is approximately 5% EER for the better algorithms on telephone speech.
Abstract: Analysis parameters and various distance measures are investigated for a template matching scheme for speaker identity verification (SIV). Two parameters are systematically varied-the length of the signal analysis window, and the order of the linear predictive coding/-cepstrum analysis. Computational costs associated with the choice of parameters are also considered. The distance measures tested are the Euclidean, inverse variance weighting, differential mean weighting, Kahn's simplified weighting, the Mahalanobis distance, and the Fisher linear discriminant. Using the equal error rate (EER) of pairwise utterance dissimilarity distributions, performance is estimated for prespecified and (a simulation of) user-determined input vocabulary. Performance varies significantly across vocabulary, and average performance is approximately 5% EER for the better algorithms on telephone speech. >

Journal ArticleDOI
TL;DR: Two algorithms for the estimation of these time-varying log area ratios are proposed; the first one is an approximation using a lattice filter, while the second one minimizes a least-squares criterion.
Abstract: A large class of stationary signals, containing speech signals, but not restricted to them, can be represented by time-varying models, the coefficients of which are finite linear combinations of known time functions. Such models have been found useful for speech recognition and speech synthesis, but they suffer in this last application from a lack of stability. A time-varying area-ratio (AR) model, into which the time-dependency is coded through log-area ratios to ensure stability is described. Two algorithms for the estimation of these time-varying log area ratios are proposed; the first one is an approximation using a lattice filter, while the second one minimizes a least-squares criterion. The evaluation of their performance is obtained by a set of simulations. An example of speech signal modeled with these time-varying log area ratios shows the usefulness of this approach for speech synthesis and recognition. >

01 Jan 1988
TL;DR: In this paper, a set of iterative speech enhancement techniques employing specral constraints is extended and evaluated, and the authors apply inter-and intraframe spectral constraints to ensure optimum speech quality across all classes of speech.
Abstract: set of iterative speech enhancement techniques employing specral constraints is extended and evaluated in this paper. The orignal unconstrained technique attempts to solve for the maximum ikelihood estimate of a speech waveform in additive noise. The new approaches (presented in ICASSP-87 [3]), apply inter- and intraframe spectral constraints to ensure optimum speech quality across all classes of speech. Constraints are applied based on the presence of perceptually important speech characteristics found during the echnique is presented the techniques have colored noise. And d to determine their extremely noisy en

Journal ArticleDOI
C.K. Gan1, R.W. Donaldson
TL;DR: An algorithm that uses two adaptive-amplitude thresholds and zero-crossing rate was used to delete nonspeech material from speech waveforms which have been digitally encoded and then decoded using PCM, adaptive-differential PCM and adaptive-delta-modulation.
Abstract: An algorithm that uses two adaptive-amplitude thresholds and zero-crossing rate was used to delete nonspeech material from speech waveforms which have been digitally encoded and then decoded using PCM, adaptive-differential PCM, and adaptive-delta-modulation. Typically, compression rates of 35% resulted. Subject evaluations are used to assess reconstructed speech quality, which improves significantly when absolute silence on playback is replaced with prerecorded background noise. >

Proceedings ArticleDOI
11 Apr 1988
TL;DR: A system to improve the intelligibility of noisy speech through the use of vector quantization of linear predictive coding (LPC) spectra and a distance measure involving formants is described, appearing to be a promising way to transform noisy speech into more intelligible signals.
Abstract: A system to improve the intelligibility of noisy speech through the use of vector quantization of linear predictive coding (LPC) spectra and a distance measure involving formants is described. Based on experiments using the system on natural speech degraded by additive white noise, the approach appears to be a promising way to transform noisy speech into more intelligible signals. As the noise corruption increases (and SRN decreases), the output speech becomes more distorted in terms of spectral jumps and mismatches, but remains free of noise. Good intelligibility remains as low as 9 dB SNR, although the speech is unnatural due to the LPC synthesis. >

Proceedings ArticleDOI
11 Apr 1988
TL;DR: Experimental results indicate that high-quality synthesized speech can be obtained using the LSP parameters at relatively low rates.
Abstract: The performance of several algorithms for the quantization of the line spectrum pair (LSP) parameters is studied. An adaptive method which utilizes the ordering property of the LSP parameters is presented. The performance of the different quantization schemes is studied on a long sequence of speech samples. For the spectral distortion measure, appropriate performance comparisons between the different quantization schemes are rendered. Experimental results indicate that high-quality synthesized speech can be obtained using the LSP parameters at relatively low rates. >

Proceedings ArticleDOI
F. Charpentier1, Eric Moulines1
11 Apr 1988
TL;DR: The authors present FFT synthesis algorithms for a French text-to-speech system based on diphone concatenation, and an experiment to reduce the computational cost by performing all the FFTs off-line is described.
Abstract: The authors present FFT synthesis algorithms for a French text-to-speech system based on diphone concatenation. FFT synthesis techniques are capable of producing high quality prosodic modifications of natural speech. Several approaches are presented to reduce the distortions due to diphone concatenation. They are based on appropriate manipulations of the phase spectrum, either by phase equalization across all the diphones, or by phase smoothing between successive diphones. The resulting speech is significantly better quality than with conventional LPC synthesis. An experiment to reduce the computational cost by performing all the FFTs off-line is described. The resulting speech is slightly degraded with respect to 'full' FFT synthesized speech, but it remains more natural in comparison with the LPC speech. >

Proceedings ArticleDOI
A. Nadas1, David Nahamoo1, Michael Picheny1
11 Apr 1988
TL;DR: A general technique termed adaptive labeling is presented for the normalization of the speech signal that combines the familiar labeling process executed by a vector quantizer with an adaptive renormalization transformation of the feature vectors proposed here.
Abstract: A general technique termed adaptive labeling is presented for the normalization of the speech signal. In principle, adaptive labeling is applicable to any sequence of feature vectors of a given dimension. It combines the familiar labeling process executed by a vector quantizer with an adaptive renormalization transformation of the feature vectors proposed here. Adaptive labeling is applied to speech recognition, where the particular interest lies in diminishing the degradation of performance that occurs as a result of changes in the signal characteristics following changes in ambient noise and other recording environment conditions or in response to a change in the characteristics of the talker. Results are presented for a series of experiments using soft and loud noises as well as environments in which microphone-to-speaker distances were allowed to vary. A 5000-word vocabulary with isolated word input was used. >

Patent
27 Sep 1988
TL;DR: In this paper, a line connection switching apparatus performs connection switching of a communication line used both for speech communication performed by a telephone set and data communication by a facsimile apparatus.
Abstract: A line connection switching apparatus performs connection switching of a communication line used both for speech communication performed by a telephone set and data communication performed by a facsimile apparatus and is inserted between the telephone set and the facsimile apparatus, and the communication line. When predetermined speech command information is input from a caller side through the communication line, a speech signal is detected by a switching unit. When a voice/silence discriminator discriminates that the speech signal represents voice, a voice interval monitor monitors a duration of the speech signal. If the duration of the speech signal falls within a predetermined range, the speech signal is stored in a speech signal storage unit. A pattern matching unit verifies whether a standard speech pattern of a speech signal registered in standard pattern dictionary unit matches with a speech pattern of the input speech signal stored in the speech signal storage unit. In accordance with a verification result, if the speech pattern of the input speech signal coincides with the standard pattern, the switching unit switches connection of the communication line to the facsimile apparatus.