scispace - formally typeset
Search or ask a question

Showing papers on "Audio signal processing published in 1998"


Patent
02 Apr 1998
TL;DR: In this article, a reconfigurable image processing system with a toroidal topology, distributed memory, and wide bandwidth I/O is described, which is capable of solving real applications at real-time speeds.
Abstract: A powerful, scaleable, and reconfigurable image processing system and method of processing data therein is described. This general purpose, reconfigurable engine with toroidal topology, distributed memory, and wide bandwidth I/O are capable of solving real applications at real-time speeds. The reconfigurable image processing system can be optimized to efficiently perform specialized computations, such as real-time video and audio processing. This reconfigurable image processing system provides high performance via high computational density, high memory bandwidth, and high I/O bandwidth. Generally, the reconfigurable image processing system and its control structure include a homogeneous array of 16 field programmable gate arrays (FPGA) and 16 static random access memories (SRAM) arranged in a partial torus configuration. The reconfigurable image processing system also includes a PCI bus interface chip, a clock control chip, and a datapath chip. It can be implemented in a single board. It receives data from its external environment, computes correspondence, and uses the results of the correspondence computations for various post-processing industrial applications. The reconfigurable image processing system determines correspondence by using non-parametric local transforms followed by correlation. These non-parametric local transforms include the census and rank transforms. Other embodiments involve a combination of correspondence, rectification, a left-right consistency check, and the application of an interest operator.

537 citations


BookDOI
01 Mar 1998
TL;DR: There are whole classes of algorithms that the speech community is not interested in pursuing or using in digital signal processing of sound and these algorithms and techniques are revealed in this book.
Abstract: With the advent of `multimedia', digital signal processing (DSP) of sound has emerged from the shadow of bandwidth limited speech processing to become a research field of its own. To date, most research in DSP applied to sound has been concentrated on speech, which is bandwidth limited to about 4 kilohertz. Speech processing is also limited by the low fidelity typically expected in the telephone network. Today, the main applications of audio DSP are high quality audio coding and the digital generation and manipulation of music signals. They share common research topics including perceptual measurement techniques and analysis/synthesis methods. Additional important topics are hearing aids using signal processing technology and hardware architectures for digital signal processing of audio. In all these areas the last decade has seen a significant amount of application-oriented research. The frequency range of wideband audio has an upper limit of 20 kilohertz and the resulting difference in frequency range and Signal to Noise Ratio (SNR) due to sample size must be taken into account when designing DSP algorithms. There are whole classes of algorithms that the speech community is not interested in pursuing or using. These algorithms and techniques are revealed in this book. This book is suitable for advanced level courses and serves as a valuable reference for researchers in the field. Interested and informed engineers will also find the book useful in their work.

300 citations


Proceedings ArticleDOI
22 Apr 1998
TL;DR: This paper explains how the Informedia system takes advantage of the closed captioning frequently broadcast with the news, how it extracts timing information by aligning the closed-captions with the result of the speech recognition, and how the system integrates closed-caption cues with the results of image and audio processing.
Abstract: The Informedia Digital Library Project allows full content indexing and retrieval of text, audio and video material. Segmentation is an integral process in the Informedia digital video library. The success of the Informedia project hinges on two critical assumptions: that we can extract sufficiently accurate speech recognition transcripts from the broadcast audio and that we can segment the broadcast into video paragraphs, or stories, that are useful for information retrieval. In previous papers we have shown that speech recognition is sufficient for information retrieval of pre-segmented video news stories. We now address the issue of segmentation and demonstrate that a fully automatic system can extract story boundaries using available audio, video and closed-captioning cues. The story segmentation step for the Informedia Digital Video Library splits full-length news broadcasts into individual news stories. During this phase the system also labels commercials as separate "stories". We explain how the Informedia system takes advantage of the closed captioning frequently broadcast with the news, how it extracts timing information by aligning the closed-captions with the result of the speech recognition, and how the system integrates closed-caption cues with the results of image and audio processing.

224 citations


Proceedings Article
01 Jan 1998
TL;DR: The fiddle and bonk objects are low tech; the algorithms would be easy to re-code in another language or for other environments from the ones considered here, and the main concern is to get predictable and acceptable behavior using easy-to-understand techniques which won't place an unacceptable computational load on a late-model computer.
Abstract: Two \objects," which run under Max/MSP or Pd, do di erent kinds of real-time analysis of musical sounds. Fiddle is a monophonic or polyphonic maximum-likelihood pitch detector similar to Rabiner's, which can also be used to obtain a raw list of a signal's sinusoidal components. Bonk does a bounded-Q analysis of an incoming sound to detect onsets of percussion instruments in a way which outperforms the standard envelope following technique. The outputs of both objects appear as Max-style control messages. 1 Tools for real-time audio analysis The new real-time patchable software synthesizers have nally brought audio signal processing out of the ivory tower and into the homes of working computer musicians. Now audio can be placed at the center of real-time computer music production, and MIDI, which for a decade was the backbone of the electronic music studio, can be relegated to its appropriate role as a low-bandwidth I/O solution for keyboards and other input devices. Many other sources of control \input" can be imagined than are provided by MIDI devices. This paper, for example, explores two possibilities for deriving a control stream from an incoming audio stream. First, the sound might contain quasi-sinusoidal \partials" and we might wish to know their frequencies and amplitudes. In the case that the audio stream comes from a monophonic or polyphonic pitched instrument, we would like to be able to determine the pitch(es) and loudness(es) of the components. It's clear that we'll never have a perfect pitch detector, but the fiddle object described here does fairly well in some cases. For the many sounds which don't lend themselves to sinusoidal decomposition, we can still get useful information from the overall spectral envelope. For instance, rapid changes in the spectral envelope turn out to be a much more reliable indicator of percussive attacks than are changes in the overall power reported by a classical envelope follower. The bonk object does a bounded-Q lterbank of an incoming sound and can either output the raw analysis or detect onsets which can then be compared to a collection of known spectral templates in order to guess which of several possible kinds of attack has occurred. The fiddle and bonk objects are low tech; the algorithms would be easy to re-code in another language or for other environments from the ones considered here. Our main concern is to get predictable and acceptable behavior using easy-to-understand techniques which won't place an unacceptable computational load on a late-model computer. Some e ort was taken to make fiddle and bonk available on a variety of platforms. They run under Max/MSP (Macintosh), Pd (Wintel, SGI, Linux) and fiddle also runs under FTS (available on several platforms.) Both are distributed with source code; see http://man104nfs.ucsd.edu/~mpuckett/ for details. 2 Analysis of discrete spectra Two problems are of interest here: getting the frequencies and amplitudes of the constituent partials of a sound, and then guessing the pitch. Our program follows the ideas of [Noll 69] and [Rabiner 78]. Whereas the earlier pitch~ object reported in [Puckette 95] departs substantially from the earlier approaches, the algorithmused here adhere more closely to them. First we wish to get a list of peaks with their frequencies and amplitudes. The incoming signal is broken into segments of N samples with N a power of two typically between 256 and 2048. A new analysis is made every N=2 samples. For each analysis the N samples are zero-padded to 2N samples and a rectangular-window DFT is taken. An interesting trick reduces the computation time roughly in half for this setup; see the source code to see how this is done. If we let X[k] denote the zero-padded DFT, we can do a three-point convolution in the frequency domain to get the Hanning-windowed DFT: XH [k] = X[k]=2 (X[k + 2] +X[k 2])=4 Any of the usual criteria can be applied to identify peaks in this spectrum. We then go back to the nonwindowed spectrum to nd the peak frequency using the phase vocoder with hop 1: ! = N k + re X[k 2] X[k + 2] 2X[k] X[k 2] X[k + 2] : This is a special case of a more general formula derived in [Puckette 98]. The amplitude estimate is simply the windowed peak strength at the strongest bin, which because of the zero-padding won't di er by more than about 1 dB from the true peak strength. The phase could be obtained in the same way but we won't bother with that here. 2.1 Guessing fundamental frequencies Fundamental frequencies are guessed using a scheme somewhat suggestive of the maximum-likelihood estimator. Our \likelihood function" is a non-negative function L(f) where f is frequency. The presence of peaks at or near multiples of f increases L(f) in a way which depends on the peak's amplitude and frequency as shown:

213 citations


Patent
TL;DR: In this paper, an audio signal is decomposed into lower and upper sub-band and at least the noise component of the upper subband is encoded at the decoder by a decoding means which utilises a synthesised noise excitation signal and a filter to reproduce the noise components in the lower subband.
Abstract: An audio signal is decomposed into lower and upper sub-band and at least the noise component of the upper sub-band is encoded. At the decoder the audio signal is synthesised by a decoding means which utilises a synthesised noise excitation signal and a filter to reproduce the noise component in the upper sub-band.

160 citations


Patent
Chang-hwan Oh1
16 Oct 1998
TL;DR: In this paper, a multiple language text-to-speech (TTS) system was proposed, which is capable of processing a text expressed in multiple languages, and a text to-speech processing method.
Abstract: A multiple language text-to-speech (TTS) processing apparatus capable of processing a text expressed in multiple languages, and a multiple language text-to-speech processing method. The multiple language text-to-speech processing apparatus includes a multiple language processing portion receiving multiple language text and dividing the input text into sub-texts according to language and a text-to-speech engine portion having a plurality of text-to-speech engines, one for each language, for converting the sub-texts divided by the multiple language processing portion into audio wave data. The processing apparatus also includes an audio processor for converting the audio wave data converted by the text-to-speech engine portion into an analog audio signal, and a speaker for converting the analog audio signal converted by the audio processor into sound and outputting the sound. Thus, the text expressed in multiple languages, which is common in dictionaries or the Internet, can be properly converted into sound.

155 citations


Patent
11 Jun 1998
TL;DR: In this article, an information processing system has signal processors that are interconnected by processing junctions that simulate and extend biological neural networks. And the response of each processing junction is determined by internal junction processes and is continuously changed with temporal variation in the received signal.
Abstract: An information processing system having signal processors that are interconnected by processing junctions that simulate and extend biological neural networks. As shown in the figure, each processing junction receives signals from one signal processor and generates a new signal to another signal processor. The response of each processing junction is determined by internal junction processes and is continuously changed with temporal variation in the received signal. Different processing junctions connected to receive a common signal from a signal processor respond differently to produce different signals to downstream signal processors. This transforms a temporal pattern of a signal train of spikes into a spatio-temporal pattern of junction events and provides an exponential computational power to signal processors. Each signal processing junction can receive a feedback signal from a downstream signal processor so that an internal junction process can be adjusted to learn certain characteristics embedded in received signals.

153 citations


PatentDOI
TL;DR: A subband audio coder employs perfect/nonperfect reconstruction filters, predictive/non-predictive subband encoding, transient analysis, and psycho-acoustic/minimum mean-square error (mmse) bit allocation over time, frequency and the multiple audio channels to encode/decode a data stream to generate high fidelity reconstructed audio as mentioned in this paper.
Abstract: A subband audio coder employs perfect/non-perfect reconstruction filters, predictive/non-predictive subband encoding, transient analysis, and psycho-acoustic/minimum mean-square-error (mmse) bit allocation over time, frequency and the multiple audio channels to encode/decode a data stream to generate high fidelity reconstructed audio. The audio coder windows the multi-channel audio signal such that the frame size, i.e. number of bytes, is constrained to lie in a desired range, and formats the encoded data so that the individual subframes can be played back as they are received thereby reducing latency. Furthermore, the audio coder processes the baseband portion (0-24 kHz) of the audio band-width for sampling frequencies of 48 kHz and higher with the same encoding/decoding algorithm so that audio coder architecture is future compatible.

153 citations


Patent
Gilad Cohen1, Yossef Cohen1, Doron Hoffman1, Hagai Krupnik1, Aharon Satt1 
04 Mar 1998
TL;DR: In this paper, a method for adaptively switching between transform audio coder and CELP coder, which makes use of the superior performance of cELP coders for speech signal coding, while enjoying the benefits of transform coder for other audio signals.
Abstract: Apparatus is described for digitally encoding an input audio signal for storage or transmission. A distinguishing parameter is measure from the input signal. It is determined from the measured distinguishing parameter whether the input signal contains an audio signal of a first type or a second type. First and second coders are provided for digitally encoding the input signal using first and second coding methods respectively and a switching arrangement directs, at any particular time, the generation of an output signal by encoding the input signal using either the first or second coders according to whether the input signal contains an audio signal of the first type or the second type at that time. A method for adaptively switching between transform audio coder and CELP coder, is presented. In a preferred embodiment, the method makes use of the superior performance of CELP coders for speech signal coding, while enjoying the benefits of transform coder for other audio signals. The combined coder is designed to handle both speech and music and achieve an improved quality.

148 citations


Book
08 May 1998
TL;DR: This book fuses signal processing algorithms and VLSI circuit design to assist digital signal processing architecture developers and shows how this technique can be used in applications such as: signal transmission and storage, manufacturing process quality control and assurance, autonomous mobile system control and biomedical process analysis.
Abstract: From the Publisher: Digital Signal Processing is a rapidly expanding area for evaluation and development of efficient measures for representation, transformation and manipulation of signals. This book fuses signal processing algorithms and VLSI circuit design to assist digital signal processing architecture developers. The author also shows how this technique can be used in applications such as: signal transmission and storage, manufacturing process quality control and assurance, autonomous mobile system control and biomedical process analysis.

143 citations


Proceedings ArticleDOI
07 Dec 1998
TL;DR: A technique for classifying TV broadcast video using a hidden Markov model (HMM) using the clip-based features as observation vectors for discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts.
Abstract: This paper describes a technique for classifying TV broadcast video using a hidden Markov model (HMM). Here we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. Eight frame-based audio features are used to characterize the low-level audio properties, and fourteen clip-based audio features are extracted based on these frame-based features to characterize the high-level audio properties. For each type of these five TV programs, we build an ergodic HMM using the clip-based features as observation vectors. The maximum likelihood method is then used for classifying testing data using the trained models.

PatentDOI
TL;DR: In this article, a binaural digital hearing aid system comprises two hearing aid units (1, 2) for arrangement in a user's left and right ear, respectively, each unit comprises input signal transducer means (3r, 3l), A/D conversion means (4r, 4l), digital signal processing means (5r-13r, 5l-13l), D/A conversion mean (14r, 14l), and output signal transducers means (15r, 15l).
Abstract: A binaural digital hearing aid system comprises two hearing aid units (1, 2) for arrangement in a user's left and right ear, respectively. Each unit comprises input signal transducer means (3r, 3l), A/D conversion means (4r, 4l), digital signal processing means (5r-13r, 5l-13l), D/A conversion means (14r, 14l) and output signal transducer means (15r, 15l). A bi-directional communication link (7) is provided between the units. The digital signal processing means of each unit is arranged to affect a substantially full digital signal processing including individual processing of signals from the input transducer means of the actual unit and simulated processing of signals from the input transducer means of the other unit as well as binaural signal processing and includes at least a first digital signal processor part (5r, 5l) for processing said internally supplied signal, a second digital signal processor part (6l, 6r) for processing the signal supplied via said communication link (7) and a third digital signal processor part (9r, 9l) to effect common binaural digital signal processing of information derived from the signals processed in said first and second digital signal processor parts, said second digital signal processor part (6l, 6r) in each unit simulating the first digital signal processor part (5l, 5r) in the other unit with respect to adjustment parameters controlling the performance of said first signal processor part in said other unit.

Patent
Erik J. Gilbert1
18 Dec 1998
TL;DR: In this article, a method and apparatus for audio compensation is proposed to compensate for the timing irregularities caused by clock synchronization differences and/or routing changes, the present invention adjusts periods of silence in the digital audio data being output.
Abstract: A method and apparatus for audio compensation is disclosed. If audio input components and audio output components are not driven by a common clock (e.g., input and output systems are separated by a network, different clock signals in a single computer system), input and output sampling rates may differ. Also, network routing of the digital audio data may not be consistent. Both clock synchronization and routing considerations can affect the digital audio output. To compensate for the timing irregularities caused by clock synchronization differences and/or routing changes, the present invention adjusts periods of silence in the digital audio data being output. The present invention thereby provides an improved digital audio output.

Patent
04 Sep 1998
TL;DR: In this paper, the authors presented an audio enhancement apparatus and method which spectrally shapes harmonics of the low-frequency information in a pair of audio signals so that when reproduced by a loudspeaker, a listener perceives the loudspeaker as having more acoustic bandwidth than is actually provided by the speaker.
Abstract: The present invention provides an audio enhancement apparatus and method which spectrally shapes harmonics of the low-frequency information in a pair of audio signals so that when reproduced by a loudspeaker, a listener perceives the loudspeaker as having more acoustic bandwidth than is actually provided by the loudspeaker. The perception of extra bandwidth is particularly pronounced at low frequencies, especially frequencies at which the loudspeaker system produces less acoustic output energy. In one embodiment, the invention also shifts signal from one audio signal to the other audio signal in order to obtain more bandwidth for the available loudspeaker to reduce clipping. In one embodiment, the invention also provides a combined signal path for spectral shaping of the desired harmonics and a feedforward signal path for each pair of audio signals.

Patent
16 Apr 1998
TL;DR: In this article, a method for programming a digital hearing aid using a program encoded in an audio band (20 Hz - 20 kHz) signal, to transmit and verify programs and algorithm parameters is provided.
Abstract: A method is provided for programming a digital hearing aid using a program encoded in an audio band (20 Hz - 20 kHz) signal, to transmit and verify programs and algorithm parameters. Preferably, this is in a digital hearing aid including filterbanks, filtering the audio signal into different frequency bands. The signal is encoded by the presence and absence of a signal in each frequency band or by other well-known modulation techniques used by computer modems. Special programming signals are provided alternating between the frequency bands in a manner to clearly distinguish the program data from any other interfering or normally present audio signal. The method does not require additional hardware, and offers reduced power consumption, as compared to some known wireless programming interfaces. It enables remote programming over a network using standard multimedia computer hardware.

Patent
27 Mar 1998
TL;DR: In this article, a demultiplexer is used to decode MPEG video and MPEG audio data separated by a digital interface, which detects a flag indicating a discontinuity in a program from an input signal and a microcomputer performs control so as to initialize the buffer memories.
Abstract: A MPEG video decoder and MPEG audio decoder respectively decode MPEG video data and MPEG audio data separated by a demultiplexer. A digital interface sends and receives MPEG video data, MPEG audio data and supplementary data between the demultiplexer and external devices. The digital interface detects a flag indicating a discontinuity in a program from an input signal, and a microcomputer performs control so as to initialize the buffer memories.

Patent
17 Feb 1998
TL;DR: In this article, the authors propose a system that allows music synthesis and audio processing tasks to dynamically scale from a default processor to additional processors in a heterogeneous array of processors, in a manner transparent to the user.
Abstract: The present invention provides apparatus and methods which allow music synthesis and audio processing tasks to dynamically scale from a default processor to additional processors in a heterogeneous array of processors in a manner transparent to the user. A router running on one of the processors in the array knows or estimates the load on each processor, and dynamically allocates processing tasks based upon the current load on each processor and its capacity. Processing parameters are shared between all the audio processors to ensure that perceived audio quality is independent of where a task is running.

Proceedings ArticleDOI
04 Oct 1998
TL;DR: It is proposed to use audio information along with image and motion information to accomplish segmentation at different levels with promising results with videos digitized from TV programs.
Abstract: A video sequence usually consists of separate scenes, and each scene includes many shots. For video understanding purposes, it is most important to detect scene breaks. To analyze the content of each scene, detection of shot breaks is also required. Usually, a scene break is associated with a simultaneous change of image, motion, and audio characteristics, while a shot break is only accompanied with changes in image or motion or both. We propose to use audio information along with image and motion information to accomplish segmentation at different levels. Promising results have been obtained with videos digitized from TV programs.

Patent
06 Oct 1998
TL;DR: In this article, a system for mitigating intermittent interruptions in an audio radio broadcast system is provided where a primary radio signal and a redundant radio signal are transmitted from a transmitter subsystem (120) and received by a receiver subsystem (140).
Abstract: A system (100) for mitigating intermittent interruptions in an audio radio broadcast system is provided wherein a primary radio signal and a redundant radio signal are transmitted from a transmitter subsystem (120) and received by a receiver subsystem (140). The output (112) of an audio source (110) is coupled to a modulator (160) for modulating a radio frequency signal (162) for coupling to a transmit antenna (172). A second output (114) of audio source (110) is coupled to a delay circuit (116), for adding a predetermined time delay thereto. The delayed audio source signal is coupled to a modulator (164) for modulating a second radio frequency signal (166) that is also coupled to the transmit antenna (172). The receiver subsystem (140) receives both the primary radio signal and the delayed redundant radio signal and couples each to a respective demodulator (180, 182). At least demodulator (180) includes a circuit (181) for determining the degradation in the primary radio signal and provides a quality measurement output signal (186) to a blend control circuit (190). The recovered primary audio signal from demodulator (180) is coupled to a second delay circuit (184), the time delay of second delay circuit (184) being substantially equal to the time delay of delay circuit (116). The audio output from delay circuit (184) and the redundant audio output from demodulator (182) are coupled to a blending subsystem (135), wherein each is combined with a weighting factor and then combined together to form a composite audio signal for coupling to the audio output circuit (150).

Patent
12 Aug 1998
TL;DR: In this paper, the authors describe a communications terminal that adapts to different operating standards by using software algorithms and digital processing instead of physically dedicated hardware, such as physical hardware, software and software.
Abstract: A communications terminal adapts to different operating standards by using software algorithms and digital processing instead of physically dedicated hardware. The communications terminal includes digital processing circuits (103) having a digital signal processor and microprocessor circuits, volatile and non-volatile memory, signal characteristics stored in memory and receiver circuitry to receive and digitize a radio signal. The communications terminal receives the radio signal, converts the radio signal into a digital signal and compares the signal characteristics of the digital signal to signal characteriscs of stored signals. The comparison of the signals identifies the standard of the radio signal and determines the format and protocol of the signal. The hardware is then reconfigured to operate according to the identified standard, format and protocol.

PatentDOI
TL;DR: In this article, a fast Fourier transform of the input signal is generated, to allow processing in the frequency domain, and the output signal is then provided to the listener with appropriate amplification to insure audible speech across the usable frequency range.
Abstract: Apparatus and methods for audio compression and frequency shifting retain the spectral shape of an audio input signal while compressing and shifting its frequency. The fast Fourier transform of the input signal is generated, to allow processing in the frequency domain. The input audio signal is divided into small time segments, and each is subjected to frequency analysis. Frequency processing includes compression and optional frequency shifting. The inverse fast Fourier transform function is performed on the compressed and frequency shifted spectrum, to compose an output audio signal, equal in duration to the original signal. The output signal is then provided to the listener with appropriate amplification to insure audible speech across the usable frequency range.

Patent
04 Mar 1998
TL;DR: In this paper, a method and apparatus for sample accurate parameter update management in digital audio recording/playback is presented, where a digital signal processor, memory, recording controller, and appropriate interfacing facilitate storage and retrieval of digital audio data samples continuously to achieve substantially seamless flow and sample accurate audio state updated at precisely designated times, all as required in such critical synchronizing applications as film dubbing.
Abstract: A method and apparatus for sample accurate parameter update management in digital audio recording/playback wherein a digital signal processor, memory, recording controller, and appropriate interfacing facilitate storage and retrieval of digital audio data samples continuously to achieve substantially seamless flow and sample accurate audio state updated at precisely designated times, all as required in such critical synchronizing applications as film dubbing.

Patent
05 Mar 1998
TL;DR: In this paper, a system and method for automatically adjusting characteristics of a television receiver, such as the video and audio settings, based on characteristics of the program being viewed, is presented.
Abstract: A system and method for automatically adjusting characteristics of a television receiver, such as the video and audio settings, based on characteristics of the program being viewed. The system accesses (16) a pre-defined list of program topics and themes stored in a television database. For each topic and theme, settings for picture quality such as contrast, color and brightness and settings for audio such as audio processor type, bass, and treble are stored. When the system is able to match (20) the currently viewed program with one from the database the audio and video settings are automatically adjusted (22) for that program. If the viewer switches to another program or if one program ends and another begins then the acquisition and adjustment process is repeated. The system can be disabled by the viewer if desired.

Patent
09 Mar 1998
TL;DR: In this article, a television converter uses digital signal processing (DSP) to provide compatibility with different television standards including NTSC and PAL video standards and FM, BTSC, DIN, Home Theatre, NICAM and independent digital audio standards.
Abstract: A television converter uses digital signal processing (DSP) to provide compatibility with different television standards including NTSC and PAL video standards and FM, BTSC, DIN, Home Theatre, NICAM and independent digital audio standards. Audio processing is accomplished without passing the audio through a Nyquist filter used for video. This eliminates AM to PM conversion improving luminance linearity and differential gain and phase. It also prevents video information from phase modulating the audio intercarrier, thereby eliminating video "buzz" components in the audio. The audio processing includes a synchronous FM demodulator and a separate synchronous FM/QPSK demodulator for handling the different audio standards. Handling historical analog TV standards with DSP also enables the advantageous combination of analog and digital television reception within a single digital VLSI ASIC.

Patent
30 Apr 1998
TL;DR: In this article, an audio data playback clock signal generating apparatus for a digital VCR synchronizes an audio frame playback synchronization signal with a video frame synchronization signal when audio data recorded in a magnetic tape is played back.
Abstract: An audio data playback clock signal generating apparatus of a digital VCR synchronizes an audio frame playback synchronization signal with a video frame synchronization signal when an audio data recorded in a magnetic tape is played back based on a phase error of an audio data frame size AF-SIZE and an audio data playback clock signal of the previous video frame. The audio data playback clock signal generating apparatus comprises an audio data frame size decoder, a low pass filter, a voltage-controlled oscillator, and phase error generating block and can be implemented by a large scale integration(LSI) to thereby provide a non-complex circuit and an accurate synchronization between the audio frame playback synchronization signal and the video frame synchronization signal.

Patent
01 Jun 1998
TL;DR: In this paper, a digital audio/video archive and distribution system and method utilize a signal capture and compression encoding subsystem to translate an analog signal from at least one media source (i.e., a digital signal segment is automatically correlated with identifying information via an Internet connection, and both are stored in a searchable database subsystem.
Abstract: A digital audio/video archive and distribution system (10) and method utilize a signal capture and compression encoding subsystem (12) to translate an analog signal from at least one media source (14) into at least one digital signal segment. The digital signal segment is automatically correlated with identifying information (18) input via an Internet connection, and both are stored in a searchable database subsystem (24). Upon a user request (28) received via an Internet connection, or programming prompt, a graphical user interface (GUI), such as a web-based user interface, is created to distribute one or more of the stored digital signal segments.

Patent
Karl J. Kuhn1, John M. Zetts1
29 Dec 1998
TL;DR: In this paper, the A/V test set signal generator includes a Video Blanking Interval (VBI) test signal generator and a white noise generator, the former injecting a marker into the video signal and the later injecting an audio marker into audio signal.
Abstract: An apparatus and method provide non-intrusive in-service testing of audio/video synchronization testing without using traditional audio marker tones. The network includes an A/V synchronous test signal generator which injects video and audio markers into the video and audio non-intrusively and routes the two signals into a switch where they are switched into a channel for encoding and transmission via the ATM network. At the distant end the signal is decoded and routed by a switch into the A/V test generator and measurement set where the markers are detected and the A/V skew calculated, after which the audio and video are routed to the subscriber. The A/V test set signal generator includes a Video Blanking Interval (VBI) test signal generator and a white noise generator, the former injecting a marker into the video signal and the later injecting an audio marker into the audio signal. The video marker is injected into the VBI and broadband, background audio noise to measure the delay between the audio and video components of a broadcast. The marking of the audio is accomplished by gradually injecting white noise into the audio channel until the noise level is 6 dB above the noise floor of the audio receiver. As a precursor A/V sync signal, a small spectrum of the white noise is notched or removed. This signature precludes inadvertent recognition of program audio noise as the audio marker.

Patent
17 Jul 1998
TL;DR: In this paper, an improved information processing apparatus, which accomplishes power-saving of an audio amplifier depending on the activity of each peripheral device having an audio signal output, is presented.
Abstract: An improved information processing apparatus, which accomplishes power-saving of an audio amplifier depending on the activity of each peripheral device having an audio signal output. This information processing apparatus includes one or more peripheral devices, each having an audio signal output and a mute signal output, the mute signal output indicating a mute state in which the targeted peripheral device does not output an audio signal, an audio amplifier for receiving the audio signal from each of the one or more peripheral device, a speaker for generating an audible output in accordance with an output of said audio amplifier; and an AND gate for receiving the mute signal from each of the one or more peripheral devices to perform a logical AND operation of the mute signals, the AND gate outputting a control signal for disabling the audio amplifier when all of the mute signals indicate the mute state. Thus, it is determined whether or not each of the peripheral devices for outputting an audio signal stays at the mute state, and the audio amplifier is disabled only when all of these peripheral devices are at the mute state.

Proceedings ArticleDOI
04 Oct 1998
TL;DR: A novel technique, which uses a joint audio- visual analysis for scene identification and characterization, and an outlook on a possible implementation strategy for the overall scene identification task is suggested, and validated through a series of experimental simulations on real audio-visual data.
Abstract: A novel technique, which uses a joint audio-visual analysis for scene identification and characterization, is proposed. The paper defines four different scene types: dialogues, stories, actions, and generic scenes. It then explains how any audio-visual material can be decomposed into a series of scenes obeying the previous classification, by properly analyzing and then combining the underlying audio and visual information. A rule-based procedure is defined for such purpose. Before such rule-based decision can take place, a series of low-level pre-processing tasks are suggested to adequately measure audio and visual correlations. As far as visual information is concerned, it is proposed to measure the similarities between non-consecutive shots using a learning vector quantization approach. An outlook on a possible implementation strategy for the overall scene identification task is suggested, and validated through a series of experimental simulations on real audio-visual data.

Patent
14 Oct 1998
TL;DR: In this paper, a record controller is coupled to the video switch, the graphics processing module and the audio processing module to determine a focus video source from among the physical video input, the graphical processing module, and the remote source interface responsive to receiving the event information.
Abstract: A videoconference system includes a video switch for selecting focus video information, a physical video input node coupled to provide physical video information to the video switch, a graphics processing module coupled to provide graphical video information to the video switch, and a remote source interface coupled to provide remote video information to the video switch. The videoconference system further includes an audio processing module for processing audio information. A record controller is coupled to the video switch, the graphics processing module and the audio processing module. The record controller is coupled to receive event information from the audio processing module and the graphics processing module. The record controller automatically determines a focus video source from among the physical video input, the graphics processing module and the remote source interface responsive to receiving the event information. The record controller controls the video switch to couple the focus video source to a video switch output responsive to determining the focus video source.