Showing papers by "Kazuya Takeda published in 2008"

PDF

Open Access

Book•DOI•

In-Vehicle Corpus and Signal Processing for Driver Behavior

[...]

Kazuya Takeda¹, Hakan Erdogan, John H. L. Hansen, Huseyin Abut²•Institutions (2)

Nagoya University¹, San Diego State University²

04 Dec 2008

TL;DR: In-Vehicle Corpus and Signal Processing for Driver Behavior (inCARS 2007) as discussed by the authors is a collection of expanded papers from the third biennial DSPinCARs held in Istanbul in June 2007.

...read moreread less

Abstract: In-Vehicle Corpus and Signal Processing for Driver Behavior is comprised of expanded papers from the third biennial DSPinCARS held in Istanbul in June 2007. The goal is to bring together scholars working on the latest techniques, standards, and emerging deployment on this central field of living at the age of wireless communications, smart vehicles, and human-machine-assisted safer and comfortable driving. Topics covered in this book include: improved vehicle safety; safe driver assistance systems; smart vehicles; wireless LAN-based vehicular location information processing; EEG emotion recognition systems; and new methods for predicting driving actions using driving signals. In-Vehicle Corpus and Signal Processing for Driver Behavior is appropriate for researchers, engineers, and professionals working in signal processing technologies, next generation vehicle design, and networks for mobile platforms.

...read moreread less

57 citations

Proceedings Article•DOI•

CENSREC-4: Development of Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments

[...]

Masato Nakayama¹, Takanobu Nishiura¹, Yuki Denda¹, Norihide Kitaoka², Kazumasa Yamamoto³, Takeshi Yamada⁴, Satoru Tsuge⁵, Chiyomi Miyajima², Masakiyo Fujimoto⁶, Tetsuya Takiguchi⁷, Satoshi Tamura⁸, Tetsuji Ogawa, Shigeki Matsuda, Shingo Kuroiwa⁹, Kazuya Takeda², Satoshi Nakamura - Show less +12 more•Institutions (9)

Ritsumeikan University¹, Nagoya University², Toyohashi University of Technology³, University of Tsukuba⁴, University of Tokushima⁵, Nippon Telegraph and Telephone⁶, Kobe University⁷, Gifu University⁸, Chiba University⁹

22 Sep 2008

TL;DR: The results of evaluation experiments proved that CENSREC-4 is an effective database for evaluating the new dereverberation method because the traditional dereVerberation process had difficulty sufficiently improving the recognition performance.

...read moreread less

Abstract: In this paper, we newly introduce a collection of databases and evaluation tools called CENSREC-4, which is an evaluation framework for distant-talking speech under hands-free conditions. Distant-talking speech recognition is crucial for a handsfree speech interface. Therefore, we measured room impulse responses to investigate reverberant speech recognition in various environments. The data contained in CENSREC-4 are connected digit utterances, as in CENSREC-1. Two subsets are included in the data: basic data sets and extra data sets. The basic data sets are used for the evaluation environment for the room impulse response-convolved speech data. The extra data sets consist of simulated and recorded data. An evaluation framework is only provided for the basic data sets as evaluation tools. The results of evaluation experiments proved that CENSREC-4 is an effective database for evaluating the new dereverberation method because the traditional dereverberation process had difficulty sufficiently improving the recognition performance. Index Terms: Various environments, Impulse response, Convolution, Real recorded data, Evaluation framework

...read moreread less

28 citations

Proceedings Article•DOI•

Encoding large array signals into a 3D sound field representation for selective listening point audio based on blind source separation

[...]

Kenta Niwa¹, Takanori Nishino¹, Kazuya Takeda¹•Institutions (1)

Nagoya University¹

12 May 2008

TL;DR: Subjective evaluation shows that there is no significant difference between natural and reconstructed sound when more than 6 virtual sources are used, and the effectiveness of the encoding algorithm as well as the virtual source representation is confirmed.

...read moreread less

Abstract: A sound field reproduction method which uses blind source separation and head-related transfer function is proposed. In the proposed system, multichannel acoustic signals captured at the distant microphones are encoded to a set of location/signal pairs of virtual sound sources based on frequency-domain ICA. After estimating the locations and the signals of the virtual sources, by convolving the controlled acoustic transfer functions with each signal, the spatial sound at the selected point is constructed. In the evaluation, the sound field made by 6 sound sources is captured using 48 distant microphones and is encoded into set of virtual sound sources. Subjective evaluation shows that there is no significant difference between natural and reconstructed sound when more than 6 virtual sources are used. Therefore the effectiveness of the encoding algorithm as well as the virtual source representation is confirmed.

...read moreread less

26 citations

Proceedings Article•DOI•

Generating lane-change trajectories of individual drivers

[...]

Yoshihiro Nishiwaki¹, Chiyomi Miyajima¹, Norihide Kitaoka¹, Ryuta Terashima², Toshihiro Wakita², Kazuya Takeda¹ - Show less +2 more•Institutions (2)

Nagoya University¹, Toyota²

10 Oct 2008

TL;DR: A statistical driver model is proposed that assumes that a driver plans various vehicle trajectories depending on the surrounding vehicles and then selects a safe and comfortable trajectory, which is then generated from the HMM.

...read moreread less

Abstract: This paper describes a method to generate vehicle trajectories of lane change paths for individual drivers. Although each driver has a consistent preferance in the lane change behavior, lane-changing time and vehicle trajectory are uncertain due to the presence of surrounding vehicles. To model this uncertainty, we propose a statistical driver model. We assume that a driver plans various vehicle trajectories depending on the surrounding vehicles and then selects a safe and comfortable trajectory. Lane change patterns of each driver are modeled with a hidden Markov model (HMM), which is trained using longitudinal vehicle velocity, lateral vehicle position, and their dynamic features. Vehicle trajectories are generated from the HMM in a maximum likelihood criterion at random lane-changing time and state duration. Experimental results show that vehicle trajectories generated from the HMM included a similar trajectory to that of a target driver.

...read moreread less

21 citations

Proceedings Article•DOI•

Multi-modal real-world driving data collection, transcription, and integration using Bayesian Network

[...]

Lucas Malta¹, Pongtep Angkititrakul², Chiyomi Miyajima¹, Kazuya Takeda¹•Institutions (2)

Nagoya University¹, University of Texas at Dallas²

04 Jun 2008

TL;DR: This paper proposes a transcription protocol based on six major groups: driver mental state, driver actions, driverpsilas secondary task, driving environment, vehicle status, and speech/background noise, and integrates transcriptions, driving behavior, and physiological signals using a Bayesian network.

...read moreread less

Abstract: In this paper we present our on-going data collection of multi-modal real-world driving. Video, speech, driving behavior, and physiological signals from 150 drivers have already been collected. To provide a more meaningful description of the collected data, we propose a transcription protocol based on six major groups: driver mental state, driver actions, driverpsilas secondary task, driving environment, vehicle status, and speech/background noise. Data from 30 drivers are transcribed. We then show how transcription reliability can be improved by properly training annotators. Finally, we integrate transcriptions, driving behavior, and physiological signals using a Bayesian network for estimating a driverpsilas level of irritation. Estimations are compared to actual values, assessed by the drivers themselves. Preliminary results are very encouraging.

...read moreread less

18 citations

Proceedings Article•

Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments: newest Part of the CENSREC Series -

[...]

Takanobu Nishiura¹, Masato Nakayama, Yuki Denda¹, Norihide Kitaoka², Kazumasa Yamamoto, Takeshi Yamada, Satoru Tsuge³, Chiyomi Miyajima⁴, Masakiyo Fujimoto⁵, Tetsuya Takiguchi⁶, Satoshi Tamura⁶, Shingo Kuroiwa⁷, Kazuya Takeda⁸, Satoshi Nakamura⁹ - Show less +10 more•Institutions (9)

Ritsumeikan University¹, Nagoya University², Toyohashi University of Technology³, University of Tsukuba⁴, University of Tokushima⁵, Kobe University⁶, Gifu University⁷, Chiba University⁸, National Institute of Information and Communications Technology⁹

01 May 2008

TL;DR: The results of evaluation experiments proved that CENSREC-4 is an effective database suitable for evaluating the new dereverberation method because the traditional dereVerberation process had difficulty sufficiently improving the recognition performance.

...read moreread less

Abstract: Recently, speech recognition performance has been drastically improved by statistical methods and huge speech databases. Now performance improvement under such realistic environments as noisy conditions is being focused on. Since October 2001, we from the working group of the Information Processing Society in Japan have been working on evaluation methodologies and frameworks for Japanese noisy speech recognition. We have released frameworks including databases and evaluation tools called CENSREC-1 (Corpus and Environment for Noisy Speech RECognition 1; formerly AURORA-2J), CENSREC-2 (in-car connected digits recognition), CENSREC-3 (in-car isolated word recognition), and CENSREC-1-C (voice activity detection under noisy conditions). In this paper, we newly introduce a collection of databases and evaluation tools named CENSREC-4, which is an evaluation framework for distant-talking speech under hands-free conditions. Distant-talking speech recognition is crucial for a hands-free speech interface. Therefore, we measured room impulse responses to investigate reverberant speech recognition. The results of evaluation experiments proved that CENSREC-4 is an effective database suitable for evaluating the new dereverberation method because the traditional dereverberation process had difficulty sufficiently improving the recognition performance. The framework was released in March 2008, and many studies are being conducted with it in Japan.

...read moreread less

17 citations

Proceedings Article•DOI•

3DAV integrated system featuring arbitrary listening-point and viewpoint generation

[...]

Mehrdad Panahpour Tehrani, Kenta Niwa¹, Norishige Fukushima¹, Yasushi Hirano¹, Toshiaki Fujii¹, Masayuki Tanimoto¹, Kazuya Takeda¹, Kenji Mase¹, Akio Ishikawa, Shigeyuki Sakazawa, Atsushi Koike - Show less +7 more•Institutions (1)

Nagoya University¹

05 Nov 2008

TL;DR: Two novel methods for arbitrary listening-point generation for 3D audio-video (3DAV) integration in a large-scale multipoint cameras and microphones system with abilities to process, and display information of any recorded 3D scene in realtime are proposed.

...read moreread less

Abstract: In this paper, we propose two novel methods for arbitrary listening-point generation for 3D audio-video (3DAV) integration in a large-scale multipoint cameras and microphones system with abilities to process, and display information of any recorded 3D scene in realtime. With this system, users are able to control their own viewpoint/listening-point position, freely. Arbitrary listening-point can be generated by either (i) ray-space representation of sound wave field (i.e. source sound independent) for multi frequency layers, or (ii) acoustic transfer function estimation (i.e. source sound dependent) and blind separation of sources of sounds. Arbitrary viewpoint generation is based on ray-space method, which is enhanced by using multipass dynamic programming for geometry compensation. Integration is done by either (i) ray-space representation of sound wave and image together, or (ii) integrating each camera video signal and acoustic transfer function of the same location as integrated 3DAV data. The prototype system of integrated audio-visual viewer achieves both good image and sound qualities with 15 frames/second.

...read moreread less

13 citations

Proceedings Article•DOI•

An integrative recognition method for speech and gestures

[...]

Madoka Miki¹, Chiyomi Miyajima¹, Takanori Nishino¹, Norihide Kitaoka¹, Kazuya Takeda¹ - Show less +1 more•Institutions (1)

Nagoya University¹

20 Oct 2008

TL;DR: The probability distribution of the time gap between the starting times of an utterance and gestures is proposed and an integrative recognition method of speech accompanied with gestures such as pointing is proposed.

...read moreread less

Abstract: We propose an integrative recognition method of speech accompanied with gestures such as pointing. Simultaneously generated speech and pointing complementarily help the recognition of both, and thus the integration of these multiple modalities may improve recognition performance. As an example of such multimodal speech, we selected the explanation of a geometry problem. While the problem was being solved, speech and fingertip movements were recorded with a close-talking microphone and a 3D position sensor. To find the correspondence between utterance and gestures, we propose probability distribution of the time gap between the starting times of an utterance and gestures. We also propose an integrative recognition method using this distribution. We obtained approximately 3-point improvement for both speech and fingertip movement recognition performance with this method.

...read moreread less

8 citations

Proceedings Article•

In-car speech data collection along with various multimodal signals

[...]

Akira Ozaki¹, Sunao Hara¹, Takashi Kusakawa¹, Chiyomi Miyajima², Takanori Nishino¹, Norihide Kitaoka¹, Katunobu Itou, Kazuya Takeda¹ - Show less +4 more•Institutions (2)

Nagoya University¹, University of Tsukuba²

01 May 2008

TL;DR: It is found that drivers tended to use longer and faster utterances with more fillers to talk with humans than machines when comparing utterance length, speaking rate, and the filler rate of driver utterances in human-human and human-machine dialogs.

...read moreread less

Abstract: In this paper, a large-scale real-world speech database is introduced along with other multimedia driving data. We designed a data collection vehicle equipped with various sensors to synchronously record twelve-channel speech, three-channel video, driving behavior including gas and brake pedal pressures, steering angles, and vehicle velocities, physiological signals including driver heart rate, skin conductance, and emotion-based sweating on the palms and soles, etc. These multimodal data are collected while driving on city streets and expressways under four different driving task conditions including two kinds of monologues, human-human dialog, and human-machine dialog. We investigated the response timing of drivers against navigator utterances and found that most overlapped with the preceding utterance due to the task characteristics and the features of Japanese. When comparing utterance length, speaking rate, and the filler rate of driver utterances in human-human and human-machine dialogs, we found that drivers tended to use longer and faster utterances with more fillers to talk with humans than machines.

...read moreread less

7 citations

Proceedings Article•DOI•

Building and combining document and music spaces for music query-by-webpage system.

[...]

Ryoei Takahashi¹, Yasunori Ohishi¹, Norihide Kitaoka¹, Kazuya Takeda¹•Institutions (1)

Nagoya University¹

22 Sep 2008

TL;DR: The key idea of the proposed system is to train a linear transformation between document and music spaces so that query documents can be mapped onto a music space in which similarities based on acoustic characteristics is represented.

...read moreread less

Abstract: Building and combining document and music spaces of songs are discussed for a new music recommendation application, which uses commonly read texts such as Web log as query input. The most important application of this flexible recommendation system is its music query-by-Webpage, from which a song that appropriately matches Webpage is automatically played. The key idea of the proposed system is to train a linear transformation between document and music spaces so that query documents can be mapped onto a music space in which similarities based on acoustic characteristics is represented. The basic system has been trained using 2,650 pairs of song and review texts. Through experimental evaluations, we show the effectiveness of the system, which is three times better than the previous system. Web text as a training corpus and a bigram representation for the document vector are also investigated for the purpose of improving the system, and their effectiveness is also confirmed.

...read moreread less

6 citations

Journal Article•DOI•

[...]

Takashi Nakado¹, Takanori Nishino¹, Kazuya Takeda¹•Institutions (1)

Nagoya University¹

01 Sep 2008-Acoustical Science and Technology

Proceedings Article•DOI•

Abrupt Steering Detection Based on the Road Construction Ordinance and Vehicle Acceleration Captured with Drive Recorders

[...]

Hideomi Amata¹, Chiyomi Miyajima¹, Akira Ozaki¹, Takanori Nishino, Norihide Kitaoka², Kazuya Takeda¹ - Show less +2 more•Institutions (2)

Nagoya University¹, Toyohashi University of Technology²

18 Jun 2008

TL;DR: Experimental results show that the percentages of risky steering operations estimated for individual drivers correlate with driver risk evaluation scores given by a risk consulting expert.

...read moreread less

Abstract: Risky steering operations are detected based on the relationship between the radius of road curvature and road design speed defined in the road construction ordinance. Vehicle motion while steering is approximated as a circular motion, and the vehicle trajectory radius is estimated from lateral acceleration and vehicle velocity captured with a drive recorder based on a circular motion equation. Steering operation behaviors are evaluated for 203 drivers. Experimental results show that the percentages of risky steering operations estimated for individual drivers correlate with driver risk evaluation scores given by a risk consulting expert. We also observed situations of risky steering by recording video along with driving data using a data collection vehicle.

...read moreread less

Proceedings Article•

Binaural sound localization for untrained directions based on a Gaussian mixture model

[...]

Takanori Nishino¹, Kazuya Takeda¹•Institutions (1)

Nagoya University¹

01 Aug 2008

TL;DR: The proposed estimation methods for all sound source directions on the horizontal plane based on a Gaussian mixture model (GMM) using binaural signals indicate that the proposed method can estimate all sound sources directions with a small amount of known information.

...read moreread less

Abstract: We propose and evaluate estimation methods for all sound source directions on the horizontal plane based on a Gaussian mixture model (GMM) using binaural signals. An estimation method based on GMM can estimate a sound source direction on which GMM has already been trained; however, it cannot estimate without a model that corresponds to a sound source direction. Three methods with interpolation techniques are investigated. Two generate GMMs for all directions by interpolating an acoustic transfer function or statistical values of GMM, and the other calculates the posterior probability for all directions with a limited number of GMMs. In our experiments, we investigated six interval conditions. From the results, the interpolation methods of an acoustic transfer function and the statistical values of GMM achieve better performance. Although there were 12 trained GMMs for the 30° intervals, the interpolation method of the statistical values of GMMestimated 62.5%accuracy (45/72) with 2.8° estimation error. These results indicate that the proposed method can estimate all sound source directions with a small amount of known information.

...read moreread less

Proceedings Article•DOI•

Parameter estimation method of F0 control model for singing voices.

[...]

Yasunori Ohishi¹, Hirokazu Kameoka², Kunio Kashino³, Kazuya Takeda¹•Institutions (3)

Nagoya University¹, University of Tokyo², Simon Fraser University³

22 Sep 2008

TL;DR: A novel representation of F0 contours is proposed that provides a computationally efficient algorithm for automatically estimating the parameters of a F0 control model for singing voices and can identify both the target musical note sequence and the dynamics of singing behaviors included in the F1 contours.

...read moreread less

Abstract: In this paper, we propose a novel representation of F0 contours that provides a computationally efficient algorithm for automatically estimating the parameters of a F0 control model for singing voices. Although the best known F0 control model, based on a second-order system with a piece-wise constant function as its input, can generate F0 contours of natural singing voices, this model has no means of learning the model parameters from observed F0 contours automatically. Therefore, by modeling the piece-wise constant function by Hidden Markov Models (HMM) and approximating the second order differential equation by the difference equation, we estimate model parameters optimally based on iteration of Viterbi training and an LPC-like solver. Our representation is a generative model and can identify both the target musical note sequence and the dynamics of singing behaviors included in the F0 contours. Our experimental results show that the proposed method can separate the dynamics from the target musical note sequence and generate the F0 contours using estimated model parameters.

...read moreread less

Data collection and usability study of a PC-based speech application in various user environments

[...]

Sunao Hara, Chiyomi Miyajima, Katsunobu Ito, Norihide Kitaoka, Kazuya Takeda - Show less +1 more

25 Nov 2008

TL;DR: A Bayesian network based stochastic model was built that predicted the subjective score of system usability from personal profiles and several objective metrics and showed that each user’s satisfaction index could be predicted for 35.2% of the subjects using the trained Bayesiannetwork.

...read moreread less

Abstract: As the spread of voice communication tools by Internet continues to spread, people have more chances to use microphones on their private PCs in various acoustic environments. In the case of using PC-based speech input application, such a variety of environments will cause speech recognition performance degradation. To improve speech recognition accuracy, it is crucial to collect speech data in the environment in which the system is used[1]. We collected speech interactions with PC-based applications in a wide range of user environments through a field test, and have obtained 488 hours of recorded data including 29 hours of speech segments, corresponding to about sixty thousand utterances. In addition to collecting data, we assessed system usability by a questionnaire that asked about system usability and the subjective impression of the speech recognition performance. Using the system data log and the questionnaire results, we analyzed the relationship among subjective performance and objective metrics. Through analysis, a Bayesian network based stochastic model was built that predicted the subjective score of system usability from personal profiles and several objective metrics. Results of experiments showed that each user’s satisfaction index could be predicted for 35.2% of the subjects using the trained Bayesian network.

...read moreread less

Journal Article•DOI•

Measurements of head‐related transfer function in sagittal and frontal coordinates

[...]

Takashi Nakado, Takanori Nishino, Kazuya Takeda

09 May 2008-Journal of the Acoustical Society of America

TL;DR: In this paper, the authors measured HRTFs for about 2,300 directions in sagittal and frontal coordinates and constructed a database of head-related transfer functions (HRTFs).

...read moreread less

Abstract: 3D sounds can be generated by using a head‐related transfer function (HRTF), which is defined as the acoustic transfer function between a sound source and the entrance to the ear canal. Since HRTF depends on a subject and the sound source direction, many HRTF measurements were conducted. In most case, HRTFs were measured in horizontal coordinates. However, HRTF measurements in other coordinates are also useful. In previous researches, HRTFs measured in sagittal coordinates were used to investigate the relation between spectral cues and vertical angle perception. Although HRTF measurement in frontal coordinates is rarely conducted, there is an advantage to measure HRTFs densely in the front and rear where sound localizations are very sensitive. Therefore, we measured HRTFs for about 2,300 directions in sagittal and frontal coordinates and constructed a database. The measurements were conducted in a soundproof chamber with two head‐and‐torso simulators (B&K 4128 and KEMAR). The HRTF database can be downloaded at http://www.sp.m.is.nagoya‐u.ac.jp/HRTF/ .

...read moreread less

Journal Article•DOI•

Estimation of speaker and listener positions in a car using binaural signals

[...]

Madoka Takimoto¹, Takanori Nishino¹, Hiroyuki Hoshino¹, Hiroyuki Hoshino², Kazuya Takeda¹ - Show less +1 more•Institutions (2)

Nagoya University¹, Toyota²

01 Jan 2008-Acoustical Science and Technology

Journal Article•DOI•

Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation

[...]

Tran Huy Dat¹, Kazuya Takeda², Fumitada Itakura³•Institutions (3)

Institute for Infocomm Research Singapore¹, Nagoya University², Meijo University³

01 Mar 2008-The IEICE transactions on information and systems

TL;DR: A multichannel speech enhancement method based on MAP speech spectral magnitude estimation using a generalized gamma model of speech prior distribution, where the model parameters are adapted from actual noisy speech in a frame-by-frame manner, resulting in better performance of speech enhancement algorithm.

...read moreread less

Abstract: We present a multichannel speech enhancement method based on MAP speech spectral magnitude estimation using a generalized gamma model of speech prior distribution, where the model parameters are adapted from actual noisy speech in a frame-by-frame manner. The utilization of a more general prior distribution with its online adaptive estimation is shown to be effective for speech spectral estimation in noisy environments. Furthermore, the multi-channel information in terms of cross-channel statistics are shown to be useful to better adapt the prior distribution parameters to the actual observation, resulting in better performance of speech enhancement algorithm. We tested the proposed algorithm in an in-car speech database and obtained significant improvements of the speech recognition performance, particularly under non-stationary noise conditions such as music, air-conditioner and open window.

...read moreread less

CENSREC-AV: Evaluation frameworks for audio-visual speech recognition

[...]

Satoshi Tamura¹, Chiyomi Miyajima², Norihide Kitaoka², Satoru Hayamizu¹, Kazuya Takeda² - Show less +1 more•Institutions (2)

Gifu University¹, Nagoya University²

01 Jan 2008

TL;DR: New evaluation frameworks for bimodal speech recognition in noisy conditions and real environments are introduced and a baseline method and its recognition results will be also provided with these corpora.

...read moreread less

Abstract: This paper introduces incoming evaluation frameworks for bimodal speech recognition in noisy conditions and real environments. In order to develop a robust speech recognition in noisy environments, bimodal speech recognition which uses acoustic and visual information has been paid attention to particularly for this decade. As a lot of methods and techniques for bimodal speech recognition have been proposed, a common evaluation framework, including audio-visual speech data and baseline system, is needed to estimate and compare these techniques and bimodal speech recognition schemes. Audio-visual evaluation frameworks, CENSREC-1-AVand CENSREC-2-AV,have been being built by the CENSREC project in Japan; CENSREC1-AV includes artificially noise-added waveforms and image sequences, whereas CENSREC-2-AV consists of audio-visual data recorded in in-car environments. A baseline method and its recognition results will be also provided with these corpora. Index Terms: evaluation framework, audio-visual speech corpus, bimodal speech recognition, noisy environments.

...read moreread less

Journal Article•DOI•

An information retrieval system for telephone dialogue in load dispatch center

[...]

Osamu Segawa, Kazuya Takeda¹•Institutions (1)

Nagoya University¹

01 Feb 2008-Electrical Engineering in Japan

TL;DR: An information retrieval system for telephone dialogue in a load dispatch center gives a solution for the task and realizes an information retrieval function with any keywords and is verified by telephone dialogue transcription and information retrieval experiments.

...read moreread less

Abstract: We have developed an information retrieval system for telephone dialogue in a load dispatch center. In load dispatching operations, the needs for recording and information retrieval of a telephone dialogue are high. The proposed system gives a solution for the task and realizes an information retrieval function with any keywords. The effectiveness of the system is verified by telephone dialogue transcription and information retrieval experiments. With 30 telephone dialogues in a load dispatch center, we obtain 59.5% in average word correct and 44.4% in average word accuracy. In the information retrieval experiment, with 20 keywords, we obtain 87.3% in average precision and 67.2% in average recall. © 2007 Wiley Periodicals, Inc. Electr Eng Jpn, 162(3): 44– 50, 2008; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/eej.20402

...read moreread less

Journal Article•DOI•

Comparison of acoustic measures for evaluating speech recognition performance in an automobile

[...]

Hiroyuki Hoshino¹, Hiroyuki Hoshino², Toshihiro Wakita¹, Kazuya Takeda²•Institutions (2)

Toyota¹, Nagoya University²

01 May 2008-Acoustical Science and Technology