scispace - formally typeset
Search or ask a question

Showing papers by "Kazuya Takeda published in 2012"


Journal ArticleDOI
TL;DR: The proposed self-coaching system establishes a cooperative relationship between the driver, the vehicle, and the driving environment, leading to the development of the next generation of safety systems and paving the way for an alternative form of driving education that could further reduce the number of fatal accidents.
Abstract: This paper describes the development of a self-coaching system to improve driving behavior by allowing drivers to review a record of their own driving activity. By employing stochastic driver-behavior modeling, the proposed system is able to detect a wide range of potentially hazardous situations, which conventional event data recorders are not able to capture, including those involving latent risks, of which drivers themselves are unaware. By utilizing these automatically detected hazardous situations, our web-based system offers a user-friendly interface for drivers to navigate and review each hazardous situation in detail (e.g., driving scenes are categorized into different types of hazardous situations and are displayed with corresponding multimodal driving signals). Furthermore, the system provides feedback on each risky driving behavior and suggests how users can safely respond to such situations. The proposed system establishes a cooperative relationship between the driver, the vehicle, and the driving environment, leading to the development of the next generation of safety systems and paving the way for an alternative form of driving education that could further reduce the number of fatal accidents. The system's potential benefits are demonstrated through preliminary extensive evaluation of an on-road experiment, showing that safe-driving behavior can be significantly improved when drivers use the proposed system.

43 citations


Proceedings ArticleDOI
25 Oct 2012
TL;DR: There were individual differences in gaze behaviors, and that expert drivers showed a higher degree of awareness than non-expert drivers.
Abstract: We investigated a method to estimate the degree of driver awareness of surrounding vehicles based on the correlation between driver gaze direction and the risks caused by surrounding vehicles. The risks posed by surrounding vehicles were represented by their time to collision (TTC) from the driver's vehicle. We recorded driving data from five expert and five non-expert drivers while passing other vehicles on expressways, using an instrumented vehicle. We manually labeled the drivers' gaze directions using video of the drivers' faces, and detected the positions of surrounding vehicles and calculated TTC using laser scanners mounted on the front and back of the vehicle. We focused on driver's gaze behavior for five seconds before the driver began moving into the right hand lane at the beginning of the passing maneuver and calculated the correlation index between vectors representing the distribution of gaze resources and risk levels of surrounding vehicles for eight zones around the vehicle. We found that there were individual differences in gaze behaviors, and that expert drivers showed a higher degree of awareness than non-expert drivers.

23 citations


Proceedings ArticleDOI
03 Jun 2012
TL;DR: The proposed driver-behavior model was employed to anticipate pedal-operation behavior during car-following maneuvers involving several drivers on the road and the experimental results showed advantages of the combined model over the adapted model previously proposed.
Abstract: In this paper, we propose a stochastic driver-behavior modeling framework which takes into account both individual and general driving characteristics as one aggregate model. Patterns of individual driving styles are modeled using Dirichlet process mixture model, a nonparametric Bayesian approach which automatically selects the optimal number of model components to fit sparse observations of each particular driver's behavior. In addition, general or background driving patterns are also captured with a Gaussian mixture model using a reasonably large amount of development observed data from several drivers. By combining both probability distributions, the aggregate driver-dependent model can better emphasize driving characteristics of each particular driver, while also backing off to exploit general driving behavior in cases of unmatched parameter spaces from individual training observations. The proposed driver-behavior model was employed to anticipate pedal-operation behavior during car-following maneuvers involving several drivers on the road. The experimental results showed advantages of the combined model over the adapted model previously proposed.

22 citations


Proceedings ArticleDOI
25 Mar 2012
TL;DR: This paper proposes using a Muscle Tension Ratio (MTR) to identify speech under stress and shows that MTR is more effective than a conventional method of stress measurement.
Abstract: We focus on variations in the glottal source of speech production, which is essential for understanding the generation of speech under psychological stress. In this paper, a two-mass vocal fold model is fitted to estimate the stiffness parameters of vocal folds during speech, and the stiffness parameters are then analyzed in order to classify recorded samples into neutral and stressed speech. Mechanisms of vocal folds under stress are derived from the experimental results. We propose using a Muscle Tension Ratio (MTR) to identify speech under stress. Our results show that MTR is more effective than a conventional method of stress measurement.

18 citations



Proceedings ArticleDOI
25 Oct 2012
TL;DR: This work focused on a car driver's cognitive distraction by analyzing a driver's internal state induced during a music-retrieval task using an automatic speech-recognition system and showed that the temporal relationship between the driver's eye gaze and the peripheral vehicle behavior depends on theDriver's state.
Abstract: One's state of mind is subconsciously exposed as a reaction reflecting it by external stimuli. In this work, we focus on a car driver's cognitive distraction, specifically by analyzing a driver's internal state induced during a music-retrieval task using an automatic speech-recognition system. A visual event that occurs in front of the driver when a peripheral vehicle overtakes the driver's vehicle is regarded as the external stimulus. The analysis result showed that the temporal relationship between the driver's eye gaze and the peripheral vehicle behavior depends on the driver's state. Specifically, we confirmed that the timing of the gaze toward the stimulus under the distracted state is later than under the neutral state without the secondary cognitive task. This temporal feature can contribute to the detection of the cognitive distraction automatically. A detector based on a Bayesian framework using this feature achieves better accuracy than one based on the percentage road center method.

13 citations


Journal ArticleDOI
TL;DR: When Gram staining shows only Gram-positive cocci, penicillin is the treatment of choice, and in other cases, antibiotics effective for thePenicillin-resistant organisms should be used.
Abstract: Objective. To examine whether Gram staining can influence the choice of antibiotic for the treatment of peritonsillar abscess. Methods. Between 2005 and 2009, a total of 57 cases of peritonsillar abscess were analyzed with regard to cultured bacteria and Gram staining. Results. Only aerobes were cultured in 16% of cases, and only anaerobes were cultured in 51% of cases. Mixed growth of aerobes and anaerobes was observed in 21% of cases. The cultured bacteria were mainly aerobic Streptococcus, anaerobic Gram-positive cocci, and anaerobic Gram-negative rods. Phagocytosis of bacteria on Gram staining was observed in 9 cases. The bacteria cultured from these cases were aerobic Streptococcus, anaerobic Gram-positive cocci, and anaerobic Gram-negative rods. The sensitivity of Gram staining for the Gram-positive cocci and Gram-negative rods was 90% and 64%, respectively. The specificity of Gram staining for the Gram-positive cocci and Gram-negative rods was 62% and 76%, respectively. Most of the Gram-positive cocci were sensitive to penicillin, but some of anaerobic Gram-negative rods were resistant to penicillin. Conclusion. When Gram staining shows only Gram-positive cocci, penicillin is the treatment of choice. In other cases, antibiotics effective for the penicillin-resistant organisms should be used.

11 citations


Book ChapterDOI
01 Jan 2012
TL;DR: A signal-processing approach for modeling vehicle trajectory during lane changes while driving with a hidden Markov model and a cognitive distance space represented with a hazard-map function is discussed.
Abstract: A signal-processing approach for modeling vehicle trajectory during lane changes while driving is discussed. Since individual driving habits are not a deterministic process, we develop a stochastic method to model them. The proposed model consists of two parts: a dynamic system represented by a hidden Markov model and a cognitive distance space represented with a hazard-map function. The first part models the local dynamics of vehicular movements and generates a set of probable trajectories. The second part selects an optimal trajectory by stochastically evaluating the distances from surrounding vehicles. Through experimental evaluation, we show that the model can predict vehicle trajectory in given traffic conditions with a prediction error of 17.6m.

8 citations


Proceedings ArticleDOI
25 Oct 2012
TL;DR: The results showed the promise of this framework for estimating deceleration probability during car following, using estimated time-to-collision (TTC) information, using both negative and positive values as a criticality indicator of driving situations perceived by the driver.
Abstract: Driver deceleration behavior contains large amount of information regarding individual driving characteristics, driving environment, and situations perceived as potentially hazardous by a driver. This paper focuses on deceleration behavior involving both release of the gas pedal and depression of the brake pedal during on-the-road car following. A Bayesian framework was employed to calculate the probability of a driver decelerating at a given point in time, using only low-level driving signals. A stochastic driver-behavior model based on a Dirichlet process mixture model was employed to capture distinct characteristics of different driver's driving behavior. In addition, this framework exploits estimated time-to-collision (TTC) information, using both negative and positive values as a criticality indicator of driving situations perceived by the driver. Experimental validation was conducted using the on-the-road car-following behavior of sixty-four drivers. The results showed the promise of this framework for estimating deceleration probability during car following.

7 citations


Proceedings ArticleDOI
24 Jul 2012
TL;DR: The results showed that under two particular context conditions, drivers demonstrated distinct driving characteristics that could be efficiently recognized by stochastic driver-behavior models, and yet, some context-specific models could be exploited to predict driving behavior in other driving contexts.
Abstract: Driving context plays an essential role in driving behavior and driving performance of a driver. The contextual information surrounding a driving activity involves several factors and dimensions that influence a driver's behavior. However, a driver may or may not need to adopt a particular driving pattern for every distinct driving context conditions. In this paper, we statistically investigate the impact of various driving context conditions on the behavior prediction and context recognition performance of stochastic driver-behavior models. We employed a Dirichlet process mixture modeling framework to capture the underlying distributions of observed driving parameters under different driving context conditions. Experimental validation was conducted using the on-the-road car-following behavior of sixty-four drivers. The results showed that under two particular context conditions, drivers demonstrated distinct driving characteristics that could be efficiently recognized by stochastic driver-behavior models, and yet, some context-specific models could be exploited to predict driving behavior in other driving contexts.

6 citations


Proceedings ArticleDOI
09 Sep 2012
TL;DR: Physical parameters which can be used to classify speech as either stressed or neutral based on a two-mass vocal fold model are investigated and proposed.
Abstract: In this study, we investigate physical parameters which can be used to classify speech as either stressed or neutral based on a two-mass vocal fold model. The model attempts to characterize the behavior of the vocal folds and fluid airflow properties when stress is present. The two-mass model is fitted to real speech to estimate the values of physical parameters that represent the stiffness of vocal folds, vocal fold viscosity loss, and subglottal pressure coming from the lungs. The estimated parameters can be used to distinguish stressed speech from neutral speech because these parameters can represent the mechanisms of vocal folds under stress. We propose combinations of physical parameters as features for classification. Experimental results show that our proposed features achieved better classification performance than features derived from traditional methods.

Proceedings Article
18 Oct 2012
TL;DR: A method of selecting temporal frames which are effective for training the separation filters is proposed and evaluated, and the proposed method can achieve faster computation with lower computational complexity, and its effectiveness can be confirmed.
Abstract: A faster computational method for performing frequency domain independent component analysis (FDICA) using a dodecahedral microphone array is proposed. Source separation with FDICA uses the spectrum of observed signals and estimates separation filters for each frequency. However, this technique is complex and requires high computational resources. In this paper, a method of selecting temporal frames which are effective for training the separation filters is proposed and evaluated. The log power spectrum and the kurtosis of amplitude distribution are employed as selection criteria. Performance was evaluated by comparing signal-to-interference performance with that of the conventional method. Experimental results showed that the proposed method reduced computation to 17.1 % of that required by the conventional method, and that separation performance of the proposed method is superior. Therefore, the proposed method can achieve faster computation with lower computational complexity, and its effectiveness can be confirmed.

Proceedings ArticleDOI
01 Dec 2012
TL;DR: CENSREC-2-AV is one of the databases of the CEN-SREC project; it is provided as a database for bimodal speech recognition for additive noises and there are speech data and lip images in these corpora.
Abstract: In this paper, we introduce a bimodal speech recognition corpus in real environments. In recent years, speech recognition technology has been used in noisy conditions. Therefore, it becomes necessary to achieve higher recognition accuracy in real environments. As one of the solutions, bimodal speech recognition using audio and non-audio information is getting studied. However, there are few databases which can be used to evaluate the bimodal speech recognition in real environments. In this paper, we introduce CENSREC-2-AV we have been working to built, as a new bimodal speech recognition corpus. CENSREC-2-AV is one of the databases of the CEN-SREC project; we provided a similar corpus CENSREC-1-AV as a database for bimodal speech recognition for additive noises. In these corpora, there are speech data and lip images. Researchers can evaluate a bimodal speech recognition method built using CENSREC-1-AV which consists of clean data, in real environments by using CENSREC-2-AV.

Journal ArticleDOI
TL;DR: A band selection method based on magnitude squared coherence is proposed for small agglomerative microphone array systems and shows improvement in performance compared to the use of uniformly spaced frequency band.
Abstract: Small agglomerative microphone array systems have been proposed for use with speech communication and recognition systems. Blind source separation methods based on frequency domain independent component analysis have shown significant separation performance, and the microphone arrays are small enough to make them portable. However, the level of computational complexity involved is very high because the conventional signal collection and processing method uses 60 microphones. In this paper, we propose a band selection method based on magnitude squared coherence. Frequency bands are selected based on the spatial and geometric characteristics of the microphone array device which is strongly related to the dodecahedral shape, and the selected bands are nonuniformly spaced. The estimated reduction in the computational complexity is 90% with a 68% reduction in the number of frequency bands. Separation performance achieved during our experimental evaluation was 7.45 (dB) (signal-to-noise ratio) and 2.30 (dB) (cepstral distortion). These results show improvement in performance compared to the use of uniformly spaced frequency band.

Proceedings Article
01 Dec 2012
TL;DR: Evaluation results show that generating pseudo-speakers by manipulating speaking rates did not result in a sufficient increase in performance, however, vocal tract length warping was effective.
Abstract: In this paper, we propose a robust speaker-independent acoustic model training method using generative training to generate many pseudo-speakers from a small number of real speakers. We focus on the difference between each speaker's vocal tract length, and manipulate it in order to create many different pseudo-speakers with a range of vocal tract lengths. This method employs frequency warping based on the inverted use Vocal Tract Length Normalization(VTLN). Another method for creating pseudo-speakers is to vary the speaking rate of the speakers. This can be achieved by a method called PICOLA; Pointer Interval Controlled OverLap and Add. In experiments, we train acoustic models using these generated pseudo-speakers in addition to the original speakers. Evaluation results show that generating pseudo-speakers by manipulating speaking rates did not result in a sufficient increase in performance, however, vocal tract length warping was effective.

Journal ArticleDOI
TL;DR: A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition and results show that the acoustic models trained using the proposed method are robust for unknown speakers.
Abstract: SUMMARY A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then we train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the transformation matrices for the existing speakers are estimated. Next, we construct pseudo-speaker transformations by sampling the weight parameters from the distribution, and apply the transformation to the normalized features of the existing speaker to generate the features of the pseudo-speakers. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models trained using our proposed method are robust for unknown speakers.

Journal ArticleDOI
12 Apr 2012
TL;DR: This study collected subjective music similarity evaluation data for individuality analysis using songs in the RWC music database, a widely used database in the field of music information processing.
Abstract: We describe a method of estimating subjective music similarity from acoustic music similarity Recently, there have been many studies on the topic of music information retrieval, but there continues to be difficulty improving retrieval precision For this reason, in this study we analyze the individuality of subjective music similarity We collected subjective music similarity evaluation data for individuality analysis using songs in the RWC music database, a widely used database in the field of music information processing A total of 27 subjects listened to pairs of music tracks, and evaluated each pair as similar or dissimilar They also selected the components of the music (melody, tempo/rhythm, vocals, instruments) that were similar Each subject evaluated the same 200 pairs of songs, thus the individuality of the evaluation can be easily analyzed Using the collected data, we trained individualized distance functions between songs, in order to estimate subjective similarity and analyze individuality

Journal ArticleDOI
TL;DR: A closely coupled framework between FDICA-based BSS algorithm and speech recognition system is proposed that can reduce ASR errors which caused by separation errors in BSS and permutation errors in ICA.
Abstract: One of the main applications of Blind Source Separation (BSS) is to improve performance of Automatic Speech Recognition (ASR) systems. However, conventional BSS algorithm has been applied only to speech signals as a pre-processing approach. In this paper, a closely coupled framework between FDICA-based BSS algorithm and speech recognition system is proposed. In the source separation step, a confidence score of the separation accuracy for each frequency bin is first estimated. Subsequently, by employing multi-band speech recognition system, acoustic likelihood is calculated from the estimated BSS confidence scores and Mel-scale filter bank energy. Therefore, our proposed method can reduce ASR errors which caused by separation errors in BSS and permutation errors in ICA, as in the conventional approach. Experimental results showed that our proposed method improved word accuracy of ASR by approximately 10%.

Journal ArticleDOI
TL;DR: An innovative experimental platform was constructed to study cross-situational consistency in driving behavior, behavioral experiments were conducted, and the data obtained in the experiment were reported.
Abstract: We constructed an innovative experimental platform to study cross-situational consistency in driving behavior, conducted behavioral experiments, and reported the data obtained in the experiment. To discuss cross-situational consistency, we separated situations in which people use some systems to conduct tasks into three independent conceptual factors: environment, context, and system. We report the experimental results with the following systems: a laboratory system with a gaming controller and steering/pedal controllers and a real system, COMS an instrumented vehicle. The results are summarized as follows. 1) The individual behaviors in each system were stable, and consistency was retained. 2) The consistency of the behaviors was also confirmed when the participants drove using different interfaces in identical systems. 3) However, only slight correlation was observed across different systems in a specific situation where a strong high-order cognitive constraint (i.e., rapid driving) and a weak low-order cognitive constraint (driving with easy handling toward a straight-line course) were given. Language: en

Proceedings Article
01 May 2012
TL;DR: LREC 2012: The 8th International Conference on Language Resources and Evaluation, 21-27 May, 2012, Istanbul, Turkey.
Abstract: LREC 2012: The 8th International Conference on Language Resources and Evaluation, 21-27 May, 2012, Istanbul, Turkey.

Proceedings ArticleDOI
25 Mar 2012
TL;DR: A method for estimating the sound source depth, i.e., the distance between a source and receiver, using a small-size array, using the spatial distribution pattern of quasi-independent signal components obtained by the frequency-domain independent component analysis (FDICA) as the cue for depth estimation.
Abstract: A method for estimating the sound source depth, i.e., the distance between a source and receiver, using a small-size array is proposed. The proposed method uses the spatial distribution pattern of quasi-independent signal components obtained by the frequency-domain independent component analysis (FDICA) as the cue for depth estimation. The quasi-independent components are calculated by applying FDICA to array signals with very high redundancy, for example, 60 microphone signals for a pair of sources; therefore, signal components associated with reflection signals are obtained even though they are correlated with the direct signal. Experimental evaluation using a small-size microphone array with a large number of elements confirms that the average (RMS) estimation error of the proposed method is 0.33 m, which is sufficiently accurate for our applications.