scispace - formally typeset
Search or ask a question

Showing papers by "Kazuya Takeda published in 2016"


Journal ArticleDOI
TL;DR: This article reviews data-centric approaches for statistical modeling of driver behavior and describes how statistical machine-learning techniques, such as hidden Markov models (HMMs) and deep learning, have been successfully applied to model driver behavior using large amounts of driving data.
Abstract: This article reviews data-centric approaches for statistical modeling of driver behavior. Modeling driver behavior is challenging due to its stochastic nature and the high degree of inter- and intradriver variability. One way to deal with the highly variable nature of driving behavior is to employ a data-centric approach that models driver behavior using large amounts of driving data collected from numerous drivers in a variety of traffic conditions. To obtain large amounts of realistic driving data, several projects have collected real-world driving data. Statistical machine-learning techniques, such as hidden Markov models (HMMs) and deep learning, have been successfully applied to model driver behavior using large amounts of driving data. We have also collected on-road data recording hundreds of drivers over more than 15 years. We have applied statistical signal processing and machine-learning techniques to this data to model various aspects of driver behavior, e.g., driver pedal-operation, car-following, and lane-change behaviors for predicting driver behavior and detecting risky driver behavior and driver frustration. By reviewing related studies and providing concrete examples of our own research, this article is intended to illustrate the usefulness of such data-centric approaches for statistical driver-behavior modeling.

56 citations


Proceedings ArticleDOI
01 Nov 2016
TL;DR: In this article, the authors proposed to compress packet data which is raw point cloud data, by converting it losslessly into range images, and then using various image/video compression algorithms to reduce the volume of the data.
Abstract: Continuous point cloud data has become very important recently as a key component in the development of autonomous driving technology, and has in fact become indispensable for some autonomous driving applications such as obstacle detection. However, such large amounts of data are very expensive to store and are difficult to share directly due to their volume. Previous studies have explored various methods of compressing point cloud data directly by converting it into 2D images or by using tree-based approaches. In this study, rather than compress point cloud data directly, we choose to compress packet data which is raw point cloud data, by converting it losslessly into range images, and then using various image/video compression algorithms to reduce the volume of the data. In this paper, four methods, which are based on MPEG, JPEG and two kinds of preprocessing approaches for each method, are evaluated for range images compression. PSNR V.S Bitrate and RMSE V.S Bitrate are used to compare and evaluate the performance of these four methods. As whether a lossy compression methods is good always depending on the application, a localization application to test point cloud data reconstruction performance is also conducted. By comparing these methods, some important conclusions can be got.

37 citations


Journal ArticleDOI
01 Jun 2016
TL;DR: A simple classifier is proposed to discriminate between the cognitive distraction and neutral states by analyzing the peripheral vehicle behavior and can manage various situations and provide high classification accuracy by focusing on gaze transitions from the front view toward other directions.
Abstract: To support safe driving, numerous methods of detecting distractions using measurements of a driver's gaze have been proposed. These methods empirically focused on certain driving contexts and analyzed gaze behavior under particular peripheral vehicle conditions; therefore, numerous driving situations were not considered. To address this problem with hypothesis-testing approaches, we turn the problem around and propose a data-mining approach that analyzes peripheral vehicle behavior during gaze transitions of drivers in order to compare their neutral driving state with a cognitive distraction state. This change in thinking is the first contribution of this paper. The analysis results show that under the neutral condition, drivers generally turned their gaze to peripheral vehicles to be focused on; however, they did not do this consistently under the distracted condition. As the second contribution, we propose a simple classifier to discriminate between the cognitive distraction and neutral states by analyzing the peripheral vehicle behavior. The proposed classifier can manage various situations and provide high classification accuracy by focusing on gaze transitions from the front view toward other directions.

17 citations


Proceedings ArticleDOI
19 Jun 2016
TL;DR: A novel method for integrating driving behavior and traffic context through signal symbolization in order to summarize driving semantics from sensor outputs to risky lane change detection is presented.
Abstract: This paper presents a novel method for integrating driving behavior and traffic context through signal symbolization in order to summarize driving semantics from sensor outputs. The method has been applied to risky lane change detection. Language models (nested Pitman-Yor language model) and speech recognition algorithms (hidden Markov Model) have been utilized for converting continuous sensor signals into a sequence of non-uniform segments (chunks). After symbolization, Latent Dirichlet Allocation (LDA) is used to integrate the symbolized driving behavior and the surrounding vehicle information for establishing the semantics of the driving scene. 988 lane changes of real-world highway driving are used for the evaluation. Risk level of each lane change rated by 10 subjects are used as ground truth. Best results have been obtained when driving behavior and surrounding vehicle information are integrated through co-occurrence chunking after independent symbolization of behavior and context signals.

11 citations


Journal ArticleDOI
TL;DR: This paper implements DPM on the GPU by exploiting multiple parallelization schemes, and demonstrates that the best scheme of GPU implementations using an NVIDIA GPU achieves a speed up of 8.6x over a naive CPU-based implementation.
Abstract: Object detection is a fundamental challenge facing intelligent applications. Image processing is a promising approach to this end, but its computational cost is often a significant problem. This paper presents schemes for accelerating the deformable part models (DPM) on graphics processing units (GPUs). DPM is a well-known algorithm for image-based object detection, and it achieves high detection rates at the expense of computational cost. GPUs are massively parallel compute devices designed to accelerate data-parallel compute-intensive workload. According to an analysis of execution times, approximately 98 percent of DPM code exhibits loop processing, which means that DPM could be highly parallelized by GPUs. In this paper, we implement DPM on the GPU by exploiting multiple parallelization schemes. Results of an experimental evaluation of this GPU-accelerated DPM implementation demonstrate that the best scheme of GPU implementations using an NVIDIA GPU achieves a speed up of 8.6x over a naive CPU-based implementation.

10 citations


Journal ArticleDOI
TL;DR: Eye tracking and response time results both showed that participants understood the textually ambiguous sentences faster when listening to voices similar to their own, and suggest that tiny acoustic features, which do not contain verbal meaning can influence the processing of verbal information.
Abstract: In this study, we investigate the effect of tiny acoustic differences on the efficiency of prosodic information transmission. Study participants listened to textually ambiguous sentences, which could be understood with prosodic cues, such as syllable length and pause length. Sentences were uttered in voices similar to the participant's own voice and in voices dissimilar to their own voice. The participants then identified which of four pictures the speaker was referring to. Both the eye movement and response time of the participants were recorded. Eye tracking and response time results both showed that participants understood the textually ambiguous sentences faster when listening to voices similar to their own. The results also suggest that tiny acoustic features, which do not contain verbal meaning can influence the processing of verbal information.

6 citations


Journal ArticleDOI
TL;DR: It turns out that a tandem-based method using audio Deep Bottle-Neck Features (DBNFs) and visual ones with multi-stream HMMs is the most suitable, followed by a hybrid approach and another tandem scheme using audio-visual DBNFs.

4 citations


Book ChapterDOI
Kazuya Takeda1
01 Jan 2016
TL;DR: The project is aimed at developing technologies for preventing excessive trust in users of automated systems and develops a method of modeling visual behavior aiming at understanding environmental awareness while driving.
Abstract: An approach which would allow us to better understand behavioral states inherent in observed behaviors is proposed, based on the development of a mathematical representation of driving behaviors signals using our large driving behavior signal corpus. In particular, the project is aimed at developing technologies for preventing excessive trust in users of automated systems. Misuse/disuse of automation is introduced as a cognitive model of excessive trust, and methods of quantitative measurement are devised. PWARX and GMM models are proposed to represent discrete and continuous information in the cognition/decision/action process. We also develop a method of modeling visual behavior aiming at understanding environmental awareness while driving. We showed the effectiveness of the model experimentally through risky lane change detection. Finally, we show the effectiveness of the method to quantify excessive trust based on developed technology.

4 citations


Proceedings ArticleDOI
08 Sep 2016
TL;DR: Experimental results on the Aurora4 corpus show that the example-based approach using BNFs greatly improves the enhanced speech quality compared with that using MFCCs and consistently outperforms a conventional DNN- based approach, i.e. a denoising autoencoder.
Abstract: Example-based speech enhancement is a promising approach for coping with highly non-stationary noise. Given a noisy speech input, it first searches in noisy speech corpora for the noisy speech examples that best match the input. Then, it concatenates the clean speech examples that are paired with the matched noisy examples to obtain an estimate of the underlying clean speech component in the input. This framework works well if the noisy speech corpora contain the noise included in the input. However, it is impossible to prepare corpora that cover all types of noisy environments. Moreover, the example search is usually performed using noise sensitive mel-frequency cepstral coefficient features (MFCCs). Consequently, a mismatch between an input and the corpora is inevitable. This paper proposes using bottleneck features (BNFs) extracted from a deep neural network (DNN) acoustic model for the example search. Since BNFs have good noise robustness (invariance), the mismatch is mitigated and thus a more accurate example search can be performed. Experimental results on the Aurora4 corpus show that the example-based approach using BNFs greatly improves the enhanced speech quality compared with that using MFCCs. It also consistently outperforms a conventional DNN-based approach, i.e. a denoising autoencoder.

2 citations


Journal ArticleDOI
TL;DR: It’s time to dust off the dustbinoculars and dustbin lids and start thinking again.
Abstract: 第53回日本鼻科学会総会・学術講演会(2014年9月:大阪)において,基礎研究に対する臨床医の知識や技術レベル の維持と向上,基礎研究に対するモチベーションの向上,各大学間の横の連携を図ることを目指して基礎研究ハンズオ ンセミナーを開催した。参加者へのアンケート調査結果から基礎ハンズオンセミナーの継続が期待されていることがわ かった。そこで,前回指摘された改善点を反映させた内容で第54回日本鼻科学会総会・学術講演会(2015年10月:広島) において第2回目となる基礎研究ハンズオンセミナーを企画した。また,セミナー終了後に参加者にアンケート調査を おこない,本学会総会・学術講演会における基礎的演題数の追加調査をおこなった。基礎的演題数の割合は依然低かっ たが,アンケート調査では,実演時間に関して94%が「適切だった」と回答し,内容に関しても98%が「大変よかった」 あるいは「よかった」と回答した。本報告では,今回おこなわれた基礎ハンズオンセミナーの取り組みの概要,調査結 果,今後の展望などに関して報告する。

1 citations


Journal ArticleDOI
TL;DR: To apply NMF to the stereo channel music signal separation, the proposed Nonnegative Tensor Factorization (NTF) is proposed by further implementing a gain matrix to represent mixing information and the separation performance of this method is insufficient.
Abstract: Music signals are usually generated by mixing many music source signals, such as various instrumental sounds and vocal sounds, and they are often represented as 2-channel signals (i.e., stereo channel signals). Underdetermined source separation for separating the music signals into individual music source signals is a potential technique to develop various applications, such as music transcription, singer discrimination, and vocal extraction. One of the most powerful underdetermined source separation methods is Nonnegative Matrix Factorization (NMF) that models a power spectrogram of an observation signal as a product of two nonnegative matrices; basis and activation matrices. To apply NMF to the stereo channel music signal separation, we have proposed Nonnegative Tensor Factorization (NTF) by further implementing a gain matrix to represent mixing (i.e., panning) information. However, the separation performance of this method is insufficient owing to less prior information to model acoustic characteristic...

Journal ArticleDOI
TL;DR: The proposed hybrid system is capable of detecting a segment of each sound event without post-processing, such as a smoothing process of detection results over multiple frames, usually required in the frame-wise detection methods.
Abstract: In this study, we propose a polyphonic sound event detection method based on a hybrid system of Convolutional Bidirectional Long Short-Term Memory Recurrent Neural Network and Hidden Markov Model (CBLSTM-HMM). Inspired by the state-of-the-art approach to integrating neural networks to HMM in speech recognition, the proposed method develops the hybrid system using CBLSTM to estimate the HMM state output probability, making it possible to model sequential data while handling its duration change. The proposed hybrid system is capable of detecting a segment of each sound event without post-processing, such as a smoothing process of detection results over multiple frames, usually required in the frame-wise detection methods. Moreover, we can easily apply it to a multi-label classification problem to achieve polyphonic sound event detection. We conduct experimental evaluations using the DCASE2016 task two dataset to compare the performance of the proposed method to that of the conventional methods, such as non-...

Journal ArticleDOI
TL;DR: In this article, the authors propose a framework for arranging audio objects in recorded music using artificial intelligence (AI) to anticipate the preferences of individual listeners, such as the tracks of guitars and drums in a piece of music, are re-synthesized in order to provide the preferred spatial arrangements of each listener.
Abstract: We propose a framework for arranging audio objects in recorded music using artificial intelligence (AI) to anticipate the preferences of individual listeners. The signals of audio objects, such as the tracks of guitars and drums in a piece of music, are re-synthesizes in order to provide the preferred spatial arrangements of each listener. Deep learning-based noise suppression ratio estimation is utilized as a technique for enhancing audio objects from mixed signals. Neural networks are tuned for each audio object in advance, and noise suppression ratios are estimated for each frequency band and time frame. After enhancing each audio object, the objects are re-synthesized as stereo sound using the positions of the audio objects and the listener as synthesizing parameters. Each listener supplies simple feedback regarding his/her preferred audio object arrangement using graphical user interface (GUI). Using this listener feedback, the synthesizing parameters are then stochastically optimized in accordance w...


Journal Article
TL;DR: In this paper, the authors investigated and compared several DNN-based audio-visual speech recognition (AVSR) methods to mainly clarify how we should incorporate audio and visual modalities using DNNs.
Abstract: Audio-Visual Speech Recognition (AVSR) is one of techniques to enhance robustness of speech recognizer in noisy or real environments. On the other hand, Deep Neural Networks (DNNs) have recently attracted a lot of attentions of researchers in the speech recognition field, because we can drastically improve recognition performance by using DNNs. There are two ways to employ DNN techniques for speech recognition: a hybrid approach and a tandem approach; in the hybrid approach an emission probability on each Hidden Markov Model (HMM) state is computed using a DNN, while in the tandem approach a DNN is composed into a feature extraction scheme. In this paper, we investigate and compare several DNN-based AVSR methods to mainly clarify how we should incorporate audio and visual modalities using DNNs. We carried out recognition experiments using a corpus CENSREC-1-AV, and we discuss the results to find out the best DNN-based AVSR modeling. Then it turns out that a tandem-based method using audio Deep Bottle-Neck Features (DBNFs) and visual ones with multi-stream HMMs is the most suitable, followed by a hybrid approach and another tandem scheme using audio-visual DBNFs.