Showing papers by "Kazuya Takeda published in 2016"

PDF

Open Access

Journal Article•DOI•

Driver-Behavior Modeling Using On-Road Driving Data: A new application for behavior signal processing

[...]

Chiyomi Miyajima¹, Kazuya Takeda¹•Institutions (1)

04 Nov 2016-IEEE Signal Processing Magazine

TL;DR: This article reviews data-centric approaches for statistical modeling of driver behavior and describes how statistical machine-learning techniques, such as hidden Markov models (HMMs) and deep learning, have been successfully applied to model driver behavior using large amounts of driving data.

...read moreread less

Abstract: This article reviews data-centric approaches for statistical modeling of driver behavior. Modeling driver behavior is challenging due to its stochastic nature and the high degree of inter- and intradriver variability. One way to deal with the highly variable nature of driving behavior is to employ a data-centric approach that models driver behavior using large amounts of driving data collected from numerous drivers in a variety of traffic conditions. To obtain large amounts of realistic driving data, several projects have collected real-world driving data. Statistical machine-learning techniques, such as hidden Markov models (HMMs) and deep learning, have been successfully applied to model driver behavior using large amounts of driving data. We have also collected on-road data recording hundreds of drivers over more than 15 years. We have applied statistical signal processing and machine-learning techniques to this data to model various aspects of driver behavior, e.g., driver pedal-operation, car-following, and lane-change behaviors for predicting driver behavior and detecting risky driver behavior and driver frustration. By reviewing related studies and providing concrete examples of our own research, this article is intended to illustrate the usefulness of such data-centric approaches for statistical driver-behavior modeling.

...read moreread less

56 citations

Proceedings Article•DOI•

Compressing continuous point cloud data using image compression methods

[...]

Chenxi Tu¹, Eijiro Takeuchi¹, Chiyomi Miyajima¹, Kazuya Takeda¹•Institutions (1)

Nagoya University¹

01 Nov 2016

TL;DR: In this article, the authors proposed to compress packet data which is raw point cloud data, by converting it losslessly into range images, and then using various image/video compression algorithms to reduce the volume of the data.

...read moreread less

Abstract: Continuous point cloud data has become very important recently as a key component in the development of autonomous driving technology, and has in fact become indispensable for some autonomous driving applications such as obstacle detection. However, such large amounts of data are very expensive to store and are difficult to share directly due to their volume. Previous studies have explored various methods of compressing point cloud data directly by converting it into 2D images or by using tree-based approaches. In this study, rather than compress point cloud data directly, we choose to compress packet data which is raw point cloud data, by converting it losslessly into range images, and then using various image/video compression algorithms to reduce the volume of the data. In this paper, four methods, which are based on MPEG, JPEG and two kinds of preprocessing approaches for each method, are evaluated for range images compression. PSNR V.S Bitrate and RMSE V.S Bitrate are used to compare and evaluate the performance of these four methods. As whether a lossy compression methods is good always depending on the application, a localization application to test point cloud data reconstruction performance is also conducted. By comparing these methods, some important conclusions can be got.

...read moreread less

37 citations

Journal Article•DOI•

Classification of Driver's Neutral and Cognitive Distraction States Based on Peripheral Vehicle Behavior in Driver's Gaze Transition

[...]

Takatsugu Hirayama¹, Kenji Mase¹, Chiyomi Miyajima¹, Kazuya Takeda¹•Institutions (1)

Nagoya University¹

01 Jun 2016

TL;DR: A simple classifier is proposed to discriminate between the cognitive distraction and neutral states by analyzing the peripheral vehicle behavior and can manage various situations and provide high classification accuracy by focusing on gaze transitions from the front view toward other directions.

...read moreread less

Abstract: To support safe driving, numerous methods of detecting distractions using measurements of a driver's gaze have been proposed. These methods empirically focused on certain driving contexts and analyzed gaze behavior under particular peripheral vehicle conditions; therefore, numerous driving situations were not considered. To address this problem with hypothesis-testing approaches, we turn the problem around and propose a data-mining approach that analyzes peripheral vehicle behavior during gaze transitions of drivers in order to compare their neutral driving state with a cognitive distraction state. This change in thinking is the first contribution of this paper. The analysis results show that under the neutral condition, drivers generally turned their gaze to peripheral vehicles to be focused on; however, they did not do this consistently under the distracted condition. As the second contribution, we propose a simple classifier to discriminate between the cognitive distraction and neutral states by analyzing the peripheral vehicle behavior. The proposed classifier can manage various situations and provide high classification accuracy by focusing on gaze transitions from the front view toward other directions.

...read moreread less

17 citations

Proceedings Article•DOI•

Integrating driving behavior and traffic context through signal symbolization

[...]

Suguru Yamazaki¹, Chiyomi Miyajima¹, Ekim Yurtsever¹, Kazuya Takeda¹, Masataka Mori², Kentarou Hitomi², Masumi Egawa² - Show less +3 more•Institutions (2)

Nagoya University¹, Denso²

19 Jun 2016

TL;DR: A novel method for integrating driving behavior and traffic context through signal symbolization in order to summarize driving semantics from sensor outputs to risky lane change detection is presented.

...read moreread less

Abstract: This paper presents a novel method for integrating driving behavior and traffic context through signal symbolization in order to summarize driving semantics from sensor outputs. The method has been applied to risky lane change detection. Language models (nested Pitman-Yor language model) and speech recognition algorithms (hidden Markov Model) have been utilized for converting continuous sensor signals into a sequence of non-uniform segments (chunks). After symbolization, Latent Dirichlet Allocation (LDA) is used to integrate the symbolized driving behavior and the surrounding vehicle information for establishing the semantics of the driving scene. 988 lane changes of real-world highway driving are used for the evaluation. Risk level of each lane change rated by 10 subjects are used as ground truth. Best results have been obtained when driving behavior and surrounding vehicle information are integrated through co-occurrence chunking after independent symbolization of behavior and context signals.

...read moreread less

11 citations

Journal Article•DOI•

Accelerated Deformable Part Models on GPUs

[...]

Manato Hirabayashi¹, Shinpei Kato¹, Masato Edahiro¹, Kazuya Takeda¹, Seiichi Mita² - Show less +1 more•Institutions (2)

Nagoya University¹, Toyota Technological Institute²

01 Jun 2016-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper implements DPM on the GPU by exploiting multiple parallelization schemes, and demonstrates that the best scheme of GPU implementations using an NVIDIA GPU achieves a speed up of 8.6x over a naive CPU-based implementation.

...read moreread less

Abstract: Object detection is a fundamental challenge facing intelligent applications. Image processing is a promising approach to this end, but its computational cost is often a significant problem. This paper presents schemes for accelerating the deformable part models (DPM) on graphics processing units (GPUs). DPM is a well-known algorithm for image-based object detection, and it achieves high detection rates at the expense of computational cost. GPUs are massively parallel compute devices designed to accelerate data-parallel compute-intensive workload. According to an analysis of execution times, approximately 98 percent of DPM code exhibits loop processing, which means that DPM could be highly parallelized by GPUs. In this paper, we implement DPM on the GPU by exploiting multiple parallelization schemes. Results of an experimental evaluation of this GPU-accelerated DPM implementation demonstrate that the best scheme of GPU implementations using an NVIDIA GPU achieves a speed up of 8.6x over a naive CPU-based implementation.

...read moreread less

10 citations

Journal Article•DOI•

Impact of acoustic similarity on efficiency of verbal information transmission via subtle prosodic cues

[...]

Bohan Chen¹, Norihide Kitaoka², Kazuya Takeda¹•Institutions (2)

Nagoya University¹, University of Tokushima²

01 Dec 2016-Eurasip Journal on Audio, Speech, and Music Processing

TL;DR: Eye tracking and response time results both showed that participants understood the textually ambiguous sentences faster when listening to voices similar to their own, and suggest that tiny acoustic features, which do not contain verbal meaning can influence the processing of verbal information.

...read moreread less

Abstract: In this study, we investigate the effect of tiny acoustic differences on the efficiency of prosodic information transmission. Study participants listened to textually ambiguous sentences, which could be understood with prosodic cues, such as syllable length and pause length. Sentences were uttered in voices similar to the participant's own voice and in voices dissimilar to their own voice. The participants then identified which of four pictures the speaker was referring to. Both the eye movement and response time of the participants were recorded. Eye tracking and response time results both showed that participants understood the textually ambiguous sentences faster when listening to voices similar to their own. The results also suggest that tiny acoustic features, which do not contain verbal meaning can influence the processing of verbal information.

...read moreread less

6 citations

Journal Article•DOI•

Investigation of DNN-Based Audio-Visual Speech Recognition

[...]

Satoshi Tamura¹, Hiroshi Ninomiya², Norihide Kitaoka³, Shin Osuga⁴, Yurie Iribe⁵, Kazuya Takeda², Satoru Hayamizu¹ - Show less +3 more•Institutions (5)

Gifu University¹, Nagoya University², University of Tokushima³, Aisin Seiki Co.⁴, Aichi Prefectural University⁵

01 Oct 2016-IEICE Transactions on Information and Systems

TL;DR: It turns out that a tandem-based method using audio Deep Bottle-Neck Features (DBNFs) and visual ones with multi-stream HMMs is the most suitable, followed by a hybrid approach and another tandem scheme using audio-visual DBNFs.

...read moreread less

4 citations

Book Chapter•DOI•

Modeling and Detecting Excessive Trust from Behavior Signals: Overview of Research Project and Results

[...]

Kazuya Takeda¹•Institutions (1)

Nagoya University¹

01 Jan 2016

TL;DR: The project is aimed at developing technologies for preventing excessive trust in users of automated systems and develops a method of modeling visual behavior aiming at understanding environmental awareness while driving.

...read moreread less

Abstract: An approach which would allow us to better understand behavioral states inherent in observed behaviors is proposed, based on the development of a mathematical representation of driving behaviors signals using our large driving behavior signal corpus. In particular, the project is aimed at developing technologies for preventing excessive trust in users of automated systems. Misuse/disuse of automation is introduced as a cognitive model of excessive trust, and methods of quantitative measurement are devised. PWARX and GMM models are proposed to represent discrete and continuous information in the cognition/decision/action process. We also develop a method of modeling visual behavior aiming at understanding environmental awareness while driving. We showed the effectiveness of the model experimentally through risky lane change detection. Finally, we show the effectiveness of the method to quantify excessive trust based on developed technology.

...read moreread less

4 citations

Proceedings Article•DOI•

Robust Example Search Using Bottleneck Features for Example-Based Speech Enhancement.

[...]

Atsunori Ogawa¹, Shogo Seki², Keisuke Kinoshita¹, Marc Delcroix¹, Takuya Yoshioka¹, Tomohiro Nakatani¹, Kazuya Takeda² - Show less +3 more•Institutions (2)

Nippon Telegraph and Telephone¹, Nagoya University²

08 Sep 2016

TL;DR: Experimental results on the Aurora4 corpus show that the example-based approach using BNFs greatly improves the enhanced speech quality compared with that using MFCCs and consistently outperforms a conventional DNN- based approach, i.e. a denoising autoencoder.

...read moreread less

Abstract: Example-based speech enhancement is a promising approach for coping with highly non-stationary noise. Given a noisy speech input, it first searches in noisy speech corpora for the noisy speech examples that best match the input. Then, it concatenates the clean speech examples that are paired with the matched noisy examples to obtain an estimate of the underlying clean speech component in the input. This framework works well if the noisy speech corpora contain the noise included in the input. However, it is impossible to prepare corpora that cover all types of noisy environments. Moreover, the example search is usually performed using noise sensitive mel-frequency cepstral coefficient features (MFCCs). Consequently, a mismatch between an input and the corpora is inevitable. This paper proposes using bottleneck features (BNFs) extracted from a deep neural network (DNN) acoustic model for the example search. Since BNFs have good noise robustness (invariance), the mismatch is mitigated and thus a more accurate example search can be performed. Experimental results on the Aurora4 corpus show that the example-based approach using BNFs greatly improves the enhanced speech quality compared with that using MFCCs. It also consistently outperforms a conventional DNN-based approach, i.e. a denoising autoencoder.

...read moreread less

2 citations

Journal Article•DOI•

Sixth Report on the Hands-on Seminar on Basic Research for Clinicians at the 58th Annual Meeting of the Japanese Rhinologic Society

[...]

Takumi Kumai¹, Yasutaka Yun², Noriaki Aoi³, Yohei Maeda⁴, Ryusuke Hayashi¹, Toshihiro Nagato¹, Kensuke Suzuki², Syunsuke Sawada², Kazuya Takeda⁴, Takeshi Tsuda⁴, Hiroshi Iwai², Akira Kanda², Hidenori Inohara⁴, Hideyuki Kawauchi³, Yasuaki Harabuchi¹ - Show less +11 more•Institutions (4)

Asahikawa Medical University¹, Kansai Medical University², Shimane University³, Osaka University⁴

01 Jan 2016-Japanese Journal of Rhinology

TL;DR: It’s time to dust off the dustbinoculars and dustbin lids and start thinking again.

...read moreread less

Abstract: 第53回日本鼻科学会総会・学術講演会(2014年9月:大阪)において,基礎研究に対する臨床医の知識や技術レベルの維持と向上,基礎研究に対するモチベーションの向上,各大学間の横の連携を図ることを目指して基礎研究ハンズオンセミナーを開催した。参加者へのアンケート調査結果から基礎ハンズオンセミナーの継続が期待されていることがわかった。そこで,前回指摘された改善点を反映させた内容で第54回日本鼻科学会総会・学術講演会(2015年10月:広島) において第2回目となる基礎研究ハンズオンセミナーを企画した。また,セミナー終了後に参加者にアンケート調査をおこない,本学会総会・学術講演会における基礎的演題数の追加調査をおこなった。基礎的演題数の割合は依然低かったが,アンケート調査では,実演時間に関して94%が「適切だった」と回答し,内容に関しても98%が「大変よかった」あるいは「よかった」と回答した。本報告では,今回おこなわれた基礎ハンズオンセミナーの取り組みの概要,調査結果,今後の展望などに関して報告する。

...read moreread less

1 citations

Journal Article•DOI•

Stereo channel music signal separation based on non-negative tensor factorization with cepstrum regularization

[...]

Shogo Seki, Kento Ohtani, Tomoki Toda, Kazuya Takeda

18 Nov 2016-Journal of the Acoustical Society of America

TL;DR: To apply NMF to the stereo channel music signal separation, the proposed Nonnegative Tensor Factorization (NTF) is proposed by further implementing a gain matrix to represent mixing information and the separation performance of this method is insufficient.

...read moreread less

Abstract: Music signals are usually generated by mixing many music source signals, such as various instrumental sounds and vocal sounds, and they are often represented as 2-channel signals (i.e., stereo channel signals). Underdetermined source separation for separating the music signals into individual music source signals is a potential technique to develop various applications, such as music transcription, singer discrimination, and vocal extraction. One of the most powerful underdetermined source separation methods is Nonnegative Matrix Factorization (NMF) that models a power spectrogram of an observation signal as a product of two nonnegative matrices; basis and activation matrices. To apply NMF to the stereo channel music signal separation, we have proposed Nonnegative Tensor Factorization (NTF) by further implementing a gain matrix to represent mixing (i.e., panning) information. However, the separation performance of this method is insufficient owing to less prior information to model acoustic characteristic...

...read moreread less

Journal Article•DOI•

Convolutional bidirectional long short-term memory hidden Markov model hybrid system for polyphonic sound event detection

[...]

Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Tori, Jonathan Le Roux, Kazuya Takeda - Show less +2 more

18 Nov 2016-Journal of the Acoustical Society of America

TL;DR: The proposed hybrid system is capable of detecting a segment of each sound event without post-processing, such as a smoothing process of detection results over multiple frames, usually required in the frame-wise detection methods.

...read moreread less

Abstract: In this study, we propose a polyphonic sound event detection method based on a hybrid system of Convolutional Bidirectional Long Short-Term Memory Recurrent Neural Network and Hidden Markov Model (CBLSTM-HMM). Inspired by the state-of-the-art approach to integrating neural networks to HMM in speech recognition, the proposed method develops the hybrid system using CBLSTM to estimate the HMM state output probability, making it possible to model sequential data while handling its duration change. The proposed hybrid system is capable of detecting a segment of each sound event without post-processing, such as a smoothing process of detection results over multiple frames, usually required in the frame-wise detection methods. Moreover, we can easily apply it to a multi-label classification problem to achieve polyphonic sound event detection. We conduct experimental evaluations using the DCASE2016 task two dataset to compare the performance of the proposed method to that of the conventional methods, such as non-...

...read moreread less

Journal Article•DOI•

AI framework to arrange audio objects according to listener preferences

[...]

Kento Ohtani, Kenta Niwa, Kazuya Takeda

18 Nov 2016-Journal of the Acoustical Society of America

TL;DR: In this article, the authors propose a framework for arranging audio objects in recorded music using artificial intelligence (AI) to anticipate the preferences of individual listeners, such as the tracks of guitars and drums in a piece of music, are re-synthesized in order to provide the preferred spatial arrangements of each listener.

...read moreread less

Abstract: We propose a framework for arranging audio objects in recorded music using artificial intelligence (AI) to anticipate the preferences of individual listeners. The signals of audio objects, such as the tracks of guitars and drums in a piece of music, are re-synthesizes in order to provide the preferred spatial arrangements of each listener. Deep learning-based noise suppression ratio estimation is utilized as a technique for enhancing audio objects from mixed signals. Neural networks are tuned for each audio object in advance, and noise suppression ratios are estimated for each frequency band and time frame. After enhancing each audio object, the objects are re-synthesized as stereo sound using the positions of the audio objects and the listener as synthesizing parameters. Each listener supplies simple feedback regarding his/her preferred audio object arrangement using graphical user interface (GUI). Using this listener feedback, the synthesizing parameters are then stochastically optimized in accordance w...

...read moreread less

Journal Article•DOI•

Retrospective Analysis of Surgical Treatment in Isolated Sphenoid Sinus Disease

[...]

Masaki Hayama, Takashi Shikina, Suetaka Nishiike, Chisako Masumura, Yumi Ohta, Youhei Maeda, Kazuya Takeda, Suzuyo Okazaki, Hidenori Inohara - Show less +5 more

01 Jan 2016-Japanese Journal of Rhinology

Journal Article•

Investigation of DNN-Based Audio-Visual Speech Recognition (Special Section on Recent Advances in Machine Learning for Spoken Language Processing)

[...]

Satoshi Tamura, Hiroshi Ninomiya, Norihide Kitaoka, Shin Osuga, Yurie Iribe, Kazuya Takeda, Satoru Hayamizu - Show less +3 more

01 Oct 2016-IEICE Transactions on Information and Systems

TL;DR: In this paper, the authors investigated and compared several DNN-based audio-visual speech recognition (AVSR) methods to mainly clarify how we should incorporate audio and visual modalities using DNNs.

...read moreread less

Abstract: Audio-Visual Speech Recognition (AVSR) is one of techniques to enhance robustness of speech recognizer in noisy or real environments. On the other hand, Deep Neural Networks (DNNs) have recently attracted a lot of attentions of researchers in the speech recognition field, because we can drastically improve recognition performance by using DNNs. There are two ways to employ DNN techniques for speech recognition: a hybrid approach and a tandem approach; in the hybrid approach an emission probability on each Hidden Markov Model (HMM) state is computed using a DNN, while in the tandem approach a DNN is composed into a feature extraction scheme. In this paper, we investigate and compare several DNN-based AVSR methods to mainly clarify how we should incorporate audio and visual modalities using DNNs. We carried out recognition experiments using a corpus CENSREC-1-AV, and we discuss the results to find out the best DNN-based AVSR modeling. Then it turns out that a tandem-based method using audio Deep Bottle-Neck Features (DBNFs) and visual ones with multi-stream HMMs is the most suitable, followed by a hybrid approach and another tandem scheme using audio-visual DBNFs.

...read moreread less