scispace - formally typeset
Search or ask a question

Showing papers by "Kazuya Takeda published in 2015"


Journal ArticleDOI
TL;DR: An open platform using commodity vehicles and sensors is introduced to facilitate the development of autonomous vehicles and presents algorithms, software libraries, and datasets required for scene recognition, path planning, and vehicle control.
Abstract: Autonomous vehicles are an emerging application of automotive technology. They can recognize the scene, plan the path, and control the motion by themselves while interacting with drivers. Although they receive considerable attention, components of autonomous vehicles are not accessible to the public but instead are developed as proprietary assets. To facilitate the development of autonomous vehicles, this article introduces an open platform using commodity vehicles and sensors. Specifically, the authors present algorithms, software libraries, and datasets required for scene recognition, path planning, and vehicle control. This open platform allows researchers and developers to study the basis of autonomous vehicles, design new algorithms, and test their performance using the common interface.

432 citations


Proceedings ArticleDOI
19 Apr 2015
TL;DR: Experimental results show that certain multi-channel features outperform both a monaural DAE and a conventional time-frequency-mask-based speech enhancement method.
Abstract: This paper investigates a multi-channel denoising autoencoder (DAE)-based speech enhancement approach. In recent years, deep neural network (DNN)-based monaural speech enhancement and robust automatic speech recognition (ASR) approaches have attracted much attention due to their high performance. Although multi-channel speech enhancement usually outperforms single channel approaches, there has been little research on the use of multi-channel processing in the context of DAE. In this paper, we explore the use of several multi-channel features as DAE input to confirm whether multi-channel information can improve performance. Experimental results show that certain multi-channel features outperform both a monaural DAE and a conventional time-frequency-mask-based speech enhancement method.

101 citations


Proceedings ArticleDOI
06 Sep 2015
TL;DR: This paper proposes a method of integrating DBNFs using multi-stream HMMs in order to improve the performance of AVSRs under both clean and noisy conditions and evaluates the method using a continuously spo-ken, Japanese digit recognition task under matched and mismatched conditions.
Abstract: Recent interest in “deep learning”, which can be defined as the use of algorithms to model high-level abstractions in data, using models composed of multiple non-linear transformations, has resulted in an increase in the number of studies investigating the use of deep learning with automatic speech recognition (ASR) systems. Some of these studies have found that bottleneck features extracted from deep neural networks (DNNs), sometimes called “deep bottleneck features” (DBNFs), can reduce the word error rates of ASR systems. However, there has been little research on audio-visual speech recognition (AVSR) systems using DBNFs. In this paper, we propose a method of integrating DBNFs using multi-stream HMMs in order to improve the performance of AVSRs under both clean and noisy conditions. We evaluate our method using a continuously spoken, Japanese digit recognition task under matched and mismatched conditions. Relative word error reduction rates of roughly 68.7%, 47.4%, and 51.9% were achieved, compared with an audio-only ASR system and two feature-fusion models, which employed DBNFs and single-stream HMMs, respectively.

49 citations


Proceedings ArticleDOI
01 Dec 2015
TL;DR: It is found VAD is useful in both audio and visual modalities, for better lipreading and AVSR, and investigating effectiveness of voice activity detection in a visual modality.
Abstract: This paper develops an Audio-Visual Speech Recognition (AVSR) method, by (1) exploring high-performance visual features, (2) applying audio and visual deep bottleneck features to improve AVSR performance, and (3) investigating effectiveness of voice activity detection in a visual modality. In our approach, many kinds of visual features are incorporated, subsequently converted into bottleneck features by deep learning technology. By using proposed features, we successfully achieved 73.66% lipreading accuracy in speaker-independent open condition, and about 90% AVSR accuracy on average in noisy environments. In addition, we extracted speech segments from visual features, resulting 77.80% lipreading accuracy. It is found VAD is useful in both audio and visual modalities, for better lipreading and AVSR.

44 citations


Proceedings ArticleDOI
28 Dec 2015
TL;DR: The proposed method outperformed the SVM-based method when an additional "Other" activity category was included and it is demonstrated that DNNs are a robust method of daily activity recognition.
Abstract: We propose a new method of recognizing daily human activities based on a Deep Neural Network (DNN), using multimodal signals such as environmental sound and subject acceleration. We conduct recognition experiments to compare the proposed method to other methods such as a Support Vector Machine (SVM), using real-world data recorded continuously over 72 hours. Our proposed method achieved a frame accuracy rate of 85.5% and a sample accuracy rate of 91.7% when identifying nine different types of daily activities. Furthermore, the proposed method outperformed the SVM-based method when an additional "Other" activity category was included. Therefore, we demonstrate that DNNs are a robust method of daily activity recognition.

41 citations


Proceedings ArticleDOI
27 Aug 2015
TL;DR: This paper proposes a method of automatically extracting lane change situations from large-scale driving corpora using an unsupervised symbolization method and topic representation to driving data and shows effectiveness of symbols with topic proportions for representing characteristics of driving situations.
Abstract: This paper proposes a method of automatically extracting lane change situations from large-scale driving corpora. Naturalistic driving data stored in large-scale corpora has a potential of contributing for developing novel advanced driver-assistance systems based on estimated information about driver's intent and/or potential risk of accidents. However, direct estimation of such kind of information from stream data is difficult. To address the issue, we apply an unsupervised symbolization method and topic representation to driving data. Driving stream data is converted to sequences of discrete symbols by a non-parametric symbolization method, and then the symbols are characterized by topics which represent typical distribution of driving behavior observed during the symbols. Because these symbols are separated on changing points of driving behavior, similar driving situations are effectively retrieved from sequences of the symbols. For evaluating effectiveness of the symbolization approach, we extract lane change situations based on the topic proportions and their temporal patterns. Distinctive elements of topic proportions and their temporal patterns for lane change situations are extracted by AdaBoost classifier. As a result, proposed approach outperforms baselines with neither topic proportions nor their temporal patterns in terms of extracting lane change situations. This result shows effectiveness of symbols with topic proportions for representing characteristics of driving situations.

16 citations


Proceedings ArticleDOI
27 Aug 2015
TL;DR: Experimental results show that drivers who pay less attention to the road ahead during automated driving tend to be less sensitive to risk factors in the surrounding environment and also tend to make inconsistent lane change decisions during automateddriving.
Abstract: We investigate a possible method for detecting a driver's negative adaptation to an automated driving system by analyzing consistency of driver decision making and driver gaze behavior during automated driving. We focus on an automated driving system equivalent to Level 2 automation per the NHTSA's definition. At this level of automation, drivers must be ready to take control of the vehicle in critical situations by monitoring the driving environment and vehicle behavior. Since drivers are not required to operate the pedals or steering wheel during automated driving, a driver's negative adaptation to an automated system needs to be detected from behavior other than vehicle operation. In this study, we focus on driver gaze behavior. We conduct a simulator study to compare the gaze behavior of fifteen drivers during conventional and automated driving. We also analyze the consistency of driver decision making when changing lanes during conventional and automated driving. Experimental results show that drivers who pay less attention to the road ahead during automated driving tend to be less sensitive to risk factors in the surrounding environment and also tend to make inconsistent lane change decisions during automated driving.

15 citations


Journal ArticleDOI
TL;DR: A physical model is proposed to model airflow patterns in the physiological system in order to represent the process of speech production under psychological stress, and physical parameters characterizing airflow variations in the vocal folds, the vocal tract, and laryngeal ventricle are explored.
Abstract: This letter presents a method to perform the classification of speech under stress based on physical characteristics. A physical model is proposed to model airflow patterns in the physiological system in order to represent the process of speech production under psychological stress, and physical parameters characterizing airflow variations in the vocal folds, the vocal tract, and laryngeal ventricle are explored. Experimental evaluations show that the physical parameters are effective for the classification of stressed speech.

10 citations



Proceedings ArticleDOI
01 Jun 2015
TL;DR: Experimental results show that the predictions made with categorized traffic trajectory history have less errors than the predictionsmade with road shape curvature.
Abstract: This paper proposes a novel approach for extracting the traffic trajectory history, with the use of GPS data collected over a certain period of time, to be used as an input for driver models. In this approach, driving curvature is distinguished from actual road shape curvature with the use of real driving data. After sufficient amount of drive data has been collected, high degree polynomials are fitted to GPS point cloud. Traffic trajectory history is the tangential unit vectors and curvature values that are calculated from these polynomials. Then a single drivers driving path has been predicted with using traffic trajectory history and road shape curvature for comparison and validation. Experimental results show that the predictions made with categorized traffic trajectory history have less errors than the predictions made with road shape curvature.

5 citations


Journal ArticleDOI
TL;DR: In this paper, the authors identify differences in characteristics of drivers' decision-making when driving a vehicle with manual operation or with an automatic driving assistance system by using a high fidelity driving simulator.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: Experimental results showed that the proposed method can improve recognition accuracy compared to a conventional method, and demonstrates the effectiveness of estimating acceleration signals with a Gaussian process to recognize daily activities.
Abstract: We have created corpus of daily activities using wearable sensors. The corpus consists of sound and image data from a camera and motion signals from a smartphone for both indoor and outdoor activities over 72 continuous hours. We propose a method that can interpolate acceleration signals to any sample points with a Gaussian process in order to recognize daily activities. We conducted recognition experiments of daily activities using our corpus. Experimental results showed that the proposed method can improve recognition accuracy compared to a conventional method. This demonstrates the effectiveness of estimating acceleration signals with a Gaussian process to recognize daily activities.

01 Jan 2015
TL;DR: This study proposes a method to extract the unique driving signatures of individual drivers from sensor data and suggests that drivers with similar driving signatures can be categorized into driving style classes such as aggressive or careful driving.
Abstract: This study proposes a method to extract the unique driving signatures of individual drivers We assume that each driver has a unique driving signature that can be represented in a k dimensional principal driving component (PDC) space We propose a method to extract this signature from sensor data Furthermore, we suggest that drivers with similar driving signatures can be categorized into driving style classes such as aggressive or careful driving In our experiments, 122 different drivers have driven the same path on Nagoya city express highway with the same instrumented car GPS, speed, acceleration, steering wheel position and pedal operations have been recorded Clustering methods have been used to identify driving signatures

Journal ArticleDOI
01 Sep 2015-Medicine
TL;DR: The rate of adverse events at DxWBS was significantly higher in patients with adverse events seen at RRA than in those who did not, and strong neck accumulation of 131I is significant independent predictor of incomplete low-dose RRA.

01 Jan 2015
TL;DR: The driver's behavioural change about which the usage of automated driving system brings, focusing on the behaviours at changing lanes, is analyzed, and the correlation between risk sensitivity and gaze behaviour is shown.
Abstract: This paper analyses the driver's behavioural change about which the usage of automated driving system brings, focusing on the behaviours at changing lanes. Especially the relation between drivers' sensitivity to risk factors in surrounding environment and their gaze behaviour were analysed. We assumed, in this research, an automated driving of level 2 in the definition provided by NHTSA. At this level of automation, the drivers are required to monitor the driving situation and, when necessary, interrupt the system's automatic control and thereby recover the safety of the driving. We have conducted a simulation experiment with fifteen drivers, and compared their behaviours in two conditions; the conventional manual driving and the driving where automated driving system automatically changes lanes. By analysing collectively the risk factors at changing lanes, shift of each driver's sensitivity to risk at changing lanes were estimated. The experimental data shows the correlation between risk sensitivity and gaze behaviour.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: The correlation between similarity in speaker characteristics and information transmission quality using a map task dialogue corpus and a linear regression prediction model is investigated.
Abstract: We investigate the correlation between similarity in speaker characteristics and information transmission quality using a map task dialogue corpus. Similarity between the prosodic features and lexical styles of different speakers are analyzed, and most of these similarity measurements are shown to have significant correlations with information transmission quality as measured by a direction following task. We also combine these similarity measurements using a linear regression prediction model and assess information transmission quality. Prediction scores show a significant 0.37 correlation coefficient between the combined similarity measurement and information transmission quality scores.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: A method for identifying objects observed by drivers and tracking the driver's observation of signage while driving is investigated, and driver and signage location information are used to limit candidate signboards for reducing computational cost for image matching.
Abstract: We investigate a method for identifying objects observed by drivers. Here we focus on roadside signage as an example, and track the driver's observation of signage while driving. A gaze tracking system and a forward-directed video camera are used to determine the driver's region of interest (ROI). The driver's observation of signage is detected by tracking the driver's ROI using optical flow, and by matching the driver's ROI with template images of signboards in a signage database using local feature matching. Driver and signage location information are used to limit candidate signboards for reducing computational cost for image matching. We conduct an experiment to evaluate our method and achieve a 66.2% detection rate of drivers' signboard observation with a false positive rate of 6.6%.