scispace - formally typeset
Search or ask a question
Author

Gokhan Tur

Other affiliations: SRI International, Bilkent University, Apple Inc.  ...read more
Bio: Gokhan Tur is an academic researcher from Amazon.com. The author has contributed to research in topics: Spoken language & Natural language. The author has an hindex of 51, co-authored 247 publications receiving 10842 citations. Previous affiliations of Gokhan Tur include SRI International & Bilkent University.


Papers
More filters
Book
25 Apr 2011
TL;DR: This book discusses the development of Spoken Language Understanding in Commercial and Research Spoken Dialogue Systems, and the role of SLU in this process.
Abstract: List of Contributors. Forward. Preface. 1 Introduction (Gokhan Tur and Renato De Mori). 1.1 A Brief History of Spoken Language Understanding. 1.2 Organization of the Book. PART 1 SPOKEN LANGUAGE UNDERSTANDING FOR HUMAN/MACHINE INTERACTIONS. 2 History of Knowledge and Processes for Spoken Language Understanding (Renato De Mori). 2.1 Introduction. 2.2 Meaning Representation and Sentence Interpretation. 2.3 Knowledge Fragments and Semantic Composition. 2.4 Probabilistic Interpretation in SLU Systems. 2.5 Interpretation with Partial Syntactic Analysis. 2.6 Classification Models for Interpretation. 2.7 Advanced Methods and Resources for Semantic Modeling and Interpretation. 2.8 Recent Systems. 2.9 Conclusions. References. 3 Semantic Frame-based Spoken Language Understanding (Ye-Yi Wang, Li Deng and Alex Acero). 3.1 Background. 3.2 Knowledge-based Solutions. 3.3 Data-driven Approaches. 3.4 Summary. References. 4 Intent Determination and Spoken Utterance Classification (Gokhan Tur and Li Deng). 4.1 Background. 4.2 Task Description. 4.3 Technical Challenges. 4.4 Benchmark Data Sets. 4.5 Evaluation Metrics. 4.6 Technical Approaches. 4.7 Discussion and Conclusions. References. 5 Voice Search (Ye-Yi Wang, Dong Yu, Yun-Cheng Ju and Alex Acero). 5.1 Background. 5.2 Technology Review. 5.3 Summary. References. 6 Spoken Question Answering (Sophie Rosset, Olivier Galibert and Lori Lamel). 6.1 Introduction. 6.2 Specific Aspects of Handling Speech in QA Systems. 6.3 QA Evaluation Campaigns. 6.4 Question-answering Systems. 6.5 Projects Integrating Spoken Requests and Question Answering. 6.6 Conclusions. References. 7 SLU in Commercial and Research Spoken Dialogue Systems (David Suendermann and Roberto Pieraccini). 7.1 Why Spoken Dialogue Systems (Do Not) Have to Understand. 7.2 Approaches to SLU for Dialogue Systems. 7.3 From Call Flow to POMDP: How Dialogue Management Integrates with SLU. 7.4 Benchmark Projects and Data Sets. 7.5 Time is Money: The Relationship between SLU and Overall Dialogue System Performance. 7.6 Conclusion. References. 8 Active Learning (Dilek Hakkani-Tur and Giuseppe Riccardi). 8.1 Introduction. 8.2 Motivation. 8.3 Learning Architectures. 8.4 Active Learning Methods. 8.5 Combining Active Learning with Semi-supervised Learning. 8.6 Applications. 8.7 Evaluation of Active Learning Methods. 8.8 Discussion and Conclusions. References. PART 2 SPOKEN LANGUAGE UNDERSTANDING FOR HUMAN/HUMAN CONVERSATIONS. 9 Human/Human Conversation Understanding (Gokhan Tur and Dilek Hakkani-Tur). 9.1 Background. 9.2 Human/Human Conversation Understanding Tasks. 9.3 Dialogue Act Segmentation and Tagging. 9.4 Action Item and Decision Detection. 9.5 Addressee Detection and Co-reference Resolution. 9.6 Hot Spot Detection. 9.7 Subjectivity, Sentiment, and Opinion Detection. 9.8 Speaker Role Detection. 9.9 Modeling Dominance. 9.10 Argument Diagramming. 9.11 Discussion and Conclusions. References. 10 Named Entity Recognition (Frederic Bechet). 10.1 Task Description. 10.2 Challenges Using Speech Input. 10.3 Benchmark Data Sets, Applications. 10.4 Evaluation Metrics. 10.5 Main Approaches for Extracting NEs from Text. 10.6 Comparative Methods for NER from Speech. 10.7 New Trends in NER from Speech. 10.8 Conclusions. References. 11 Topic Segmentation (Matthew Purver). 11.1 Task Description. 11.2 Basic Approaches, and the Challenge of Speech. 11.3 Applications and Benchmark Datasets. 11.4 Evaluation Metrics. 11.5 Technical Approaches. 11.6 New Trends and Future Directions. References. 12 Topic Identification (Timothy J. Hazen). 12.1 Task Description. 12.2 Challenges Using Speech Input. 12.3 Applications and Benchmark Tasks. 12.4 Evaluation Metrics. 12.5 Technical Approaches. 12.6 New Trends and Future Directions. References. 13 Speech Summarization (Yang Liu and Dilek Hakkani-Tur). 13.1 Task Description. 13.2 Challenges when Using Speech Input. 13.3 Data Sets. 13.4 Evaluation Metrics. 13.5 General Approaches. 13.6 More Discussions on Speech versus Text Summarization. 13.7 Conclusions. References. 14 Speech Analytics (I. Dan Melamed and Mazin Gilbert) 14.1 Introduction. 14.2 System Architecture. 14.3 Speech Transcription. 14.4 Text Feature Extraction. 14.5 Acoustic Feature Extraction. 14.6 Relational Feature Extraction. 14.7 DBMS. 14.8 Media Server and Player. 14.9 Trend Analysis. 14.10 Alerting System. 14.11 Conclusion. References. 15 Speech Retrieval (Ciprian Chelba, Timothy J. Hazen, Bhuvana Ramabhadran and Murat Saraclar). 15.1 Task Description. 15.2 Applications. 15.3 Challenges Using Speech Input. 15.4 Evaluation Metrics. 15.5 Benchmark Data Sets. 15.6 Approaches. 15.7 New Trends. 15.8 Discussion and Conclusions. References. Index.

577 citations

Journal ArticleDOI
TL;DR: This paper implemented and compared several important RNN architectures, including Elman, Jordan, and hybrid variants, and implemented these networks with the publicly available Theano neural network toolkit and completed experiments on the well-known airline travel information system (ATIS) benchmark.
Abstract: Semantic slot filling is one of the most challenging problems in spoken language understanding (SLU). In this paper, we propose to use recurrent neural networks (RNNs) for this task, and present several novel architectures designed to efficiently model past and future temporal dependencies. Specifically, we implemented and compared several important RNN architectures, including Elman, Jordan, and hybrid variants. To facilitate reproducibility, we implemented these networks with the publicly available Theano neural network toolkit and completed experiments on the well-known airline travel information system (ATIS) benchmark. In addition, we compared the approaches on two custom SLU data sets from the entertainment and movies domains. Our results show that the RNN-based models outperform the conditional random field (CRF) baseline by 2% in absolute error reduction on the ATIS benchmark. We improve the state-of-the-art by 0.5% in the Entertainment domain, and 6.7% for the movies domain.

562 citations

Proceedings ArticleDOI
08 Sep 2016
TL;DR: Experimental results show the power of a holistic multi-domain, multi-task modeling approach to estimate complete semantic frames for all user utterances addressed to a conversational system over alternative methods based on single domain/task deep learning.
Abstract: Sequence-to-sequence deep learning has recently emerged as a new paradigm in supervised learning for spoken language understanding. However, most of the previous studies explored this framework for building single domain models for each task, such as slot filling or domain classification, comparing deep learning based approaches with conventional ones like conditional random fields. This paper proposes a holistic multi-domain, multi-task (i.e. slot filling, domain and intent detection) modeling approach to estimate complete semantic frames for all user utterances addressed to a conversational system, demonstrating the distinctive power of deep learning methods, namely bi-directional recurrent neural network (RNN) with long-short term memory (LSTM) cells (RNN-LSTM) to handle such complexity. The contributions of the presented work are three-fold: (i) we propose an RNN-LSTM architecture for joint modeling of slot filling, intent determination, and domain classification; (ii) we build a joint multi-domain model enabling multi-task deep learning where the data from each domain reinforces each other; (iii) we investigate alternative architectures for modeling lexical context in spoken language understanding. In addition to the simplicity of the single model framework, experimental results show the power of such an approach on Microsoft Cortana real user data over alternative methods based on single domain/task deep learning.

464 citations

Journal ArticleDOI
TL;DR: This work combines prosodic cues with word-based approaches, and evaluates performance on two speech corpora, Broadcast News and Switchboard, finding that the prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events.

464 citations

Journal ArticleDOI
TL;DR: The CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, topic identification and segmentation, question-answer pair identification, action item recognition, decision extraction, and summarization are presented.
Abstract: The CALO Meeting Assistant (MA) provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper presents the CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, topic identification and segmentation, question-answer pair identification, action item recognition, decision extraction, and summarization.

295 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

01 Jan 2009
TL;DR: This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.
Abstract: The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer training labels if it is allowed to choose the data from which it learns. An active learner may pose queries, usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant or easily obtained, but labels are difficult, time-consuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for successful active learning, a summary of problem setting variants and practical issues, and a discussion of related topics in machine learning research are also presented.

5,227 citations

Proceedings Article
01 Jan 2002
TL;DR: The functionality of the SRILM toolkit is summarized and its design and implementation is discussed, highlighting ease of rapid prototyping, reusability, and combinability of tools.
Abstract: SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation and evaluation of a variety of language model types based on N-gram statistics, as well as several related tasks, such as statistical tagging and manipulation of N-best lists and word lattices. This paper summarizes the functionality of the toolkit and discusses its design and implementation, highlighting ease of rapid prototyping, reusability, and combinability of tools.

4,904 citations

Book
Li Deng1, Dong Yu1
12 Jun 2014
TL;DR: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.
Abstract: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria in mind: (1) expertise or knowledge of the authors; (2) the application areas that have already been transformed by the successful use of deep learning technology, such as speech recognition and computer vision; and (3) the application areas that have the potential to be impacted significantly by deep learning and that have been experiencing research growth, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

2,817 citations