Home
/
Authors
/
Gokhan Tur

Author

Gokhan Tur

Other affiliations: SRI International, Bilkent University, Apple Inc. ...read more

Bio: Gokhan Tur is an academic researcher from Amazon.com. The author has contributed to research in topics: Spoken language & Natural language. The author has an hindex of 51, co-authored 247 publications receiving 10842 citations. Previous affiliations of Gokhan Tur include SRI International & Bilkent University.

Topics: Spoken language, Natural language, Language model, Parsing, Utterance ...read more

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996

Papers

PDF

Open Access

More filters

Book•

Spoken Language Understanding: Systems for Extracting Semantic Information from Speech

[...]

Gokhan Tur, Renato De Mori

25 Apr 2011

TL;DR: This book discusses the development of Spoken Language Understanding in Commercial and Research Spoken Dialogue Systems, and the role of SLU in this process.

...read moreread less

Abstract: List of Contributors. Forward. Preface. 1 Introduction (Gokhan Tur and Renato De Mori). 1.1 A Brief History of Spoken Language Understanding. 1.2 Organization of the Book. PART 1 SPOKEN LANGUAGE UNDERSTANDING FOR HUMAN/MACHINE INTERACTIONS. 2 History of Knowledge and Processes for Spoken Language Understanding (Renato De Mori). 2.1 Introduction. 2.2 Meaning Representation and Sentence Interpretation. 2.3 Knowledge Fragments and Semantic Composition. 2.4 Probabilistic Interpretation in SLU Systems. 2.5 Interpretation with Partial Syntactic Analysis. 2.6 Classification Models for Interpretation. 2.7 Advanced Methods and Resources for Semantic Modeling and Interpretation. 2.8 Recent Systems. 2.9 Conclusions. References. 3 Semantic Frame-based Spoken Language Understanding (Ye-Yi Wang, Li Deng and Alex Acero). 3.1 Background. 3.2 Knowledge-based Solutions. 3.3 Data-driven Approaches. 3.4 Summary. References. 4 Intent Determination and Spoken Utterance Classification (Gokhan Tur and Li Deng). 4.1 Background. 4.2 Task Description. 4.3 Technical Challenges. 4.4 Benchmark Data Sets. 4.5 Evaluation Metrics. 4.6 Technical Approaches. 4.7 Discussion and Conclusions. References. 5 Voice Search (Ye-Yi Wang, Dong Yu, Yun-Cheng Ju and Alex Acero). 5.1 Background. 5.2 Technology Review. 5.3 Summary. References. 6 Spoken Question Answering (Sophie Rosset, Olivier Galibert and Lori Lamel). 6.1 Introduction. 6.2 Specific Aspects of Handling Speech in QA Systems. 6.3 QA Evaluation Campaigns. 6.4 Question-answering Systems. 6.5 Projects Integrating Spoken Requests and Question Answering. 6.6 Conclusions. References. 7 SLU in Commercial and Research Spoken Dialogue Systems (David Suendermann and Roberto Pieraccini). 7.1 Why Spoken Dialogue Systems (Do Not) Have to Understand. 7.2 Approaches to SLU for Dialogue Systems. 7.3 From Call Flow to POMDP: How Dialogue Management Integrates with SLU. 7.4 Benchmark Projects and Data Sets. 7.5 Time is Money: The Relationship between SLU and Overall Dialogue System Performance. 7.6 Conclusion. References. 8 Active Learning (Dilek Hakkani-Tur and Giuseppe Riccardi). 8.1 Introduction. 8.2 Motivation. 8.3 Learning Architectures. 8.4 Active Learning Methods. 8.5 Combining Active Learning with Semi-supervised Learning. 8.6 Applications. 8.7 Evaluation of Active Learning Methods. 8.8 Discussion and Conclusions. References. PART 2 SPOKEN LANGUAGE UNDERSTANDING FOR HUMAN/HUMAN CONVERSATIONS. 9 Human/Human Conversation Understanding (Gokhan Tur and Dilek Hakkani-Tur). 9.1 Background. 9.2 Human/Human Conversation Understanding Tasks. 9.3 Dialogue Act Segmentation and Tagging. 9.4 Action Item and Decision Detection. 9.5 Addressee Detection and Co-reference Resolution. 9.6 Hot Spot Detection. 9.7 Subjectivity, Sentiment, and Opinion Detection. 9.8 Speaker Role Detection. 9.9 Modeling Dominance. 9.10 Argument Diagramming. 9.11 Discussion and Conclusions. References. 10 Named Entity Recognition (Frederic Bechet). 10.1 Task Description. 10.2 Challenges Using Speech Input. 10.3 Benchmark Data Sets, Applications. 10.4 Evaluation Metrics. 10.5 Main Approaches for Extracting NEs from Text. 10.6 Comparative Methods for NER from Speech. 10.7 New Trends in NER from Speech. 10.8 Conclusions. References. 11 Topic Segmentation (Matthew Purver). 11.1 Task Description. 11.2 Basic Approaches, and the Challenge of Speech. 11.3 Applications and Benchmark Datasets. 11.4 Evaluation Metrics. 11.5 Technical Approaches. 11.6 New Trends and Future Directions. References. 12 Topic Identification (Timothy J. Hazen). 12.1 Task Description. 12.2 Challenges Using Speech Input. 12.3 Applications and Benchmark Tasks. 12.4 Evaluation Metrics. 12.5 Technical Approaches. 12.6 New Trends and Future Directions. References. 13 Speech Summarization (Yang Liu and Dilek Hakkani-Tur). 13.1 Task Description. 13.2 Challenges when Using Speech Input. 13.3 Data Sets. 13.4 Evaluation Metrics. 13.5 General Approaches. 13.6 More Discussions on Speech versus Text Summarization. 13.7 Conclusions. References. 14 Speech Analytics (I. Dan Melamed and Mazin Gilbert) 14.1 Introduction. 14.2 System Architecture. 14.3 Speech Transcription. 14.4 Text Feature Extraction. 14.5 Acoustic Feature Extraction. 14.6 Relational Feature Extraction. 14.7 DBMS. 14.8 Media Server and Player. 14.9 Trend Analysis. 14.10 Alerting System. 14.11 Conclusion. References. 15 Speech Retrieval (Ciprian Chelba, Timothy J. Hazen, Bhuvana Ramabhadran and Murat Saraclar). 15.1 Task Description. 15.2 Applications. 15.3 Challenges Using Speech Input. 15.4 Evaluation Metrics. 15.5 Benchmark Data Sets. 15.6 Approaches. 15.7 New Trends. 15.8 Discussion and Conclusions. References. Index.

...read moreread less

577 citations

Journal Article•DOI•

Using recurrent neural networks for slot filling in spoken language understanding

[...]

Grégoire Mesnil¹, Yann N. Dauphin¹, Kaisheng Yao², Yoshua Bengio¹, Li Deng², Dilek Hakkani-Tur², Xiaodong He², Larry Heck³, Gokhan Tur⁴, Dong Yu², Geoffrey Zweig² - Show less +7 more•Institutions (4)

Université de Montréal¹, Microsoft², Google³, Apple Inc.⁴

01 Mar 2015-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper implemented and compared several important RNN architectures, including Elman, Jordan, and hybrid variants, and implemented these networks with the publicly available Theano neural network toolkit and completed experiments on the well-known airline travel information system (ATIS) benchmark.

...read moreread less

Abstract: Semantic slot filling is one of the most challenging problems in spoken language understanding (SLU). In this paper, we propose to use recurrent neural networks (RNNs) for this task, and present several novel architectures designed to efficiently model past and future temporal dependencies. Specifically, we implemented and compared several important RNN architectures, including Elman, Jordan, and hybrid variants. To facilitate reproducibility, we implemented these networks with the publicly available Theano neural network toolkit and completed experiments on the well-known airline travel information system (ATIS) benchmark. In addition, we compared the approaches on two custom SLU data sets from the entertainment and movies domains. Our results show that the RNN-based models outperform the conditional random field (CRF) baseline by 2% in absolute error reduction on the ATIS benchmark. We improve the state-of-the-art by 0.5% in the Entertainment domain, and 6.7% for the movies domain.

...read moreread less

562 citations

Proceedings Article•DOI•

Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM.

[...]

Dilek Hakkani-Tur¹, Gokhan Tur¹, Asli Celikyilmaz¹, Yun-Nung Chen¹, Jianfeng Gao¹, Li Deng¹, Ye-Yi Wang¹ - Show less +3 more•Institutions (1)

Microsoft¹

08 Sep 2016

TL;DR: Experimental results show the power of a holistic multi-domain, multi-task modeling approach to estimate complete semantic frames for all user utterances addressed to a conversational system over alternative methods based on single domain/task deep learning.

...read moreread less

Abstract: Sequence-to-sequence deep learning has recently emerged as a new paradigm in supervised learning for spoken language understanding. However, most of the previous studies explored this framework for building single domain models for each task, such as slot filling or domain classification, comparing deep learning based approaches with conventional ones like conditional random fields. This paper proposes a holistic multi-domain, multi-task (i.e. slot filling, domain and intent detection) modeling approach to estimate complete semantic frames for all user utterances addressed to a conversational system, demonstrating the distinctive power of deep learning methods, namely bi-directional recurrent neural network (RNN) with long-short term memory (LSTM) cells (RNN-LSTM) to handle such complexity. The contributions of the presented work are three-fold: (i) we propose an RNN-LSTM architecture for joint modeling of slot filling, intent determination, and domain classification; (ii) we build a joint multi-domain model enabling multi-task deep learning where the data from each domain reinforces each other; (iii) we investigate alternative architectures for modeling lexical context in spoken language understanding. In addition to the simplicity of the single model framework, experimental results show the power of such an approach on Microsoft Cortana real user data over alternative methods based on single domain/task deep learning.

...read moreread less

464 citations

Journal Article•DOI•

Prosody-based automatic segmentation of speech into sentences and topics

[...]

Elizabeth Shriberg¹, Andreas Stolcke¹, Dilek Hakkani-Tur², Gokhan Tur²•Institutions (2)

SRI International¹, Bilkent University²

01 Sep 2000-Speech Communication

TL;DR: This work combines prosodic cues with word-based approaches, and evaluates performance on two speech corpora, Broadcast News and Switchboard, finding that the prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events.

...read moreread less

464 citations

Journal Article•DOI•

The CALO Meeting Assistant System

[...]

Gokhan Tur, Andreas Stolcke, L. Voss, Stanley Peters¹, Dilek Hakkani-Tur, John Dowding¹, Benoit Favre², Raquel Fernández¹, Matthew Frampton¹, Michael Frandsen, C. Frederickson, Martin Graciarena, D. Kintzing, K. Leveque, S. Mason, John Niekrasz¹, Matthew Purver¹, Korbinian Riedhammer², Elizabeth Shriberg, Jing Tien, Dimitra Vergyri, Fan Yang - Show less +18 more•Institutions (2)

Stanford University¹, Institute of Company Secretaries of India²

01 Aug 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, topic identification and segmentation, question-answer pair identification, action item recognition, decision extraction, and summarization are presented.

...read moreread less

Abstract: The CALO Meeting Assistant (MA) provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper presents the CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, topic identification and segmentation, question-answer pair identification, action item recognition, decision extraction, and summarization.

...read moreread less

295 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

Collapse

Cited by

PDF

Open Access

More filters

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Active Learning Literature Survey

[...]

Burr Settles

01 Jan 2009

TL;DR: This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.

...read moreread less

Abstract: The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer training labels if it is allowed to choose the data from which it learns. An active learner may pose queries, usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant or easily obtained, but labels are difficult, time-consuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for successful active learning, a summary of problem setting variants and practical issues, and a discussion of related topics in machine learning research are also presented.

...read moreread less

5,227 citations

Proceedings Article•

SRILM – An Extensible Language Modeling Toolkit

[...]

Andreas Stolcke

01 Jan 2002

TL;DR: The functionality of the SRILM toolkit is summarized and its design and implementation is discussed, highlighting ease of rapid prototyping, reusability, and combinability of tools.

...read moreread less

Abstract: SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation and evaluation of a variety of language model types based on N-gram statistics, as well as several related tasks, such as statistical tagging and manipulation of N-best lists and word lattices. This paper summarizes the functionality of the toolkit and discusses its design and implementation, highlighting ease of rapid prototyping, reusability, and combinability of tools.

...read moreread less

4,904 citations

Book•

Deep Learning: Methods and Applications

[...]

Li Deng¹, Dong Yu¹•Institutions (1)

Microsoft¹

12 Jun 2014

TL;DR: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

...read moreread less

Abstract: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria in mind: (1) expertise or knowledge of the authors; (2) the application areas that have already been transformed by the successful use of deep learning technology, such as speech recognition and computer vision; and (3) the application areas that have the potential to be impacted significantly by deep learning and that have been experiencing research growth, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

...read moreread less

2,817 citations

Journal Article•

Measuring Health: A Guide to Rating Scales and Questionnaires

[...]

Kate L Hyett, Terry Mills

01 Aug 2006-Australian Health Review

2,428 citations