Showing papers by "Patrick Haffner published in 2011"

PDF

Open Access

Patent•

System and method for dynamic facial features for speaker recognition

[...]

Ann K. Syrdal¹, Sumit Chopra², Patrick Haffner¹, Taniya Mishra¹, Ilija Zeljkovic¹, Eric Zavesky¹ - Show less +2 more•Institutions (2)

AT&T¹, Nuance Communications²

05 May 2011

TL;DR: In this article, a system is configured to verify a speaker, generates a text challenge that is unique to the request, and prompts the speaker to utter the text challenge, and then the system records a dynamic image feature of the speaker as the speaker utters the challenge.

...read moreread less

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for performing speaker verification. A system configured to practice the method receives a request to verify a speaker, generates a text challenge that is unique to the request, and, in response to the request, prompts the speaker to utter the text challenge. Then the system records a dynamic image feature of the speaker as the speaker utters the text challenge, and performs speaker verification based on the dynamic image feature and the text challenge. Recording the dynamic image feature of the speaker can include recording video of the speaker while speaking the text challenge. The dynamic feature can include a movement pattern of head, lips, mouth, eyes, and/or eyebrows of the speaker. The dynamic image feature can relate to phonetic content of the speaker speaking the challenge, speech prosody, and the speaker's facial expression responding to content of the challenge.

...read moreread less

71 citations

Patent•

System and method for rapid customization of speech recognition models

[...]

Srinivas Bangalore¹, Robert M. Bell¹, Diamantino Caseiro¹, Mazin E. Gilbert¹, Patrick Haffner¹ - Show less +1 more•Institutions (1)

Nuance Communications¹

28 Mar 2011

TL;DR: In this paper, the authors present a method for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition model when a speech recognizer does not have access to a speech recognition system for that domain of the interest and when available domain specific data is below a minimum desired threshold.

...read moreread less

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data.

...read moreread less

38 citations

Proceedings Article•DOI•

Making sense of customer tickets in cellular networks

[...]

Yu Jin¹, Nick Duffield², Alexandre Gerber², Patrick Haffner², Wen-Ling Hsu², Guy Jacobson², Subhabrata Sen², Shobha Venkataraman², Zhi-Li Zhang¹ - Show less +5 more•Institutions (2)

University of Minnesota¹, AT&T Labs²

10 Apr 2011

TL;DR: It is shown that most calls are due to customer-side factors and can be well captured by the model, and it is demonstrated that location-specific deviations from the model provide a good indicator of potential network-side issues.

...read moreread less

Abstract: Effective management of large-scale cellular data networks is critical to meet customer demands and expectations. Customer calls for technical support provide direct indication as to the problems customers encounter. In this paper, we study the customer tickets - free-text recordings and classifications by customer support agents - collected at a large cellular network provider, with two inter-related goals: i) to characterize and understand the major factors which lead to customers to call and seek support; and ii) to utilize such customer tickets to help identify potential network problems. For this purpose, we develop a novel statistical approach to model customer call rates which account for customer-side factors (e.g., user tenure and handset types) and geo-locations. We show that most calls are due to customer-side factors and can be well captured by the model. Furthermore, we also demonstrate that location-specific deviations from the model provide a good indicator of potential network-side issues.

...read moreread less

15 citations

Patent•

System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification

[...]

Sumit Chopra¹, Dimitrios Dimitriadis¹, Patrick Haffner¹•Institutions (1)

AT&T¹

25 Oct 2011

TL;DR: In this article, a temporal pooling scheme for combining frame and segment level processing for phonetic classification is presented. But the method is not suitable for non-transitory computer-readable storage media.

...read moreread less

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification A frame processor unit receives an input and extracts the time-dependent features from the input A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies Next, a plurality of segmental classification units generates scores for the feature vectors Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations Based on the scores, the plurality of segmental classification units selects a class label and returns a result

...read moreread less

9 citations

Proceedings Article•DOI•

Large-scale app-based reporting of customer problems in cellular networks: potential and limitations

[...]

Yu Jin¹, Nick Duffield¹, Alexandre Gerber¹, Patrick Haffner¹, Wen-Ling Hsu¹, Guy Jacobson¹, Subhabrata Sen¹, Shobha Venkataraman¹, Zhi-Li Zhang² - Show less +5 more•Institutions (2)

AT&T Labs¹, University of Minnesota²

19 Aug 2011

TL;DR: Analysis of the Location-based Reporting Tool shows that, due to the light-weight design, LRT encourages customers to report more problems from anywhere and at any time, and render LRT feedback a valuable information source for early detection of emerging network problems.

...read moreread less

Abstract: In this paper, we study the Location-based Reporting Tool (LRT), a smartphone application for collecting large-scale feedback from mobile customers. Using one-year data collected from one of the largest cellular networks in the US, we compare LRT feedback to the traditional customer feedback channel -- customer care tickets. Our analysis shows that, due to the light-weight design, LRT encourages customers to report more problems from anywhere and at any time. In addition, we find LRT users access network services more intensively than other mobile users, and hence are more likely to experience and are more sensitive to network problems. All these render LRT feedback a valuable information source for early detection of emerging network problems.

...read moreread less

6 citations

Proceedings Article•

Combining Frame and Segment Level Processing via Temporal Pooling for Phonetic Classification.

[...]

Sumit Chopra¹, Patrick Haffner¹, Dimitrios Dimitriadis²•Institutions (2)

AT&T Labs¹, AT&T²

01 Jan 2011

TL;DR: This model combines a frame level transformation of the acoustic signal with a segment level phone classiﬁcation, and the study of new temporal pooling strategies that interface these two levels, de-termining how frame scores are converted into segment scores.

...read moreread less

Abstract: We propose a simple, yet novel, multi-layer model for the problem of phonetic classification. Our model combines a frame level transformation of the acoustic signal with a segment level phone classification. Our key contribution is the study of new temporal pooling strategies that interface these two levels, determining how frame scores are converted into segment scores. On the TIMIT benchmark, we match the best performance obtained using a single classifier. Diversity in pooling strategies is further used to generate candidate classifiers with complementary performance characteristics, which perform even better as an ensemble. Without the use of any phonetic knowledge, our ensemble model achieves a 16.96% phone classification error. While our data-driven approach is exhaustive, the combinatorial inflation is limited to the smaller segmental half of the system.

...read moreread less