scispace - formally typeset
Search or ask a question

Showing papers by "Patrick Haffner published in 2011"


Patent
05 May 2011
TL;DR: In this article, a system is configured to verify a speaker, generates a text challenge that is unique to the request, and prompts the speaker to utter the text challenge, and then the system records a dynamic image feature of the speaker as the speaker utters the challenge.
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for performing speaker verification. A system configured to practice the method receives a request to verify a speaker, generates a text challenge that is unique to the request, and, in response to the request, prompts the speaker to utter the text challenge. Then the system records a dynamic image feature of the speaker as the speaker utters the text challenge, and performs speaker verification based on the dynamic image feature and the text challenge. Recording the dynamic image feature of the speaker can include recording video of the speaker while speaking the text challenge. The dynamic feature can include a movement pattern of head, lips, mouth, eyes, and/or eyebrows of the speaker. The dynamic image feature can relate to phonetic content of the speaker speaking the challenge, speech prosody, and the speaker's facial expression responding to content of the challenge.

71 citations


Patent
28 Mar 2011
TL;DR: In this paper, the authors present a method for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition model when a speech recognizer does not have access to a speech recognition system for that domain of the interest and when available domain specific data is below a minimum desired threshold.
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data.

38 citations


Proceedings ArticleDOI
10 Apr 2011
TL;DR: It is shown that most calls are due to customer-side factors and can be well captured by the model, and it is demonstrated that location-specific deviations from the model provide a good indicator of potential network-side issues.
Abstract: Effective management of large-scale cellular data networks is critical to meet customer demands and expectations. Customer calls for technical support provide direct indication as to the problems customers encounter. In this paper, we study the customer tickets - free-text recordings and classifications by customer support agents - collected at a large cellular network provider, with two inter-related goals: i) to characterize and understand the major factors which lead to customers to call and seek support; and ii) to utilize such customer tickets to help identify potential network problems. For this purpose, we develop a novel statistical approach to model customer call rates which account for customer-side factors (e.g., user tenure and handset types) and geo-locations. We show that most calls are due to customer-side factors and can be well captured by the model. Furthermore, we also demonstrate that location-specific deviations from the model provide a good indicator of potential network-side issues.

15 citations


Patent
25 Oct 2011
TL;DR: In this article, a temporal pooling scheme for combining frame and segment level processing for phonetic classification is presented. But the method is not suitable for non-transitory computer-readable storage media.
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification A frame processor unit receives an input and extracts the time-dependent features from the input A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies Next, a plurality of segmental classification units generates scores for the feature vectors Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations Based on the scores, the plurality of segmental classification units selects a class label and returns a result

9 citations


Proceedings ArticleDOI
19 Aug 2011
TL;DR: Analysis of the Location-based Reporting Tool shows that, due to the light-weight design, LRT encourages customers to report more problems from anywhere and at any time, and render LRT feedback a valuable information source for early detection of emerging network problems.
Abstract: In this paper, we study the Location-based Reporting Tool (LRT), a smartphone application for collecting large-scale feedback from mobile customers. Using one-year data collected from one of the largest cellular networks in the US, we compare LRT feedback to the traditional customer feedback channel -- customer care tickets. Our analysis shows that, due to the light-weight design, LRT encourages customers to report more problems from anywhere and at any time. In addition, we find LRT users access network services more intensively than other mobile users, and hence are more likely to experience and are more sensitive to network problems. All these render LRT feedback a valuable information source for early detection of emerging network problems.

6 citations


Proceedings Article
01 Jan 2011
TL;DR: This model combines a frame level transformation of the acoustic signal with a segment level phone classification, and the study of new temporal pooling strategies that interface these two levels, de-termining how frame scores are converted into segment scores.
Abstract: We propose a simple, yet novel, multi-layer model for the problem of phonetic classification. Our model combines a frame level transformation of the acoustic signal with a segment level phone classification. Our key contribution is the study of new temporal pooling strategies that interface these two levels, determining how frame scores are converted into segment scores. On the TIMIT benchmark, we match the best performance obtained using a single classifier. Diversity in pooling strategies is further used to generate candidate classifiers with complementary performance characteristics, which perform even better as an ensemble. Without the use of any phonetic knowledge, our ensemble model achieves a 16.96% phone classification error. While our data-driven approach is exhaustive, the combinatorial inflation is limited to the smaller segmental half of the system.