scispace - formally typeset
Search or ask a question

Showing papers by "Patrick Haffner published in 2007"


Proceedings Article
06 Aug 2007
TL;DR: It is demonstrated that the history and the structure of the IP addresses can reduce the adverse impact of mail server overload, by increasing the number of legitimate e-mails accepted by a factor of 3.
Abstract: E-mail has become indispensable in today's networked society. However, the huge and ever-growing volume of spam has become a serious threat to this important communication medium. It not only affects e-mail recipients, but also causes a significant overload to mail servers which handle the e-mail transmission. We perform an extensive analysis of IP addresses and IP aggregates given by network-aware clusters in order to investigate properties that can distinguish the bulk of the legitimate mail and spam. Our analysis indicates that the bulk of the legitimate mail comes from long-lived IP addresses. We also find that the bulk of the spam comes from network clusters that are relatively long-lived. Our analysis suggests that network-aware clusters may provide a good aggregation scheme for exploiting the history and structure of IP addresses. We then consider the implications of this analysis for prioritizing legitimate mail. We focus on the situation when mail server is overloaded, and the goal is to maximize the legitimate mail that it accepts. We demonstrate that the history and the structure of the IP addresses can reduce the adverse impact of mail server overload, by increasing the number of legitimate e-mails accepted by a factor of 3.

64 citations


Proceedings Article
01 Jun 2007
TL;DR: This paper presents a novel approach to lexical selection where the target words are associated with the entire source sentence (global) without the need to compute local associations.
Abstract: Machine translation of a source language sentence involves selecting appropriate target language words and ordering the selected words to form a well-formed target language sentence. Most of the previous work on statistical machine translation relies on (local) associations of target words/phrases with source words/phrases for lexical selection. In contrast, in this paper, we present a novel approach to lexical selection where the target words are associated with the entire source sentence (global) without the need to compute local associations. Further, we present a technique for reconstructing the target language sentence from the selected words. We compare the results of this approach against those obtained from a finite-state based statistical machine translation system which relies on local lexical associations.

61 citations


Zhu Liu, Eric Zavesky, David Crawford Gibbon, Behzad Shahraray, Patrick Haffner1 
01 Jan 2007
TL;DR: In this paper, a multimodal rushes summarization method that relies on both face and speech information was proposed to show the main objects and events in the raw material with least redundancy while maximizing the usability.
Abstract: ATT more training data that includes 2004, 2005, and 2006 SBD data; no SVM boundary adjustment; training SVM with high generalization capability (e.g., a smaller value of C). As a pilot task, rushes summarization aims to show the main objects and events in the raw material with least redundancy while maximizing the usability. We proposed a multimodal rushes summarization method that relies on both face and speech information. Evaluation results show that the new SBD system is highly effective and the human centric rushes summarization approach is concise and easy to understand.

43 citations


Proceedings ArticleDOI
02 Jul 2007
TL;DR: The proposed shot boundary determination (SBD) algorithm contains a set of finite state machine (FSM) based detectors for pure cut, fast dissolve, fade in, fade out, dissolve, and wipe.
Abstract: The proposed shot boundary determination (SBD) algorithm contains a set of finite state machine (FSM) based detectors for pure cut, fast dissolve, fade in, fade out, dissolve, and wipe. Support vector machines (SVM) are applied to the cut and dissolve detectors to further boost performance. Our SBD system was highly effective when evaluated in TRECVID 2006 (TREC video retrieval evaluation) and its performance was ranked highest overall.

40 citations


Patent
13 Apr 2007
TL;DR: In this paper, a translation module including at least one language pair module for translating a source language to a target language may be made available for use by a subscriber in exchange for displaying commercial messages to the subscriber.
Abstract: A method, a system and a machine-readable medium are provided for an on demand translation service. A translation module including at least one language pair module for translating a source language to a target language may be made available for use by a subscriber. The subscriber may be charged a fee for use of the requested on demand translation service or may be provided use of the on demand translation service for free in exchange for displaying commercial messages to the subscriber. A video signal may be received including information in the source language, which may be obtained as text from the video signal and may be translated from the source language to the target language by use of the translation module. Translated information, based on the translated text, may be added into the received video signal. The video signal including the translated information in the target language may be sent to a display device.

30 citations


Patent
11 Dec 2007
TL;DR: In this paper, discriminative training is used to develop models for each target vocabulary word based on a set of features of the corresponding source word in training sentences, with at least one of those features relating to the context of the source word.
Abstract: Classification of sequences, such as the translation of natural language sentences, is carried out using an independence assumption. The independence assumption is an assumption that the probability of a correct translation of a source sentence word into a particular target sentence word is independent of the translation of other words in the sentence. Although this assumption is not a correct one, a high level of word translation accuracy is nonetheless achieved. In particular, discriminative training is used to develop models for each target vocabulary word based on a set of features of the corresponding source word in training sentences, with at least one of those features relating to the context of the source word. Each model comprises a weight vector for the corresponding target vocabulary word. The weights comprising the vectors are associated with respective ones of the features; each weight is a measure of the extent to which the presence of that feature for the source word makes it more probable that the target word in question is the correct one.

27 citations


Patent
11 Dec 2007
TL;DR: In this paper, discriminative training is used to develop models for each target vocabulary word based on a set of features of the corresponding source word in training sentences, with at least one of those features relating to the context of the source word.
Abstract: Classification of sequences, such as the translation of natural language sentences, is carried out using an independence assumption. The independence assumption is an assumption that the probability of a correct translation of a source sentence word into a particular target sentence word is independent of the translation of other words in the sentence. Although this assumption is not a correct one, a high level of word translation accuracy is nonetheless achieved. In particular, discriminative training is used to develop models for each target vocabulary word based on a set of features of the corresponding source word in training sentences, with at least one of those features relating to the context of the source word. Each model comprises a weight vector for the corresponding target vocabulary word. The weights comprising the vectors are associated with respective ones of the features; each weight is a measure of the extent to which the presence of that feature for the source word makes it more probable that the target word in question is the correct one.

21 citations


Patent
24 Oct 2007
TL;DR: In this article, email server management methods and systems that protect the ability of the infrastructure of the email server to process legitimate emails in the presence of large spam volumes are discussed. But, the authors do not discuss how to identify the priority classes of emails.
Abstract: Disclosed are email server management methods and systems that protect the ability of the infrastructure of the email server to process legitimate emails in the presence of large spam volumes. During a period of server overload, priority classes of emails are identified, and emails are processed according to priority. In a typical embodiment, the server sends emails sequentially in a queue, and the queue has a limited capacity. When the server nears or reaches that capacity, the emails in the queue are analyzed to identify priority emails, and the priority emails are moved to the head of the queue.

21 citations


01 Jan 2007
TL;DR: A multimodal rushes summarization method that relies on both face and speech information is proposed that is concise and easy to understand and shows that the new SBD system was enhanced for robustness and efficiency and is highly effective.
Abstract: AT&T participated in two tasks at TRECVID 2007: shot boundary detection (SBD) and rushes summarization. The SBD system developed for TRECVID 2006 was enhanced for robustness and efficiency. New visual features are extracted for cut, dissolve, and fast dissolve detectors, and SVM based verification method is used to boost the accuracy. The speed is improved by a more streamlined processing with on-the-fly result fusion. We submitted 10 runs for SBD evaluation task. The best result (TT05) was achieved with the following configuration: SVM based verification method; more training data that includes 2004, 2005, and 2006 SBD data; no SVM boundary adjustment; training SVM with high generalization capability (e.g., a smaller value of C). As a pilot task, rushes summarization aims to show the main objects and events in the raw material with least redundancy while maximizing the usability. We proposed a multimodal rushes summarization method that relies on both face and speech information. Evaluation results show that the new SBD system is highly effective and the human centric rushes summarization approach is concise and easy to understand.

8 citations