scispace - formally typeset
Search or ask a question

Showing papers by "John Platt published in 2011"


Proceedings Article
23 Jun 2011
TL;DR: A novel discriminative training method that projects the raw term vectors into a common, low-dimensional vector space, which not only outperforms existing state-of-the-art approaches, but also achieves high accuracy at low dimensions and is thus more efficient.
Abstract: Traditional text similarity measures consider each term similar only to itself and do not model semantic relatedness of terms. We propose a novel discriminative training method that projects the raw term vectors into a common, low-dimensional vector space. Our approach operates by finding the optimal matrix to minimize the loss of the pre-selected similarity function (e.g., cosine) of the projected vectors, and is able to efficiently handle a large number of training examples in the high-dimensional space. Evaluated on two very different tasks, cross-lingual document retrieval and ad relevance measure, our method not only outperforms existing state-of-the-art approaches, but also achieves high accuracy at low dimensions and is thus more efficient.

298 citations


Patent
17 Jun 2011
TL;DR: A reliable automated malware classification approach with substantially low false positive rates is provided in this paper, where graph-based local and/or global file relationships are used to improve malware classification along with a feature selection algorithm.
Abstract: A reliable automated malware classification approach with substantially low false positive rates is provided. Graph-based local and/or global file relationships are used to improve malware classification along with a feature selection algorithm. File relationships such as containing, creating, copying, downloading, modifying, etc. are used to assign malware probabilities and simultaneously reduce the false positive and false negative rates on executable files.

83 citations


Patent
01 Jul 2011
TL;DR: In this paper, user actions on an electronic mail message received from a sender by one or more recipients may be monitored and statistics may be generated based on the user actions to provide a quality ranking of the electronic mail messages based on generated statistics.
Abstract: Electronic mail messages may be collaboratively ranked and filtered. User actions on an electronic mail message received from a sender by one or more recipients may be monitored. Statistics may be generated based on the user actions. The generated statistics may be utilized to provide a quality ranking of the electronic mail message based on the generated statistics.

12 citations


Patent
17 Jun 2011
TL;DR: A reliable automated malware classification approach with substantially low false positive rates is provided in this article, where graph-based local and/or global file relationships are used to improve malware classification along with a feature selection algorithm.
Abstract: A reliable automated malware classification approach with substantially low false positive rates is provided. Graph-based local and/or global file relationships are used to improve malware classification along with a feature selection algorithm. File relationships such as containing, creating, copying, downloading, modifying, etc. are used to assign malware probabilities and simultaneously reduce the false positive and false negative rates on executable files.

7 citations


Patent
22 Aug 2011
TL;DR: In this article, the authors present a method and system for identifying a configuration parameter of a "sick" computer system that is at fault for causing an undesired behavior, such as abnormal behavior.
Abstract: PROBLEM TO BE SOLVED: To provide a method and system for identifying a configuration parameter of a "sick" computer system that is at fault for causing an undesired behavior.SOLUTION: The troubleshooting system collects values of configuration parameters suspected to be used by a "sick" application when an undesired behavior is exhibited by a sick computer system. The troubleshooting system has in advance collected and stored sample values of suspect configuration parameters from multiple sample computer systems. The troubleshooting system compares the suspect values with the collected sample values to identify one or more configuration parameters that are likely at fault for causing the application to exhibit the undesired behavior.

Patent
09 Nov 2011
TL;DR: In this article, a cooperating evaluating and filtering method for an e-mail message is proposed, which is capable of performing cooperating evaluating, filtering, and monitoring a user motion of the email message received from an addresser for one or many addresses.
Abstract: The invention relates to a cooperating evaluating and filtering method for an e-mail message. The method is capable of performing cooperating evaluating and filtering of the e-mail message, and monitoring a user motion of the e-mail message received from an addresser for one or many addresses. Statistic data based on the user motion is generated. The generated statistic data can be used for providing quality evaluation of the e-mail message based on the generated statistic data.