Top 5 papers published by John Platt from Microsoft in 2009

Patent•

Malware detection using multiple classifiers

[...]

Jack W. Stokes¹, John Platt¹, Jonathan M. Keller¹, Joseph L. Faulhaber¹, Anil Francis Thomas¹, Adrian M. Marinescu¹, Marius Gheorghe Gheorghescu¹, George C. Chicioreanu¹ - Show less +4 more•Institutions (1)

Microsoft¹

23 Jan 2009

TL;DR: In this article, a method of identifying a malware file using multiple classifiers is disclosed, which includes receiving a file at a client computer and applying a set of metadata classifier weights are applied to the static metadata to generate a first classifier output.

...read moreread less

Abstract: A method of identifying a malware file using multiple classifiers is disclosed. The method includes receiving a file at a client computer. The file includes static metadata. A set of metadata classifier weights are applied to the static metadata to generate a first classifier output. A dynamic classifier is initiated to evaluate the file and to generate a second classifier output. The method includes automatically identifying the file as potential malware based on at least the first classifier output and the second classifier output.

...read moreread less

75 citations

Patent•

Trans-lingual representation of text documents

[...]

John Platt¹, Ilya Sutskever¹•Institutions (1)

Microsoft¹

19 Jun 2009

TL;DR: In this article, a method of creating translingual text representations takes in documents in a first language and in a second language and creates a matrix using the words in the documents to represent which words are present in which language.

...read moreread less

Abstract: A method of creating translingual text representations takes in documents in a first language and in a second language and creates a matrix using the words in the documents to represent which words are present in which language. An algorithm is applied to each matrix such that like documents are placed close to each other and unlike documents are moved far from each other.

...read moreread less

13 citations

Classification of Automated Web Traffic

[...]

Greg Buehrer, Jack W. Stokes, Kumar Chellapilla, John Platt

01 Jan 2009

TL;DR: This paper investigates automated traffic in the query stream of a large search engine provider, and develops many different features that distinguish between queries generated by people searching for information, and those generated by automated processes.

...read moreread less

Abstract: As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a web site’s rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic (sometimes referred to as bot traffic) in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by automated means. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. Using the these detection features, we next classify the query stream using multiple binary classifiers. In addition, a multiclass classifier is then developed to identify subclasses of both normal and automated traffic. An active learning algorithm is used to suggest which user sessions to label to improve the accuracy of the multiclass classifier, while also seeking to discover new classes of automated traffic. Performance analysis are then provided. Finally, the multiclass classifier is used to predict the subclass distribution for the search query stream.

...read moreread less

11 citations

Book Chapter•DOI•

Classification of Automated Search Traffic

[...]

Gregory Buehrer¹, Jack W. Stokes¹, Kumar Chellapilla¹, John Platt¹•Institutions (1)

Microsoft¹

01 Jan 2009

TL;DR: This paper investigates automated traffic in the query stream of a large search engine provider, and develops many different features that distinguish between queries generated by people searching for information, and those generated by automated processes.

...read moreread less

Abstract: As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a web site’s rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic (sometimes referred to as bot traffic) in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by automated means. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. Using the these detection features, we next classify the query stream using multiple binary classifiers. In addition, a multiclass classifier is then developed to identify subclasses of both normal and automated traffic. An active learning algorithm is used to suggest which user sessions to label to improve the accuracy of the multiclass classifier, while also seeking to discover new classes of automated traffic. Performance analysis are then provided. Finally, the multiclass classifier is used to predict the subclass distribution for the search query stream.

...read moreread less

6 citations

Patent•

Boosting to determine indicative features from a training set

[...]

John Platt¹, Harvey Rook¹, Shengquan Yan¹, Rajasi Saha¹•Institutions (1)

Microsoft¹

26 May 2009

TL;DR: In this paper, a document frequency process and a boosting process are used to determine indicative features for document frequency and then a second set of features may be determined using a boosting method.

...read moreread less

Abstract: Determining indicative features may be provided First, a first set of features may be determined using a document frequency process Then a second set of features may be determined using a boosting process Using the boosting process may comprise using an approximation for a one-dimensional optimization The approximation may include an upper bound Next, the first set of features and the second set of features may be combined into a combined set of features The combined set of features may comprise a union of the first set of features and the second set of features At least one document may then be classified based on the combined set of features

...read moreread less

2 citations

Showing papers by "John Platt published in 2009"