Search or ask a question

Showing papers by "John Platt published in 2010"

PDF

Open Access

Proceedings Article•

Translingual Document Representations from Discriminative Projections

[...]

John Platt¹, Kristina Toutanova¹, Wen-tau Yih¹•Institutions (1)

Microsoft¹

09 Oct 2010

TL;DR: This work uses discriminative training to create a projection of documents from multiple languages into a single translingual vector space and evaluates these algorithms on two tasks: parallel document retrieval for Wikipedia and Europarl documents, and cross-lingual text classification on Reuters.

...read moreread less

Abstract: Representing documents by vectors that are independent of language enhances machine translation and multilingual text categorization. We use discriminative training to create a projection of documents from multiple languages into a single translingual vector space. We explore two variants to create these projections: Oriented Principal Component Analysis (OPCA) and Coupled Probabilistic Latent Semantic Analysis (CPLSA). Both of these variants start with a basic model of documents (PCA and PLSA). Each model is then made discriminative by encouraging comparable document pairs to have similar vector representations. We evaluate these algorithms on two tasks: parallel document retrieval for Wikipedia and Europarl documents, and cross-lingual text classification on Reuters. The two discriminative variants, OPCA and CPLSA, significantly outperform their corresponding baselines. The largest differences in performance are observed on the task of retrieval when the documents are only comparable and not parallel. The OPCA method is shown to perform best.

...read moreread less

126 citations

Patent•

Performing query expansion based upon statistical analysis of structured data

[...]

Charles E. Jacobs¹, John Platt¹, Johnson T. Apacible¹•Institutions (1)

Microsoft¹

18 Jun 2010

TL;DR: In this article, a query is configured to search over a plurality of documents belonging to a particular domain, and the data is provided based at least in part upon a statistical analysis undertaken with respect to structured data pertaining to the particular domain.

...read moreread less

Abstract: A method described herein includes an act of receiving a query from a user, wherein the query is configured to search over a plurality of documents belonging to a particular domain. The method also includes an act of providing data to the user for display on a display screen of a computing apparatus, wherein the data is provided based at least in part upon a statistical analysis undertaken with respect to structured data pertaining to the particular domain, wherein the structured data is based at least in part upon data included in the plurality of documents.

...read moreread less

7 citations

Patent•

Pasting Various Data into a Programming Environment

[...]

Charles E. Jacobs¹, Sumit Basu¹, John Platt¹•Institutions (1)

Microsoft¹

12 May 2010

TL;DR: In this paper, a user pastes selected data into a command line of a program, including when the selected data is non-textual, and a variable name is automatically generated and inserted at the current point in the command line, where it acts as a proxy for the pasted data itself.

...read moreread less

Abstract: Described is a technology by which a user pastes selected data into a command line of a program, including when the selected data is non-textual. Upon detecting the paste (or drop) action, a variable name is automatically generated and inserted at the current point in a command line, where it acts as a proxy for the pasted data itself. A data structure comprising the selected data or transformed data corresponding to that selected data is maintained in program storage, e.g., RAM allocated to the program. In one aspect, a handler may be used to transform the data from one format into another that may be used by a particular program. For example, text may be reformatted into an array on which the program operates. The handler may be selected from a plurality of possible handlers, including customized handlers.

...read moreread less

3 citations