Showing papers by "John Platt published in 2013"

PDF

Open Access

Proceedings Article•DOI•

Stat!: an interactive analytics environment for big data

[...]

Mike Barnett¹, Badrish Chandramouli¹, Robert DeLine¹, Steven M. Drucker¹, Danyel Fisher¹, Jonathan Goldstein¹, Patrick Morrison², John Platt¹ - Show less +4 more•Institutions (2)

Microsoft¹, North Carolina State University²

22 Jun 2013

TL;DR: This work demonstrates Stat!

...read moreread less

Abstract: Exploratory analysis on big data requires us to rethink data management across the entire stack -- from the underlying data processing techniques to the user experience. We demonstrate Stat! -- a visualization and analytics environment that allows users to rapidly experiment with exploratory queries over big data. Data scientists can use Stat! to quickly refine to the correct query, while getting immediate feedback after processing a fraction of the data. Stat! can work with multiple processing engines in the backend; in this demo, we use Stat! with the Microsoft StreamInsight streaming engine. StreamInsight is used to generate incremental early results to queries and refine these results as more data is processed. Stat! allows data scientists to explore data, dynamically compose multiple queries to generate streams of partial results, and display partial results in both textual and visual form.

...read moreread less

44 citations

Patent•

Analytical Data Processing Engine

[...]

Badrish Chandramouli¹, John Wernsing¹, Jonathan Goldstein¹, Mike Barnett¹, John Platt¹ - Show less +1 more•Institutions (1)

Microsoft¹

17 Dec 2013

TL;DR: In this paper, the stream of incoming data events may be organized into a sequence of data batches that each include multiple data events and the individual data batches in the sequence may be processed in a non-decreasing "sync-time" order.

...read moreread less

Abstract: Some examples include high-performance query processing of real-time and offline temporal-relational data. Further, some implementations include processing streaming data events by annotating individual events with a first timestamp (e.g., a “sync-time”) and second timestamp that may identify additional event information. The stream of incoming data events may be organized into a sequence of data batches that each include multiple data events. The individual data batches in the sequence may be processed in a non-decreasing “sync-time” order.

...read moreread less

37 citations

Patent•

Progressive query computation using streaming architectures

[...]

Danyel Fisher¹, Steven M. Drucker¹, Jonathan Goldstein¹, Badrish Chandramouli¹, Robert DeLine¹, John Platt¹, Mike Barnett¹ - Show less +3 more•Institutions (1)

Microsoft¹

10 Jul 2013

TL;DR: In this paper, the authors present a technique to obtain a relational query that references one or more data items and associating progress intervals with the data items, which can be used to define event lifetimes of streaming events that are provided as inputs to the stream engine.

...read moreread less

Abstract: The described implementations relate to processing of electronic data. One implementation is manifest as a technique that can include obtaining a relational query that references one or more data items and associating progress intervals with the data items. The technique can also include converting the relational query into a corresponding streaming query, and providing the streaming query and the data items with the progress intervals to a stream engine that produces incremental results of the query. For example, the progress intervals can be based on row numbers of a relational database table. The progress intervals can be used to define event lifetimes of streaming events that are provided as inputs to the stream engine.

...read moreread less

34 citations

Proceedings Article•DOI•

Robust scareware image detection

[...]

Christian Seifert¹, Jack W. Stokes¹, Christina Colcernian¹, John Platt¹, Long Lu² - Show less +1 more•Institutions (2)

Microsoft¹, Georgia Institute of Technology²

26 May 2013

TL;DR: An image-based detection method to identify web-based scareware attacks that is robust to evasion techniques and a novel visualization technique is presented demonstrating the acquired classifier knowledge on a classified screenshot.

...read moreread less

Abstract: In this paper, we propose an image-based detection method to identify web-based scareware attacks that is robust to evasion techniques. We evaluate the method on a large-scale data set that resulted in an equal error rate of 0.018%. Conceptually, false positives may occur when a visual element, such as a red shield, is embedded in a benign page. We suggest including additional orthogonal features or employing graders to mitigate this risk. A novel visualization technique is presented demonstrating the acquired classifier knowledge on a classified screenshot.

...read moreread less

8 citations

Patent•

Learning with noisy labels from multiple judges

[...]

Dengyong Zhou¹, Sumit Basu¹, Yi Mao¹, John Platt¹•Institutions (1)

Microsoft¹

03 Feb 2013

TL;DR: In this paper, a system and method for inferring true labels for multiple items from judgments is presented, where multiple judges select the judgments from a specified choice of labels for each item and then select improved labels for the items from the specified choice such that the entropy over the probability distribution is reduced.

...read moreread less

Abstract: A system and method infer true labels for multiple items. The inferred labels are generated from judgments. Multiple judges select the judgments from a specified choice of labels for each item. The method includes determining a characterization of judge expertise and item difficulties based on the judgments. The method also includes determining, using maximum entropy, a probability distribution over the specified choice of labels for each judge and item, based on the judgments. The method further includes selecting improved labels for the items from the specified choice such that the entropy over the probability distribution is reduced. The improved labels represent an improvement from the judgments toward the true labels. Additionally, the method includes performing iterative procedure to determine the true labels, the characterizations of judge expertise and the labeling difficulties.

...read moreread less

3 citations

Patent•

Developing implicit metadata for data stores

[...]

John Platt¹, Surajit Chaudhuri¹, Henricus Johannes Maria Meijer¹, Lev Novik¹•Institutions (1)

Microsoft¹

09 Apr 2013

TL;DR: In this article, a system enables metadata to be gathered about a data store beginning from the creation and generation of the data store, through subsequent use of data stores, including keywords and data appearing within the data stores.

...read moreread less

Abstract: A system enables metadata to be gathered about a data store beginning from the creation and generation of the data store, through subsequent use of the data store. This metadata can include keywords related to the data store and data appearing within the data store. Thus, keywords and other metadata can be generated without owner/creator intervention, with enough semantic meaning to make a discovery process associated with the data store much easier and efficient. Usage of or communication regarding a data store are monitored and keywords are extracted from the usage or communication. The keywords are then written to otherwise associated with metadata of the data store. During searching, keywords in the metadata are made available to be used to attempt to match query terms entered by a searcher.

...read moreread less

Patent•

Apaptive junk message filtering system and method

[...]

Robert L. Rounthwaite, Joshua T. Goodman, David Heckerman, John Platt, Carl M. Kadie - Show less +1 more

11 Apr 2013

Patent•

Pseudo-documents to facilitate data discovery

[...]

Surajit Chaudhuri¹, Lev Novik¹, John Platt¹•Institutions (1)

Microsoft¹

09 Apr 2013

TL;DR: In this paper, the discoverability of data that can be contained within a database is promoted by organizing data within a structure having a schema, and data can be processed in a manner that renders one or more pseudo-documents each of which constitutes a sub-structure that can then be indexed.

...read moreread less

Abstract: Various embodiments promote the discoverability of data that can be contained within a database. In one or more embodiments, data within a database is organized in a structure having a schema. The structure and data can be processed in a manner that renders one or more pseudo-documents each of which constitutes a sub-structure that can be indexed. Once produced and indexed, the pseudo-documents constitute a set of searchable objects each of which relationally points back to its associated structure within the database. Searches can now be performed against the pseudo-documents which, in turn, returns a set of search results. The set of search results can include multiple sub-sets of pseudo-documents, each sub-set of which is associated with a different structure.

...read moreread less