scispace - formally typeset
Search or ask a question

Showing papers by "John Platt published in 2013"


Proceedings ArticleDOI
22 Jun 2013
TL;DR: This work demonstrates Stat!
Abstract: Exploratory analysis on big data requires us to rethink data management across the entire stack -- from the underlying data processing techniques to the user experience. We demonstrate Stat! -- a visualization and analytics environment that allows users to rapidly experiment with exploratory queries over big data. Data scientists can use Stat! to quickly refine to the correct query, while getting immediate feedback after processing a fraction of the data. Stat! can work with multiple processing engines in the backend; in this demo, we use Stat! with the Microsoft StreamInsight streaming engine. StreamInsight is used to generate incremental early results to queries and refine these results as more data is processed. Stat! allows data scientists to explore data, dynamically compose multiple queries to generate streams of partial results, and display partial results in both textual and visual form.

44 citations


Patent
Badrish Chandramouli1, John Wernsing1, Jonathan Goldstein1, Mike Barnett1, John Platt1 
17 Dec 2013
TL;DR: In this paper, the stream of incoming data events may be organized into a sequence of data batches that each include multiple data events and the individual data batches in the sequence may be processed in a non-decreasing "sync-time" order.
Abstract: Some examples include high-performance query processing of real-time and offline temporal-relational data. Further, some implementations include processing streaming data events by annotating individual events with a first timestamp (e.g., a “sync-time”) and second timestamp that may identify additional event information. The stream of incoming data events may be organized into a sequence of data batches that each include multiple data events. The individual data batches in the sequence may be processed in a non-decreasing “sync-time” order.

37 citations


Patent
10 Jul 2013
TL;DR: In this paper, the authors present a technique to obtain a relational query that references one or more data items and associating progress intervals with the data items, which can be used to define event lifetimes of streaming events that are provided as inputs to the stream engine.
Abstract: The described implementations relate to processing of electronic data. One implementation is manifest as a technique that can include obtaining a relational query that references one or more data items and associating progress intervals with the data items. The technique can also include converting the relational query into a corresponding streaming query, and providing the streaming query and the data items with the progress intervals to a stream engine that produces incremental results of the query. For example, the progress intervals can be based on row numbers of a relational database table. The progress intervals can be used to define event lifetimes of streaming events that are provided as inputs to the stream engine.

34 citations


Proceedings ArticleDOI
26 May 2013
TL;DR: An image-based detection method to identify web-based scareware attacks that is robust to evasion techniques and a novel visualization technique is presented demonstrating the acquired classifier knowledge on a classified screenshot.
Abstract: In this paper, we propose an image-based detection method to identify web-based scareware attacks that is robust to evasion techniques. We evaluate the method on a large-scale data set that resulted in an equal error rate of 0.018%. Conceptually, false positives may occur when a visual element, such as a red shield, is embedded in a benign page. We suggest including additional orthogonal features or employing graders to mitigate this risk. A novel visualization technique is presented demonstrating the acquired classifier knowledge on a classified screenshot.

8 citations


Patent
03 Feb 2013
TL;DR: In this paper, a system and method for inferring true labels for multiple items from judgments is presented, where multiple judges select the judgments from a specified choice of labels for each item and then select improved labels for the items from the specified choice such that the entropy over the probability distribution is reduced.
Abstract: A system and method infer true labels for multiple items. The inferred labels are generated from judgments. Multiple judges select the judgments from a specified choice of labels for each item. The method includes determining a characterization of judge expertise and item difficulties based on the judgments. The method also includes determining, using maximum entropy, a probability distribution over the specified choice of labels for each judge and item, based on the judgments. The method further includes selecting improved labels for the items from the specified choice such that the entropy over the probability distribution is reduced. The improved labels represent an improvement from the judgments toward the true labels. Additionally, the method includes performing iterative procedure to determine the true labels, the characterizations of judge expertise and the labeling difficulties.

3 citations


Patent
09 Apr 2013
TL;DR: In this article, a system enables metadata to be gathered about a data store beginning from the creation and generation of the data store, through subsequent use of data stores, including keywords and data appearing within the data stores.
Abstract: A system enables metadata to be gathered about a data store beginning from the creation and generation of the data store, through subsequent use of the data store. This metadata can include keywords related to the data store and data appearing within the data store. Thus, keywords and other metadata can be generated without owner/creator intervention, with enough semantic meaning to make a discovery process associated with the data store much easier and efficient. Usage of or communication regarding a data store are monitored and keywords are extracted from the usage or communication. The keywords are then written to otherwise associated with metadata of the data store. During searching, keywords in the metadata are made available to be used to attempt to match query terms entered by a searcher.


Patent
09 Apr 2013
TL;DR: In this paper, the discoverability of data that can be contained within a database is promoted by organizing data within a structure having a schema, and data can be processed in a manner that renders one or more pseudo-documents each of which constitutes a sub-structure that can then be indexed.
Abstract: Various embodiments promote the discoverability of data that can be contained within a database. In one or more embodiments, data within a database is organized in a structure having a schema. The structure and data can be processed in a manner that renders one or more pseudo-documents each of which constitutes a sub-structure that can be indexed. Once produced and indexed, the pseudo-documents constitute a set of searchable objects each of which relationally points back to its associated structure within the database. Searches can now be performed against the pseudo-documents which, in turn, returns a set of search results. The set of search results can include multiple sub-sets of pseudo-documents, each sub-set of which is associated with a different structure.