scispace - formally typeset
Search or ask a question

Showing papers by "John Platt published in 2005"


Proceedings Article
05 Dec 2005
TL;DR: MILBoost adapts the feature selection criterion of MILBoost to optimize the performance of the Viola-Jones cascade to show the advantage of simultaneously learning the locations and scales of the objects in the training set along with the parameters of the classifier.
Abstract: A good image object detection algorithm is accurate, fast, and does not require exact locations of objects in a training set. We can create such an object detector by taking the architecture of the Viola-Jones detector cascade and training it with a new variant of boosting that we call MIL-Boost. MILBoost uses cost functions from the Multiple Instance Learning literature combined with the AnyBoost framework. We adapt the feature selection criterion of MILBoost to optimize the performance of the Viola-Jones cascade. Experiments show that the detection rate is up to 1.6 times better using MILBoost. This increased detection rate shows the advantage of simultaneously learning the locations and scales of the objects in the training set along with the parameters of the classifier.

808 citations


Proceedings ArticleDOI
04 Sep 2005
TL;DR: This paper presents the results on the TIMIT phone classification task and shows that HCRFs outperforms comparable ML and CML/MMI trained HMMs and has the ability to handle complex features without any change in training procedure.
Abstract: In this paper, we show the novel application of hidden conditional random fields (HCRFs) – conditional random fields with hidden state sequences – for modeling speech. Hidden state sequences are critical for modeling the non-stationarity of speech signals. We show that HCRFs can easily be trained using the simple direct optimization technique of stochastic gradient descent. We present the results on the TIMIT phone classification task and show that HCRFs outperforms comparable ML and CML/MMI trained HMMs. In fact, HCRF results on this task are the best single classifier results known to us. We note that the HCRF framework is easily extensible to recognition since it is a state and label sequence modeling technique. We also note that HCRFs have the ability to handle complex features without any change in training procedure.

352 citations


Patent
11 Mar 2005
TL;DR: In this paper, a system and method for generating a list is described, which includes a seed item input subsystem, an item identifying subsystem, a descriptive metadata similarity determining subsystem and a list generating subsystem that builds a list based, at least in part, on similarity processing performed on seed item descriptive metadata and user item descriptors.
Abstract: A system and method for generating a list is provided. The system includes a seed item input subsystem, an item identifying subsystem, a descriptive metadata similarity determining subsystem and a list generating subsystem that builds a list based, at least in part, on similarity processing performed on seed item descriptive metadata and user item descriptive metadata and user selected thresholds applied to such similarity processing. The method includes inexact matching between identifying metadata associated with new user items and identifying metadata stored in a reference metadata database. The method further includes subjecting candidate user items to similarity processing, where the degree to which the candidate user items are similar to the seed item is determined, and placing user items in a list of items based on user selected preferences for (dis)similarity between items in the list and the seed item.

280 citations


Proceedings Article
John Platt1
01 Jan 2005
TL;DR: Empirical experiments on the Reuters and Corel Image Features data sets show that LMDS is more accurate than FastMap and MetricMap with roughly the same computation and can become even more accurate if allowed to be slower.
Abstract: This paper unifies the mathematical foundation of three multidimen- sional scaling algorithms: FastMap, MetricMap, and Landmark MDS (LMDS). All three algorithms are based on the Nystrom approximation of the eigenvectors and eigenvalues of a matrix. LMDS is applies the basic Nystrom approximation, while FastMap and MetricMap use generaliza- tions of Nystrom, including deflation and using more points to establish an embedding. Empirical experiments on the Reuters and Corel Image Features data sets show that the basic Nystrom approximation outper- forms these generalizations: LMDS is more accurate than FastMap and MetricMap with roughly the same computation and can become even more accurate if allowed to be slower.

192 citations


Patent
29 Dec 2005
TL;DR: A system that facilitates organization of emails comprises a clustering component that clusters a plurality of emails and creates topics for emails by assigning key phrases extracted from emails within one or more clusters as discussed by the authors.
Abstract: A system that facilitates organization of emails comprises a clustering component that clusters a plurality of emails and creates topics for emails by assigning key phrases extracted from emails within one or more clusters. An organization component then utilizes the key phrases to organize documents. Furthermore, the organization component can comprise a probability component that determines a probability that a document belongs to a certain topic.

135 citations


Patent
Erin L. Renshaw1, John Platt1
27 Jan 2005
TL;DR: The Music Mapper as mentioned in this paper automatically constructs a set coordinate vectors for use in inferring similarity between various pieces of music in particular, given a music similarity graph expressed as links between various artists, albums, songs, etc, the music Mapper applies a recursive embedding process to embed each of the graphs music entries into a multi-dimensional space.
Abstract: A “Music Mapper” automatically constructs a set coordinate vectors for use in inferring similarity between various pieces of music In particular, given a music similarity graph expressed as links between various artists, albums, songs, etc, the Music Mapper applies a recursive embedding process to embed each of the graphs music entries into a multi-dimensional space This recursive embedding process also embeds new music items added to the music similarity graph without reembedding existing entries so long a convergent embedding solution is achieved Given this embedding, coordinate vectors are then computed for each of the embedded musical items The similarity between any two musical items is then determined as either a function of the distance between the two corresponding vectors In various embodiments, this similarity is then used in constructing music playlists given one or more random or user selected seed songs or in a statistical music clustering process

103 citations


Patent
Chris J.C. Burges1, John Platt1
15 Sep 2005
TL;DR: In this article, a dynamic trace cache is used to limit the database queries necessary to identify particular traces, such as songs, commercials, jingles, station identifiers, etc.
Abstract: A "Media Identifier" operates on concurrent media streams to provide large numbers of clients with real-time server-side identification of media objects embedded in streaming media, such as radio, television, or Internet broadcasts. Such media objects may include songs, commercials, jingles, station identifiers, etc. Identification of the media objects is provided to clients by comparing client-generated traces computed from media stream samples to a large database of stored, pre-computed traces (i.e., "fingerprints") of known identification. Further, given a finite number of media steams and a much larger number of clients, many of the traces sent to the server are likely to be almost identical. Therefore, a searchable dynamic trace cache is used to limit the database queries necessary to identify particular traces. This trace cache caches only one copy of recent traces along with the database search results, either positive or negative. Cache entries are then removed as they age.

75 citations


Patent
10 Feb 2005
TL;DR: In this paper, a system and methodology to facilitate automatic generation of mnemonic audio portions or segments referred to as audio thumbnails is presented, which can then be employed to facilitate browsing or searching audio files in order to mitigate listening to longer segments or segments of such files.
Abstract: The present invention relates to a system and methodology to facilitate automatic generation of mnemonic audio portions or segments referred to as audio thumbnails. A system is provided for summarizing audio information. The system includes an analysis component to determine common features in an audio file and a mnemonic detector to extract fingerprint portions of the audio file based in part on the common features in order to generate a thumbnail of the audio file. The generated thumbnails can then be employed to facilitate browsing or searching audio files in order to mitigate listening to longer portions or segments of such files.

74 citations


Proceedings ArticleDOI
18 Mar 2005
TL;DR: Two new applications of audio fingerprinting are presented: duplicate detection, whose goal is to identify duplicate audio clips in a set, even if they differ in compression quality or duration, and thumbnail generation, which aims to provide a representative short clip of a music track.
Abstract: Audio fingerprinting is a powerful tool for identifying file-based or streaming audio, using a database of fingerprints. The paper presents two new applications of audio fingerprinting: duplicate detection, whose goal is to identify duplicate audio clips in a set, even if they differ in compression quality or duration, and thumbnail generation, which aims to provide a representative short clip of a music track. Neither application requires an external database of fingerprints. Thanks to the robustness of the fingerprinting engine, both applications perform well; the duplicate detector has a false positive rate that is conservatively bounded above by 1% on a very large data set, and the thumbnail generator significantly outperforms using a fixed window.

62 citations


Patent
30 Jun 2005
TL;DR: In this paper, a general probabilistic formulation referred to as "Conditional Harmonic Mixing" is provided, in which links between classification nodes are directed, a conditional probability matrix is associated with each link, and where the numbers of classes can vary from node to node.
Abstract: A general probabilistic formulation referred to as ‘Conditional Harmonic Mixing’ is provided, in which links between classification nodes are directed, a conditional probability matrix is associated with each link, and where the numbers of classes can vary from node to node. A posterior class probability at each node is updated by minimizing a divergence between its distribution and that predicted by its neighbors. For arbitrary graphs, as long as each unlabeled point is reachable from at least one training point, a solution generally always exists, is unique, and can be found by solving a sparse linear system iteratively. In one aspect, an automated data classification system is provided. The system includes a data set having at least one labeled category node in the data set. A semi-supervised learning component employs directed arcs to determine the label of at least one other unlabeled category node in the data set.

51 citations


Patent
Jack W. Stokes1, John Platt1
27 Jun 2005
TL;DR: In this paper, a method and system of multi-channel echo cancellation using round robin regularization was proposed, which includes applying a plurality of adaptive filters, each having an inverse correlation matrix, to the multichannel playback signal.
Abstract: A method and system of multi-channel echo cancellation using round robin regularization. The multi-channel round robin regularization echo cancellation method includes applying a plurality of adaptive filters, each having an inverse correlation matrix, to the multi-channel playback signal. Each of the plurality of adaptive filters is selected in a round robin sequence, so that every round each of the filters is selected. The inverse correlation matrix associated with each selected adaptive filter then is regularized as needed. The regularized adaptive filter then is used to remove the echo of the multi-channel playback signal from a captured signal. Regularization is implemented in a round robin manner to ensure that each subband is selected so that the adaptive filter for that subband can be examined. Other features of the multi-channel echo cancellation system and method include dynamic switching between monaural and multi-channel echo cancellation and mixed processing for lower and upper subbands.

Patent
John Platt1, Erin L. Renshaw1
27 Jan 2005
TL;DR: The Music Mapper as mentioned in this paper automatically constructs a set coordinate vectors for use in inferring similarity between various pieces of music, which is then used in constructing music playlists given one or more random or user selected seed songs or in a statistical music clustering process.
Abstract: A “Music Mapper” automatically constructs a set coordinate vectors for use in inferring similarity between various pieces of music. In particular, given a music similarity graph expressed as links between various artists, albums, songs, etc., the Music Mapper applies a recursive embedding process to embed each of the graphs music entries into a multi-dimensional space. This recursive embedding process also embeds new music items added to the music similarity graph without reembedding existing entries so long a convergent embedding solution is achieved. Given this embedding, coordinate vectors are then computed for each of the embedded musical items. The similarity between any two musical items is then determined as either a function of the distance between the two corresponding vectors. In various embodiments, this similarity is then used in constructing music playlists given one or more random or user selected seed songs or in a statistical music clustering process.

Proceedings Article
01 Jul 2005
TL;DR: This paper presents a procedure to automatically discover a user s personal topics by clustering their emails using appropriate keywords and demonstrates these keywords by creating an email/ document browser which makes use of these keywords as standing queries to create virtual folders that help organize, index and retrieve email efficiently.
Abstract: We present in this paper a procedure to automatically discover a user s personal topics by clustering their emails. Unlike previous work, we automatically label topics using appropriate keywords. We show that, in order to get appropriate keywords, we must apply strong filters that use domain knowledge about e-mail and the workplace of the user. We demonstrate these keywords by creating an email/ document browser which makes use of these keywords as standing queries to create virtual folders that help organize, index and retrieve email efficiently. We present subjective user studies to show the usefulness of the strong filtering.

Patent
31 Mar 2005
TL;DR: A regression-based residual echo suppression (RES) system and process for suppressing the portion of the microphone signal corresponding to a playback of a speaker audio signal that was not suppressed by an acoustic echo canceller (AEC) is proposed in this article.
Abstract: A regression-based residual echo suppression (RES) system and process for suppressing the portion of the microphone signal corresponding to a playback of a speaker audio signal that was not suppressed by an acoustic echo canceller (AEC). In general, a prescribed regression technique is used between a prescribed spectral attribute of multiple past and present, fixed-length, periods (e.g., frames) of the speaker signal and the same spectral attribute of a current period (e.g., frame) of the echo residual in the output of the AEC. This automatically takes into consideration the correlation between the time periods of the speaker signal. The parameters of the regression can be easily tracked using adaptive methods. Multiple applications of RES can be used to produce better results and this system and process can be applied to stereo-RES as well.

Journal Article
TL;DR: In this paper, the informative vector machine (IVM) is extended to a block-diagonal covariance matrix, which allows the IVM to be applied to a mixture of labeled and unlabeled data.
Abstract: The informative vector machine (IVM) is a practical method for Gaussian process regression and classification. The IVM produces a sparse approximation to a Gaussian process by combining assumed density filtering with a heuristic for choosing points based on minimizing posterior entropy. This paper extends IVM in several ways. First, we propose a novel noise model that allows the IVM to be applied to a mixture of labeled and unlabeled data. Second, we use IVM on a block-diagonal covariance matrix, for learning to learn from related tasks. Third, we modify the IVM to incorporate prior knowledge from known invariances. All of these extensions are tested on artificial and real data.

Journal Article
TL;DR: Redundant Bit Vectors is proposed: a novel method for quickly solving applications such as audio fingerprinting that approximate the high-dimensional regions/distributions as tightened hyperrectangles as well as partition the query space to store each item redundantly in an index.
Abstract: Applications such as audio fingerprinting require search in high dimensions: find an item in a database that is similar to a query. An important property of this search task is that negative answers are very frequent: much of the time, a query does not correspond to any database item. We propose Redundant Bit Vectors (RBVs): a novel method for quickly solving this search problem. RBVs rely on three key ideas: 1) approximate the high-dimensional regions/distributions as tightened hyperrectangles, 2) partition the query space to store each item redundantly in an index and 3) use bit vectors to store and search the index efficiently. We show that our method is the preferred method for very large databases or when the queries are often not in the database. Our method is 109 times faster than linear scan, and 48 times faster than locality-sensitive hashing on a data set of 239369 audio fingerprints.

Patent
24 Jan 2005
TL;DR: In this article, the authors present a system and method that facilitates an interactive game-powered search engine that serve the purposes of both users who may be looking for information as well as game participants who may desire to earn some reward or level of enjoyment by playing the game.
Abstract: The subject invention provides a unique system and method that facilitates an interactive game-powered search engine that serve the purposes of both users who may be looking for information as well as game participants who may desire to earn some reward or level of enjoyment by playing the game. More specifically, the system and method provides feedback to a user based on the user's input string or a string derived therefrom. The feedback can be a response or answer to the user's input in the form of text, an image, audio or sound, video, and/or a URL that is provided by one or more game participants when there is some degree of consistency or agreement between the responses or when individual players have demonstrated good reliability in their responses.

Patent
John Platt1, M. Robinson1
28 Nov 2005
TL;DR: In this paper, text is extracted from a document or workflow that is relevant to the rich media content and the text is filtered into keyphrases and added to a metadata file associated with the content.
Abstract: Metadata is generated for rich media content from a document or workflow that is associated with the rich media content. When rich media content is included in a document or workflow, text is extracted from the document or workflow that is relevant to the rich media content. The text is filtered into keyphrases and added to a metadata file associated with the rich media content.

Patent
22 Sep 2005
TL;DR: In this article, a method and apparatus for training parameters in a hidden conditional random field model for use in speech recognition and phonetic classification is provided. But this method is limited to a single segment of speech, and the parameters are updated after processing of individual training samples.
Abstract: A method and apparatus are provided for training parameters in a hidden conditional random field model for use in speech recognition and phonetic classification. The hidden conditional random field model uses parameterized features that are determined from a segment of speech, and those values are used to identify a phonetic unit for the segment of speech. The parameters are updated after processing of individual training samples.

Patent
Jack W. Stokes1, John Platt1
10 Jun 2005
TL;DR: In this paper, an echo cancellation technique that can process multi-input microphone signals with only a small increase in the overall CPU consumption compared to implementing the algorithm for a single channel microphone signal is presented.
Abstract: An echo cancellation technique that can process multi-input microphone signals with only a small increase in the overall CPU consumption compared to implementing the algorithm for a single channel microphone signal. Furthermore, the invention provides an architecture that provides for echo cancellation for multiple applications in parallel with only a small increase in CPU consumption compared to a single instance of echo cancellation with a single microphone input and multi-output channel playback.

Proceedings ArticleDOI
14 Nov 2005
TL;DR: A novel algorithm for super-resolution of text magnifies images in real-time by interpolation with a variable linear filter determined nonlinearly from the neighborhood to which it is applied.
Abstract: Images magnified by standard methods display a degradation of detail that is particularly noticeable in the blurry edges of text. Current super-resolution algorithms address the lack of sharpness by filling in the image with probable details. These algorithms break the outlines of text. Our novel algorithm for super-resolution of text magnifies images in real-time by interpolation with a variable linear filter. The coefficients of the filter are determined nonlinearly from the neighborhood to which it is applied. We train the mapping that defines the coefficients to specifically enhance edges of text, producing a conservative algorithm that infers the detail of magnified text. Possible applications include resizing web page layouts or other interfaces, and enhancing low resolution camera captures of text. In general, learning spatially-variable filters is applicable to other image filtering tasks.

Patent
09 Mar 2005
TL;DR: In this paper, the blending coefficients (alpha values) of font glyphs undergo alpha correction to compensate for a lack of gamma correction in text rendering processes, which can be performed by a GPU which is not configured to perform gamma correction.
Abstract: The blending coefficients (alpha values) of font glyphs undergo alpha correction to compensate for a lack of gamma correction in text rendering processes. The alpha correction includes selecting a set of correction coefficients that correspond to the predetermined gamma value of the display device and computing corrected alpha values from the known alpha values, the foreground colors, and set of correction coefficients. The corrected alpha values can then be used to blend the foreground and background colors of the corresponding display pixels without requiring gamma correction. Accordingly, the alpha correction can be performed by a GPU, which is not configured to perform gamma correction, thereby increasing the speed at which text rendering can occur.

Patent
28 Jul 2005
TL;DR: In this paper, a saltating sample image enhancement system and method that provides an image processing operation in which a filter considers one or one or more exact source image pixels, one or many bilinearly interpolated source image samples, where the bilinear weights are coupled to the position of the target pixel relative to the source pixels, and (optionally) one or multiple linearly interpolation source image sample samples, with the linear weights being coupled to position of target pixels relative to source pixels.
Abstract: A saltating sample image enhancement system and method that provides an image processing operation in which a filter considers one or one or more exact source image pixels; one or more bilinearly interpolated source image samples, where the bilinear weights are coupled to the position of the target pixel relative to the source pixels; and (optionally) one or more linearly interpolated source image samples, where the linear weights are coupled to the position of the target pixel relative to the source pixels. The filter can construct a spatially continuous image statistic.