scispace - formally typeset
Search or ask a question

Showing papers by "Trevor Hastie published in 1998"


Journal ArticleDOI
TL;DR: In this article, the authors discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together, similar to the Bradley-Terry method for paired comparisons.
Abstract: We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar to the Bradley-Terry method for paired comparisons. We study the nature of the class probability estimates that arise, and examine the performance of the procedure in real and simulated data sets. Classifiers used include linear discriminants, nearest neighbors, adaptive nonlinear methods and the support vector machine.

1,569 citations


Book ChapterDOI
TL;DR: The handwritten digits taken from US envelopes are regarded as a feature vector to be used as input to a classifier, which will automatically assign a digit class based on the pixel values.
Abstract: Figure 1 shows some handwritten digits taken from US envelopes. Each image consists of 16 × 16 pixels of greyscale values ranging from 0 – 255. These 256 pixel values are regarded as a feature vector to be used as input to a classifier, which will automatically assign a digit class based on the pixel values.

132 citations


Journal ArticleDOI
TL;DR: This article investigates a new family of plug-in classification techniques developed in the statistics and machine learning literature and investigates one of these methods, finding some motivation for its success.
Abstract: A new family of plug-in classification techniques has recently been developed in the statistics and machine learning literature. A plug-in classification technique (PICT) is a method that takes a standard classifier (such as LDA or TREES) and plugs it into an algorithm to produce a new classifier. The standard classifier is known as the base classifier. These methods often produce large improvements over using a single classifier. In this article we investigate one of these methods and give some motivation for its success.

74 citations


Proceedings ArticleDOI
01 Mar 1998
TL;DR: A general framework is presented for analyzing multiple protein structures using statistical regression methods, and it is revealed that globins are most strongly conserved structurally in helical regions, particularly in the mid-regions of the E, F, and G helices.
Abstract: A general framework is presented for analyzing multiple protein structures using statistical regression methods. The regression approach can superimpose protein structures rigidly or with shear. Also, this approach can superimpose multiple structures explicitly, without resorting to pairwise superpositions. The algorithm alternates between matching corresponding landmarks among the protein structures and superimposing these landmarks. Matching is performed using a robust dynamic programming technique that uses gap penalties that adapt to the given data. Superposition is performed using either orthogonal transformations, which impose the rigid-body assumption, or affine transformations, which allow shear. The resulting regression model of a protein family measures the amount of structural variability at each landmark. A variation of our algorithm permits a separate weight for each landmark, thereby allowing one to emphasize particular segments of a protein structure or to compensate for variances ...

40 citations


Journal ArticleDOI
TL;DR: A general framework is presented for analyzing multiple protein structures using statistical regression methods, and it is revealed that globins are most strongly conserved structurally in helical regions, particularly in the mid-regions of the E, F, and G helices.
Abstract: A general framework is presented for analyzing multiple protein structures using statistical regression methods. The regression approach can superimpose protein structures rigidly or with shear. Also, this approach can superimpose multiple structures explicitly, without resorting to pairwise superpositions. The algorithm alternates between matching corresponding landmarks among the protein structures and superimposing these landmarks. Matching is performed using a robust dynamic programming technique that uses gap penalties that adapt to the given data. Superposition is performed using either orthogonal transformations, which impose the rigid-body assumption, or affine transformations, which allow shear. The resulting regression model of a protein family measures the amount of structural variability at each landmark. A variation of our algorithm permits a separate weight for each landmark, thereby allowing one to emphasize particular segments of a protein structure or to compensate for variances that differ at various positions in a structure. In addition, a method is introduced for finding an initial correspondence, by measuring the discrete curvature along each protein backbone. Discrete curvature also characterizes the secondary structure of a protein backbone, distinguishing among helical, strand, and loop regions. An example is presented involving a set of seven globin structures. Regression analysis, using both affine and orthogonal transformations, reveals that globins are most strongly conserved structurally in helical regions, particularly in the mid-regions of the E, F, and G helices.

23 citations


Proceedings Article
01 Jan 1998
TL;DR: Analysis of the superposition reveals that globins are most strongly conserved structurally in the mid-regions of the E and G helices.
Abstract: A novel approach for analyzing multiple protein structures is presented A family of related protein structures may be characterized by an affine model, obtained by applying transformation matrices that permit both rotation and shear The affine model and transformation matrices can be computed efficiently using a single eigen-decomposition A novel method for finding correspondences is also introduced This method matches curvatures along the protein backbone The algorithm is applied to analyze a set of seven globin structures Our method identifies 100 corresponding landmarks across all seven structures Results show that most helices in globins can be identified by high curvature, with the exception of the C and D helices Analysis of the superposition reveals that globins are most strongly conserved structurally in the mid-regions of the E and G helices

10 citations