scispace - formally typeset
Search or ask a question
Author

Stephen F. Altschul

Bio: Stephen F. Altschul is an academic researcher from National Institutes of Health. The author has contributed to research in topics: Multiple sequence alignment & Substitution matrix. The author has an hindex of 46, co-authored 78 publications receiving 171211 citations. Previous affiliations of Stephen F. Altschul include Vanderbilt University & Center for Information Technology.


Papers
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
09 Aug 1991-Science
TL;DR: The APC gene was identified in a contig initiated from the MCC gene and was found to encode an unusually large protein, and these two closely spaced genes encode proteins predicted to contain coiled-coil regions, which were also expressed in a wide variety of tissues.
Abstract: Recent studies suggest that one or more genes on chromosome 5q21 are important for the development of colorectal cancers, particularly those associated with familial adenomatous polyposis (FAP). To facilitate the identification of genes from this locus, a portion of the region that is tightly linked to FAP was cloned. Six contiguous stretches of sequence (contigs) containing approximately 5.5 Mb of DNA were isolated. Subclones from these contigs were used to identify and position six genes, all of which were expressed in normal colonic mucosa. Two of these genes (APC and MCC) are likely to contribute to colorectal tumorigenesis. The MCC gene had previously been identified by virtue of its mutation in human colorectal tumors. The APC gene was identified in a contig initiated from the MCC gene and was found to encode an unusually large protein. These two closely spaced genes encode proteins predicted to contain coiled-coil regions. Both genes were also expressed in a wide variety of tissues. Further studies of MCC and APC and their potential interaction should prove useful for understanding colorectal neoplasia.

2,364 citations

Journal ArticleDOI
Robert L. Strausberg, Elise A. Feingold1, Lynette H. Grouse1, Jeffery G. Derge2, Richard D. Klausner1, Francis S. Collins1, Lukas Wagner1, Carolyn M. Shenmen1, Gregory D. Schuler1, Stephen F. Altschul1, Barry R. Zeeberg1, Kenneth H. Buetow1, Carl F. Schaefer1, Narayan K. Bhat1, Ralph F. Hopkins1, Heather Jordan1, Troy Moore3, Steve I Max3, Jun Wang3, Florence Hsieh, Luda Diatchenko, Kate Marusina, Andrew A Farmer, Gerald M. Rubin4, Ling Hong4, Mark Stapleton4, M. Bento Soares5, Maria de Fatima Bonaldo5, Thomas L. Casavant5, Todd E. Scheetz5, Michael J. Brownstein1, Ted B. Usdin1, Shiraki Toshiyuki, Piero Carninci, Christa Prange6, Sam S Raha7, Naomi A Loquellano7, Garrick J Peters7, Rick D Abramson7, Sara J Mullahy7, Stephanie Bosak, Paul J. McEwan, Kevin McKernan, Joel A. Malek, Preethi H. Gunaratne8, Stephen Richards8, Kim C. Worley8, Sarah Hale8, Angela M. Garcia8, Stephen W. Hulyk8, Debbie K Villalon8, Donna M. Muzny8, Erica Sodergren8, Xiuhua Lu8, Richard A. Gibbs8, Jessica Fahey9, Erin Helton9, Mark Ketteman9, Anuradha Madan9, Stephanie Rodrigues9, Amy Sanchez9, Michelle Whiting9, Anup Madan9, Alice C. Young1, Yuriy O. Shevchenko1, Gerard G. Bouffard1, Robert W. Blakesley1, Jeffrey W. Touchman1, Eric D. Green1, Mark Dickson10, Alex Rodriguez10, Jane Grimwood10, Jeremy Schmutz10, Richard M. Myers10, Yaron S.N. Butterfield11, Martin Krzywinski11, Ursula Skalska11, Duane E. Smailus11, Angelique Schnerch11, Jacqueline E. Schein11, Steven J.M. Jones11, Marco A. Marra11 
TL;DR: The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence a cDNA clone containing a complete ORF for each human and mouse gene.
Abstract: The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence a cDNA clone containing a complete ORF for each human and mouse gene. ESTs were generated from libraries enriched for full-length cDNAs and analyzed to identify candidate full-ORF clones, which then were sequenced to high accuracy. The MGC has currently sequenced and verified the full ORF for a nonredundant set of >9,000 human and >6,000 mouse genes. Candidate full-ORF clones for an additional 7,800 human and 3,500 mouse genes also have been identified. All MGC sequences and clones are available without restriction through public databases and clone distribution networks (see http:mgc.nci.nih.gov).

2,184 citations

Journal ArticleDOI
08 Oct 1993-Science
TL;DR: A mathematical definition of this "local multiple alignment" problem suitable for full computer automation has been used to develop a new and sensitive algorithm, based on the statistical method of iterative sampling, that finds an optimized local alignment model for N sequences in N-linear time, requiring only seconds on current workstations.
Abstract: A wealth of protein and DNA sequence data is being generated by genome projects and other sequencing efforts. A crucial barrier to deciphering these sequences and understanding the relations among them is the difficulty of detecting subtle local residue patterns common to multiple sequences. Such patterns frequently reflect similar molecular structures and biological properties. A mathematical definition of this "local multiple alignment" problem suitable for full computer automation has been used to develop a new and sensitive algorithm, based on the statistical method of iterative sampling. This algorithm finds an optimized local alignment model for N sequences in N-linear time, requiring only seconds on current workstations, and allows the simultaneous detection and optimization of multiple patterns and pattern repeats. The method is illustrated as applied to helix-turn-helix proteins, lipocalins, and prenyltransferases.

1,991 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.
Abstract: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

63,427 citations

Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis that facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system.
Abstract: Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis. Fiji uses modern software engineering practices to combine powerful software libraries with a broad range of scripting languages to enable rapid prototyping of image-processing algorithms. Fiji facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system. We propose Fiji as a platform for productive collaboration between computer science and biology research communities.

43,540 citations