scispace - formally typeset
Search or ask a question

Showing papers by "Jorja G. Henikoff published in 2000"



Journal ArticleDOI
TL;DR: The Blocks+ nearly doubles the number of protein families included in the database by adding families from the Pfam-A, ProDom and Domo databases to those from PROSITE and PRINTS.
Abstract: The Blocks Database WWW (http://blocks.fhcrc.org ) and Email (blocks@blocks.fhcrc.org ) servers provide tools to search DNA and protein queries against the Blocks+ Database of multiple alignments, which represent conserved protein regions. Blocks+ nearly doubles the number of protein families included in the database by adding families from the Pfam-A, ProDom and Domo databases to those from PROSITE and PRINTS. Other new features include improved Block Searcher statistics, searching with NCBI’s IMPALA program and 3D display of blocks on PDB structures.

332 citations


Journal ArticleDOI
01 Sep 2000
TL;DR: It is concluded that a better matrix can be constructed by using background frequencies characteristic of the twilight zone, where low-scoring true positives have scores indistinguishable from high-scoring false positives, rather than the amino acid frequencies of the database.
Abstract: Motivation: Database searching algorithms for proteins use scoring matrices based on average protein properties, and thus are dominated by globular proteins. However, since transmembrane regions of a protein are in a distinctly different environment than globular proteins, one would expect generalized substitution matrices to be inappropriate for transmembrane regions. Results: We present the PHAT (predicted hydrophobic and transmembrane) matrix, which significantly outperforms generalized matrices and a previously published transmembrane matrix in searches with transmembrane queries. We conclude that a better matrix can be constructed by using background frequencies characteristic of the twilight zone, where low-scoring true positives have scores indistinguishable from high-scoring false positives, rather than the amino acid frequencies of the database. The PHAT matrix may help improve the accuracy of sequence alignments and evolutionary trees of membrane proteins.

156 citations


Journal ArticleDOI
TL;DR: The most highly conserved regions of proteins can be represented as blocks of aligned sequence segments, typically with multiple blocks for a given protein family as mentioned in this paper, and the Blocks Database World Wide Web (http://blocks.fhcrc.org) servers provide tools to search DNA and protein queries against the Blocks+ Database of multiple alignments.
Abstract: The most highly conserved regions of proteins can be represented as blocks of aligned sequence segments, typically with multiple blocks for a given protein family. The Blocks Database World Wide Web (http://blocks.fhcrc.org) and e-mail (blocks@blocks. fhcrc.org) servers provide tools to search DNA and protein queries against the Blocks+ Database of multiple alignments. We describe features for detection of distant relationships using blocks. Blocks+ includes protein families from the PROSITE, Prints, Pfam-A, ProDom and Domo databases. Other features include searching Blocks+ with the BLIMPS and NCBI's IMPALA programs, sequence logos, phylogenetic trees, three-dimensional display of blocks on PDB structures, and a polymerase chain reaction (PCR) primer design strategy based on blocks.

65 citations


Journal ArticleDOI
TL;DR: The results confirm that protein family databases can be used effectively in automated sequence annotation efforts and improve BLOCKS+ by identifying compositionally biased blocks.
Abstract: A simple and general homology-based method for gene finding was applied to the 2.9-Mb Drosophila melanogaster Adh region, the target sequence of the Genome Annotation Assessment Project (GASP). Each strand of the entire sequence was used as query of the BLOCKS+ database of conserved regions of proteins. This led to functional assignments for more than one-third of the genes and two-thirds of the transposons. Considering the enormous size of the query, the fact that only two false-positive matches were reported emphasizes the high selectivity of protein family-based methods for gene finding. We used the search results to improve BLOCKS+ by identifying compositionally biased blocks. Our results confirm that protein family databases can be used effectively in automated sequence annotation efforts.

5 citations