Showing papers by "Alejandro A. Schäffer published in 2006"

PDF

Open Access

Journal Article•DOI•

A Fast and Symmetric DUST Implementation to Mask Low-Complexity DNA Sequences

[...]

Aleksandr Morgulis¹, E. Michael Gertz, Alejandro A. Schäffer, Richa Agarwala•Institutions (1)

23 Jun 2006-Journal of Computational Biology

TL;DR: A new implementation of the DUST module that uses the same function to assign a complexity score to a sequence, but uses a different rule by which high-scoring sequences are masked, at least four times faster than the old on the human genome.

...read moreread less

Abstract: The DUST module has been used within BLAST for many years to mask low-complexity sequences. In this paper, we present a new implementation of the DUST module that uses the same function to assign a complexity score to a sequence, but uses a different rule by which high-scoring sequences are masked. The new rule masks every nucleotide masked by the old rule and occasionally masks more. The new masking rule corrects two related deficiencies with the old rule. First, the new rule is symmetric with respect to reversing the sequence. Second, the new rule is not context sensitive; the decision to mask a subsequence does not depend on what sequences flank it. The new implementation is at least four times faster than the old on the human genome. We show that both the percentage of additional bases masked and the effect on MegaBLAST outputs are very small.

...read moreread less

431 citations

Journal Article•DOI•

Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST

[...]

E. Michael Gertz¹, Yi-Kuo Yu¹, Richa Agarwala¹, Alejandro A. Schäffer¹, Stephen F. Altschul¹ - Show less +1 more•Institutions (1)

National Institutes of Health¹

07 Dec 2006-BMC Biology

TL;DR: It is shown that composition-based statistics greatly improve the statistical accuracy of TBLASTN, at a small cost to the retrieval accuracy, and is useful in other studies of translated search algorithms.

...read moreread less

Abstract: TBLASTN is a mode of operation for BLAST that aligns protein sequences to a nucleotide database translated in all six frames. We present the first description of the modern implementation of TBLASTN, focusing on new techniques that were used to implement composition-based statistics for translated nucleotide searches. Composition-based statistics use the composition of the sequences being aligned to generate more accurate E-values, which allows for a more accurate distinction between true and false matches. Until recently, composition-based statistics were available only for protein-protein searches. They are now available as a command line option for recent versions of TBLASTN and as an option for TBLASTN on the NCBI BLAST web server. We evaluate the statistical and retrieval accuracy of the E-values reported by a baseline version of TBLASTN and by two variants that use different types of composition-based statistics. To test the statistical accuracy of TBLASTN, we ran 1000 searches using scrambled proteins from the mouse genome and a database of human chromosomes. To test retrieval accuracy, we modernize and adapt to translated searches a test set previously used to evaluate the retrieval accuracy of protein-protein searches. We show that composition-based statistics greatly improve the statistical accuracy of TBLASTN, at a small cost to the retrieval accuracy. TBLASTN is widely used, as it is common to wish to compare proteins to chromosomes or to libraries of mRNAs. Composition-based statistics improve the statistical accuracy, and therefore the reliability, of TBLASTN results. The algorithms used by TBLASTN are not widely known, and some of the most important are reported here. The data used to test TBLASTN are available for download and may be useful in other studies of translated search algorithms.

...read moreread less

409 citations

Journal Article•DOI•

WindowMasker: window-based masker for sequenced genomes

[...]

Aleksandr Morgulis¹, E. Michael Gertz¹, Alejandro A. Schäffer¹, Richa Agarwala¹•Institutions (1)

National Institutes of Health¹

15 Jan 2006-Bioinformatics

TL;DR: WindowMasker (WM) is developed, a software tool that identifies and masks highly repetitive DNA sequences in a genome, using only the sequence of the genome itself, which is orders of magnitude faster than RepeatMasker/Maskeraid.

...read moreread less

Abstract: Motivation: Matches to repetitive sequences are usually undesirable in the output of DNA database searches. Repetitive sequences need not be matched to a query, if they can be masked in the database. RepeatMasker/Maskeraid (RM), currently the most widely used software for DNA sequence masking, is slow and requires a library of repetitive template sequences, such as a manually curated RepBase library, that may not exist for newly sequenced genomes. Results: We have developed a software tool called WindowMasker (WM) that identifies and masks highly repetitive DNA sequences in a genome, using only the sequence of the genome itself. WM is orders of magnitude faster than RM because WM uses a few linear-time scans of the genome sequence, rather than local alignment methods that compare each library sequence with each piece of the genome. We validate WM by comparing BLAST outputs from large sets of queries applied to two versions of the same genome, one masked by WM, and the other masked by RM. Even for genomes such as the human genome, where a good RepBase library is available, searching the database as masked with WM yields more matches that are apparently non-repetitive and fewer matches to repetitive sequences. We show that these results hold for transcribed regions as well. WM also performs well on genomes for which much of the sequence was in draft form at the time of the analysis. Availability: WM is included in the NCBI C++ toolkit. The source code for the entire toolkit is available at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/. Once the toolkit source is unpacked, the instructions for building WindowMasker application in the UNIX environment can be found in file src/app/winmasker/README.build. Contact: richa@helix.nih.gov Supplementary information: Supplementary data are available at ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/windowmasker/windowmasker_suppl.pdf.

...read moreread less

253 citations

Journal Article•DOI•

Identification of a homozygous deletion in the AP3B1 gene causing Hermansky-Pudlak syndrome, type 2

[...]

Johannes Jung, Georg Bohn, Anna Allroth, Kaan Boztug, Gudrun Brandes, Inga Sandrock, Alejandro A. Schäffer, Chozhavendan Rathinam, Inga Köllner, Carmela Beger, Reinhard Schilke, Karl Welte, Bodo Grimbacher, Christoph Klein - Show less +10 more

01 Jul 2006-Blood

TL;DR: The clinical and molecular phenotype of human AP-3 deficiency is extended and further insights into the role of theAP-3 complex for the innate immune system are provided.

...read moreread less

119 citations

Journal Article•DOI•

A homozygous single-base deletion in MLPH causes the dilute coat color phenotype in the domestic cat.

[...]

Yasuko Ishida, Victor A. David, Eduardo Eizirik¹, Alejandro A. Schäffer², Beena Neelam³, Melody E. Roelke³, Steven S. Hannah⁴, Stephen J. O'Brien, Marilyn Menotti-Raymond - Show less +5 more•Institutions (4)

Pontifícia Universidade Católica do Rio Grande do Sul¹, National Institutes of Health², Science Applications International Corporation³, Nestlé Purina PetCare Company⁴

01 Dec 2006-Genomics

TL;DR: In this article, the authors performed linkage mapping with microsatellites in a large multigeneration pedigree of domestic cats and detected tight linkage for dilute on cat chromosome C1 (θ=0.08, LOD=10.81).

...read moreread less

95 citations

Journal Article•DOI•

Does Having Children Extend Life Span? A Genealogical Study of Parity and Longevity in the Amish

[...]

Patrick F. McArdle¹, Toni I. Pollin, Jeffrey R. O'Connell, John D. Sorkin, Richa Agarwala, Alejandro A. Schäffer, Elizabeth A. Streeten, Terri M. King, Alan R. Shuldiner, Braxton D. Mitchell - Show less +6 more•Institutions (1)

University of Maryland, Baltimore¹

01 Feb 2006-Journals of Gerontology Series A-biological Sciences and Medical Sciences

TL;DR: It is concluded that high parity among men and later menopause among women may be markers for increased life span, and understanding the biological and/or social factors mediating these relationships may provide insights into mechanisms underlying successful aging.

...read moreread less

Abstract: Background. The relationship between parity and life span is uncertain, with evidence of both positive and negative relationships being reported previously. We evaluated this issue by using genealogical data from an Old Order Amish community in Lancaster, Pennsylvania, a population characterized by large nuclear families, homogeneous lifestyle, and extensive genealogical records. Methods. The analysis was restricted to the set of 2015 individuals who had children, were born between 1749 and 1912, and survived until at least age 50 years. Pedigree structures and birth and death dates were extracted from Amish genealogies, and the relationship between parity and longevity was examined using a variance component framework. Results. Life span of fathers increased in linear fashion with increasing number of children (0.23 years per additional child; p ¼ .01), while life span of mothers increased linearly up to 14 children (0.32 years per additional child; p ¼ .004) but decreased with each additional child beyond 14 (p ¼ .0004). Among women, but not men, a later age at last birth was associated with longer life span (p ¼ .001). Adjusting for age at last birth obliterated the correlation between maternal life span and number of children, except among mothers with ultrahigh (.14 children) parity. Conclusions. We conclude that high parity among men and later menopause among women may be markers for increased life span. Understanding the biological and/or social factors mediating these relationships may provide insights into mechanisms underlying successful aging.

...read moreread less

90 citations

Journal Article•DOI•

Genetic alterations in caspase-10 may be causative or protective in autoimmune lymphoproliferative syndrome.

[...]

Shigui Zhu¹, Amy P. Hsu¹, Marla M. Vacek¹, Lixin Zheng¹, Alejandro A. Schäffer¹, Janet K. Dale¹, Joie Davis¹, Roxanne Fischer¹, Sharon E. Straus¹, Donna Boruchov², Frank T. Saulsbury³, Michael J. Lenardo¹, Jennifer M. Puck¹ - Show less +9 more•Institutions (3)

National Institutes of Health¹, Brookdale University Hospital and Medical Center², University of Virginia Health System³

31 Jan 2006-Human Genetics

TL;DR: A association analysis suggested protection from severe disease by caspase-10 V410I in 63 families with ALPS Ia due to dominant Fas mutations (P<0.05), challenging the earlier suggestion that homozygosity for V 410I alone causes ALPS.

...read moreread less

Abstract: Autoimmune lymphoproliferative syndrome (ALPS) is characterized by lymphadenopathy, elevated numbers of T cells with αβ-T cell receptors but neither CD4 nor CD8 co-receptors, and impaired lymphocyte apoptosis in vitro. Defects in the Fas receptor are the most common cause of ALPS (ALPS Ia), but in rare cases other apoptosis proteins have been implicated, including caspase-10 (ALPS II). We investigated the role of variants of caspase-10 in ALPS. Of 32 unrelated probands with ALPS who did not have Fas defects, two were heterozygous for the caspase-10 missense mutation I406L. Like the previously reported ALPS II-associated mutation L285F, I406L impaired apoptosis when transfected alone and dominantly inhibited apoptosis mediated by wild type caspase-10 in a co-transfection assay. Other variants in caspase-10, V410I and Y446C, were found in 3.4 and 1.6% of chromosomes in Caucasians, and in 0.5 and <0.5% of African Americans, respectively. In contrast to L285F and I406L, these variants had no dominant negative effect in co-transfection assays into the H9 lymphocytic cell line. We found healthy individuals homozygous for V410I, challenging the earlier suggestion that homozygosity for V410I alone causes ALPS. Moreover, an association analysis suggested protection from severe disease by caspase-10 V410I in 63 families with ALPS Ia due to dominant Fas mutations (P<0.05). Thus, different genetic variations in caspase-10 can produce contrasting phenotypic effects.

...read moreread less

61 citations

Journal Article•DOI•

Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches.

[...]

Yi-Kuo Yu¹, E. Michael Gertz, Richa Agarwala, Alejandro A. Schäffer, Stephen F. Altschul - Show less +1 more•Institutions (1)

National Institutes of Health¹

01 Nov 2006-Nucleic Acids Research

TL;DR: A version of the BLAST protein database search program, modified to employ this new measure of sequence similarity, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set.

...read moreread less

Abstract: Protein sequence database search programs may be evaluated both for their retrieval accuracy—the ability to separate meaningful from chance similarities—and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set.

...read moreread less

61 citations

Journal Article•DOI•

Linkage of autosomal-dominant common variable immunodeficiency to chromosome 4q

[...]

Anemone Finck¹, Jos W. M. van der Meer, Alejandro A. Schäffer², Jessica Pfannstiel¹, Claire Fieschi, Alessandro Plebani³, A. David B. Webster⁴, Lennart Hammarström⁵, Bodo Grimbacher¹ - Show less +5 more•Institutions (5)

University Hospital Regensburg¹, National Institutes of Health², University of Brescia³, Royal Free Hospital⁴, Karolinska Institutet⁵

26 Apr 2006-European Journal of Human Genetics

TL;DR: A collection of 32 families with at least one CVID case and a second case of either CVID or IgAD has a peak multipoint LOD score under heterogeneity of 0.32, supporting the existence of a disease-causing gene for autosomal-dominant CVID/IgAD on chromosome 4q.

...read moreread less

Abstract: The phenotype of common variable immunodeficiency (CVID) is characterized by recurrent infections owing to hypogammaglobulinemia, with deficiency in immunoglobulin (Ig)G and at least one of IgA or IgM. Family studies have shown a genetic association between CVID and selective IgA deficiency (IgAD), the latter being a milder disorder compatible with normal health. Approximately 20-25% of CVID cases are familial, if one includes families with at least one case of CVID and one of IgAD. Nijenhuis et al described a five-generation family with six cases of CVID, five cases of IgAD, and three cases of dysgammaglobulinemia. We conducted a genome-wide scan on this family seeking genetic linkage. One interval on chromosome 4q gives a peak multipoint LOD score of 2.70 using a strict model that treats only the CVID patients and one obligate carrier with dysgammaglobulinemia as affected. Extending the definition of likely affected to include IgAD boosts the peak multipoint LOD score to 3.38. The linkage interval spans at least from D4S2361 to D4S1572. We extended our study to a collection of 32 families with at least one CVID case and a second case of either CVID or IgAD. We used the same dominant penetrance model and genotyped and analyzed nine markers on 4q. The 32 families have a peak multipoint LOD score under heterogeneity of 0.96 between markers D4S423 and D4S1572 within the suggested linkage interval of the first family, and an estimated proportion of linked families (alpha) of 0.32, supporting the existence of a disease-causing gene for autosomal-dominant CVID/IgAD on chromosome 4q.

...read moreread less

52 citations

Journal Article•DOI•

An ~140-kb deletion associated with feline spinal muscular atrophy implies an essential LIX1 function for motor neuron survival

[...]

John C. Fyfe¹, Marilyn Menotti-Raymond², Victor A. David², Lars Brichta³, Alejandro A. Schäffer², Richa Agarwala², William J. Murphy⁴, William J. Wedemeyer¹, Brittany L. Gregory¹, Bethany G. Buzzell², Meghan C. Drummond¹, Brunhilde Wirth³, Stephen J. O'Brien² - Show less +9 more•Institutions (4)

Michigan State University¹, National Institutes of Health², University of Cologne³, Texas A&M University⁴

01 Sep 2006-Genome Research

TL;DR: A novel SMA gene candidate, LIX1, is identified in an approximately140-kb deletion on feline chromosome A1q in a region of conserved synteny to human chromosome 5q15, where the predicted secondary structure is compatible with a role in RNA metabolism.

...read moreread less

Abstract: The leading genetic cause of infant mortality is spinal muscular atrophy (SMA), a clinically and genetically heterogeneous group of disorders. Previously we described a domestic cat model of autosomal recessive, juvenile-onset SMA similar to human SMA type III. Here we report results of a whole-genome scan for linkage in the feline SMA pedigree using recently developed species-specific and comparative mapping resources. We identified a novel SMA gene candidate, LIX1, in an approximately140-kb deletion on feline chromosome A1q in a region of conserved synteny to human chromosome 5q15. Though LIX1 function is unknown, the predicted secondary structure is compatible with a role in RNA metabolism. LIX1 expression is largely restricted to the central nervous system, primarily in spinal motor neurons, thus offering explanation of the tissue restriction of pathology in feline SMA. An exon sequence screen of 25 human SMA cases, not otherwise explicable by mutations at the SMN1 locus, failed to identify comparable LIX1 mutations. Nonetheless, a LIX1-associated etiology in feline SMA implicates a previously undetected mechanism of motor neuron maintenance and mandates consideration of LIX1 as a candidate gene in human SMA when SMN1 mutations are not found.

...read moreread less

48 citations

Journal Article•DOI•

Analysis of families with common variable immunodeficiency (CVID) and IgA deficiency suggests linkage of CVID to chromosome 16q

[...]

Alejandro A. Schäffer¹, Jessica Pfannstiel², A. David B. Webster³, Alessandro Plebani⁴, Lennart Hammarström⁵, Bodo Grimbacher² - Show less +2 more•Institutions (5)

National Institutes of Health¹, University of Freiburg², Royal Free Hospital³, University of Brescia⁴, Karolinska Institutet⁵

01 Feb 2006-Human Genetics

TL;DR: Evidence of a CVID locus on chromosome 16q with autosomal dominant inheritance is presented and the peak (model-based) LOD score for the best marker D16S518 is 2.83, and the NPL score using the same markers peaks at the same location with a value of 3.38.

...read moreread less

Abstract: Common variable immunodeficiency (CVID) is an antibody deficiency syndrome that often co-occurs in families with selective IgA deficiency (IgAD). Vorechovský et al. (Am J Hum Genet 64:1096-1109, 1999; J Immunol 164:4408-4416, 2000) ascertained and genotyped 101 multiplex IgAD families and used them to identify and fine map the IGAD1 locus on chromosome 6p. We analyzed the original genotype data in a subset of families with at least one case of CVID and present evidence of a CVID locus on chromosome 16q with autosomal dominant inheritance. The peak (model-based) LOD score for the best marker D16S518 is 2.83 at theta=0.07, and a 4-marker LOD score under heterogeneity peaks at 3.00 with alpha=0.68. The (model-free) NPL score using the same markers peaks at the same location with a value of 3.38 (P=0.0001).

...read moreread less

Journal Article•DOI•

A 1.3-Mb interval map of equine homologs of HSA2.

[...]

M. L. Wagner¹, Terje Raudsepp², Glenda Goh², Richa Agarwala³, Alejandro A. Schäffer³, Patricia K. Dranchak¹, Candice Brinkmeyer-Langford², Loren C. Skow², Bhanu P. Chowdhary², James R. Mickelson - Show less +6 more•Institutions (3)

University of Minnesota¹, Texas A&M University², National Institutes of Health³

01 Feb 2006-Cytogenetic and Genome Research

TL;DR: The assignment of 140 new markers to the equine radiation hybrid (RH) map, and the anchoring of 24 of these markers to horse chromosomes by FISH are described, which have a three-fold increase in the number of mapped markers compared to previous maps of these chromosomes.

...read moreread less

Abstract: A comparative approach that utilizes information from more densely mapped or sequenced genomes is a proven and efficient means to increase our knowledge of the structure of the horse genome. Human chromosome 2 (HSA2), the second largest human chromosome, comprising 243 Mb, and containing 1246 known genes, corresponds to all or parts of three equine chromosomes. This report describes the assignment of 140 new markers (78 genes and 62 microsatellites) to the equine radiation hybrid (RH) map, and the anchoring of 24 of these markers to horse chromosomes by FISH. The updated equine RH maps for ECA6p, ECA15, and ECA18 resulting from this work have one, two, and three RH linkage groups, respectively, per chromosome/chromosome-arm. These maps have a three-fold increase in the number of mapped markers compared to previous maps of these chromosomes, and an increase in the average marker density to one marker per 1.3 Mb. Comparative maps of ECA6p, ECA15, and ECA18 with human, chimpanzee, dog, mouse, rat, and chicken genomes reveal blocks of conserved synteny across mammals and vertebrates.

...read moreread less

Journal Article•DOI•

HLA B44 is associated with decreased severity of autoimmune lymphoproliferative syndrome in patients with CD95 defects (ALPS type Ia).

[...]

Marla M. Vacek¹, Alejandro A. Schäffer¹, Joie Davis¹, Roxanne Fischer¹, Janet K. Dale¹, Sharon Adams¹, Sharon E. Straus¹, Jennifer M. Puck¹ - Show less +4 more•Institutions (1)

National Institutes of Health¹

01 Jan 2006-Clinical Immunology

TL;DR: The B44 allele may exert a protective role in ALPS, and among the healthier, mutation-bearing individuals, transmission of HLA B44 was significantly overrepresented (nominal P<0.0074) as compared to transmission in patients with severe clinical features of ALPS.

...read moreread less

function for motor neuron survival LIX1 implies an essential An ~140-kb deletion associated with feline spinal muscular atrophy

[...]

Brunhilde Wirth, William J. Murphy, William J. Wedemeyer, Brittany L. Gregory, Bethany G. Buzzell, Colin A. Fyfe, Marilyn A. Menotti-Raymond, Victor A. David, Lars Brichta, Alejandro A. Schäffer - Show less +6 more

01 Jan 2006