Author
Edward J. Kulbokas
Other affiliations: Harvard University, Broad Institute
Bio: Edward J. Kulbokas is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Genome & Single-nucleotide polymorphism. The author has an hindex of 10, co-authored 10 publications receiving 14109 citations. Previous affiliations of Edward J. Kulbokas include Harvard University & Broad Institute.
Papers
More filters
••
TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.
Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
6,643 citations
••
Kerstin Lindblad-Toh1, Claire M. Wade2, Claire M. Wade1, Tarjei S. Mikkelsen1 +238 more•Institutions (11)
TL;DR: A high-quality draft genome sequence of the domestic dog is reported, together with a dense map of single nucleotide polymorphisms (SNPs) across breeds, to shed light on the structure and evolution of genomes and genes.
Abstract: Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.
2,431 citations
••
TL;DR: In this article, a comparative analysis of the human, mouse, rat and dog genomes is presented to create a systematic catalogue of common regulatory motifs in promoters and 3' untranslated regions (3' UTRs).
Abstract: Comprehensive identification of all functional elements encoded in the human genome is a fundamental need in biomedical research. Here, we present a comparative analysis of the human, mouse, rat and dog genomes to create a systematic catalogue of common regulatory motifs in promoters and 3' untranslated regions (3' UTRs). The promoter analysis yields 174 candidate motifs, including most previously known transcription-factor binding sites and 105 new motifs. The 3'-UTR analysis yields 106 motifs likely to be involved in post-transcriptional regulation. Nearly one-half are associated with microRNAs (miRNAs), leading to the discovery of many new miRNA genes and their likely target genes. Our results suggest that previous estimates of the number of human miRNA genes were low, and that miRNAs regulate at least 20% of human genes. The overall results provide a systematic view of gene regulation in the human, which will be refined as additional mammalian genomes become available.
1,954 citations
••
TL;DR: Methylation patterns at orthologous loci are strongly conserved between human and mouse even though many methylated sites do not show sequence conservation notably higher than background, which suggests that the DNA elements that direct the methylation represent only a small fraction of the region or lie at some distance from the site.
1,536 citations
••
TL;DR: This work seeks to design a systematic approach for LD mapping and applies it to the localization of a gene (IBD5) conferring susceptibility to Crohn disease, and binds the region to a common haplotype that shows strong association with the disease and contains the cytokine gene cluster.
Abstract: Linkage disequilibrium (LD) mapping provides a powerful method for fine-structure localization of rare disease genes, but has not yet been widely applied to common disease1. We sought to design a systematic approach for LD mapping and apply it to the localization of a gene (IBD5) conferring susceptibility to Crohn disease. The key issues are: (i) to detect a significant LD signal (ii) to rigorously bound the critical region and (iii) to identify the causal genetic variant within this region. We previously mapped the IBD5 locus to a large region spanning 18 cM of chromosome 5q31 (P<10−4). Using dense genetic maps of microsatellite markers and single-nucleotide polymorphisms (SNPs) across the entire region, we found strong evidence of LD. We bound the region to a common haplotype spanning 250 kb that shows strong association with the disease (P<2×10−7) and contains the cytokine gene cluster. This finding provides overwhelming evidence that a specific common haplotype of the cytokine region in 5q31 confers susceptibility to Crohn disease. However, genetic evidence alone is not sufficient to identify the causal mutation within this region, as strong LD across the region results in multiple SNPs having equivalent genetic evidence—each consistent with the expected properties of the IBD5 locus. These results have important implications for Crohn disease in particular and LD mapping in general.
800 citations
Cited by
More filters
••
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
34,830 citations
••
TL;DR: The current understanding of miRNA target recognition in animals is outlined and the widespread impact of miRNAs on both the expression and evolution of protein-coding genes is discussed.
18,036 citations
••
TL;DR: Haploview is a software package that provides computation of linkage disequilibrium statistics and population haplotype patterns from primary genotype data in a visually appealing and interactive interface.
Abstract: Summary: Research over the last few years has revealed significant haplotype structure in the human genome. The characterization of these patterns, particularly in the context of medical genetic association studies, is becoming a routine research activity. Haploview is a software package that provides computation of linkage disequilibrium statistics and population haplotype patterns from primary genotype data in a visually appealing and interactive interface.
Availability: http://www.broad.mit.edu/mpg/haploview/
Contact: jcbarret@broad.mit.edu
13,862 citations
••
TL;DR: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences.
Abstract: Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
13,223 citations
••
TL;DR: In a four-genome analysis of 3' UTRs, approximately 13,000 regulatory relationships were detected above the estimate of false-positive predictions, thereby implicating as miRNA targets more than 5300 human genes, which represented 30% of the gene set.
11,624 citations