scispace - formally typeset
Search or ask a question
Author

Michael Kamal

Bio: Michael Kamal is an academic researcher from Broad Institute. The author has contributed to research in topics: Genome & Human genome. The author has an hindex of 17, co-authored 19 publications receiving 20921 citations. Previous affiliations of Michael Kamal include Harvard University & Massachusetts Institute of Technology.

Papers
More filters
Journal ArticleDOI
Robert H. Waterston1, Kerstin Lindblad-Toh2, Ewan Birney, Jane Rogers3  +219 moreInstitutions (26)
05 Dec 2002-Nature
TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.
Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

6,643 citations

Journal ArticleDOI
21 Apr 2006-Cell
TL;DR: It is proposed that bivalent domains silence developmental genes in ES cells while keeping them poised for activation, highlighting the importance of DNA sequence in defining the initial epigenetic landscape and suggesting a novel chromatin-based mechanism for maintaining pluripotency.

5,131 citations

Journal ArticleDOI
08 Dec 2005-Nature
TL;DR: A high-quality draft genome sequence of the domestic dog is reported, together with a dense map of single nucleotide polymorphisms (SNPs) across breeds, to shed light on the structure and evolution of genomes and genes.
Abstract: Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.

2,431 citations

Journal ArticleDOI
Elise A. Feingold1, Peter J. Good1, Mark S. Guyer1, S. Kamholz1  +193 moreInstitutions (19)
22 Oct 2004-Science
TL;DR: The ENCyclopedia Of DNA Elements (ENCODE) Project is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function.
Abstract: The ENCyclopedia Of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome sequence. The pilot phase of the Project is focused on a specified 30 megabases (∼1%) of the human genome sequence and is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function. The results of this pilot phase will guide future efforts to analyze the entire human genome.

2,248 citations

Journal ArticleDOI
24 Apr 2003-Nature
TL;DR: A high-quality draft sequence of the N. crassa genome is reported, suggesting that RIP has had a profound impact on genome evolution, greatly slowing the creation of new genes through genomic duplication and resulting in a genome with an unusually low proportion of closely related genes.
Abstract: Neurospora crassa is a central organism in the history of twentieth-century genetics, biochemistry and molecular biology. Here, we report a high-quality draft sequence of the N. crassa genome. The approximately 40-megabase genome encodes about 10,000 protein-coding genes—more than twice as many as in the fission yeast Schizosaccharomyces pombe and only about 25% fewer than in the fruitfly Drosophila melanogaster. Analysis of the gene set yields insights into unexpected aspects of Neurospora biology including the identification of genes potentially associated with red light photobiology, genes implicated in secondary metabolism, and important differences in Ca21 signalling as compared with plants and animals. Neurospora possesses the widest array of genome defence mechanisms known for any eukaryotic organism, including a process unique to fungi called repeat-induced point mutation (RIP). Genome analysis suggests that RIP has had a profound impact on genome evolution, greatly slowing the creation of new genes through genomic duplication and resulting in a genome with an unusually low proportion of closely related genes.

1,659 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences.
Abstract: Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.

13,223 citations

Journal ArticleDOI
14 Jan 2005-Cell
TL;DR: In a four-genome analysis of 3' UTRs, approximately 13,000 regulatory relationships were detected above the estimate of false-positive predictions, thereby implicating as miRNA targets more than 5300 human genes, which represented 30% of the gene set.

11,624 citations

Journal ArticleDOI
TL;DR: In this article, the authors present an approach for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.

10,798 citations

Journal ArticleDOI
23 Feb 2007-Cell
TL;DR: The surface of nucleosomes is studded with a multiplicity of modifications that can dictate the higher-order chromatin structure in which DNA is packaged and can orchestrate the ordered recruitment of enzyme complexes to manipulate DNA.

10,046 citations

Journal ArticleDOI
TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.
Abstract: We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.

9,389 citations