Home
/
Authors
/
Iain MacCallum

Author

Iain MacCallum

Bio: Iain MacCallum is an academic researcher from Broad Institute. The author has contributed to research in topics: Genome & Sequence assembly. The author has an hindex of 15, co-authored 16 publications receiving 9423 citations.

Topics: Genome, Sequence assembly, Hybrid genome assembly, Genomics, Genome project ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Evolution of genes and genomes on the Drosophila phylogeny.

[...]

Andrew G. Clark¹, Michael B. Eisen², Michael B. Eisen³, Douglas Smith +426 more•Institutions (70)

08 Nov 2007-Nature

TL;DR: These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution.

...read moreread less

Abstract: Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.

...read moreread less

2,057 citations

Journal Article•DOI•

High-quality draft assemblies of mammalian genomes from massively parallel sequence data

[...]

Sante Gnerre¹, Iain MacCallum, Dariusz Przybylski, Filipe J. Ribeiro, Joshua N. Burton, Bruce J. Walker, Ted Sharpe, Giles Hall, Terrance Shea, Sean M. Sykes, Aaron M. Berlin, Daniel Aird, Maura Costello, Riza M. Daza, Louise Williams, Robert Nicol, Andreas Gnirke, Chad Nusbaum, Eric S. Lander, David B. Jaffe - Show less +16 more•Institutions (1)

Broad Institute¹

25 Jan 2011-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform, have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome.

...read moreread less

Abstract: Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.

...read moreread less

1,616 citations

Journal Article•DOI•

ALLPATHS: de novo assembly of whole-genome shotgun microreads.

[...]

Jonathan Butler¹, Iain MacCallum, Michael Kleber, Ilya Shlyakhter, Matthew K. Belmonte, Eric S. Lander, Chad Nusbaum, David B. Jaffe - Show less +4 more•Institutions (1)

Broad Institute¹

01 May 2008-Genome Research

TL;DR: A general method for genome assembly that can be applied to all types of DNA sequence data, not only short read data, but also conventional sequence reads is described.

...read moreread less

Abstract: New DNA sequencing technologies deliver data at dramatically lower costs but demand new analytical methods to take full advantage of the very short reads that they produce. We provide an initial, theoretical solution to the challenge of de novo assembly from whole-genome shotgun “microreads.” For 11 genomes of sizes up to 39 Mb, we generated high-quality assemblies from 80× coverage by paired 30-base simulated reads modeled after real Illumina-Solexa reads. The bacterial genomes of Campylobacter jejuni and Escherichia coli assemble optimally, yielding single perfect contigs, and larger genomes yield assemblies that are highly connected and accurate. Assemblies are presented in a graph form that retains intrinsic ambiguities such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. For both C. jejuni and E. coli, this assembly graph is a single edge encompassing the entire genome. Larger genomes produce more complicated graphs, but the vast majority of the bases in their assemblies are present in long edges that are nearly always perfect. We describe a general method for genome assembly that can be applied to all types of DNA sequence data, not only short read data, but also conventional sequence reads.

...read moreread less

880 citations

Journal Article•DOI•

The genomic substrate for adaptive radiation in African cichlid fish

[...]

David Brawand¹, David Brawand², Catherine E. Wagner³, Catherine E. Wagner⁴, Yang I. Li¹, Milan Malinsky⁵, Milan Malinsky⁶, Irene Keller⁴, Shaohua Fan⁷, Oleg Simakov⁷, Alvin Yu Jin Ng⁸, Zhi Wei Lim⁸, Etienne Bezault⁹, Jason Turner-Maier², Jeremy A. Johnson², Rosa Alcazar¹⁰, Hyun Ji Noh², Pamela Russell¹¹, Bronwen Aken⁶, Jessica Alföldi², Chris T. Amemiya¹², Naoual Azzouzi¹³, Jean-François Baroiller, Frédérique Barloy-Hubler¹³, Aaron M. Berlin², Ryan F. Bloomquist¹⁴, Karen L. Carleton¹⁵, Matthew A. Conte¹⁵, Helena D'Cotta, Orly Eshel, Leslie Gaffney², Francis Galibert¹³, Hugo F. Gante¹⁶, Sante Gnerre², Lucie Greuter³, Lucie Greuter⁴, Richard Guyon¹³, Natalie S. Haddad¹⁴, Wilfried Haerty¹, Robert M Harris¹⁷, Hans A. Hofmann¹⁷, Thibaut Hourlier⁶, Gideon Hulata, David B. Jaffe², Marcia Lara², Alison P. Lee⁸, Iain MacCallum², Salome Mwaiko³, Masato Nikaido¹⁸, Hidenori Nishihara¹⁸, Catherine Ozouf-Costaz¹⁹, David J. Penman²⁰, Dariusz Przybylski², Michaelle Rakotomanga¹³, Suzy C. P. Renn⁹, Filipe J. Ribeiro², Micha Ron, Walter Salzburger¹⁶, Luis Sanchez-Pulido¹, M. Emília Santos¹⁶, Steve Searle⁶, Ted Sharpe², Ross Swofford², Frederick J. Tan²¹, Louise Williams², Sarah Young², Shuangye Yin², Norihiro Okada²², Norihiro Okada¹⁸, Thomas D. Kocher¹⁵, Eric A. Miska⁵, Eric S. Lander², Byrappa Venkatesh⁸, Russell D. Fernald¹⁰, Axel Meyer⁷, Chris P. Ponting¹, J. Todd Streelman¹⁴, Kerstin Lindblad-Toh², Kerstin Lindblad-Toh²³, Ole Seehausen⁴, Ole Seehausen³, Federica Di Palma²⁴, Federica Di Palma² - Show less +79 more•Institutions (24)

University of Oxford¹, Broad Institute², Swiss Federal Institute of Aquatic Science and Technology³, University of Bern⁴, Wellcome Trust/Cancer Research UK Gurdon Institute⁵, Wellcome Trust Sanger Institute⁶, University of Konstanz⁷, Agency for Science, Technology and Research⁸, Reed College⁹, Stanford University¹⁰, California Institute of Technology¹¹, Benaroya Research Institute¹², University of Rennes¹³, Georgia Institute of Technology¹⁴, University of Maryland, College Park¹⁵, University of Basel¹⁶, University of Texas at Austin¹⁷, Tokyo Institute of Technology¹⁸, National Museum of Natural History¹⁹, University of Stirling²⁰, Carnegie Institution for Science²¹, National Cheng Kung University²², Science for Life Laboratory²³, Norwich University²⁴

18 Sep 2014-Nature

TL;DR: This article found an excess of gene duplications in the East African lineage compared to Nile tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs.

...read moreread less

Abstract: Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification.

...read moreread less

832 citations

Journal Article•DOI•

Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences

[...]

Tarjei S. Mikkelsen¹, Tarjei S. Mikkelsen², Matthew Wakefield³, Bronwen Aken⁴ +235 more•Institutions (21)

10 May 2007-Nature

TL;DR: A high-quality draft of the genome sequence of the grey, short-tailed opossum is reported, indicating a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation.

...read moreread less

Abstract: We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.

...read moreread less

724 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing

[...]

Anton Bankevich¹, Sergey Nurk, Dmitry Antipov, Alexey Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, Sergey I. Nikolenko, Son Pham, Andrey D. Prjibelski, Alexey V. Pyshkin, Alexander Sirotkin, Nikolay Vyahhi, Glenn Tesler, Max A. Alekseyev, Pavel A. Pevzner - Show less +12 more•Institutions (1)

Saint Petersburg Academic University¹

07 May 2012-Journal of Computational Biology

TL;DR: SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies.

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V−SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online (http://bioinf.spbau.ru/spades). It is distributed as open source software.

...read moreread less

16,859 citations

Journal Article•DOI•

Full-length transcriptome assembly from RNA-Seq data without a reference genome.

[...]

Manfred Grabherr¹, Brian J. Haas¹, Moran Yassour², Moran Yassour¹, Joshua Z. Levin¹, Dawn Thompson¹, Ido Amit¹, Xian Adiconis¹, Lin Fan¹, Raktima Raychowdhury¹, Qiandong Zeng¹, Zehua Chen¹, Evan Mauceli¹, Nir Hacohen¹, Andreas Gnirke¹, Nicholas Rhind³, Federica Di Palma¹, Bruce W. Birren¹, Chad Nusbaum¹, Kerstin Lindblad-Toh⁴, Kerstin Lindblad-Toh¹, Nir Friedman², Aviv Regev¹ - Show less +19 more•Institutions (4)

Massachusetts Institute of Technology¹, Hebrew University of Jerusalem², University of Massachusetts Medical School³, Science for Life Laboratory⁴

01 Jul 2011-Nature Biotechnology

TL;DR: The Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available, providing a unified solution for transcriptome reconstruction in any sample.

...read moreread less

Abstract: Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.

...read moreread less

15,665 citations

Journal Article•DOI•

featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features

[...]

Yang Liao¹, Gordon K. Smyth¹, Wei Shi¹•Institutions (1)

Walter and Eliza Hall Institute of Medical Research¹

01 Apr 2014-Bioinformatics

TL;DR: FeatureCounts as discussed by the authors is a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments, which implements highly efficient chromosome hashing and feature blocking techniques.

...read moreread less

Abstract: MOTIVATION: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. RESULTS: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. AVAILABILITY AND IMPLEMENTATION: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

...read moreread less

14,103 citations

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

[...]

Glenn Tesler

01 Jun 2012

TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

...read moreread less

10,124 citations

Journal Article•DOI•

FLASH: Fast Length Adjustment of Short Reads to Improve Genome Assemblies

[...]

Tanja Magoc¹, Steven L. Salzberg¹•Institutions (1)

Johns Hopkins University School of Medicine¹

01 Nov 2011-Bioinformatics

TL;DR: FLASH is a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short and when FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds.

...read moreread less

Abstract: Motivation: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. Results: We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds. Availability and Implementation: The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash. Contact: moc.liamg@cogam.t

...read moreread less

9,827 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse