scispace - formally typeset
Search or ask a question
Author

Marian Thomson

Bio: Marian Thomson is an academic researcher from University of Edinburgh. The author has contributed to research in topics: Genome & Nanopore sequencing. The author has an hindex of 14, co-authored 23 publications receiving 2735 citations.

Papers
More filters
Journal ArticleDOI
Kanchon K. Dasmahapatra1, James R. Walters2, Adriana D. Briscoe3, John W. Davey, Annabel Whibley, Nicola J. Nadeau2, Aleksey V. Zimin4, Daniel S.T. Hughes5, Laura Ferguson5, Simon H. Martin2, Camilo Salazar2, Camilo Salazar6, James J. Lewis3, Sebastian Adler7, Seung-Joon Ahn8, Dean A. Baker9, Simon W. Baxter2, Nicola Chamberlain10, Ritika Chauhan11, Brian A. Counterman12, Tamas Dalmay11, Lawrence E. Gilbert13, Karl H.J. Gordon14, David G. Heckel8, Heather M. Hines5, Katharina J. Hoff7, Peter W. H. Holland5, Emmanuelle Jacquin-Joly15, Francis M. Jiggins, Robert T. Jones, Durrell D. Kapan16, Durrell D. Kapan17, Paul J. Kersey, Gerardo Lamas, Daniel Lawson, Daniel Mapleson11, Luana S. Maroja18, Arnaud Martin3, Simon Moxon19, William J. Palmer2, Riccardo Papa20, Alexie Papanicolaou14, Yannick Pauchet8, David A. Ray12, Neil Rosser1, Steven L. Salzberg21, Megan A. Supple22, Alison K. Surridge2, Ayşe Tenger-Trolander10, Heiko Vogel8, Paul A. Wilkinson23, Derek Wilson, James A. Yorke4, Furong Yuan3, Alexi Balmuth24, Cathlene Eland, Karim Gharbi, Marian Thomson, Richard A. Gibbs25, Yi Han25, Joy Jayaseelan25, Christie Kovar25, Tittu Mathew25, Donna M. Muzny25, Fiona Ongeri25, Ling-Ling Pu25, Jiaxin Qu25, Rebecca Thornton25, Kim C. Worley25, Yuanqing Wu25, Mauricio Linares26, Mark Blaxter, Richard H. ffrench-Constant27, Mathieu Joron, Marcus R. Kronforst10, Sean P. Mullen28, Robert D. Reed3, Steven E. Scherer25, Stephen Richards25, James Mallet10, James Mallet1, W. Owen McMillan, Chris D. Jiggins2, Chris D. Jiggins6 
05 Jul 2012-Nature
TL;DR: It is inferred that closely related Heliconius species exchange protective colour-pattern genes promiscuously, implying that hybridization has an important role in adaptive radiation.
Abstract: Sequencing of the genome of the butterfly Heliconius melpomene shows that closely related Heliconius species exchange protective colour-pattern genes promiscuously.

1,103 citations

Journal ArticleDOI
TL;DR: The rate and properties of new spontaneous mutations in Drosophila melanogaster are inferred by carrying out whole-genome shotgun sequencing-by-synthesis of three mutation accumulation lines that had been maintained by close inbreeding for an average of 262 generations, implying that any transcription-coupled repair process is weak.
Abstract: We inferred the rate and properties of new spontaneous mutations in Drosophila melanogaster by carrying out whole-genome shotgun sequencing-by-synthesis of three mutation accumulation (MA) lines that had been maintained by close inbreeding for an average of 262 generations. We tested for the presence of new mutations by generating alignments of each MA line to the D. melanogaster reference genome sequence and then compared these alignments base by base. We determined empirically that at least five reads at a site within each line are required for accurate single nucleotide mutation calling. We mapped a total of 174 single-nucleotide mutations, giving a single nucleotide mutation rate of 3.5 x 10(-9) per site per generation. There were no false positives in a random sample of 40 of these mutations checked by Sanger sequencing. Variation in the numbers of mutations among the MA lines was small and nonsignificant. Numbers of transition and transversion mutations were 86 and 88, respectively, implying that transition mutation rate is close to 2x the transversion rate. We observed 1.5x as many G or C --> A or T as A or T --> G or C mutations, implying that the G or C --> A or T mutation rate is close to 2x the A or T --> G or C mutation rate. The base composition of the genome is therefore not at an equilibrium determined solely by mutation. The predicted G + C content at mutational equilibrium (33%) is similar to that observed in transposable element remnants. Nearest-neighbor mutational context dependencies are nonsignificant, suggesting that this is a weak phenomenon in Drosophila. We also saw nonsignificant differences in the mutation rate between transcribed and untranscribed regions, implying that any transcription-coupled repair process is weak. Of seven short indel mutations confirmed, six were deletions, consistent with the deletion bias that is thought to exist in Drosophila.

384 citations

Journal ArticleDOI
TL;DR: The phylum Nematoda occupies a huge range of ecological niches, from free-living microbivores to human parasites, and more than 2,600 different known protein domains were identified, some of which had differential abundances between major taxonomic groups of nematodes.
Abstract: The phylum Nematoda occupies a huge range of ecological niches, from free-living microbivores to human parasites. We analyzed the genomic biology of the phylum using 265,494 expressed-sequence tag sequences, corresponding to 93,645 putative genes, from 30 species, including 28 parasites. From 35% to 70% of each species' genes had significant similarity to proteins from the model nematode Caenorhabditis elegans. More than half of the putative genes were unique to the phylum, and 23% were unique to the species from which they were derived. We have not yet come close to exhausting the genomic diversity of the phylum. We identified more than 2,600 different known protein domains, some of which had differential abundances between major taxonomic groups of nematodes. We also defined 4,228 nematode-specific protein families from nematode-restricted genes: this class of genes probably underpins species- and higher-level taxonomic disparity. Nematode-specific families are particularly interesting as drug and vaccine targets.

266 citations

Journal ArticleDOI
TL;DR: Results are presented showing that, under many realistic experimental designs, NGS of DNA pools from diploid individuals allows to estimate the allele frequencies at single nucleotide polymorphisms (SNPs) with at least the same accuracy as individual‐based analyses, for considerably lower library construction and sequencing efforts.
Abstract: Molecular markers produced by next-generation sequencing (NGS) technologies are revolutionizing genetic research. However, the costs of analysing large numbers of individual genomes remain prohibitive for most population genetics studies. Here, we present results based on mathematical derivations showing that, under many realistic experimental designs, NGS of DNA pools from diploid individuals allows to estimate the allele frequencies at single nucleotide polymorphisms (SNPs) with at least the same accuracy as individual-based analyses, for considerably lower library construction and sequencing efforts. These findings remain true when taking into account the possibility of substantially unequal contributions of each individual to the final pool of sequence reads. We propose the intuitive notion of effective pool size to account for unequal pooling and derive a Bayesian hierarchical model to estimate this parameter directly from the data. We provide a user-friendly application assessing the accuracy of allele frequency estimation from both pool- and individual-based NGS population data under various sampling, sequencing depth and experimental error designs. We illustrate our findings with theoretical examples and real data sets corresponding to SNP loci obtained using restriction site-associated DNA (RAD) sequencing in pool- and individual-based experiments carried out on the same population of the pine processionary moth (Thaumetopoea pityocampa). NGS of DNA pools might not be optimal for all types of studies but provides a cost-effective approach for estimating allele frequencies for very large numbers of SNPs. It thus allows comparison of genome-wide patterns of genetic variation for large numbers of individuals in multiple populations.

197 citations

Journal ArticleDOI
TL;DR: The accuracy, efficiency and robustness of three methods of genotyping single nucleotide polymorphisms on pooled DNAs are compared and an extension to the ACeDB database is described that facilitates management and analysis of the data generated by association studies.
Abstract: We have compared the accuracy, efficiency and robustness of three methods of genotyping single nucleotide polymorphisms on pooled DNAs. We conclude that (i) the frequencies of the two alleles in pools should be corrected with a factor for unequal allelic amplification, which should be estimated from the mean ratio of a set of heterozygotes (k); (ii) the repeatability of an assay is more important than pinpoint accuracy when estimating allele frequencies, and assays should therefore be optimised to increase the repeatability; and (iii) the size of a pool has a relatively small effect on the accuracy of allele frequency estimation. We therefore recommend that large pools are genotyped and replicated a minimum of four times. In addition, we describe statistical approaches to allow rigorous comparison of DNA pool results. Finally, we describe an extension to our ACeDB database that facilitates management and analysis of the data generated by association studies.

144 citations


Cited by
More filters
01 Jun 2012
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

10,124 citations

01 Aug 2000
TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.
Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

4,833 citations

Journal ArticleDOI
TL;DR: The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.
Abstract: Massively parallel short-read sequencing technologies, coupled with powerful software platforms, are enabling investigators to analyse tens of thousands of genetic markers. This wealth of data is rapidly expanding and allowing biological questions to be addressed with unprecedented scope and precision. The sizes of the data sets are now posing significant data processing and analysis challenges. Here we describe an extension of the Stacks software package to efficiently use genotype-by-sequencing data for studies of populations of organisms. Stacks now produces core population genomic summary statistics and SNP-by-SNP statistical tests. These statistics can be analysed across a reference genome using a smoothed sliding window. Stacks also now provides several output formats for several commonly used downstream analysis packages. The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.

2,958 citations

Journal Article
TL;DR: FastTree as mentioned in this paper uses sequence profiles of internal nodes in the tree to implement neighbor-joining and uses heuristics to quickly identify candidate joins, then uses nearest-neighbor interchanges to reduce the length of the tree.
Abstract: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

2,436 citations