scispace - formally typeset
Search or ask a question
Author

Sarah Siu Tze Mak Mak

Bio: Sarah Siu Tze Mak Mak is an academic researcher from University of Copenhagen. The author has contributed to research in topics: Illumina dye sequencing & Nuclear DNA. The author has an hindex of 1, co-authored 1 publications receiving 258 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The observations suggest that the BGISEQ-500 holds the potential to represent a valid and potentially valuable alternative platform for palaeogenomic data generation that is worthy of future exploration by those interested in the sequencing and analysis of degraded DNA.
Abstract: Ancient DNA research has been revolutionized following development of next-generation sequencing platforms. Although a number of such platforms have been applied to ancient DNA samples, the Illumina series are the dominant choice today, mainly because of high production capacities and short read production. Recently a potentially attractive alternative platform for palaeogenomic data generation has been developed, the BGISEQ-500, whose sequence output are comparable with the Illumina series. In this study, we modified the standard BGISEQ-500 library preparation specifically for use on degraded DNA, then directly compared the sequencing performance and data quality of the BGISEQ-500 to the Illumina HiSeq2500 platform on DNA extracted from 8 historic and ancient dog and wolf samples. The data generated were largely comparable between sequencing platforms, with no statistically significant difference observed for parameters including level (P = 0.371) and average sequence length (P = 0718) of endogenous nuclear DNA, sequence GC content (P = 0.311), double-stranded DNA damage rate (v. 0.309), and sequence clonality (P = 0.093). Small significant differences were found in single-strand DNA damage rate (δS; slightly lower for the BGISEQ-500, P = 0.011) and the background rate of difference from the reference genome (θ; slightly higher for BGISEQ-500, P = 0.012). This may result from the differences in amplification cycles used to polymerase chain reaction-amplify the libraries. A significant difference was also observed in the mitochondrial DNA percentages recovered (P = 0.018), although we believe this is likely a stochastic effect relating to the extremely low levels of mitochondria that were sequenced from 3 of the samples with overall very low levels of endogenous DNA. Although we acknowledge that our analyses were limited to animal material, our observations suggest that the BGISEQ-500 holds the potential to represent a valid and potentially valuable alternative platform for palaeogenomic data generation that is worthy of future exploration by those interested in the sequencing and analysis of degraded DNA.

282 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The high accuracy and technical reproducibility confirm the applicability of the new high-throughput sequencing platform BGISEQ-500 for metagenomic studies, though caution is still warranted when combining meetagenomic data from different platforms.
Abstract: Background More extensive use of metagenomic shotgun sequencing in microbiome research relies on the development of high-throughput, cost-effective sequencing. Here we present a comprehensive evaluation of the performance of the new high-throughput sequencing platform BGISEQ-500 for metagenomic shotgun sequencing and compare its performance with that of 2 Illumina platforms. Findings Using fecal samples from 20 healthy individuals, we evaluated the intra-platform reproducibility for metagenomic sequencing on the BGISEQ-500 platform in a setup comprising 8 library replicates and 8 sequencing replicates. Cross-platform consistency was evaluated by comparing 20 pairwise replicates on the BGISEQ-500 platform vs the Illumina HiSeq 2000 platform and the Illumina HiSeq 4000 platform. In addition, we compared the performance of the 2 Illumina platforms against each other. By a newly developed overall accuracy quality control method, an average of 82.45 million high-quality reads (96.06% of raw reads) per sample, with 90.56% of bases scoring Q30 and above, was obtained using the BGISEQ-500 platform. Quantitative analyses revealed extremely high reproducibility between BGISEQ-500 intra-platform replicates. Cross-platform replicates differed slightly more than intra-platform replicates, yet a high consistency was observed. Only a low percentage (2.02%-3.25%) of genes exhibited significant differences in relative abundance comparing the BGISEQ-500 and HiSeq platforms, with a bias toward genes with higher GC content being enriched on the HiSeq platforms. Conclusions Our study provides the first set of performance metrics for human gut metagenomic sequencing data using BGISEQ-500. The high accuracy and technical reproducibility confirm the applicability of the new platform for metagenomic studies, though caution is still warranted when combining metagenomic data from different platforms.

353 citations

Journal ArticleDOI
TL;DR: A comparative assessment of DL tools against other existing techniques, with respect to decision accuracy, data size requirement, and applicability in various scenarios is provided.

350 citations

Journal ArticleDOI
TL;DR: Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests, and the work suggests that robust results from WGS studies will require large cohorts and strategies that consider the substantial multiple-testing burden.
Abstract: Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.

219 citations

Journal ArticleDOI
07 Dec 2018-Science
TL;DR: Analysis of the oldest genomes suggests that there was an early split within Beringian populations, giving rise to the Northern and Southern lineages, and that the early population spread widely and rapidly suggests that their access to large portions of the hemisphere was essentially unrestricted, yet there are genomic and archaeological hints of an earlier human presence.
Abstract: Studies of the peopling of the Americas have focused on the timing and number of initial migrations. Less attention has been paid to the subsequent spread of people within the Americas. We sequenced 15 ancient human genomes spanning from Alaska to Patagonia; six are ≥10,000 years old (up to ~18× coverage). All are most closely related to Native Americans, including those from an Ancient Beringian individual and two morphologically distinct "Paleoamericans." We found evidence of rapid dispersal and early diversification that included previously unknown groups as people moved south. This resulted in multiple independent, geographically uneven migrations, including one that provides clues of a Late Pleistocene Australasian genetic signal, as well as a later Mesoamerican-related expansion. These led to complex and dynamic population histories from North to South America.

211 citations

Journal ArticleDOI
TL;DR: A fully convolutional neural network is used to create time-resolved three-dimensional dense segmentations of heart images that can efficiently predict human survival.
Abstract: Motion analysis is used in computer vision to understand the behaviour of moving objects in sequences of images. Optimising the interpretation of dynamic biological systems requires accurate and precise motion tracking as well as efficient representations of high-dimensional motion trajectories so that these can be used for prediction tasks. Here we use image sequences of the heart, acquired using cardiac magnetic resonance imaging, to create time-resolved three-dimensional segmentations using a fully convolutional network trained on anatomical shape priors. This dense motion model formed the input to a supervised denoising autoencoder (4Dsurvival), which is a hybrid network consisting of an autoencoder that learns a task-specific latent code representation trained on observed outcome data, yielding a latent representation optimised for survival prediction. To handle right-censored survival outcomes, our network used a Cox partial likelihood loss function. In a study of 302 patients the predictive accuracy (quantified by Harrell's C-index) was significantly higher (p = .0012) for our model C=0.75 (95% CI: 0.70 - 0.79) than the human benchmark of C=0.59 (95% CI: 0.53 - 0.65). This work demonstrates how a complex computer vision task using high-dimensional medical image data can efficiently predict human survival.

189 citations