scispace - formally typeset
Search or ask a question
Author

Lei Zhang

Other affiliations: Life Technologies
Bio: Lei Zhang is an academic researcher from University of Oviedo. The author has contributed to research in topics: Carbyne & Ligand. The author has an hindex of 10, co-authored 13 publications receiving 763 citations. Previous affiliations of Lei Zhang include Life Technologies.
Topics: Carbyne, Ligand, Human genome, Cannabis, Acetonitrile

Papers
More filters
Journal ArticleDOI
TL;DR: Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual.
Abstract: We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding approximately 18x haploid coverage of aligned sequence and close to 300x clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed mate-paired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.

595 citations

Journal ArticleDOI
TL;DR: The first next generation sequencing survey of the microbial communities found in dispensary based Cannabis flowers is described and the limitations in the culture-based regulations that are being superimposed from the food industry are demonstrated.
Abstract: The Center for Disease Control estimates 128,000 people in the U.S. are hospitalized annually due to food borne illnesses. This has created a demand for food safety testing targeting the detection of pathogenic mold and bacteria on agricultural products. This risk extends to medical Cannabis and is of particular concern with inhaled, vaporized and even concentrated Cannabis products . As a result, third party microbial testing has become a regulatory requirement in the medical and recreational Cannabis markets, yet knowledge of the Cannabis microbiome is limited. Here we describe the first next generation sequencing survey of the fungal communities found in dispensary based Cannabis flowers by ITS2 sequencing, and demonstrate the sensitive detection of several toxigenic Penicillium and Aspergillus species, including P. citrinum and P. paxilli, that were not detected by one or more culture-based methods currently in use for safety testing.

41 citations

Journal ArticleDOI
TL;DR: This study evaluated two widely used culture-based platforms for total yeast and mold testing marketed by 3M Corporation and Biomérieux in comparison with a quantitative PCR approach marketed by Medicinal Genomics Corporation, finding substantial shifts in the number and diversity of species present.
Abstract: Background: The presence of bacteria and fungi in medicinal or recreational Cannabis poses a potential threat to consumers if those microbes include pathogenic or toxigenic species. This study evaluated two widely used culture-based platforms for total yeast and mold (TYM) testing marketed by 3M Corporation and Biomerieux, in comparison with a quantitative PCR (qPCR) approach marketed by Medicinal Genomics Corporation. Methods: A set of 15 medicinal Cannabis samples were analyzed using 3M and Biomerieux culture-based platforms and by qPCR to quantify microbial DNA. All samples were then subjected to next-generation sequencing and metagenomics analysis to enumerate the bacteria and fungi present before and after growth on culture-based media. Results: Several pathogenic or toxigenic bacterial and fungal species were identified in proportions of >5% of classified reads on the samples, including Acinetobacter baumannii, Escherichia coli, Pseudomonas aeruginosa, Ralstonia pickettii, Salmonella enterica, Stenotrophomonas maltophilia, Aspergillus ostianus, Aspergillus sydowii, Penicillium citrinum and Penicillium steckii. Samples subjected to culture showed substantial shifts in the number and diversity of species present, including the failure of Aspergillus species to grow well on either platform. Substantial growth of Clostridium botulinum and other bacteria were frequently observed on one or both of the culture-based TYM platforms. The presence of plant growth promoting (beneficial) fungal species further influenced the differential growth of species in the microbiome of each sample. Conclusions: These findings have important implications for the Cannabis and food safety testing industries.

31 citations

Journal ArticleDOI
TL;DR: In this article, neutral trans-cyanide alkenylcarbyne complexes 2a and 2b have been prepared by reaction of the complex 1a and 1b with NaCN or [Bu4N]CN.
Abstract: Neutral trans-cyanide alkenylcarbyne complexes 2a and 2b have been prepared by reaction of the complex 1a and 1b with NaCN or [Bu4N]CN. The reaction of complexes 2a and 2b with an equimolar amount of the acetonitrile complexes 1a and 1b in CH2Cl2 leads to the cationic cyanide-bridged bis(alkenylcarbyne) di-tungsten complexes 3a–d. Diisocyanide-bridged bis(alkenylcarbyne) di-tungsten complexes 4a and 4b have been synthesized by the reaction of complexes 1a and 1b with 0.5 equivalents of the diisocyanide 1,4-(CN)2C6H4. IR as well as 1H-, 31P{1H}-, 13C{1H}-, and 183W-NMR data are reported. The spectroscopic data show that in the dinuclear complexes 3a–d, the bridging CN group and the alkenylcarbyne units are located in trans positions, while in the dinuclear complexes 4a and b, the isocyanide groups of the bridging ligand 1,4-(CN)2C6H4 and the two alkenylcarbyne moieties are cis. The 183W chemical shifts of complexes 2a, 2b, 3a–d, 4a, and 4b were obtained through two-dimensional indirect 31P, 183W NMR recording techniques. A downfield shifting of 183W resonances of the cyanide-bridged dinuclear complexes 3a–d with respect to the mononuclear ones, 2a and 2b, was observed. The δ183W of isocyanide bridging dinuclear complexes 4a and 4b appear at higher field than those of the corresponding mononuclear cyanide 2a and 2b in accordance with the higher π-acceptor electron properties of the isocyanide ligand. The electrochemical behaviour of all the complexes has been investigated by cyclic voltammetry and controlled potential electrolysis in aprotic media and at a Pt (or vitreous C) electrode. Complexes 1, 2, or 3 undergo multi-electron irreversible oxidation processes involving anodically induced proton dissociation from the alkenylcarbyne ligands, and irreversible cathodic processes are also observed for all the complexes. The order of the redox potentials reflects that of the net electron π-acceptor/σ-donor character of the ligands and the ligating alkenylcarbynes are shown to behave as remarkably strong π-electron acceptors (even stronger than CO).

18 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

20,557 citations

Journal ArticleDOI
TL;DR: A unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs is presented.
Abstract: Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.

10,056 citations

Journal ArticleDOI
TL;DR: A technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments is presented.
Abstract: Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing (NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.

7,023 citations

Journal ArticleDOI
21 Jul 2011-Nature
TL;DR: A DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes, showing its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.
Abstract: The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.

2,246 citations

Journal ArticleDOI
TL;DR: It is argued that the long-term goal should be routine, cost-effective and high quality de novo assembly of human genomes to comprehensively assess all classes of structural variation.
Abstract: Comparisons of human genomes show that more base pairs are altered as a result of structural variation — including copy number variation — than as a result of point mutations. Here we review advances and challenges in the discovery and genotyping of structural variation. The recent application of massively parallel sequencing methods has complemented microarray-based methods and has led to an exponential increase in the discovery of smaller structural-variation events. Some global discovery biases remain, but the integration of experimental and computational approaches is proving fruitful for accurate characterization of the copy, content and structure of variable regions. We argue that the long-term goal should be routine, cost-effective and high quality de novo assembly of human genomes to comprehensively assess all classes of structural variation.

1,384 citations