scispace - formally typeset
Search or ask a question
Author

Bernard J. Pope

Bio: Bernard J. Pope is an academic researcher from University of Melbourne. The author has contributed to research in topics: Massive parallel sequencing & Lynch syndrome. The author has an hindex of 20, co-authored 65 publications receiving 1997 citations. Previous affiliations of Bernard J. Pope include Monash University, Clayton campus & Monash University.


Papers
More filters
Journal ArticleDOI
TL;DR: This work presents SRST2, a read mapping-based tool for fast and accurate detection of genes, alleles and multi-locus sequence types (MLST) from WGS data, which is highly accurate and outperforms assembly-based methods in terms of both gene detection and allele assignment.
Abstract: Rapid molecular typing of bacterial pathogens is critical for public health epidemiology, surveillance and infection control, yet routine use of whole genome sequencing (WGS) for these purposes poses significant challenges. Here we present SRST2, a read mapping-based tool for fast and accurate detection of genes, alleles and multi-locus sequence types (MLST) from WGS data. Using >900 genomes from common pathogens, we show SRST2 is highly accurate and outperforms assembly-based methods in terms of both gene detection and allele assignment. We include validation of SRST2 within a public health laboratory, and demonstrate its use for microbial genome surveillance in the hospital setting. In the face of rising threats of antimicrobial resistance and emerging virulence among bacterial pathogens, SRST2 represents a powerful tool for rapidly extracting clinically useful information from raw WGS data. Source code is available from http://katholt.github.io/srst2/.

820 citations

Journal ArticleDOI
TL;DR: Oligodendrocyte development and myelination rely on an unusual membrane-associated transcription factor that shares functional domains with bacteriophage proteins.
Abstract: The myelination of axons is a crucial step during vertebrate central nervous system (CNS) development, allowing for rapid and energy efficient saltatory conduction of nerve impulses. Accordingly, the differentiation of oligodendrocytes, the myelinating cells of the CNS, and their expression of myelin genes are under tight transcriptional control. We previously identified a putative transcription factor, Myelin Regulatory Factor (Myrf), as being vital for CNS myelination. Myrf is required for the generation of CNS myelination during development and also for its maintenance in the adult. It has been controversial, however, whether Myrf directly regulates transcription, with reports of a transmembrane domain and lack of nuclear localization. Here we show that Myrf is a membrane-associated transcription factor that undergoes an activating proteolytic cleavage to separate its transmembrane domain-containing C-terminal region from a nuclear-targeted N-terminal region. Unexpectedly, this cleavage event occurs via a protein domain related to the autoproteolytic intramolecular chaperone domain of the bacteriophage tail spike proteins, the first time this domain has been found to play a role in eukaryotic proteins. Using ChIP-Seq we show that the N-terminal cleavage product directly binds the enhancer regions of oligodendrocyte-specific and myelin genes. This binding occurs via a defined DNA-binding consensus sequence and strongly promotes the expression of target genes. These findings identify Myrf as a novel example of a membrane-associated transcription factor and provide a direct molecular mechanism for its regulation of oligodendrocyte differentiation and CNS myelination.

202 citations

Journal ArticleDOI
TL;DR: The identification of XRCC2 as a breast cancer susceptibility gene thus increases the proportion of breast cancers that are associated with homologous recombination-DNA-repair dysfunction and Fanconi anemia and could therefore benefit from specific targeted treatments such as PARP (poly ADP ribose polymerase) inhibitors.
Abstract: An exome-sequencing study of families with multiple breast-cancer-affected individuals identified two families with XRCC2 mutations, one with a protein-truncating mutation and one with a probably deleterious missense mutation. We performed a population-based case-control mutation-screening study that identified six probably pathogenic coding variants in 1,308 cases with early-onset breast cancer and no variants in 1,120 controls (the severity grading was p < 0.02). We also performed additional mutation screening in 689 multiple-case families. We identified ten breast-cancer-affected families with protein-truncating or probably deleterious rare missense variants in XRCC2. Our identification of XRCC2 as a breast cancer susceptibility gene thus increases the proportion of breast cancers that are associated with homologous recombination-DNA-repair dysfunction and Fanconi anemia and could therefore benefit from specific targeted treatments such as PARP (poly ADP ribose polymerase) inhibitors. This study demonstrates the power of massively parallel sequencing for discovering susceptibility genes for common, complex diseases.

178 citations

Posted ContentDOI
28 Jul 2014-bioRxiv
TL;DR: Using >900 genomes from common pathogens, SRST2 is highly accurate and outperforms assembly-based methods in terms of both gene detection and allele assignment and represents a powerful tool for rapidly extracting clinically useful information from raw WGS data.
Abstract: Rapid molecular typing of bacterial pathogens is critical for public health epidemiology, surveillance and infection control, yet routine use of whole genome sequencing (WGS) for these purposes poses significant challenges. Here we present SRST2, a read mapping-based tool for fast and accurate detection of genes, alleles and multi-locus sequence types (MLST) from WGS data. Using >900 genomes from common pathogens, we show SRST2 is highly accurate and outperforms assembly-based methods in terms of both gene detection and allele assignment. Here we have demonstrated the use of SRST2 for microbial genome surveillance in a variety of public health and hospital settings. In the face of rising threats of antimicrobial resistance and emerging virulence amongst bacterial pathogens, SRST2 represents a powerful tool for rapidly extracting clinically useful information from raw WGS data. Source code is available from http://katholt.github.io/srst2/

163 citations

Journal ArticleDOI
TL;DR: Bpipe is a simple, dedicated programming language for defining and executing bioinformatics pipelines that is fully self-contained and cross-platform, making it very easy to adopt and deploy into existing environments.
Abstract: Summary Bpipe is a simple, dedicated programming language for defining and executing bioinformatics pipelines. It specializes in enabling users to turn existing pipelines based on shell scripts or command line tools into highly flexible, adaptable and maintainable workflows with a minimum of effort. Bpipe ensures that pipelines execute in a controlled and repeatable fashion and keeps audit trails and logs to ensure that experimental results are reproducible. Requiring only Java as a dependency, Bpipe is fully self-contained and cross-platform, making it very easy to adopt and deploy into existing environments. Availability and implementation Bpipe is freely available from http://bpipe.org under a BSD License.

157 citations


Cited by
More filters
01 Jun 2012
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

10,124 citations

Journal ArticleDOI
Seth Carbon1, Eric Douglass1, Nathan Dunn1, Benjamin M. Good1  +189 moreInstitutions (19)
TL;DR: GO-CAM, a new framework for representing gene function that is more expressive than standard GO annotations, has been released, and users can now explore the growing repository of these models.
Abstract: The Gene Ontology resource (GO; http://geneontology.org) provides structured, computable knowledge regarding the functions of genes and gene products. Founded in 1998, GO has become widely adopted in the life sciences, and its contents are under continual improvement, both in quantity and in quality. Here, we report the major developments of the GO resource during the past two years. Each monthly release of the GO resource is now packaged and given a unique identifier (DOI), enabling GO-based analyses on a specific release to be reproduced in the future. The molecular function ontology has been refactored to better represent the overall activities of gene products, with a focus on transcription regulator activities. Quality assurance efforts have been ramped up to address potentially out-of-date or inaccurate annotations. New evidence codes for high-throughput experiments now enable users to filter out annotations obtained from these sources. GO-CAM, a new framework for representing gene function that is more expressive than standard GO annotations, has been released, and users can now explore the growing repository of these models. We also provide the ‘GO ribbon’ widget for visualizing GO annotations to a gene; the widget can be easily embedded in any web page.

2,138 citations

Journal ArticleDOI
TL;DR: Snakemake is a workflow engine that provides a readable Python-based workflow definition language and a powerful execution environment that scales from single-core workstations to compute clusters without modifying the workflow.
Abstract: Snakemake is a workflow engine that provides a readable Python-based workflow definition language and a powerful execution environment that scales from single-core workstations to compute clusters without modifying the workflow. It is the first system to support the use of automatically inferred multiple named wildcards (or variables) in input and output filenames.

1,932 citations

Journal ArticleDOI
24 Sep 2018
TL;DR: Developments in the BIGSdb software made from publication to June 2018 are described and it is shown how the platform realises microbial population genomics for a wide range of applications.
Abstract: The PubMLST.org website hosts a collection of open-access, curated databases that integrate population sequence data with provenance and phenotype information for over 100 different microbial species and genera. Although the PubMLST website was conceived as part of the development of the first multi-locus sequence typing (MLST) scheme in 1998 the software it uses, the Bacterial Isolate Genome Sequence database (BIGSdb, published in 2010), enables PubMLST to include all levels of sequence data, from single gene sequences up to and including complete, finished genomes. Here we describe developments in the BIGSdb software made from publication to June 2018 and show how the platform realises microbial population genomics for a wide range of applications. The system is based on the gene-by-gene analysis of microbial genomes, with each deposited sequence annotated and curated to identify the genes present and systematically catalogue their variation. Originally intended as a means of characterising isolates with typing schemes, the synthesis of sequences and records of genetic variation with provenance and phenotype data permits highly scalable (whole genome sequence data for tens of thousands of isolates) means of addressing a wide range of functional questions, including: the prediction of antimicrobial resistance; likely cross-reactivity with vaccine antigens; and the functional activities of different variants that lead to key phenotypes. There are no limitations to the number of sequences, genetic loci, allelic variants or schemes (combinations of loci) that can be included, enabling each database to represent an expanding catalogue of the genetic variation of the population in question. In addition to providing web-accessible analyses and links to third-party analysis and visualisation tools, the BIGSdb software includes a RESTful application programming interface (API) that enables access to all the underlying data for third-party applications and data analysis pipelines.

1,349 citations

Journal ArticleDOI
TL;DR: The utility of Toil is demonstrated by creating one of the single largest, consistently analyzed, public human RNA-seq expression repositories, which the community will find useful.
Abstract: 1. Weinstein, J.N. et al. Nat. Genet. 45, 1113–1120 (2013). 2. Zhang, J. et al. Database. http://dx.doi.org/10.1093/ database/bar026 (2011) 3. Siva, N. Lancet 385, 103–104 (2015). 4. McKenna, A. et al. Genome Res. 20, 1297–1303 (2010). 5. UNC Bioinformatics. TCGA mRNA-seq pipeline for UNC data. https://webshare.bioinf.unc.edu/public/ mRNAseq_TCGA/UNC_mRNAseq_summary.pdf (2013). 6. Albrecht, M., Michael, A., Patrick, D., Peter, B. & Douglas, T. in Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies (SWEET ’12) 1. ACM (Association of Computing Machinery. http://dx.doi. org/10.1145/2443416.2443417 (2012). 7. Bernhardsson, E. & Frieder, E. Luigi. Github https:// github.com/spotify/luigi (2016). 8. Goecks, J., Nekrutenko, A. & Taylor, J. Genome Biol. 11, R86 (2010). 9. UCSC. Xena http://xena.ucsc.edu (2016). comprehensive analyses. Further, it means that results can be reproduced using the original computation’s set of tools and parameters. If we had run the original TCGA best-practices RNA-seq pipeline with one sample per node, it would have cost ~$800,000. Through the use of efficient algorithms (STAR and Kallisto) and Toil, we were able to reduce the final cost to $26,071 (Supplementary Note 9). We have demonstrated the utility of Toil by creating one of the single largest, consistently analyzed, public human RNA-seq expression repositories, which we hope the community will find useful.

1,309 citations