Author
Johannes Andries Roubos
Other affiliations: Delft University of Technology, Netherlands Bioinformatics Centre, Wageningen University and Research Centre
Bio: Johannes Andries Roubos is an academic researcher from DSM. The author has contributed to research in topics: Fuzzy logic & CRISPR. The author has an hindex of 27, co-authored 82 publications receiving 4111 citations. Previous affiliations of Johannes Andries Roubos include Delft University of Technology & Netherlands Bioinformatics Centre.
Topics: Fuzzy logic, CRISPR, Aspergillus niger, Genome, Peptide sequence
Papers published on a yearly basis
Papers
More filters
••
DSM1, Delft University of Technology2, University of Nottingham3, Technical University of Denmark4, Wageningen University and Research Centre5, University of Sheffield6, Utrecht University7, Biomax Informatics AG8, CLC bio9, University of Liverpool10, Ghent University11, University of Manchester12, University of Provence13, University of Groningen14, Pasteur Institute15, University of Amsterdam16, University of Angers17, Leiden University18, Radboud University Nijmegen19, University of Szeged20
TL;DR: The filamentous fungus Aspergillus niger is widely exploited by the fermentation industry for the production of enzymes and organic acids, particularly citric acid, and the sequenced genome revealed a large number of major facilitator superfamily transporters and fungal zinc binuclear cluster transcription factors.
Abstract: The filamentous fungus Aspergillus niger is widely exploited by the fermentation industry for the production of enzymes and organic acids, particularly citric acid. We sequenced the 33.9-megabase genome of A. niger CBS 513.88, the ancestor of currently used enzyme production strains. A high level of synteny was observed with other aspergilli sequenced. Strong function predictions were made for 6,506 of the 14,165 open reading frames identified. A detailed description of the components of the protein secretion pathway was made and striking differences in the hydrolytic enzyme spectra of aspergilli were observed. A reconstructed metabolic network comprising 1,069 unique reactions illustrates the versatile metabolism of A. niger. Noteworthy is the large number of major facilitator superfamily transporters and fungal zinc binuclear cluster transcription factors, and the presence of putative gene clusters for fumonisin and ochratoxin A synthesis.
1,161 citations
••
TL;DR: Genes predicted to encode transporters were strongly overrepresented among the genes transcriptionally upregulated under conditions that stimulate penicillinG production, illustrating potential for future genomics-driven metabolic engineering.
Abstract: Industrial penicillin production with the filamentous fungus Penicillium chrysogenum is based on an unprecedented effort in microbial strain improvement. To gain more insight into penicillin synthesis, we sequenced the 32.19 Mb genome of P. chrysogenum Wisconsin54-1255 and identified numerous genes responsible for key steps in penicillin production. DNA microarrays were used to compare the transcriptomes of the sequenced strain and a penicillinG high-producing strain, grown in the presence and absence of the side-chain precursor phenylacetic acid. Transcription of genes involved in biosynthesis of valine, cysteine and alpha-aminoadipic acid-precursors for penicillin biosynthesis-as well as of genes encoding microbody proteins, was increased in the high-producing strain. Some gene products were shown to be directly controlling beta-lactam output. Many key cellular transport processes involving penicillins and intermediates remain to be characterized at the molecular level. Genes predicted to encode transporters were strongly overrepresented among the genes transcriptionally upregulated under conditions that stimulate penicillinG production, illustrating potential for future genomics-driven metabolic engineering.
457 citations
••
Technical University of Denmark1, DSM2, Pacific Northwest National Laboratory3, Biomax Informatics AG4, Novozymes5, University of Göttingen6, University of Seville7, Concordia University8, Chr. Hansen9, United States Department of Energy10, Stanford University11, Vienna University of Technology12, Los Alamos National Laboratory13
TL;DR: In this article, the authors performed whole-genome sequencing of the Aspergillus niger wild-type strain (ATCC 1015) and produced a genome sequence of very high quality.
Abstract: The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compel additional exploration. We therefore undertook whole-genome sequencing of the acidogenic A. niger wild-type strain (ATCC 1015) and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence, and half the telomeric regions have been elucidated. Moreover, sequence information from ATCC 1015 was used to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 Mb of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis supported up-regulation of genes associated with biosynthesis of amino acids that are abundant in glucoamylase A, tRNA-synthases, and protein transporters in the protein producing CBS 513.88 strain. Our results and data sets from this integrative systems biology analysis resulted in a snapshot of fungal evolution and will support further optimization of cell factories based on filamentous fungi.
308 citations
29 Apr 2011
TL;DR: In this paper, the authors performed whole-genome sequencing of the Aspergillus niger wild-type strain (ATCC 1015) and produced a genome sequence of very high quality.
Abstract: The filamentous fungus Aspergillus niger exhibits great diversity in its phenotype. It is found globally, both as marine and terrestrial strains, produces both organic acids and hydrolytic enzymes in high amounts, and some isolates exhibit pathogenicity. Although the genome of an industrial enzyme-producing A. niger strain (CBS 513.88) has already been sequenced, the versatility and diversity of this species compel additional exploration. We therefore undertook whole-genome sequencing of the acidogenic A. niger wild-type strain (ATCC 1015) and produced a genome sequence of very high quality. Only 15 gaps are present in the sequence, and half the telomeric regions have been elucidated. Moreover, sequence information from ATCC 1015 was used to improve the genome sequence of CBS 513.88. Chromosome-level comparisons uncovered several genome rearrangements, deletions, a clear case of strain-specific horizontal gene transfer, and identification of 0.8 Mb of novel sequence. Single nucleotide polymorphisms per kilobase (SNPs/kb) between the two strains were found to be exceptionally high (average: 7.8, maximum: 160 SNPs/kb). High variation within the species was confirmed with exo-metabolite profiling and phylogenetics. Detailed lists of alleles were generated, and genotypic differences were observed to accumulate in metabolic pathways essential to acid production and protein synthesis. A transcriptome analysis supported up-regulation of genes associated with biosynthesis of amino acids that are abundant in glucoamylase A, tRNA-synthases, and protein transporters in the protein producing CBS 513.88 strain. Our results and data sets from this integrative systems biology analysis resulted in a snapshot of fungal evolution and will support further optimization of cell factories based on filamentous fungi.
306 citations
••
01 Mar 2003TL;DR: An iterative approach for developing fuzzy classifiers is proposed and the initial model is derived from the data and subsequently, feature selection and rule-base simplification are applied to reduce the model, while a genetic algorithm is used for parameter optimization.
Abstract: The automatic design of fuzzy rule-based classification systems based on labeled data is considered. It is recognized that both classification performance and interpretability are of major importance and effort is made to keep the resulting rule bases small and comprehensible. For this purpose, an iterative approach for developing fuzzy classifiers is proposed. The initial model is derived from the data and subsequently, feature selection and rule-base simplification are applied to reduce the model, while a genetic algorithm is used for parameter optimization. An application to the Wine data classification problem is shown.
193 citations
Cited by
More filters
01 Jun 2012
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.
10,124 citations
01 Jan 2011
TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.
2,187 citations
••
University of New Mexico1, Los Alamos National Laboratory2, Novozymes3, University of Provence4, VTT Technical Research Centre of Finland5, Pacific Northwest National Laboratory6, Joint Genome Institute7, United States Department of Agriculture8, Vienna University of Technology9, Pontifical Catholic University of Chile10, Oregon State University11, Genencor12
TL;DR: This work assembled 89 scaffolds to generate 34 Mbp of nearly contiguous T. reesei genome sequence comprising 9,129 predicted gene models, providing a roadmap for constructing enhanced T.Reesei strains for industrial applications such as biofuel production.
Abstract: Trichoderma reesei is the main industrial source of cellulases and hemicellulases used to depolymerize biomass to simple sugars that are converted to chemical intermediates and biofuels, such as ethanol. We assembled 89 scaffolds (sets of ordered and oriented contigs) to generate 34 Mbp of nearly contiguous T. reesei genome sequence comprising 9,129 predicted gene models. Unexpectedly, considering the industrial utility and effectiveness of the carbohydrate-active enzymes of T. reesei, its genome encodes fewer cellulases and hemicellulases than any other sequenced fungus able to hydrolyze plant cell wall polysaccharides. Many T. reesei genes encoding carbohydrate-active enzymes are distributed nonrandomly in clusters that lie between regions of synteny with other Sordariomycetes. Numerous genes encoding biosynthetic pathways for secondary metabolites may promote survival of T. reesei in its competitive soil habitat, but genome analysis provided little mechanistic insight into its extraordinary capacity for protein secretion. Our analysis, coupled with the genome sequence data, provides a roadmap for constructing enhanced T. reesei strains for industrial applications such as biofuel production.
1,085 citations
••
TL;DR: The role of big data in supporting smart manufacturing is discussed, a historical perspective to data lifecycle in manufacturing is overviewed, and a conceptual framework proposed in the paper is proposed.
937 citations
••
TL;DR: Electronic design automation principles from EDA are applied to enable increased circuit complexity and to simplify the incorporation of synthetic gene regulation into genetic engineering projects, and it is demonstrated that engineering principles can be applied to identify and suppress errors that complicate the compositions of larger systems.
Abstract: INTRODUCTION Cells respond to their environment, make decisions, build structures, and coordinate tasks. Underlying these processes are computational operations performed by networks of regulatory proteins that integrate signals and control the timing of gene expression. Harnessing this capability is critical for biotechnology projects that require decision-making, control, sensing, or spatial organization. It has been shown that cells can be programmed using synthetic genetic circuits composed of regulators organized to generate a desired operation. However, the construction of even simple circuits is time-intensive and unreliable. RATIONALE Electronic design automation (EDA) was developed to aid engineers in the design of semiconductor-based electronics. In an effort to accelerate genetic circuit design, we applied principles from EDA to enable increased circuit complexity and to simplify the incorporation of synthetic gene regulation into genetic engineering projects. We used the hardware description language Verilog to enable a user to describe a circuit function. The user also specifies the sensors, actuators, and “user constraints file” (UCF), which defines the organism, gate technology, and valid operating conditions. Cello (www.cellocad.org) uses this information to automatically design a DNA sequence encoding the desired circuit. This is done via a set of algorithms that parse the Verilog text, create the circuit diagram, assign gates, balance constraints to build the DNA, and simulate performance. RESULTS Cello designs circuits by drawing upon a library of Boolean logic gates. Here, the gate technology consists of NOT/NOR logic based on repressors. Gate connection is simplified by defining the input and output signals as RNA polymerase (RNAP) fluxes. We found that the gates need to be insulated from their genetic context to function reliably in the context of different circuits. Each gate is isolated using strong terminators to block RNAP leakage, and input interchangeability is improved using ribozymes and promoter spacers. These parts are varied for each gate to avoid breakage due to recombination. Measuring the load of each gate and incorporating this into the optimization algorithms further reduces evolutionary pressure. Cello was applied to the design of 60 circuits for Escherichia coli , where the circuit function was specified using Verilog code and transformed to a DNA sequence. The DNA sequences were built as specified with no additional tuning, requiring 880,000 base pairs of DNA assembly. Of these, 45 circuits performed correctly in every output state (up to 10 regulators and 55 parts). Across all circuits, 92% of the 412 output states functioned as predicted. CONCLUSION Our work constitutes a hardware description language for programming living cells. This required the co-development of design algorithms with gates that are sufficiently simple and robust to be connected by automated algorithms. We demonstrate that engineering principles can be applied to identify and suppress errors that complicate the compositions of larger systems. This approach leads to highly repetitive and modular genetics, in stark contrast to the encoding of natural regulatory networks. The use of a hardware-independent language and the creation of additional UCFs will allow a single design to be transformed into DNA for different organisms, genetic endpoints, operating conditions, and gate technologies.
813 citations