scispace - formally typeset
Search or ask a question
Author

Katharina J. Hoff

Other affiliations: University of Göttingen
Bio: Katharina J. Hoff is an academic researcher from University of Greifswald. The author has contributed to research in topics: Gene prediction & Genome. The author has an hindex of 16, co-authored 35 publications receiving 3321 citations. Previous affiliations of Katharina J. Hoff include University of Göttingen.

Papers
More filters
Journal ArticleDOI
Kanchon K. Dasmahapatra1, James R. Walters2, Adriana D. Briscoe3, John W. Davey, Annabel Whibley, Nicola J. Nadeau2, Aleksey V. Zimin4, Daniel S.T. Hughes5, Laura Ferguson5, Simon H. Martin2, Camilo Salazar6, Camilo Salazar2, James J. Lewis3, Sebastian Adler7, Seung-Joon Ahn8, Dean A. Baker9, Simon W. Baxter2, Nicola Chamberlain10, Ritika Chauhan11, Brian A. Counterman12, Tamas Dalmay11, Lawrence E. Gilbert13, Karl H.J. Gordon14, David G. Heckel8, Heather M. Hines5, Katharina J. Hoff7, Peter W. H. Holland5, Emmanuelle Jacquin-Joly15, Francis M. Jiggins, Robert T. Jones, Durrell D. Kapan16, Durrell D. Kapan17, Paul J. Kersey, Gerardo Lamas, Daniel Lawson, Daniel Mapleson11, Luana S. Maroja18, Arnaud Martin3, Simon Moxon19, William J. Palmer2, Riccardo Papa20, Alexie Papanicolaou14, Yannick Pauchet8, David A. Ray12, Neil Rosser1, Steven L. Salzberg21, Megan A. Supple22, Alison K. Surridge2, Ayşe Tenger-Trolander10, Heiko Vogel8, Paul A. Wilkinson23, Derek Wilson, James A. Yorke4, Furong Yuan3, Alexi Balmuth24, Cathlene Eland, Karim Gharbi, Marian Thomson, Richard A. Gibbs25, Yi Han25, Joy Jayaseelan25, Christie Kovar25, Tittu Mathew25, Donna M. Muzny25, Fiona Ongeri25, Ling-Ling Pu25, Jiaxin Qu25, Rebecca Thornton25, Kim C. Worley25, Yuanqing Wu25, Mauricio Linares26, Mark Blaxter, Richard H. ffrench-Constant27, Mathieu Joron, Marcus R. Kronforst10, Sean P. Mullen28, Robert D. Reed3, Steven E. Scherer25, Stephen Richards25, James Mallet1, James Mallet10, W. Owen McMillan, Chris D. Jiggins2, Chris D. Jiggins6 
05 Jul 2012-Nature
TL;DR: It is inferred that closely related Heliconius species exchange protective colour-pattern genes promiscuously, implying that hybridization has an important role in adaptive radiation.
Abstract: Sequencing of the genome of the butterfly Heliconius melpomene shows that closely related Heliconius species exchange protective colour-pattern genes promiscuously.

1,103 citations

Journal ArticleDOI
TL;DR: Baker1 is presented, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS and was observed that BRAKER1 was more accurate than MAKER2 when it is using RNA- Seq as sole source for training and prediction.
Abstract: MOTIVATION Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction. RESULTS We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step. AVAILABILITY AND IMPLEMENTATION BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/ CONTACT katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

809 citations

Journal ArticleDOI
06 Jan 2021
TL;DR: The BRAKER2 pipeline as mentioned in this paper generates and integrates external protein support into the iterative process of training and gene prediction by GeneMark-EP+ and AUGUSTUS, and it is favorably compared with other pipelines, e.g. MAKER2, in terms of accuracy and performance.
Abstract: The task of eukaryotic genome annotation remains challenging. Only a few genomes could serve as standards of annotation achieved through a tremendous investment of human curation efforts. Still, the correctness of all alternative isoforms, even in the best-annotated genomes, could be a good subject for further investigation. The new BRAKER2 pipeline generates and integrates external protein support into the iterative process of training and gene prediction by GeneMark-EP+ and AUGUSTUS. BRAKER2 continues the line started by BRAKER1 where self-training GeneMark-ET and AUGUSTUS made gene predictions supported by transcriptomic data. Among the challenges addressed by the new pipeline was a generation of reliable hints to protein-coding exon boundaries from likely homologous but evolutionarily distant proteins. In comparison with other pipelines for eukaryotic genome annotation, BRAKER2 is fully automatic. It is favorably compared under equal conditions with other pipelines, e.g. MAKER2, in terms of accuracy and performance. Development of BRAKER2 should facilitate solving the task of harmonization of annotation of protein-coding genes in genomes of different eukaryotic species. However, we fully understand that several more innovations are needed in transcriptomic and proteomic technologies as well as in algorithmic development to reach the goal of highly accurate annotation of eukaryotic genomes.

455 citations

Book ChapterDOI
TL;DR: This book chapter describes how to apply BRAKER in environments characterized by various combinations of external evidence, both RNA-Seq and protein alignments.
Abstract: BRAKER is a pipeline for highly accurate and fully automated gene prediction in novel eukaryotic genomes. It combines two major tools: GeneMark-ES/ET and AUGUSTUS. GeneMark-ES/ET learns its parameters from a novel genomic sequence in a fully automated fashion; if available, it uses extrinsic evidence for model refinement. From the protein-coding genes predicted by GeneMark-ES/ET, we select a set for training AUGUSTUS, one of the most accurate gene finding tools that, in contrast to GeneMark-ES/ET, integrates extrinsic evidence already into the gene prediction step. The first published version, BRAKER1, integrated genomic footprints of unassembled RNA-Seq reads into the training as well as into the prediction steps. The pipeline has since been extended to the integration of data on mapped cross-species proteins, and to the usage of heterogeneous extrinsic evidence, both RNA-Seq and protein alignments. In this book chapter, we briefly summarize the pipeline methodology and describe how to apply BRAKER in environments characterized by various combinations of external evidence.

382 citations

Journal ArticleDOI
TL;DR: Improved honey bee genome assembly with a new gene annotation set and a number of genes similar to that of other insect genomes are reported, contrary to what was suggested in OGSv1.0.
Abstract: The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.

370 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Both BlastKOALA and GhostKOalA are automatic annotation servers for genome and metagenome sequences, which perform KO (KEGG Orthology) assignments to characterize individual gene functions and reconstruct KEGG pathways, BRITE hierarchies and K EGG modules to infer high-level functions of the organism or the ecosystem.

2,247 citations

Journal ArticleDOI
01 Nov 2012-Genetics
TL;DR: A suite of methods for learning about population mixtures are presented, implemented in a software package called ADMIXTOOLS, that support formal tests for whether mixture occurred and make it possible to infer proportions and dates of mixture.
Abstract: Population mixture is an important process in biology. We present a suite of methods for learning about population mixtures, implemented in a software package called ADMIXTOOLS, that support formal tests for whether mixture occurred and make it possible to infer proportions and dates of mixture. We also describe the development of a new single nucleotide polymorphism (SNP) array consisting of 629,433 sites with clearly documented ascertainment that was specifically designed for population genetic analyses and that we genotyped in 934 individuals from 53 diverse populations. To illustrate the methods, we give a number of examples that provide new insights about the history of human admixture. The most striking finding is a clear signal of admixture into northern Europe, with one ancestral population related to present-day Basques and Sardinians and the other related to present-day populations of northeast Asia and the Americas. This likely reflects a history of admixture between Neolithic migrants and the indigenous Mesolithic population of Europe, consistent with recent analyses of ancient bones from Sweden and the sequencing of the genome of the Tyrolean "Iceman."

1,877 citations

Journal ArticleDOI
TL;DR: A perspective on the context and evolutionary significance of hybridization during speciation is offered, highlighting issues of current interest and debate and suggesting that the Dobzhansky–Muller model of hybrid incompatibilities requires a broader interpretation.
Abstract: Hybridization has many and varied impacts on the process of speciation. Hybridization may slow or reverse differentiation by allowing gene flow and recombination. It may accelerate speciation via adaptive introgression or cause near-instantaneous speciation by allopolyploidization. It may have multiple effects at different stages and in different spatial contexts within a single speciation event. We offer a perspective on the context and evolutionary significance of hybridization during speciation, highlighting issues of current interest and debate. In secondary contact zones, it is uncertain if barriers to gene flow will be strengthened or broken down due to recombination and gene flow. Theory and empirical evidence suggest the latter is more likely, except within and around strongly selected genomic regions. Hybridization may contribute to speciation through the formation of new hybrid taxa, whereas introgression of a few loci may promote adaptive divergence and so facilitate speciation. Gene regulatory networks, epigenetic effects and the evolution of selfish genetic material in the genome suggest that the Dobzhansky-Muller model of hybrid incompatibilities requires a broader interpretation. Finally, although the incidence of reinforcement remains uncertain, this and other interactions in areas of sympatry may have knock-on effects on speciation both within and outside regions of hybridization.

1,715 citations

Journal ArticleDOI
TL;DR: This work presents BUSCO v3 with example analyses that highlight the wide‐ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics.
Abstract: Genomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness of genomic data sets in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). The latest software release implements a complete refactoring of the code to make it more flexible and extendable to facilitate high-throughput assessments. The original six lineage assessment data sets have been updated with improved species sampling, 34 new subsets have been built for vertebrates, arthropods, fungi, and prokaryotes that greatly enhance resolution, and data sets are now also available for nematodes, protists, and plants. Here, we present BUSCO v3 with example analyses that highlight the wide-ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics.

1,575 citations

10 Dec 2007
TL;DR: The experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.
Abstract: EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

1,528 citations