scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The First Highly Contiguous Genome Assembly of Pikeperch (Sander lucioperca), an Emerging Aquaculture Species in Europe

13 Sep 2019-Genes (Multidisciplinary Digital Publishing Institute)-Vol. 10, Iss: 9, pp 708
TL;DR: This draft genome sequence is the first genomic resource for this promising aquaculture species and will provide an impetus for genomic-based breeding studies targeting phenotypic and performance traits of captive pikeperch.
Abstract: The pikeperch (Sander lucioperca) is a fresh and brackish water Percid fish natively inhabiting the northern hemisphere. This species is emerging as a promising candidate for intensive aquaculture production in Europe. Specific traits like cannibalism, growth rate and meat quality require genomics based understanding, for an optimal husbandry and domestication process. Still, the aquaculture community is lacking an annotated genome sequence to facilitate genome-wide studies on pikeperch. Here, we report the first highly contiguous draft genome assembly of Sander lucioperca. In total, 413 and 66 giga base pairs of DNA sequencing raw data were generated with the Illumina platform and PacBio Sequel System, respectively. The PacBio data were assembled into a final assembly size of ~900 Mb covering 89% of the 1,014 Mb estimated genome size. The draft genome consisted of 1966 contigs ordered into 1,313 scaffolds. The contig and scaffold N50 lengths are 3.0 Mb and 4.9 Mb, respectively. The identified repetitive structures accounted for 39% of the genome. We utilized homologies to other ray-finned fishes, and ab initio gene prediction methods to predict 21,249 protein-coding genes in the Sander lucioperca genome, of which 88% were functionally annotated by either sequence homology or protein domains and signatures search. The assembled genome spans 97.6% and 96.3% of Vertebrate and Actinopterygii single-copy orthologs, respectively. The outstanding mapping rate (99.9%) of genomic PE-reads on the assembly suggests an accurate and nearly complete genome reconstruction. This draft genome sequence is the first genomic resource for this promising aquaculture species. It will provide an impetus for genomic-based breeding studies targeting phenotypic and performance traits of captive pikeperch.
Citations
More filters
Journal Article
TL;DR: FastTree as mentioned in this paper uses sequence profiles of internal nodes in the tree to implement neighbor-joining and uses heuristics to quickly identify candidate joins, then uses nearest-neighbor interchanges to reduce the length of the tree.
Abstract: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

2,436 citations

10 Dec 2007
TL;DR: The experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.
Abstract: EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

1,528 citations

Journal ArticleDOI
TL;DR: This high-quality genome of the leopard coral grouper is the first genomic resource for Plectropomus and should provide a pivotal genetic foundation for further research.
Abstract: Leopard coral groupers belong to the Plectropomus genus of the Epinephelidae family and are important fish for coral reef ecosystems and the marine aquaculture industry. To promote future research of this species, a high-quality chromosome-level genome was assembled using PacBio sequencing and Hi-C technology. A 787.06 Mb genome was assembled, with 99.7% (784.57 Mb) of bases anchored to 24 chromosomes. The leopard coral grouper genome size was smaller than that of other groupers, which may be related to its ancient status among grouper species. A total of 22 317 protein-coding genes were predicted. This high-quality genome of the leopard coral grouper is the first genomic resource for Plectropomus and should provide a pivotal genetic foundation for further research. Phylogenetic analysis of the leopard coral grouper and 12 other fish species showed that this fish is closely related to the brown-marbled grouper. Expanded genes in the leopard coral grouper genome were mainly associated with immune response and movement ability, which may be related to the adaptive evolution of this species to its habitat. In addition, we also identified differentially expressed genes (DEGs) associated with carotenoid metabolism between red and brown-colored leopard coral groupers. These genes may play roles in skin color decision by regulating carotenoid content in these groupers.

30 citations


Additional excerpts

  • ... Strong correlations between repeat  content  and  the  genome  have  been  observed  in Perciformes  (Nguinkal  et  al.,  2019)....

    [...]

  • ...(Nguinkal  et  al.,  2019), Triplophysa tibetana  (0.1...

    [...]

  • ... The  number of  protein-coding genes was similar  to  other  Perciformes fish such as pike-perch (21 249)  (Nguinkal  et  al.,  2019),  Chinese sillago (Sillago sinica, 22 122) (Zhou et al., 2018), spotted sea bass  (Lateolabrax maculatus,  22  015),  northern  snakehead (19  877)  (Liu  et  al.,…...

    [...]

Journal ArticleDOI
TL;DR: The estimate of the genome-wide average heterozygosity in the Atlantic silverside is the highest reported for a fish, or any vertebrate, to date and extreme levels of structural variation are found, affecting ~23% of the total genome sequence.
Abstract: The levels and distribution of standing genetic variation in a genome can provide a wealth of insights about the adaptive potential, demographic history, and genome structure of a population or species. As structural variants are increasingly associated with traits important for adaptation and speciation, investigating both sequence and structural variation is essential for wholly tapping this potential. Using a combination of shotgun sequencing, 10X Genomics linked reads and proximity-ligation data (Chicago and Hi-C), we produced and annotated a chromosome-level genome assembly for the Atlantic silverside (Menidia menidia) - an established ecological model for studying the phenotypic effects of natural and artificial selection - and examined patterns of genomic variation across two individuals sampled from different populations with divergent local adaptations. Levels of diversity varied substantially across each chromosome, consistently being highly elevated near the ends (presumably near telomeric regions) and dipping to near zero around putative centromeres. Overall, our estimate of the genome-wide average heterozygosity in the Atlantic silverside is among the highest reported for a fish, or any vertebrate (1.32-1.76% depending on inference method and sample). Furthermore, we also found extreme levels of structural variation, affecting ∼23% of the total genome sequence, including multiple large inversions (> 1 Mb and up to 12.6 Mb) associated with previously identified haploblocks showing strong differentiation between locally adapted populations. These extreme levels of standing genetic variation are likely associated with large effective population sizes and may help explain the remarkable adaptive divergence among populations of the Atlantic silverside.

17 citations

Journal ArticleDOI
TL;DR: Phylogenetic analysis showed that brown‐marbled grouper and humpback grouper were clustered into one clade that separated approximately 11–23 million years ago, andCollinearity analyses showed that there was no obvious duplication of large fragments between chromosomes in the Brown‐marbling grouper.
Abstract: The brown-marbled grouper (Epinephelus fuscoguttatus) is an important species of fish in the coral reef ecosystem and marine aquaculture industry. In this study, a high-quality chromosome-level genome of brown-marbled grouper was assembled using Oxford Nanopore technology and Hi-C technology. The GC content and heterozygosity were approximately 42% and 0.35%, respectively. A total of 230 contigs with a total length of 1047 Mb and contig N50 of 13.8 Mb were assembled, and 228 contigs (99.13%) were anchored into 24 chromosomes. A total of 24,005 protein-coding genes were predicted, among which 23,862 (99.4%) predicted genes were annotated. Phylogenetic analysis showed that brown-marbled grouper and humpback grouper were clustered into one clade that separated approximately 11-23 million years ago. Collinearity analyses showed that there was no obvious duplication of large fragments between chromosomes in the brown-marbled grouper. Genomes of the humpback grouper and giant grouper showed a high collinearity with that of the brown-marbled grouper. A total of 305 expanded gene families were detected in the brown-marbled grouper genome, which is mainly involved in disease resistance. In addition, a genetic linkage map with 3061.88 cM was constructed. Based on the physical and genetic map, one growth-related quantitative trait loci was detected in 32,332,447 bp of chromosome 20, and meox1 and etv4 were considered candidate growth-related genes. This study provides pivotal genetic resources for further evolutionary analyses and artificial breeding of groupers.

14 citations


Additional excerpts

  • ...(Nguinkal et al., 2019), murray cod (Maccullochella peelii, 0.113...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations


"The First Highly Contiguous Genome ..." refers methods in this paper

  • ...0 [36], with an e-value cutoff of 1e-6 to align these homologous protein sequences to the Sander lucioperca genome....

    [...]

Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: This version of MAFFT has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update.
Abstract: We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.

27,771 citations

Journal ArticleDOI
TL;DR: The Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available, providing a unified solution for transcriptome reconstruction in any sample.
Abstract: Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.

15,665 citations

Journal ArticleDOI
TL;DR: Tests showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method, and requires only 4.3 gigabytes of memory.
Abstract: HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ∼64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.

13,192 citations


"The First Highly Contiguous Genome ..." refers background in this paper

  • ...1 [40], a splice-aware aligner, to detect splice junctions....

    [...]

Related Papers (5)