scispace - formally typeset
Search or ask a question

Showing papers on "GenBank published in 2017"


Journal ArticleDOI
TL;DR: The web application GeSeq combines batch processing with a fully customizable reference sequence selection of organellar genome records from NCBI and/or references uploaded by the user to support high-quality annotations of chloroplast genomes.
Abstract: We have developed the web application GeSeq (https://chlorobox.mpimp-golm.mpg.de/geseq.html) for the rapid and accurate annotation of organellar genome sequences, in particular chloroplast genomes. In contrast to existing tools, GeSeq combines batch processing with a fully customizable reference sequence selection of organellar genome records from NCBI and/or references uploaded by the user. For the annotation of chloroplast genomes, the application additionally provides an integrated database of manually curated reference sequences. GeSeq identifies genes or other feature-encoding regions by BLAT-based homology searches and additionally, by profile HMM searches for protein and rRNA coding genes and two de novo predictors for tRNA genes. These unique features enable the user to conveniently compare the annotations of different state-of-the-art methods, thus supporting high-quality annotations. The main output of GeSeq is a GenBank file that usually requires only little curation and is instantly visualized by OGDRAW. GeSeq also offers a variety of optional additional outputs that facilitate downstream analyzes, for example comparative genomic or phylogenetic studies.

1,663 citations


Journal ArticleDOI
TL;DR: This manuscript describes a series of features and functionalities recently added to the Virus Variation Resource, a value-added viral sequence data resource hosted by the National Center for Biotechnology Information.
Abstract: The Virus Variation Resource is a value-added viral sequence data resource hosted by the National Center for Biotechnology Information. The resource is located at http://www.ncbi.nlm.nih.gov/genome/viruses/variation/ and includes modules for seven viral groups: influenza virus, Dengue virus, West Nile virus, Ebolavirus, MERS coronavirus, Rotavirus A and Zika virus Each module is supported by pipelines that scan newly released GenBank records, annotate genes and proteins and parse sample descriptors and then map them to controlled vocabulary. These processes in turn support a purpose-built search interface where users can select sequences based on standardized gene, protein and metadata terms. Once sequences are selected, a suite of tools for downloading data, multi-sequence alignment and tree building supports a variety of user directed activities. This manuscript describes a series of features and functionalities recently added to the Virus Variation Resource.

311 citations


Journal ArticleDOI
TL;DR: All metazoan mitochondrial gene sequences from GenBank are retrieved, and quality filtered and formatted the datasets for taxonomic assignments using taxonomic assignment tools, and the mitochondrial Cytochrome oxidase subunit I gene was the most sequence-rich gene.
Abstract: Mitochondrial-encoded genes are increasingly targeted in studies using high-throughput sequencing approaches for characterizing metazoan communities from environmental samples (e.g., plankton, meiofauna, filtered water). Yet, unlike nuclear ribosomal RNA markers, there is to date no high-quality reference dataset available for taxonomic assignments. Here, we retrieved all metazoan mitochondrial gene sequences from GenBank, and then quality filtered and formatted the datasets for taxonomic assignments using taxonomic assignment tools. The reference datasets-'Midori references'-are available for download at www.reference-midori.info. Two versions are provided: (I) Midori-UNIQUE that contains all unique haplotypes associated with each species and (II) Midori-LONGEST that contains a single sequence, the longest, for each species. Overall, the mitochondrial Cytochrome oxidase subunit I gene was the most sequence-rich gene. However, sequences of the mitochondrial large ribosomal subunit RNA and Cytochrome b apoenzyme genes were observed for a large number of species in some phyla. The Midori reference is compatible with some taxonomic assignment software. Therefore, automated high-throughput sequence taxonomic assignments can be particularly effective using these datasets.

136 citations


Journal ArticleDOI
TL;DR: The results of the analysis supported the distinctiveness of Kitasatospora and Streptacidiphilus as validly named genera since they cluster outside of the phylogenetic radiation of the genus Streptomyces.
Abstract: The family Streptomycetaceae, notably species in the genus Streptomyces, have long been the subject of investigation due to their well-known ability to produce secondary metabolites. The emergence of drug resistant pathogens and the relative ease of producing genome sequences has renewed the importance of Streptomyces as producers of new natural products and resulted in revived efforts in isolating and describing strains from novel environments. A previous large study of the phylogeny in the Streptomycetaceae based on 16S rRNA gene sequences provided a useful framework for the relationships among species, but did not always have sufficient resolution to provide definitive identification. Multi-locus sequence analysis of 5 house-keeping genes has been shown to provide improved taxonomic resolution of Streptomyces species in a number of previous reports so a comprehensive study was undertaken to evaluate evolutionary relationships among species within the family Streptomycetaceae where type strains are available in the ARS Culture Collection or genome sequences are available in GenBank. The results of the analysis supported the distinctiveness of Kitasatospora and Streptacidiphilus as validly named genera since they cluster outside of the phylogenetic radiation of the genus Streptomyces. There is also support for the transfer of a number of Streptomyces species to the genus Kitasatospora as well for reducing at least 31 species clusters to a single taxon. The multi-locus sequence database resulting from the study is a useful tool for identification of new isolates and the phylogenetic analysis presented also provides a road map for planning future genome sequencing efforts in the Streptomycetaceae.

90 citations


Journal ArticleDOI
TL;DR: The STR Sequencing Project (STRSeq) was initiated to facilitate the description of sequence-based alleles at the Short Tandem Repeat (STR) loci targeted in human identification assays and provides a framework for communication among laboratories.
Abstract: The STR Sequencing Project (STRSeq) was initiated to facilitate the description of sequence-based alleles at the Short Tandem Repeat (STR) loci targeted in human identification assays. This international collaborative effort, which has been endorsed by the ISFG DNA Commission, provides a framework for communication among laboratories. The initial data used to populate the project are the aggregate alleles observed in targeted sequencing studies across four laboratories: National Institute of Standards and Technology (N=1786), Kings College London (N=1043), University of North Texas Health Sciences Center (N=839), and University of Santiago de Compostela (N=944), for a total of 4612 individuals. STRSeq data are maintained as GenBank records at the U.S. National Center for Biotechnology Information (NCBI), which participates in a daily data exchange with the DNA DataBank of Japan (DDBJ) and the European Nucleotide Archive (ENA). Each GenBank record contains the observed sequence of a STR region, annotation ("bracketing") of the repeat region and flanking region polymorphisms, information regarding the sequencing assay and data quality, and backward compatible length-based allele designation. STRSeq GenBank records are organized within a BioProject at NCBI (https://www.ncbi.nlm.nih.gov/bioproject/380127), which is sub-divided into: commonly used autosomal STRs, alternate autosomal STRs, Y-chromosomal STRs, and X-chromosomal STRs. Each of these categories is further divided into locus-specific BioProjects. The BioProject hierarchy facilitates access to the GenBank records by browsing, BLAST searching, or ftp download. Future plans include user interface tools at strseq.nist.gov, a pathway for submission of additional allele records by laboratories performing population sample sequencing and interaction with the STRidER web portal for quality control (http://strider.online).

67 citations


Journal ArticleDOI
TL;DR: The characterization of the faecal virome of healthy chickens described here not only provides a description of the viruses encountered in such niche but should also represent a baseline for future studies comparing viral populations in healthy and diseased chicken flocks.
Abstract: This study is focused on the identification of the faecal virome of healthy chickens raised in high-density, export-driven poultry farms in Brazil. Following high-throughput sequencing, a total of 7743 de novo-assembled contigs were constructed and compared with known nucleotide/amino acid sequences from the GenBank database. Analyses with blastx revealed that 279 contigs (4 %) were related to sequences of eukaryotic viruses. Viral genome sequences (total or partial) indicative of members of recognized viral families, including Adenoviridae, Caliciviridae, Circoviridae, Parvoviridae, Picobirnaviridae, Picornaviridae and Reoviridae, were identified, some of those representing novel genotypes. In addition, a range of circular replication-associated protein encoding DNA viruses were also identified. The characterization of the faecal virome of healthy chickens described here not only provides a description of the viruses encountered in such niche but should also represent a baseline for future studies comparing viral populations in healthy and diseased chicken flocks. Moreover, it may also be relevant for human health, since chickens represent a significant proportion of the animal protein consumed worldwide.

50 citations


Journal ArticleDOI
TL;DR: Findings from an exome study conducted in five affected individuals of a multiplex family with cleft palate only reveal ARHGAP29 to be a regulatory protein essential for proper development of the face, identifies an amino acid that is key for this, and provides a potential new diagnostic tool.
Abstract: Background Recent advances in genomics methodologies, in particular the availability of next-generation sequencing approaches have made it possible to identify risk loci throughout the genome, in particular the exome. In the current study, we present findings from an exome study conducted in five affected individuals of a multiplex family with cleft palate only. Methods The GEnome MINIng (GEMINI) pipeline was used to functionally annotate the single nucleotide polymorphisms, insertions and deletions. Filtering methods were applied to identify variants that are clinically relevant and present in affected individuals at minor allele frequencies (≤1%) in the 1000 Genomes Project single nucleotide polymorphism database, Exome Aggregation Consortium, and Exome Variant Server databases. The bioinformatics tool Systems Tool for Craniofacial Expression-Based Gene Discovery was used to prioritize cleft candidates in our list of variants, and Sanger sequencing was used to validate the presence of identified variants in affected and unaffected relatives. Results Our analyses approach narrowed the candidates down to the novel missense variant in ARHGAP29 (GenBank: NM_004815.3, NP_004806.3;c.1654T>C [p.Ser552Pro]. A functional assay in zebrafish embryos showed that the encoded protein lacks the activity possessed by its wild-type counterpart, and migration assays revealed that keratinocytes transfected with wild-type ARHGAP29 migrated faster than counterparts transfected with the p.Ser552Pro ARHGAP29 variant or empty vector (control). Conclusion These findings reveal ARHGAP29 to be a regulatory protein essential for proper development of the face, identifies an amino acid that is key for this, and provides a potential new diagnostic tool.Birth Defects Research 109:27-37, 2017. © 2016 Wiley Periodicals, Inc.

46 citations


Journal ArticleDOI
TL;DR: Prediction of antigenic epitope indicated that SVA VP1 protein contained both potential B-cell and potential T-cell epitopes, and variation analysis in SVA in southern China was provided.
Abstract: Senecavirus A (SVA) is the only member of genus Senecavirus that causes vesicular lesions in pigs. We have characterized seven SVA isolates from different swine farms in Guangdong, China. The most variable isolate, CH-DL-01-2016, contained a single amino acid insertion at position 219-220 and a 16 amino acid insertion at position 250-251. The VP1 protein also had four nucleotide changes when compared to 31 other SVA VP1 sequences obtained from GenBank. These mutations have not been identified before. Phylogenetic trees demonstrated that the new SVA isolates were clustered into two different clades and shared 96.3%-97.1% similarity with US strains and 97.9%-98.3% similarity with Brazilian stains on nucleotide level, respectively. Prediction of antigenic epitope indicated that SVA VP1 protein contained both potential B-cell and potential T-cell epitopes. This report provides information about variation analysis in SVA in southern China.

41 citations


Journal ArticleDOI
TL;DR: This study is the first report of an Enterobacteriaceae strain harboring a chromosomally integrated blaNDM-1, which directly reveals the vertical spreading pattern of the gene.
Abstract: New Delhi metallo-β-lactamase-1 (NDM-1)-producing Enterobacteriaceae has disseminated rapidly throughout the world and poses an urgent threat to public health. Previous studies confirmed that the blaNDM-1 gene is typically carried in plasmids but rarely in chromosome. We discovered a multidrug-resistant Escherichia coli strain Y5, originating from a urine sample and containing the blaNDM-1 gene, which did not transfer by either conjugation or electrotransformation. We confirmed the possibility of its chromosome location by S1-pulsed-field gel electrophoresis (PFGE) and XbaI-PFGE, followed by Southern blotting. To determine the genomic background of blaNDM-1, the genome of Y5 was completely sequenced and compared to other reference genomes. The results of our study revealed that this isolate consists of a 4.8-Mbp chromosome and three plasmids, it is an epidemic clone of sequence type (ST) 167, and it shows 99% identity with Escherichia coli 6409 (GenBank accession no. CP010371), which lacks the same blaNDM-1 gene-surrounding structure as Y5. The blaNDM-1 gene is embedded in the chromosome along with two tandem copies of an insertion sequence common region 1 (ISCR1) element (sul1-ARR-3-cat-blaNDM-1-bleo-ISCR1), which appears intact in the plasmid from Proteus mirabilis (GenBank accession no. KP662515). The genomic context indicates that the ISCR1 element mediated the blaNDM-1 transposition from a single source plasmid to the chromosome. Our study is the first report of an Enterobacteriaceae strain harboring a chromosomally integrated blaNDM-1, which directly reveals the vertical spreading pattern of the gene. Close surveillance is urgently needed to monitor the emergence and potential spread of ST167 strains that harbor blaNDM-1.

41 citations


Journal ArticleDOI
TL;DR: Enterovirus D68 was rarely observed prior to a widespread outbreak in 2014, but its reemergence in St. Louis in 2016 and sequenced the EV-D68 genomes from two patient samples were observed.
Abstract: To the Editor: During the current (2014) enterovirus/rhinovirus season in the United States, enterovirus D68 (EV-D68) is circulating at an unprecedented level. As of October 6, 2014, the Centers for Disease Control and Prevention (CDC) had confirmed 594 cases of EV-D68 infection in 43 states and the District of Columbia (http://www.cdc.gov/non-polio-enterovirus/outbreaks/EV-D68-outbreaks.html); the actual number of cases was undoubtedly much higher. In mid-August, hospitals in Missouri and Illinois noticed an increased number of patients with severe respiratory illness (1). We observed this pattern at St. Louis Children’s Hospital in St. Louis, Missouri. Resources for studying this virus are limited. Before the current season, only 7 whole-genome sequences and 5 additional complete coding sequences of the virus were available. Therefore, determining whether there are genomic elements associated with rapid spread or severe and unusual disease was not possible. To address these limitations, we determined the complete coding sequence of 1 strain from St. Louis by using high-throughput sequencing of nucleic acid from a clinical sample. To evaluate the sequence diversity in EV-D68 strains circulating in the St. Louis metropolitan area, we also generated partial-genome sequences from 8 more EV-D68–positive clinical samples from St. Louis. During the preparation of this article, CDC generated and submitted to GenBank 7 complete or nearly complete genome sequences from viruses obtained from the Midwest. We documented the diversity of the sequences of strains from St. Louis and compared them to publicly available sequences. The methods are described in brief here and in more detail in the Technical Appendix. This study was conducted under a protocol approved by the Human Research Protection Office of Washington University in St. Louis. Patients were categorized retrospectively as having mild, moderate, or severe disease if they had been discharged home from the emergency unit, admitted to general wards, or admitted to the pediatric intensive care unit, respectively. Residual material from a subset of nasopharyngeal specimens positive for rhinovirus/enterovirus (tested by the BioFire FilmArray Respiratory Panel [BioFire Diagnostics, Salt Lake City, UT, USA] at the Clinical Virology Laboratory, St. Louis Children’s Hospital) was selected for high-throughput sequencing. Total nucleic acid was extracted from clinical samples by using NucliSENS easyMAG (bioMerieux, Marcy l'Etoile, France) and used to make dual-indexed sequencing libraries. Enterovirus/rhinovirus sequences were enriched by using a NimbleGen custom sequence capture reagent (Roche/NimbleGen, Madison, WI, USA), which as of February 2014 was selective for all complete enterovirus and rhinovirus genomes in GenBank. Sequence data were generated on an Illumina HiSeq 2500 (Illumina Inc., San Diego, CA, USA). Sequences were assembled with IDBA-UD (2) and manually improved. The most contiguous genome was annotated by using VIGOR (3). Publicly available sequences were downloaded and compared by using the National Institute of Allergy and Infectious Diseases Virus Pathogen Resource (http://www.viprbrc.org) (4). Variants were identified by using VarScan (5). The sequence was deposited in GenBank under accession no. {"type":"entrez-nucleotide","attrs":{"text":"KM881710","term_id":"748394359","term_text":"KM881710"}}KM881710, BioProject PRJNA263037. For 14 of the 17 samples, high-throughput sequencing data were interpretable (Technical Appendix Table); for the other 3 samples, the number of virus sequence reads was too low to distinguish them from sample cross-talk, which occurs during high-throughput sequencing analysis (6). Of the 14 typed samples, EV-D68 sequences were detected in 7 of 10 samples from patients with severe disease, 2 of 2 with moderate disease, and 0 of 2 with mild disease. The complete coding sequence was assembled from sample EV-D68_STL_2014_12. The most closely related genomes from previous seasons were Thailand, CU134, and CU171 (7) (Figure, panel A). Several of the genome sequences obtained from Missouri strains from this season, which had been sequenced by CDC, were very similar to this genome sequence. Comparison of the virus protein 1 sequence with that of publicly available sequences indicated that the strain from St. Louis and the strain from Missouri (CDC) cluster with virus strains identified in Europe and Asia within the past several years (Figure, panel B). The St. Louis virus shared 97%–99% aa sequence identity with all other sequenced strains. We observed little variation in the strains from St. Louis because they shared 98%–99% nt sequence identity (Technical Appendix Figure). Figure Phylogenetic comparison of enterovirus D68 (EV-D68) obtained from St. Louis, Missouri, USA, in 2014, with other sequenced strains. The phylogenetic relationships of genome sequences (nucleotides) were estimated by using the maximum-likelihood method with ... We provide a genome sequence from the 2014 outbreak of EV-D68 infection in St. Louis, Missouri. This sequence seems to be highly representative of the strains circulating in St. Louis during this time because the other genomes we partially sequenced are very similar. To our knowledge, no amino acids have been associated with virulence or increased infectivity of EV-D68; therefore, we cannot associate the changes we observed in these genomes to phenotypic traits. Because changes in the 5′ untranslated region have the potential to affect the rate of replication (8–10), it is possible that minor genome changes are responsible for the rapid spread and high severity of disease in 2014. Correlation between clinical features of patients in conjunction with additional genomic analysis might provide further insight into the pathogenetic determinants of this strain. Therefore the genome sequence of EV-D68 determined from the 2014 outbreak in St. Louis, Missouri, provides a resource for tracking and genomic comparison of this rapidly spreading virus. Technical Appendix: Supplemental methods. Click here to view.(78K, pdf)

34 citations


Journal ArticleDOI
27 Jul 2017-PeerJ
TL;DR: Correct annotations for GenBank sequences are suggested based on the phylogenetic validation of four Ganoderma species based on morphological features and multigene analysis, which supported the four species distinctions with high bootstrap support.
Abstract: Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II). These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species.

Journal ArticleDOI
TL;DR: Comparison of sequences with cognate sequences in GenBank from isolates with the same species names did not always give coherent data, reinforcing earlier studies that have shown large intraspecific variability in many Penicillium species.
Abstract: Penicillium is a large genus of common molds with over 400 described species; however, identification of individual species is difficult, including for those species that cause postharvest rots. In this study, blue rot fungi from stored apples and pears were isolated from a variety of hosts, locations, and years. Based on morphological and cultural characteristics and partial amplification of the β-tubulin locus, the isolates were provisionally identified as several different species of Penicillium. These isolates were investigated further using a suite of molecular DNA markers and compared to sequences of the ex-type for cognate species in GenBank, and were identified as P. expansum (3 isolates), P. solitum (3 isolates), P. carneum (1 isolate), and P. paneum (1 isolate). Three of the markers we used (ITS, internal transcribed spacer rDNA sequence; benA, β-tubulin; CaM, calmodulin) were suitable for distinguishing most of our isolates from one another at the species level. In contrast, we were unable to amplify RPB2 sequences from four of the isolates. Comparison of our sequences with cognate sequences in GenBank from isolates with the same species names did not always give coherent data, reinforcing earlier studies that have shown large intraspecific variability in many Penicillium species, as well as possible errors in some sequence data deposited in GenBank.

Journal ArticleDOI
TL;DR: It is demonstrated the existence of high genetic variability among PVC2 strains and the ability of this virus to rapidly evolve in Chinese porcine populations.
Abstract: Porcine circovirus type 2 (PCV2) is the cause of postweaning multisystemic wasting syndrome (PMWS), which encompasses several distinct symptoms in pigs PCV2 infection and clinical incidence of PMWS have increased in recent years, possibly due to shifts in viral populations and mutations In this study, we identified PVC2 strains currently afflicting pig populations in mainland China, because this is a prerequisite for developing a specific vaccine to control the spread of PMWS We collected 235 tissue samples from 16 provinces between 2014 and 2016 Of these, 152 samples were positive for PCV2 We compared the sequences we obtained for the PVC2 capsid gene, ORF2, to those of the Chinese PCV2 sequences deposited in GenBank between 2002 and 2016 (n = 648) Phylogenetic analyses demonstrated that the PCV2d genotype was the most prevalent strain in the sample population included in GenBank and among the positive samples from this study We also found one PCV2c strain among the GenBank sequences Furthermore, PCV2a-2F was the predominant genotype in the PCV2a cluster Amino acid sequence comparisons demonstrated 708–100% identity within PCV ORF2 and several consistent mutations in ORF2 More interestingly, six isolates were classified as recombinant strains Cumulatively, this study represents the first comprehensive description of PCV2 strains distribution, including recent samples, in Chinese porcine populations We demonstrate the existence of high genetic variability among PVC2 strains and the ability of this virus to rapidly evolve

Journal ArticleDOI
TL;DR: The draft genome of Corchorus olitorious cv.
Abstract: Here, we present the draft genome (377.3 Mbp) of Corchorus olitorious cv. JRO-524 (Navin), which is a leading dark jute variety developed from a cross between African (cv. Sudan Green) and indigenous (cv. JRO-632) types. We predicted from the draft genome a total of 57,087 protein-coding genes with annotated functions. We identified a large number of 1765 disease resistance-like and defense response genes in the jute genome. The annotated genes showed the highest sequence similarities with that of Theobroma cacao followed by Gossypium raimondii. Seven chromosome-scale genetically anchored pseudomolecules were constructed with a total size of 8.53 Mbp and used for synteny analyses with the cocoa and cotton genomes. Like other plant species, gypsy and copia retrotransposons were the most abundant classes of repeat elements in jute. The raw data of our study are available in SRA database of NCBI with accession number SRX1506532. The genome sequence has been deposited at DDBJ/EMBL/GenBank under the accession LLWS00000000, and the version described in this paper will be the first version (LLWS01000000).

Journal ArticleDOI
01 Jan 2017-Database
TL;DR: The recent taxonomic information was applied to do a complete taxonomic audit for the genus Trichoderma in the NCBI Taxonomy database, and a list of quality records of the RPB2 gene obtained from type material in GenBank that could help validate future submissions.
Abstract: The ITS (nuclear ribosomal internal transcribed spacer) RefSeq database at the National Center for Biotechnology Information (NCBI) is dedicated to the clear association between name, specimen and sequence data This database is focused on sequences obtained from type material stored in public collections While the initial ITS sequence curation effort together with numerous fungal taxonomy experts attempted to cover as many orders as possible, we extended our latest focus to the family and genus ranks We focused on Trichoderma for several reasons, mainly because the asexual and sexual synonyms were well documented, and a list of proposed names and type material were recently proposed and published In this case study the recent taxonomic information was applied to do a complete taxonomic audit for the genus Trichoderma in the NCBI Taxonomy database A name status report is available here: https://wwwncbinlmnihgov/Taxonomy/TaxIdentifier/tax_identifiercgi As a result, the ITS RefSeq Targeted Loci database at NCBI has been augmented with more sequences from type and verified material from Trichoderma species Additionally, to aid in the cross referencing of data from single loci and genomes we have collected a list of quality records of the RPB2 gene obtained from type material in GenBank that could help validate future submissions During the process of curation misidentified genomes were discovered, and sequence records from type material were found hidden under previous classifications Source metadata curation, although more cumbersome, proved to be useful as confirmation of the type material designation Database URL:http://wwwncbinlmnihgov/bioproject/PRJNA177353

Journal ArticleDOI
TL;DR: This work proposes the establishment of ForCyt as a fully-regulated database of species that are commonly encountered in forensic investigations to allow confidence in future species identification and to ensure high quality forensic science.

Journal ArticleDOI
TL;DR: A novel virulent bacteriophage named vB_EfaP_IME199 that specifically infects Enterococcus faecium was isolated and characterized and can be taxonomically classified as a new member of the genus Ahjdlikevirus of family Podoviridae.
Abstract: A novel virulent bacteriophage named vB_EfaP_IME199 that specifically infects Enterococcus faecium was isolated and characterized. Its optimal multiplicity of infection was 0.01, and it had a 30 minute outbreak period. High-throughput sequencing revealed that the phage has a dsDNA genome of 18,838 bp with 22 open reading frames. The genome has very low homology to all other bacteriophage sequences in the GenBank database. Run-off sequencing experiments confirmed that vB_EfaP_IME199 has short inverted terminal repeats. Phylogenetic analysis indicated that vB_EfaP_IME199 can be taxonomically classified as a new member of the genus Ahjdlikevirus of family Podoviridae.

Journal ArticleDOI
TL;DR: The phylogenetic analysis of the 16S rRNA gene suggested that these strains may represent novel species of the Psychrobacter genus, and among the identified coding sequences of the genomes, mercury detoxification and biogeochemistry genes were found, as well as genes related to heavy metals and antibiotic resistance.
Abstract: To date, the genus Psychrobacter consists of 37 recognized species isolated from different sources, however they are more frequently found in cold and other non-polar environments of low water activity. Some strains belonging to the genus have shown different enzymatic activities with potential applications in bioremediation or food industry. In the present study, the whole genome sequences of three Psychrobacter-like strains (C 20.9, Cmf 22.2 and Rd 27.2) isolated from reared clams in Galicia (Spain) are described. The sequenced genomes resulted in an assembly size of 3,143,782 bp for C 20.9 isolate, 3,168,467 bp for Cmf 22.2 isolate and 3,028,386 bp for Rd 27.2 isolate. Among the identified coding sequences of the genomes, mercury detoxification and biogeochemistry genes were found, as well as genes related to heavy metals and antibiotic resistance. Also virulence-related features were identified such as the siderophore vibrioferrin or an aerobactin-like siderophore. The phylogenetic analysis of the 16S rRNA gene suggested that these strains may represent novel species of the Psychrobacter genus. The genome sequences of the Psychrobacter sp. strains have been deposited at DDBJ/EMBL/GenBank under the accession numbers MRYA00000000 (Cmf 22.2), MRYB00000000 (Rd 27.2) and MRYC00000000 (C 20.9), and the sequences could be found at the site https://www.ncbi.nlm.nih.gov/bioproject/PRJNA353858.

Journal ArticleDOI
TL;DR: The results indicated that the MicroSEQ® D2 LSU rDNA Fungal Identification Kit was equivalent to the in-house developed ITS regions assay to identify fungi at the genus level.

Journal ArticleDOI
TL;DR: Genetic variability analysis of viral sequences obtained by HTS and sequences available in GenBank indicated that the coding regions in the different viral species are under purifying selection, and that recombination events occurred in the majority of the viral species analyzed.
Abstract: The application of high-throughput sequencing technologies (HTS) enables the recovery of many nucleotide sequence fragments from diseased plants and may help in pathogen identification. This study was designed to identify viruses infecting 15 grapevine (Vitis spp.) samples collected from experimental fields and vine collections and assess the genetic variability of the identified viruses. The virus-enriched dsRNAs were extracted from bark scrapings and sequenced using an Illumina platform. The paired-end reads were analyzed, assembled contigs were generated and identified as related to viruses. Contigs of 14 viruses have been identified, some of them covering large extensions of viral genomes or resulting in assembly of near-complete or complete genomes. Grapevine virus infections are usually mixed and the HTS assays were suitable to identify ten viruses already reported that traditionally infect grapevines in Brazil, one that has been recently identified (Grapevine Syrah virus 1) and others (Grapevine Cabernet Sauvignon reovirus, Grapevine Red Globe virus and Grapevine vein clearing virus) not previously reported in this country. Nucleotide identities among Brazilian isolates identified by HTS and homologous grapevine virus sequences in GenBank were high, ranging from 77% to 99%. Genetic variability analysis of viral sequences obtained by HTS and sequences available in GenBank indicated that the coding regions in the different viral species are under purifying selection, and that recombination events occurred in the majority of the viral species analyzed. The coat protein genes, generally, had lower genetic variability than the replicase and movement protein genes.

Journal ArticleDOI
TL;DR: DNA sequencing successfully identified all the 5 cestodes and 7 nematodes with cox1 gene sequences available in GenBank, with all these names appearing as the best match of the cox2 gene sequences of the corresponding clinical samples.

Journal ArticleDOI
31 Jan 2017-PLOS ONE
TL;DR: The selective pressure analysis showed that all HPV-33 and 4/6 HPV-58 E6/E7 major non-synonymous mutations were sites of positive selection and all variations were observed in sites belonging to major histocompatibility complex and/or B-cell predicted epitopes.
Abstract: Cancer of the cervix is associated with infection by certain types of human papillomavirus (HPV). The gene variants differ in immune responses and oncogenic potential. The E6 and E7 proteins encoded by high-risk HPV play a key role in cellular transformation. HPV-33 and HPV-58 types are highly prevalent among Chinese women. To study the gene intratypic variations, polymorphisms and positive selections of HPV-33 and HPV-58 E6/E7 in southwest China, HPV-33 (E6, E7: n = 216) and HPV-58 (E6, E7: n = 405) E6 and E7 genes were sequenced and compared to others submitted to GenBank. Phylogenetic trees were constructed by Maximum-likelihood and the Kimura 2-parameters methods by MEGA 6 (Molecular Evolutionary Genetics Analysis version 6.0). The diversity of secondary structure was analyzed by PSIPred software. The selection pressures acting on the E6/E7 genes were estimated by PAML 4.8 (Phylogenetic Analyses by Maximun Likelihood version4.8) software. The positive sites of HPV-33 and HPV-58 E6/E7 were contrasted by ClustalX 2.1. Among 216 HPV-33 E6 sequences, 8 single nucleotide mutations were observed with 6/8 non-synonymous and 2/8 synonymous mutations. The 216 HPV-33 E7 sequences showed 3 single nucleotide mutations that were non-synonymous. The 405 HPV-58 E6 sequences revealed 8 single nucleotide mutations with 4/8 non-synonymous and 4/8 synonymous mutations. Among 405 HPV-58 E7 sequences, 13 single nucleotide mutations were observed with 10/13 non-synonymous mutations and 3/13 synonymous mutations. The selective pressure analysis showed that all HPV-33 and 4/6 HPV-58 E6/E7 major non-synonymous mutations were sites of positive selection. All variations were observed in sites belonging to major histocompatibility complex and/or B-cell predicted epitopes. K93N and R145 (I/N) were observed in both HPV-33 and HPV-58 E6.

Journal ArticleDOI
TL;DR: The monophyly of each superfamily considered in this study was confirmed by the clades in the phylogenetic tree and Cicadellidae was resolved as monophyletic by the phylogenetics analysis.

Journal ArticleDOI
TL;DR: Four markers showed useful differences in high-resolution melting analysis to identify nucleotide polymorphisms including single- nucleus polymorphisms (SNPs), oligonucleotide polymorphism, and insertions/deletions (InDels) and a combination of three markers was able to distinguish the geographical isolates into two groups.
Abstract: Clubroot is a soil-borne disease caused by the protist Plasmodiophora brassicae (P. brassicae). It is one of the most economically important diseases of Brassica rapa and other cruciferous crops as it can cause remarkable yield reductions. Understanding P. brassicae genetics, and developing efficient molecular markers, is essential for effective detection of harmful races of this pathogen. Samples from 11 Korean field populations of P. brassicae (geographic isolates), collected from nine different locations in South Korea, were used in this study. Genomic DNA was extracted from the clubroot-infected samples to sequence the ribosomal DNA. Primers and probes for P. brassicae were designed using a ribosomal DNA gene sequence from a Japanese strain available in GenBank (accession number AB526843; isolate NGY). The nuclear ribosomal DNA (rDNA) sequence of P. brassicae, comprising 6932 base pairs (bp), was cloned and sequenced and found to include the small subunits (SSUs) and a large subunit (LSU), internal transcribed spacers (ITS1 and ITS2), and a 5.8s. Sequence variation was observed in both the SSU and LSU. Four markers showed useful differences in high-resolution melting analysis to identify nucleotide polymorphisms including single- nucleotide polymorphisms (SNPs), oligonucleotide polymorphisms, and insertions/deletions (InDels). A combination of three markers was able to distinguish the geographical isolates into two groups.

Journal ArticleDOI
TL;DR: The analyzed genomic data of this potentially virulent strain of C. coli will facilitate further understanding of this important foodborne pathogen most likely leading to better control strategies.
Abstract: Campylobacter is a major cause of foodborne illnesses worldwide. Campylobacter infections, commonly caused by ingestion of undercooked poultry and meat products, can lead to gastroenteritis and chronic reactive arthritis in humans. Whole genome sequencing (WGS) is a powerful technology that provides comprehensive genetic information about bacteria and is increasingly being applied to study foodborne pathogens: e.g., evolution, epidemiology/outbreak investigation, and detection. Herein we report the complete genome sequence of Campylobacter coli strain YH502 isolated from retail chicken in the United States. WGS, de novo assembly, and annotation of the genome revealed a chromosome of 1,718,974 bp and a mega-plasmid (pCOS502) of 125,964 bp. GC content of the genome was 31.2% with 1931 coding sequences and 53 non-coding RNAs. Multiple virulence factors including a plasmid-borne type VI secretion system and antimicrobial resistance genes (beta-lactams, fluoroquinolones, and aminoglycoside) were found. The presence of T6SS in a mobile genetic element (plasmid) suggests plausible horizontal transfer of these virulence genes to other organisms. The C. coli YH502 genome also harbors CRISPR sequences and associated proteins. Phylogenetic analysis based on average nucleotide identity and single nucleotide polymorphisms identified closely related C. coli genomes available in the NCBI database. Taken together, the analyzed genomic data of this potentially virulent strain of C. coli will facilitate further understanding of this important foodborne pathogen most likely leading to better control strategies. The chromosome and plasmid sequences of C. coli YH502 have been deposited in GenBank under the accession numbers CP018900.1 and CP018901.1, respectively.

Journal ArticleDOI
TL;DR: Pairwise comparisons of the nucleotide sequences and the deduced amino acid sequences revealed that the four Xinjiang EV-B106 strains had only 80.5–80.8% nucleotide identity and 95.4–97.3% amino acid identity with the Yunnan EV- B106 strain, indicating high mutagenicity.
Abstract: Human enterovirus B106 (EV-B106) is a new member of the enterovirus B species. To date, only three nucleotide sequences of EV-B106 have been published, and only one full-length genome sequence (the Yunnan strain 148/YN/CHN/12) is available in the GenBank database. In this study, we conducted phylogenetic characterisation of four EV-B106 strains isolated in Xinjiang, China. Pairwise comparisons of the nucleotide sequences and the deduced amino acid sequences revealed that the four Xinjiang EV-B106 strains had only 80.5-80.8% nucleotide identity and 95.4-97.3% amino acid identity with the Yunnan EV-B106 strain, indicating high mutagenicity. Similarity plots and bootscanning analyses revealed that frequent intertypic recombination occurred in all four Xinjiang EV-B106 strains in the non-structural region. These four strains may share a donor sequence with the EV-B85 strain, which circulated in Xinjiang in 2011, indicating extensive genetic exchanges between these strains. All Xinjiang EV-B106 strains were temperature-sensitive. An antibody seroprevalence study against EV-B106 in two Xinjiang prefectures also showed low titres of neutralizing antibodies, suggesting limited exposure and transmission in the population. This study contributes the whole genome sequences of EV-B106 to the GenBank database and provides valuable information regarding the molecular epidemiology of EV-B106 in China.

Journal ArticleDOI
TL;DR: The complete genomic characterization of two E-18 strains isolated in Yunnan, China is described and it is indicated that frequent intertypic recombination might have occurred in the two Yunnan strains.
Abstract: Human echovirus 18 (E-18) is a member of the enterovirus B species. To date, sixteen full-length genome sequences of E-18 are available in the GenBank database. In this study, we describe the complete genomic characterization of two E-18 strains isolated in Yunnan, China. Pairwise comparisons of the nucleotide sequences and the deduced amino acid sequences revealed that the two Yunnan E-18 strains had 87.5% nucleotide identity and 96.3-96.5% amino acid identity with the Chinese strain. Phylogenetic and bootscanning analyses revealed the two E-18 strains had the highest identity with other several EV-B serotypes than the other E-18 strains in the P3 coding region, especially, 3B region of the Swine Vesicular disease virus (SVDV) strain HK70, indicated that frequent intertypic recombination might have occurred in the two Yunnan strains. This study contributes the complete genome sequences of E-18 to the GenBank database and provides valuable information on the molecular epidemiology of E-18 in China.

Journal ArticleDOI
Xianzhong Huang1, Lifei Yang1, Yuhuan Jin1, Jun Lin1, Fang Liu1 
TL;DR: The large-scale EST library obtained in this study provides first-hand information on gene sequences expressed in young leaves of A. pumila exposed to salt shock, and will facilitate the understanding of complex adaptive mechanisms for ephemerals.
Abstract: Arabidopsis pumila is an ephemeral plant, and a close relative of the model plant Arabidopsis thaliana, but it possesses higher photosynthetic efficiency, higher propagation rate, and higher salinity tolerance compared to those A. thaliana, thus providing a candidate plant system for gene mining for environmental adaption and salt tolerance. However, A. pumila is an under-explored resource for understanding the genetic mechanisms underlying abiotic stress adaptation. To improve our understanding of the molecular and genetic mechanisms of salt stress adaptation, more than 19,900 clones randomly selected from a cDNA library constructed previously from leaf tissue exposed to high-salinity shock were sequenced. A total of 16,014 high-quality expressed sequence tags (ESTs) were generated, which have been deposited in the dbEST GenBank under accession numbers JZ932319 to JZ948332. Clustering and assembly of these ESTs resulted in the identification of 8,835 unique sequences, consisting of 2,469 contigs and 6,366 singletons. The blastx results revealed 8,011 unigenes with significant similarity to known genes, while only 425 unigenes remained uncharacterized. Functional classification demonstrated an abundance of unigenes involved in binding, catalytic, structural or transporter activities, and in pathways of energy, carbohydrate, amino acid, or lipid metabolism. At least seven main classes of genes were related to salt-tolerance among the 8,835 unigenes. Many previously-reported salt tolerance genes were also manifested in this library, for example VP1, H+-ATPase, NHX1, SOS2, SOS3, NAC, MYB, ERF, LEA, P5CS1. In addition, 251 transcription factors were identified from the library, classified into 42 families. Lastly, changes in expression of the 12 most abundant unigenes, 12 transcription factor genes, and 19 stress-related genes in the first 24 h of exposure to high-salinity stress conditions were monitored by qRT-PCR. The large-scale EST library obtained in this study provides first-hand information on gene sequences expressed in young leaves of A. pumila exposed to salt shock. The rapid discovery of known or unknown genes related to salinity stress response in A. pumila will facilitate the understanding of complex adaptive mechanisms for ephemerals.

Journal ArticleDOI
TL;DR: It is argued that the observed discrepancies are due to incorrect taxonomic identification so that the GenBank accession number KJ956027 represents actually the mt genome of C szanaga erroneously identified as C czerskii, highlighting the potential negative consequences of entry errors, which once they are introduced tend to be propagated among databases and subsequent publications.
Abstract: The complete mitochondrial (mt) genome is sequenced in 2 individuals of the Cherskii's sculpin Cottus czerskii. A surprisingly high level of sequence divergence (10.3%) has been detected between the 2 genomes of C czerskii studied here and the GenBank mt genome of C czerskii (KJ956027). At the same time, a surprisingly low level of divergence (1.4%) has been detected between the GenBank C czerskii (KJ956027) and the Amur sculpin Cottus szanaga (KX762049, KX762050). We argue that the observed discrepancies are due to incorrect taxonomic identification so that the GenBank accession number KJ956027 represents actually the mt genome of C szanaga erroneously identified as C czerskii. Our results are of consequence concerning the GenBank database quality, highlighting the potential negative consequences of entry errors, which once they are introduced tend to be propagated among databases and subsequent publications. We illustrate the premise with the data on recombinant mt genome of the Siberian taimen Hucho taimen (NCBI Reference Sequence Database NC_016426.1; GenBank accession number HQ897271.1), bearing 2 introgressed fragments (≈0.9 kb [kilobase]) from 2 lenok subspecies, Brachymystax lenok and Brachymystax lenok tsinlingensis, submitted to GenBank on June 12, 2011. Since the time of submission, the H taimen recombinant mt genome leading to incorrect phylogenetic inferences was propagated in multiple subsequent publications despite the fact that nonrecombinant H taimen genomes were also available (submitted to GenBank on August 2, 2014; KJ711549, KJ711550). Other examples of recombinant sequences persisting in GenBank are also considered. A GenBank Entry Error Depositary is urgently needed to monitor and avoid a progressive accumulation of wrong biological information.

Journal ArticleDOI
TL;DR: From the genome sequences, the genes encoding plant growth promoting properties such as 1-aminocyclopropane-1-carboxylate deaminase (AcdS), phosphate solubilisation, siderophore, and IAA (indole acetic acid) production are identified.
Abstract: Here, we report the draft genome sequence and annotation of plant growth promoting rhizobacterium Enterobacter cloacae SBP-8 isolated from the rhizosphere of Sorghum bicolor L. growing in desert region of Rajasthan, India. From the genome sequences, we identified the genes encoding plant growth promoting properties such as 1-aminocyclopropane-1-carboxylate deaminase (AcdS), phosphate solubilisation, siderophore, and IAA (indole acetic acid) production. The genes encoding different functions required for colonization including motility, chemotaxis, adherence, and secretion system (I, II, IV, VI) were also identified. The complete genome sequence of this bacterium consists of one chromosome (48,54,065 bp) and one plasmid (85,398). The genome sequence of Enterobacter cloacae SBP-8 was deposited in the Genbank with the accession number CP016906 (chromosome) and CP017413 (plasmid).