•Journal•ISSN: 1471-2164

BMC Genomics

Springer Science+Business Media

About: BMC Genomics is an academic journal published by Springer Science+Business Media. The journal publishes majorly in the area(s): Gene & Genome. It has an ISSN identifier of 1471-2164. It is also open access. Over the lifetime, 16053 publications have been published receiving 658393 citations. The journal is also known as: BioMed Central genomics & Genomics.

...read moreread less

Topics: Gene, Genome, Biology, Transcriptome, Population ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The RAST Server: Rapid Annotations using Subsystems Technology

[...]

Ramy K. Aziz¹, Ramy K. Aziz², Daniela Bartels³, Aaron A. Best⁴, Matthew DeJongh⁴, Terrence Disz³, Terrence Disz⁵, Robert Edwards⁵, Kevin Formsma⁴, Svetlana Gerdes, Elizabeth M. Glass⁵, Michael Kubal³, Folker Meyer³, Folker Meyer⁵, Gary J. Olsen⁵, Gary J. Olsen⁶, Robert Olson⁵, Robert Olson³, Andrei L. Osterman⁷, Ross Overbeek, Leslie Klis McNeil⁶, Daniel Paarmann³, Tobias Paczian³, Bruce Parrello, Gordon D. Pusch³, Claudia I. Reich⁶, Rick Stevens³, Rick Stevens⁵, Olga Vassieva, Veronika Vonstein, Andreas Wilke³, Olga Zagnitko - Show less +28 more•Institutions (7)

Cairo University¹, University of Tennessee Health Science Center², University of Chicago³, Hope College⁴, Argonne National Laboratory⁵, University of Illinois at Urbana–Champaign⁶, Sanford-Burnham Institute for Medical Research⁷

08 Feb 2008-BMC Genomics

TL;DR: A fully automated service for annotating bacterial and archaeal genomes that identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user.

...read moreread less

Abstract: The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.

...read moreread less

9,397 citations

Journal Article•DOI•

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

[...]

Davide Chicco, Giuseppe Jurman¹•Institutions (1)

fondazione bruno kessler¹

02 Jan 2020-BMC Genomics

TL;DR: This article shows how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario.

...read moreread less

Abstract: To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets. The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset. In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F1 score in evaluating binary classification tasks by all scientific communities.

...read moreread less

2,358 citations

Journal Article•DOI•

BLAST Ring Image Generator (BRIG) : simple prokaryote genome comparisons

[...]

Nabil-Fareed Alikhan¹, Nicola K. Petty¹, Nouri L. Ben Zakour¹, Scott A. Beatson¹•Institutions (1)

University of Queensland¹

08 Aug 2011-BMC Genomics

TL;DR: BRIG is a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface and will perform all required file parsing and BLAST comparisons automatically.

...read moreread less

Abstract: Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image.

...read moreread less

2,254 citations

Journal Article•DOI•

Centering, scaling, and transformations: improving the biological information content of metabolomics data

[...]

Robert A. van den Berg, Huub C. J. Hoefsloot¹, Johan A. Westerhuis¹, Age K. Smilde¹, Mariët J. van der Werf - Show less +1 more•Institutions (1)

University of Amsterdam¹

08 Jun 2006-BMC Genomics

TL;DR: Range scaling and autoscaling were able to remove the dependence of the rank of the metabolites on the average concentration and the magnitude of the fold changes and showed biologically sensible results after PCA (principal component analysis).

...read moreread less

Abstract: Extracting relevant biological information from large data sets is a major challenge in functional genomics research. Different aspects of the data hamper their biological interpretation. For instance, 5000-fold differences in concentration for different metabolites are present in a metabolomics data set, while these differences are not proportional to the biological relevance of these metabolites. However, data analysis methods are not able to make this distinction. Data pretreatment methods can correct for aspects that hinder the biological interpretation of metabolomics data sets by emphasizing the biological information in the data set and thus improving their biological interpretability. Different data pretreatment methods, i.e. centering, autoscaling, pareto scaling, range scaling, vast scaling, log transformation, and power transformation, were tested on a real-life metabolomics data set. They were found to greatly affect the outcome of the data analysis and thus the rank of the, from a biological point of view, most important metabolites. Furthermore, the stability of the rank, the influence of technical errors on data analysis, and the preference of data analysis methods for selecting highly abundant metabolites were affected by the data pretreatment method used prior to data analysis. Different pretreatment methods emphasize different aspects of the data and each pretreatment method has its own merits and drawbacks. The choice for a pretreatment method depends on the biological question to be answered, the properties of the data set and the data analysis method selected. For the explorative analysis of the validation data set used in this study, autoscaling and range scaling performed better than the other pretreatment methods. That is, range scaling and autoscaling were able to remove the dependence of the rank of the metabolites on the average concentration and the magnitude of the fold changes and showed biologically sensible results after PCA (principal component analysis). In conclusion, selecting a proper data pretreatment method is an essential step in the analysis of metabolomics data and greatly affects the metabolites that are identified to be the most important.

...read moreread less

1,987 citations

Journal Article•DOI•

A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.

[...]

Michael A. Quail¹, Miriam Smith¹, Paul Coupland¹, Thomas D. Otto¹, Simon R. Harris¹, Thomas R. Connor¹, Anna Bertoni¹, Harold Swerdlow¹, Yong Gu¹ - Show less +5 more•Institutions (1)

Wellcome Trust Sanger Institute¹

24 Jul 2012-BMC Genomics

TL;DR: All three fast turnaround sequencers evaluated here were able to generate usable sequence, however there are key differences between the quality of that data and the applications it will support.

...read moreread less

Abstract: Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent’s PGM, Pacific Biosciences’ RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.

...read moreread less

1,967 citations

Collapse

Performance

Metrics

16,099

Papers

658,455

Citations

No. of papers from the Journal in previous years
Year	Papers
2023	393
2022	1,342
2021	853
2020	892
2019	1,030
2018	972