Home
/
Authors
/
Carsten Friis

Author

Carsten Friis

Bio: Carsten Friis is an academic researcher from Technical University of Denmark. The author has contributed to research in topics: Genome & Virulence. The author has an hindex of 19, co-authored 26 publications receiving 2559 citations.

Topics: Genome, Virulence, Multilocus sequence typing, Gene, Comparative genomics ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

[...]

Mette Voldby Larsen¹, Salvatore Cosentino¹, Simon Rasmussen¹, Carsten Friis¹, Henrik Hasman¹, Rasmus L. Marvig¹, Lars Jelsbak¹, Thomas Sicheritz-Pontén¹, David W. Ussery¹, Frank Møller Aarestrup¹, Ole Lund¹ - Show less +7 more•Institutions (1)

Technical University of Denmark¹

01 Apr 2012-Journal of Clinical Microbiology

TL;DR: A Web-based method for MLST of 66 bacterial species based on whole-genome sequencing data that enables investigators to determine the sequence types of their isolates on the basis of WGS data.

...read moreread less

Abstract: Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the “gold standard” of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST.

...read moreread less

1,620 citations

Journal Article•DOI•

Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes

[...]

Rolf Sommer Kaas¹, Carsten Friis¹, David W. Ussery¹, Frank Møller Aarestrup¹•Institutions (1)

Technical University of Denmark¹

31 Oct 2012-BMC Genomics

TL;DR: The results of comparing a large and diverse E. coli dataset support the theory that reliable and good resolution phylogenies can be inferred from the core-genome and suggest that the resolution at the isolate level may, subsequently, be improved by targeting more variable genes.

...read moreread less

Abstract: Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful for creating better phylogenies, for determination of molecular clocks and for improved typing techniques. We find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness of the 186 sequenced E. coli genomes. The core-gene tree displays high confidence and divides the E. coli strains into the observed MLST type clades and also separates defined phylotypes. The results of comparing a large and diverse E. coli dataset support the theory that reliable and good resolution phylogenies can be inferred from the core-genome. The results further suggest that the resolution at the isolate level may, subsequently be improved by targeting more variable genes. The use of whole genome sequencing will make it possible to eliminate, or at least reduce, the need for several typing steps used in traditional epidemiology.

...read moreread less

202 citations

Journal Article•DOI•

The Salmonella enterica Pan-genome

[...]

Annika Jacobsen¹, Rene S. Hendriksen¹, Frank M. Aaresturp¹, David W. Ussery¹, David W. Ussery², Carsten Friis¹ - Show less +2 more•Institutions (2)

Technical University of Denmark¹, University of Oslo²

04 Jun 2011-Microbial Ecology

TL;DR: It is found that variation in gene content is localized in a selection of variable genomic regions or islands, including the Salmonella pathogenicity islands (SPIs), transposable elements, phages, and plasmid DNA.

...read moreread less

Abstract: Salmonella enterica is divided into four subspe- cies containing a large number of different serovars, several of which are important zoonotic pathogens and some show a high degree of host specificity or host preference. We compare 45 sequenced S. enterica genomes that are publicly available (22 complete and 23 draft genome sequences). Of these, 35 were found to be of sufficiently good quality to allow a detailed analysis, along with two Escherichia coli strains (K-12 substr. DH10B and the avian pathogenic E. coli (APEC O1) strain). All genomes were subjected to standardized gene finding, and the core and pan-genome of Salmonella were estimated to be around 2,800 and 10,000 gene families, respectively. The constructed pan-genomic dendrograms suggest that gene content is often, but not uniformly correlated to serotype. Any given Salmonella strain has a large stable core, whilst there is an abundance of accessory genes, including the Salmonella pathogenicity islands (SPIs), transposable elements, phages, and plasmid DNA. We visualize conservation in the genomes in relation to chromosomal location and DNA structural features and find that variation in gene content is localized in a selection of variable genomic regions or islands. These include the SPIs but also encompass phage insertion sites and transposable elements. The islands were typically well conserved in several, but not all, isolates— ad ifference which may have implications in, e.g., host specificity.

...read moreread less

169 citations

Journal Article•DOI•

snpTree - a web-server to identify and construct SNP trees from whole genome sequence data

[...]

Pimlapas Leekitcharoenphon¹, Rolf Sommer Kaas¹, Martin Christen Frølund Thomsen¹, Carsten Friis¹, Simon Rasmussen¹, Frank Møller Aarestrup¹ - Show less +2 more•Institutions (1)

Technical University of Denmark¹

13 Dec 2012-BMC Genomics

TL;DR: The snpTree server is an easy to use option for rapid standardised and automatic SNP analysis in epidemiological studies also for users with limited bioinformatic experience.

...read moreread less

Abstract: The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed from concatenated SNPs using FastTree and a perl script. The online server was implemented by HTML, Java and python script. The server was evaluated using four published bacterial WGS data sets (V. cholerae, S. aureus CC398, S. Typhimurium and M. tuberculosis). The evalution results for the first three cases was consistent and concordant for both raw reads and assembled genomes. In the latter case the original publication involved extensive filtering of SNPs, which could not be repeated using snpTree. The snpTree server is an easy to use option for rapid standardised and automatic SNP analysis in epidemiological studies also for users with limited bioinformatic experience. The web server is freely accessible at http://www.cbs.dtu.dk/services/snpTree-1.0/.

...read moreread less

107 citations

Journal Article•DOI•

Genomic Characterization of Campylobacter jejuni Strain M1

[...]

Carsten Friis¹, Trudy M. Wassenaar¹, Muhammad Afzal Javed², Lars Snipen¹, Lars Snipen³, Karin Lagesen⁴, Karin Lagesen⁵, Karin Lagesen¹, Peter Fischer Hallin¹, Diane G. Newell, Monique Toszeghy, Anne J. Ridley, Georgina Manning², David W. Ussery¹ - Show less +10 more•Institutions (5)

Technical University of Denmark¹, Nottingham Trent University², Norwegian University of Life Sciences³, Oslo University Hospital⁴, University of Oslo⁵

26 Aug 2010-PLOS ONE

TL;DR: The C. jejuni pan-genome is identified, as well as the core genome, the auxiliary genes, and genes unique between strains M1 and 81116, and the findings are discussed in the background of the proven virulence potential of M1.

...read moreread less

Abstract: Campylobacter jejuni strain M1 (laboratory designation 99/308) is a rarely documented case of direct transmission of C. jejuni from chicken to a person, resulting in enteritis. We have sequenced the genome of C. jejuni strain M1, and compared this to 12 other C. jejuni sequenced genomes currently publicly available. Compared to these, M1 is closest to strain 81116. Based on the 13 genome sequences, we have identified the C. jejuni pan-genome, as well as the core genome, the auxiliary genes, and genes unique between strains M1 and 81116. The pan-genome contains 2,427 gene families, whilst the core genome comprised 1,295 gene families, or about two-thirds of the gene content of the average of the sequenced C. jejuni genomes. Various comparison and visualization tools were applied to the 13 C. jejuni genome sequences, including a species pan- and core genome plot, a BLAST Matrix and a BLAST Atlas. Trees based on 16S rRNA sequences and on the total gene families in each genome are presented. The findings are discussed in the background of the proven virulence potential of M1.

...read moreread less

102 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Identification of acquired antimicrobial resistance genes

[...]

Ea Zankari¹, Henrik Hasman¹, Salvatore Cosentino¹, Martin Vestergaard¹, Simon Rasmussen¹, Ole Lund¹, Frank Møller Aarestrup¹, Mette Voldby Larsen¹ - Show less +4 more•Institutions (1)

Technical University of Denmark¹

01 Nov 2012-Journal of Antimicrobial Chemotherapy

TL;DR: A web server providing a convenient way of identifying acquired antimicrobial resistance genes in completely sequenced isolates was created, and the method was evaluated on WGS chromosomes and plasmids of 30 isolates.

...read moreread less

Abstract: Objectives Identification of antimicrobial resistance genes is important for understanding the underlying mechanisms and the epidemiology of antimicrobial resistance. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available in routine diagnostic laboratories and is anticipated to substitute traditional methods for resistance gene identification. Thus, the current challenge is to extract the relevant information from the large amount of generated data.

...read moreread less

3,956 citations

Journal Article•DOI•

In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing

[...]

Alessandra Carattoli¹, Ea Zankari², Aurora García-Fernández¹, Mette Voldby Larsen², Ole Lund², Laura Villa¹, Frank Møller Aarestrup², Henrik Hasman² - Show less +4 more•Institutions (2)

Istituto Superiore di Sanità¹, Technical University of Denmark²

01 Jul 2014-Antimicrobial Agents and Chemotherapy

TL;DR: Two easy-to-use Web tools for in silico detection and characterization of whole-genome sequence (WGS) and whole-plasmid sequence data from members of the family Enterobacteriaceae are designed and developed.

...read moreread less

Abstract: In the work presented here, we designed and developed two easy-to-use Web tools for in silico detection and characterization of whole-genome sequence (WGS) and whole-plasmid sequence data from members of the family Enterobacteriaceae. These tools will facilitate bacterial typing based on draft genomes of multidrug-resistant Enterobacteriaceae species by the rapid detection of known plasmid types. Replicon sequences from 559 fully sequenced plasmids associated with the family Enterobacteriaceae in the NCBI nucleotide database were collected to build a consensus database for integration into a Web tool called PlasmidFinder that can be used for replicon sequence analysis of raw, contig group, or completely assembled and closed plasmid sequencing data. The PlasmidFinder database currently consists of 116 replicon sequences that match with at least at 80% nucleotide identity all replicon sequences identified in the 559 fully sequenced plasmids. For plasmid multilocus sequence typing (pMLST) analysis, a database that is updated weekly was generated from www.pubmlst.org and integrated into a Web tool called pMLST. Both databases were evaluated using draft genomes from a collection of Salmonella enterica serovar Typhimurium isolates. PlasmidFinder identified a total of 103 replicons and between zero and five different plasmid replicons within each of 49 S . Typhimurium draft genomes tested. The pMLST Web tool was able to subtype genomic sequencing data of plasmids, revealing both known plasmid sequence types (STs) and new alleles and ST variants. In conclusion, testing of the two Web tools using both fully assembled plasmid sequences and WGS-generated draft genomes showed them to be able to detect a broad variety of plasmids that are often associated with antimicrobial resistance in clinically relevant bacterial pathogens.

...read moreread less

2,834 citations

Journal Article•

Fast Tree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix

[...]

Morgan N. Price, Paramvir S. Dehal, Adam P. Arkin

18 Jun 2009-Lawrence Berkeley National Laboratory

TL;DR: FastTree as mentioned in this paper uses sequence profiles of internal nodes in the tree to implement neighbor-joining and uses heuristics to quickly identify candidate joins, then uses nearest-neighbor interchanges to reduce the length of the tree.

...read moreread less

Abstract: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

...read moreread less

2,436 citations

Journal Article•DOI•

Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications.

[...]

Keith A. Jolley¹, James E. Bray¹, Martin C. J. Maiden¹•Institutions (1)

University of Oxford¹

24 Sep 2018

TL;DR: Developments in the BIGSdb software made from publication to June 2018 are described and it is shown how the platform realises microbial population genomics for a wide range of applications.

...read moreread less

Abstract: The PubMLST.org website hosts a collection of open-access, curated databases that integrate population sequence data with provenance and phenotype information for over 100 different microbial species and genera. Although the PubMLST website was conceived as part of the development of the first multi-locus sequence typing (MLST) scheme in 1998 the software it uses, the Bacterial Isolate Genome Sequence database (BIGSdb, published in 2010), enables PubMLST to include all levels of sequence data, from single gene sequences up to and including complete, finished genomes. Here we describe developments in the BIGSdb software made from publication to June 2018 and show how the platform realises microbial population genomics for a wide range of applications. The system is based on the gene-by-gene analysis of microbial genomes, with each deposited sequence annotated and curated to identify the genes present and systematically catalogue their variation. Originally intended as a means of characterising isolates with typing schemes, the synthesis of sequences and records of genetic variation with provenance and phenotype data permits highly scalable (whole genome sequence data for tens of thousands of isolates) means of addressing a wide range of functional questions, including: the prediction of antimicrobial resistance; likely cross-reactivity with vaccine antigens; and the functional activities of different variants that lead to key phenotypes. There are no limitations to the number of sequences, genetic loci, allelic variants or schemes (combinations of loci) that can be included, enabling each database to represent an expanding catalogue of the genetic variation of the population in question. In addition to providing web-accessible analyses and links to third-party analysis and visualisation tools, the BIGSdb software includes a RESTful application programming interface (API) that enables access to all the underlying data for third-party applications and data analysis pipelines.

...read moreread less

1,349 citations

Journal Article•DOI•

PATRIC, the bacterial bioinformatics database and analysis resource

[...]

Alice R. Wattam¹, David Abraham¹, Oral Dalay¹, Terry Disz¹, Timothy P. Driscoll¹, Joseph L. Gabbard¹, Joseph J. Gillespie¹, Roger Gough¹, Deborah Hix¹, Ronald W. Kenyon¹, Dustin Machi¹, Chunhong Mao¹, Eric K. Nordberg¹, Robert Olson¹, Ross Overbeek¹, Gordon D. Pusch¹, Maulik Shukla¹, Julie R. Schulman¹, Rick Stevens¹, Daniel E. Sullivan¹, Veronika Vonstein¹, Andrew S. Warren¹, Rebecca Will¹, Meredith J. C. Wilson¹, Hyunseung Yoo¹, Chengdong Zhang¹, Yan Zhang¹, Bruno W. S. Sobral¹ - Show less +24 more•Institutions (1)

University of Maryland, Baltimore¹

01 Jan 2014-Nucleic Acids Research

TL;DR: The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) and describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.

...read moreread less

Abstract: The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein–protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.

...read moreread less

1,136 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse