Home
/
Authors
/
Christine Nguyen

Author

Christine Nguyen

Bio: Christine Nguyen is an academic researcher from Washington University in St. Louis. The author has contributed to research in topics: Salmonella enterica & Euchromatin. The author has an hindex of 5, co-authored 5 publications receiving 4397 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes.

[...]

Helen Skaletsky¹, Tomoko Kuroda-Kawaguchi¹, Patrick Minx², Holland S. Cordum², LaDeana W. Hillier², Laura G. Brown¹, Sjoerd Repping, Tatyana Pyntikova¹, Johar Ali², Tamberlyn Bieri², Asif T. Chinwalla², Andrew Delehaunty², Kim D. Delehaunty², Hui Du², Ginger A. Fewell², Lucinda Fulton², Robert S. Fulton², Tina Graves², Shunfang Hou², Philip Latrielle², Shawn Leonard², Elaine R. Mardis², Rachel Maupin², John Douglas Mcpherson², Tracie L. Miner², William E. Nash², Christine Nguyen², Philip Ozersky², Kymberlie H. Pepin², Susan M. Rock², Tracy Rohlfing², Kelsi Scott², Brian Schultz², Cindy Strong², Aye Mon Tin-Wollam², Shiaw-Pyng Yang², Robert H. Waterston², Richard K. Wilson², Steve Rozen¹, David C. Page¹ - Show less +36 more•Institutions (2)

Massachusetts Institute of Technology¹, Washington University in St. Louis²

19 Jun 2003-Nature

TL;DR: The male-specific region of the Y chromosome, the MSY, differentiates the sexes and comprises 95% of the chromosome's length, and is a mosaic of heterochromatic sequences and three classes of euchromatics sequences: X-transposed, X-degenerate and ampliconic.

...read moreread less

Abstract: The male-specific region of the Y chromosome, the MSY, differentiates the sexes and comprises 95% of the chromosome's length. Here, we report that the MSY is a mosaic of heterochromatic sequences and three classes of euchromatic sequences: X-transposed, X-degenerate and ampliconic. These classes contain all 156 known transcription units, which include 78 protein-coding genes that collectively encode 27 distinct proteins. The X-transposed sequences exhibit 99% identity to the X chromosome. The X-degenerate sequences are remnants of ancient autosomes from which the modern X and Y chromosomes evolved. The ampliconic class includes large regions (about 30% of the MSY euchromatin) where sequence pairs show greater than 99.9% identity, which is maintained by frequent gene conversion (non-reciprocal transfer). The most prominent features here are eight massive palindromes, at least six of which contain testis genes.

...read moreread less

2,022 citations

Journal Article•DOI•

Complete genome sequence of Salmonella enterica serovar Typhimurium LT2

[...]

Michael McClelland, Kenneth E. Sanderson¹, John Spieth², Sandra W. Clifton², Phil Latreille², Laura Courtney², Steffen Porwollik, Johar Ali², Mike Dante², Feiyu Du², Shunfang Hou², Dan Layman², Shawn Leonard², Christine Nguyen², Kelsi Scott², Andrea Holmes², Neenu Grewal², Elizabeth Mulvaney², Ellen E. Ryan², Hui Sun², Liliana Florea³, Liliana Florea⁴, Webb Miller³, Tamberlyn Stoneking², Michael Nhan², Robert H. Waterston², Richard K. Wilson² - Show less +23 more•Institutions (4)

University of Calgary¹, Washington University in St. Louis², Pennsylvania State University³, Celera Corporation⁴

25 Oct 2001-Nature

TL;DR: The distribution of close homologues of S. typhimurium LT2 genes in eight related enterobacteria was determined using previously completed genomes of three related bacteria, sample sequencing of both S. enterica serovar Paratyphi A and Klebsiella pneumoniae as mentioned in this paper.

...read moreread less

Abstract: Salmonella enterica subspecies I, serovar Typhimurium (S. typhimurium), is a leading cause of human gastroenteritis, and is used as a mouse model of human typhoid fever. The incidence of non-typhoid salmonellosis is increasing worldwide, causing millions of infections and many deaths in the human population each year. Here we sequenced the 4,857-kilobase (kb) chromosome and 94-kb virulence plasmid of S. typhimurium strain LT2. The distribution of close homologues of S. typhimurium LT2 genes in eight related enterobacteria was determined using previously completed genomes of three related bacteria, sample sequencing of both S. enterica serovar Paratyphi A (S. paratyphi A) and Klebsiella pneumoniae, and hybridization of three unsequenced genomes to a microarray of S. typhimurium LT2 genes. Lateral transfer of genes is frequent, with 11% of the S. typhimurium LT2 genes missing from S. enterica serovar Typhi (S. typhi), and 29% missing from Escherichia coli K12. The 352 gene homologues of S. typhimurium LT2 confined to subspecies I of S. enterica-containing most mammalian and bird pathogens-are useful for studies of epidemiology, host specificity and pathogenesis. Most of these homologues were previously unknown, and 50 may be exported to the periplasm or outer membrane, rendering them accessible as therapeutic or vaccine targets.

...read moreread less

1,850 citations

Journal Article•DOI•

Comparison of genome degradation in Paratyphi A and Typhi, human-restricted serovars of Salmonella enterica that cause typhoid.

[...]

Michael McClelland, Kenneth E. Sanderson¹, Sandra W. Clifton², Phil Latreille², Steffen Porwollik, Aniko Sabo², Rekha Meyer², Tamberlyn Bieri², Phil Ozersky², Michael D. McLellan², C Richard Harkins², Chunyan Wang², Christine Nguyen², Amy Berghoff², Glendoria Elliott², Sara Kohlberg², Cindy Strong², Feiyu Du², Jason Carter², Colin Kremizki², Dan Layman², Shawn Leonard², Hui Sun², Lucinda Fulton², William E. Nash², Tracie L. Miner², Patrick Minx², Kim D. Delehaunty², Catrina Fronick², Vincent Magrini², Michael Nhan², Wesley C. Warren², Liliana Florea³, John Spieth², Richard K. Wilson² - Show less +31 more•Institutions (3)

University of Calgary¹, Washington University in St. Louis², Applied Biosystems³

07 Nov 2004-Nature Genetics

TL;DR: The sequence and microarray analysis of the Paratyphi A genome indicates that it is similar to the Typhi genome but suggests that it has a more recent evolutionary origin.

...read moreread less

Abstract: Salmonella enterica serovars often have a broad host range, and some cause both gastrointestinal and systemic disease. But the serovars Paratyphi A and Typhi are restricted to humans and cause only systemic disease. It has been estimated that Typhi arose in the last few thousand years. The sequence and microarray analysis of the Paratyphi A genome indicates that it is similar to the Typhi genome but suggests that it has a more recent evolutionary origin. Both genomes have independently accumulated many pseudogenes among their approximately 4,400 protein coding sequences: 173 in Paratyphi A and approximately 210 in Typhi. The recent convergence of these two similar genomes on a similar phenotype is subtly reflected in their genotypes: only 30 genes are degraded in both serovars. Nevertheless, these 30 genes include three known to be important in gastroenteritis, which does not occur in these serovars, and four for Salmonella-translocated effectors, which are normally secreted into host cells to subvert host functions. Loss of function also occurs by mutation in different genes in the same pathway (e.g., in chemotaxis and in the production of fimbriae).

...read moreread less

392 citations

Journal Article•DOI•

The DNA sequence of human chromosome 7

[...]

LaDeana W. Hillier¹, Robert S. Fulton¹, Lucinda Fulton¹, Tina Graves¹, Kymberlie H. Pepin¹, Caryn Wagner-McPherson¹, Dan Layman¹, Jason Maas¹, Sara Jaeger¹, Rebecca S. Walker¹, Kristine M. Wylie¹, Mandeep Sekhon¹, Michael C. Becker¹, Michelle O'Laughlin¹, Mark E. Schaller¹, Ginger A. Fewell¹, Kimberly D. Delehaunty¹, Tracie L. Miner¹, William E. Nash¹, Matt Cordes¹, Hui Du¹, Hui Sun¹, Jennifer Edwards¹, Holland Bradshaw-Cordum¹, Johar Ali¹, Stephanie Andrews¹, Amber Isak¹, Andrew Vanbrunt¹, Christine Nguyen¹, Feiyu Du¹, Betty Lamar¹, Laura Courtney¹, Joelle Kalicki¹, Philip Ozersky¹, Lauren Bielicki¹, Kelsi Scott¹, Andrea Holmes¹, Richard Harkins¹, Anthony R. Harris¹, Cindy Strong¹, Shunfang Hou¹, Chad Tomlinson¹, Sara Dauphin-Kohlberg¹, Amy Kozlowicz-Reilly¹, Shawn Leonard¹, Theresa Rohlfing¹, Susan M. Rock¹, Aye-Mon Tin-Wollam¹, Amanda Abbott¹, Patrick Minx¹, Rachel Maupin¹, Catrina Strowmatt¹, Phil Latreille¹, Nancy Miller¹, Doug Johnson¹, Jennifer Murray¹, Jeffrey Woessner¹, Michael C. Wendl¹, Shiaw-Pyng Yang¹, Brian Schultz¹, John W. Wallis¹, John Spieth¹, Tamberlyn Bieri¹, Joanne O. Nelson¹, Nicolas Berkowicz¹, Patricia Wohldmann¹, Lisa Cook¹, Matthew T. Hickenbotham¹, James M. Eldred¹, Donald Williams¹, Joseph A. Bedell¹, Elaine R. Mardis¹, Sandra W. Clifton¹, Stephanie L. Chissoe¹, Marco A. Marra², Marco A. Marra¹, Christopher K. Raymond³, Eric Haugen³, Will Gillett³, Yang Zhou³, R. James³, Karen A. Phelps³, Shawn Iadanoto³, Kerry L. Bubb³, Elizabeth Simms³, Ruth Levy³, James B. Clendenning³, Rajinder Kaul³, W. James Kent⁴, Terrence S. Furey⁴, Robert Baertsch⁴, Michael R. Brent¹, Evan Keibler¹, Paul Flicek¹, Peer Bork⁵, Mikita Suyama⁵, Jeffrey A. Bailey⁶, Matthew E. Portnoy⁷, David Torrents⁵, Asif T. Chinwalla¹, Warren Gish¹, Sean R. Eddy¹, John Douglas Mcpherson¹, John Douglas Mcpherson⁸, Maynard V. Olson³, Evan E. Eichler⁶, Eric D. Green⁷, Robert H. Waterston¹, Robert H. Waterston³, Richard K. Wilson¹ - Show less +106 more•Institutions (8)

Washington University in St. Louis¹, BC Cancer Agency², University of Washington³, University of California, Santa Cruz⁴, European Bioinformatics Institute⁵, Case Western Reserve University⁶, National Institutes of Health⁷, Human Genome Sequencing Center⁸

10 Jul 2003-Nature

TL;DR: The euchromatic sequence of chromosome 7, the first metacentric chromosome completed so far, has excellent concordance with previously established physical and genetic maps, and it exhibits an unusual amount of segmentally duplicated sequence.

...read moreread less

Abstract: Human chromosome 7 has historically received prominent attention in the human genetics community, primarily related to the search for the cystic fibrosis gene and the frequent cytogenetic changes associated with various forms of cancer. Here we present more than 153 million base pairs representing 99.4% of the euchromatic sequence of chromosome 7, the first metacentric chromosome completed so far. The sequence has excellent concordance with previously established physical and genetic maps, and it exhibits an unusual amount of segmentally duplicated sequence (8.2%), with marked differences between the two arms. Our initial analyses have identified 1,150 protein-coding genes, 605 of which have been confirmed by complementary DNA sequences, and an additional 941 pseudogenes. Of genes confirmed by transcript sequences, some are polymorphic for mutations that disrupt the reading frame.

...read moreread less

244 citations

Journal Article•DOI•

Generation and annotation of the DNA sequences of human chromosomes 2 and 4

[...]

LaDeana W. Hillier¹, Tina Graves¹, Robert S. Fulton¹, Lucinda Fulton¹, Kymberlie H. Pepin¹, Patrick Minx¹, Caryn Wagner-McPherson¹, Dan Layman¹, Kristine M. Wylie¹, Mandeep Sekhon¹, Michael C. Becker¹, Ginger A. Fewell¹, Kimberly D. Delehaunty¹, Tracie L. Miner¹, William E. Nash¹, Colin Kremitzki¹, Lachlan G. Oddy¹, Hui Du¹, Hui Sun¹, Holland Bradshaw-Cordum¹, Johar Ali¹, Jason Carter¹, Matt Cordes¹, Anthony R. Harris¹, Amber Isak¹, Andrew Van Brunt¹, Christine Nguyen¹, Feiyu Du¹, Laura Courtney¹, Joelle Kalicki¹, Philip Ozersky¹, Scott Abbott¹, Jon R. Armstrong¹, Edward A. Belter¹, Lauren Caruso¹, Maria Cedroni¹, Marc Cotton¹, Teresa Davidson¹, Anu Desai¹, Glendoria Elliott¹, Thomas Erb¹, Catrina Fronick¹, Tony Gaige¹, William Haakenson¹, Krista Haglund¹, Andrea Holmes¹, Richard Harkins¹, Kyung Kim¹, Scott Kruchowski¹, Cindy Strong¹, Neenu Grewal¹, Ernest Goyea¹, Shunfang Hou¹, Andrew Levy¹, Scott Martinka¹, Kelly Mead¹, Michael D. McLellan¹, Rick Meyer¹, Jennifer Randall-Maher¹, Chad Tomlinson¹, Sara Dauphin-Kohlberg¹, Amy Kozlowicz-Reilly¹, Neha Shah¹, Sharhonda Swearengen-Shahid¹, Jacqueline E. Snider¹, Joseph T. Strong¹, Johanna Thompson¹, Martin Yoakum¹, Shawn Leonard¹, Charlene Pearman¹, Lee Trani¹, Maxim Radionenko¹, Jason Waligorski¹, Chunyan Wang¹, Susan M. Rock¹, Aye Mon Tin-Wollam¹, Rachel Maupin¹, Phil Latreille¹, Michael C. Wendl¹, Shiaw Pyng Yang¹, Craig Pohl¹, John W. Wallis¹, John Spieth¹, Tamberlyn Bieri¹, Nicolas Berkowicz¹, Joanne O. Nelson¹, John R. Osborne¹, Li Ding¹, Rekha Meyer¹, Aniko Sabo¹, Yoram Shotland¹, Prashant R. Sinha¹, Patricia Wohldmann¹, Lisa Cook¹, Matthew T. Hickenbotham¹, James M. Eldred¹, Donald Williams¹, Thomas A. Jones¹, Xinwei She², Francesca D. Ciccarelli, Elisa Izaurralde, James Taylor³, Jeremy Schmutz⁴, Richard M. Myers⁴, David R. Cox⁴, Xiaoqiu Huang⁵, John Douglas Mcpherson¹, John Douglas Mcpherson⁶, Elaine R. Mardis¹, Sandra W. Clifton¹, Wesley C. Warren¹, Asif T. Chinwalla¹, Sean R. Eddy¹, Marco A. Marra¹, Marco A. Marra⁷, Ivan Ovcharenko⁸, Terrence S. Furey⁹, Webb Miller³, Evan E. Eichler², Peer Bork, Mikita Suyama, David Torrents, Robert H. Waterston¹, Robert H. Waterston², Richard K. Wilson¹ - Show less +121 more•Institutions (9)

Washington University in St. Louis¹, University of Washington², Pennsylvania State University³, Stanford University⁴, Iowa State University⁵, Baylor College of Medicine⁶, University of British Columbia⁷, Lawrence Livermore National Laboratory⁸, University of California, Santa Cruz⁹

07 Apr 2005-Nature

TL;DR: Extensive analyses confirm the underlying construction of the sequence, and expand the understanding of the structure and evolution of mammalian chromosomes, including gene deserts, segmental duplications and highly variant regions.

...read moreread less

Abstract: Human chromosome 2 is unique to the human lineage in being the product of a head-to-head fusion of two intermediate-sized ancestral chromosomes. Chromosome 4 has received attention primarily related to the search for the Huntington's disease gene, but also for genes associated with Wolf-Hirschhorn syndrome, polycystic kidney disease and a form of muscular dystrophy. Here we present approximately 237 million base pairs of sequence for chromosome 2, and 186 million base pairs for chromosome 4, representing more than 99.6% of their euchromatic sequences. Our initial analyses have identified 1,346 protein-coding genes and 1,239 pseudogenes on chromosome 2, and 796 protein-coding genes and 778 pseudogenes on chromosome 4. Extensive analyses confirm the underlying construction of the sequence, and expand our understanding of the structure and evolution of mammalian chromosomes, including gene deserts, segmental duplications and highly variant regions.

...read moreread less

107 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

GSVA: gene set variation analysis for microarray and RNA-seq data.

[...]

Sonja Hänzelmann, Robert Castelo¹, Justin Guinney²•Institutions (2)

Pompeu Fabra University¹, Sage Bionetworks²

16 Jan 2013-BMC Bioinformatics

TL;DR: This work introduces Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner and constitutes a starting point to build pathway-centric models of biology.

...read moreread less

Abstract: Gene set enrichment (GSE) analysis is a popular framework for condensing information from gene expression profiles into a pathway or signature summary. The strengths of this approach over single gene analysis include noise and dimension reduction, as well as greater biological interpretability. As molecular profiling experiments move beyond simple case-control studies, robust and flexible GSE methodologies are needed that can model pathway activity within highly heterogeneous data sets. To address this challenge, we introduce Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner. We demonstrate the robustness of GSVA in a comparison with current state of the art sample-wise enrichment methods. Further, we provide examples of its utility in differential pathway activity and survival analysis. Lastly, we show how GSVA works analogously with data from both microarray and RNA-seq experiments. GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. While GSE methods are generally regarded as end points of a bioinformatic analysis, GSVA constitutes a starting point to build pathway-centric models of biology. Moreover, GSVA contributes to the current need of GSE methods for RNA-seq data. GSVA is an open source software package for R which forms part of the Bioconductor project and can be downloaded at http://www.bioconductor.org .

...read moreread less

6,125 citations

Journal Article•DOI•

voom: precision weights unlock linear model analysis tools for RNA-seq read counts

[...]

Charity W. Law¹, Charity W. Law², Yunshun Chen¹, Yunshun Chen², Wei Shi¹, Wei Shi², Gordon K. Smyth¹, Gordon K. Smyth² - Show less +4 more•Institutions (2)

University of Melbourne¹, Walter and Eliza Hall Institute of Medical Research²

03 Feb 2014-Genome Biology

TL;DR: New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments, and the voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline.

...read moreread less

Abstract: New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods.

...read moreread less

4,475 citations

Journal Article•DOI•

The COG database: an updated version includes eukaryotes

[...]

Roman L. Tatusov¹, Natalie D. Fedorova¹, John D. Jackson¹, Aviva R. Jacobs¹, Boris Kiryutin¹, Eugene V. Koonin¹, Dmitri M. Krylov¹, Raja Mazumder², Sergei L. Mekhedov¹, Anastasia N. Nikolskaya², B Sridhar Rao¹, Sergei Smirnov¹, Alexander V. Sverdlov¹, Sona Vasudevan¹, Yuri I. Wolf¹, Jodie J. Yin¹, Darren A. Natale² - Show less +13 more•Institutions (2)

National Institutes of Health¹, Georgetown University Medical Center²

11 Sep 2003-BMC Bioinformatics

TL;DR: A major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes is described and is expected to be a useful platform for functional annotation of newlysequenced genomes, including those of complex eukARYotes, and genome-wide evolutionary studies.

...read moreread less

Abstract: The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after euk aryotic o rthologous g roups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The euk aryotic o rthologous g roups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of ~20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (~1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.

...read moreread less

4,167 citations

Journal Article•DOI•

Finishing the euchromatic sequence of the human genome

[...]

Chris P. Ponting, Daniel Barker

21 Oct 2004-Nature

TL;DR: The current human genome sequence (Build 35) as discussed by the authors contains 2.85 billion nucleotides interrupted by only 341 gaps and is accurate to an error rate of approximately 1 event per 100,000 bases.

...read moreread less

Abstract: The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human genome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

...read moreread less

3,989 citations

Journal Article•DOI•

Mauve: multiple alignment of conserved genomic sequence with rearrangements.

[...]

Aaron E. Darling¹, Bob Mau, Frederick R. Blattner, Nicole T. Perna•Institutions (1)

University of Wisconsin-Madison¹

01 Jul 2004-Genome Research

TL;DR: This work presents methods for identification and alignment of conserved genomic DNA in the presence of rearrangements and horizontal transfer and evaluated the quality of Mauve alignments and drawn comparison to other methods through extensive simulations of genome evolution.

...read moreread less

Abstract: As genomes evolve, they undergo large-scale evolutionary processes that present a challenge to sequence comparison not posed by short sequences. Recombination causes frequent genome rearrangements, horizontal transfer introduces new sequences into bacterial chromosomes, and deletions remove segments of the genome. Consequently, each genome is a mosaic of unique lineage-specific segments, regions shared with a subset of other genomes and segments conserved among all the genomes under consideration. Furthermore, the linear order of these segments may be shuffled among genomes. We present methods for identification and alignment of conserved genomic DNA in the presence of rearrangements and horizontal transfer. Our methods have been implemented in a software package called Mauve. Mauve has been applied to align nine enterobacterial genomes and to determine global rearrangement structure in three mammalian genomes. We have evaluated the quality of Mauve alignments and drawn comparison to other methods through extensive simulations of genome evolution.

...read moreread less

3,741 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse