Showing papers in &quot;Genome Research in 2008&quot;

Mapping short DNA sequencing reads and calling variants using mapping quality scores

TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.

...read moreread less

Abstract: We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.

...read moreread less

9,389 citations

Journal Article•DOI•

[...]

Heng Li¹, Jue Ruan, Richard Durbin•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Nov 2008-Genome Research

TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.

...read moreread less

Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

...read moreread less

2,927 citations

Journal Article•DOI•

RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays

[...]

John C. Marioni¹, Christopher E. Mason, Shrikant Mane, Matthew Stephens, Yoav Gilad - Show less +1 more•Institutions (1)

University of Chicago¹

MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes

TL;DR: It is found that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane).

...read moreread less

Abstract: Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

...read moreread less

2,834 citations

Journal Article•DOI•

[...]

Brandi L. Cantarel¹, Ian F Korf², Sofia M. C. Robb¹, Genís Parra², Eric D. Ross¹, Barry Moore¹, Carson Holt¹, Alejandro Sánchez Alvarado¹, Mark Yandell¹ - Show less +5 more•Institutions (2)

University of Utah¹, University of California, Davis²

01 Jan 2008-Genome Research

TL;DR: The results demonstrate that MAKER provides a simple and effective means to convert a genome sequence into a community-accessible genome database, and should prove especially useful for emerging model organism genome projects for which extensive bioinformatics resources may not be readily available.

...read moreread less

Abstract: We have developed a portable and easily configurable genome annotation pipeline called MAKER. Its purpose is to allow investigators to independently annotate eukaryotic genomes and create genome databases. MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab initio gene predictions, and automatically synthesizes these data into gene annotations having evidence-based quality indices. MAKER is also easily trainable: Outputs of preliminary runs are used to automatically retrain its gene-prediction algorithm, producing higher-quality gene-models on subsequent runs. MAKER’s inputs are minimal, and its outputs can be used to create a GMOD database. Its outputs can also be viewed in the Apollo Genome browser; this feature of MAKER provides an easy means to annotate, view, and edit individual contigs and BACs without the overhead of a database. As proof of principle, we have used MAKER to annotate the genome of the planarian Schmidtea mediterranea and to create a new genome database, SmedGD. We have also compared MAKER’s performance to other published annotation pipelines. Our results demonstrate that MAKER provides a simple and effective means to convert a genome sequence into a community-accessible genome database. MAKER should prove especially useful for emerging model organism genome projects for which extensive bioinformatics resources may not be readily available.

...read moreread less

1,503 citations

Journal Article•DOI•

Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells.

[...]

Ryan D. Morin¹, Michael D O'Connor, Malachi Griffith, Florian Kuchenbauer, Allen Delaney, Anna-Liisa Prabhu, Yongjun Zhao, Helen McDonald, Thomas Zeng, Martin Hirst, Connie J. Eaves, Marco A. Marra - Show less +8 more•Institutions (1)

BC Cancer Agency¹

01 Apr 2008-Genome Research

TL;DR: Application of this approach to RNA from human embryonic stem cells obtained before and after their differentiation into embryoid bodies revealed the sequences and expression levels of 334 known plus 104 novel miRNA genes, representing the deepest miRNA sampling to date.

...read moreread less

Abstract: MicroRNAs (miRNAs) are emerging as important, albeit poorly characterized, regulators of biological processes. Key to further elucidation of their roles is the generation of more complete lists of their numbers and expression changes in different cell states. Here, we report a new method for surveying the expression of small RNAs, including microRNAs, using Illumina sequencing technology. We also present a set of methods for annotating sequences deriving from known miRNAs, identifying variability in mature miRNA sequences, and identifying sequences belonging to previously unidentified miRNA genes. Application of this approach to RNA from human embryonic stem cells obtained before and after their differentiation into embryoid bodies revealed the sequences and expression levels of 334 known plus 104 novel miRNA genes. One hundred seventy-one known and 23 novel microRNA sequences exhibited significant expression differences between these two developmental states. Owing to the increased number of sequence reads, these libraries represent the deepest miRNA sampling to date, spanning nearly six orders of magnitude of expression. The predicted targets of those miRNAs enriched in either sample shared common features. Included among the high-ranked predicted gene targets are those implicated in differentiation, cell cycle control, programmed cell death, and transcriptional regulation.

...read moreread less

1,102 citations

Journal Article•DOI•

ALLPATHS: de novo assembly of whole-genome shotgun microreads.

[...]

Jonathan Butler¹, Iain MacCallum, Michael Kleber, Ilya Shlyakhter, Matthew K. Belmonte, Eric S. Lander, Chad Nusbaum, David B. Jaffe - Show less +4 more•Institutions (1)

Broad Institute¹

A diversity profile of the human skin microbiota

TL;DR: A general method for genome assembly that can be applied to all types of DNA sequence data, not only short read data, but also conventional sequence reads is described.

...read moreread less

Abstract: New DNA sequencing technologies deliver data at dramatically lower costs but demand new analytical methods to take full advantage of the very short reads that they produce. We provide an initial, theoretical solution to the challenge of de novo assembly from whole-genome shotgun “microreads.” For 11 genomes of sizes up to 39 Mb, we generated high-quality assemblies from 80× coverage by paired 30-base simulated reads modeled after real Illumina-Solexa reads. The bacterial genomes of Campylobacter jejuni and Escherichia coli assemble optimally, yielding single perfect contigs, and larger genomes yield assemblies that are highly connected and accurate. Assemblies are presented in a graph form that retains intrinsic ambiguities such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. For both C. jejuni and E. coli, this assembly graph is a single edge encompassing the entire genome. Larger genomes produce more complicated graphs, but the vast majority of the bases in their assemblies are present in long edges that are nearly always perfect. We describe a general method for genome assembly that can be applied to all types of DNA sequence data, not only short read data, but also conventional sequence reads.

...read moreread less

880 citations

Journal Article•DOI•

[...]

Elizabeth A. Grice¹, Heidi H. Kong, Gabriel Renaud, Alice C. Young, Gerard G. Bouffard, Robert W. Blakesley, Tyra G. Wolfsberg, Maria L. Turner, Julia A. Segre - Show less +5 more•Institutions (1)

National Institutes of Health¹

New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree

TL;DR: This study of healthy human skin microbiota will serve to direct future research addressing the role of skin microbiota in health and disease, and metagenomic projects addressing the complex physiological interactions between the skin and the microbes that inhabit this environment.

...read moreread less

Abstract: The many layers and structures of the skin serve as elaborate hosts to microbes, including a diversity of commensal and pathogenic bacteria that contribute to both human health and disease. To determine the complexity and identity of the microbes inhabiting the skin, we sequenced bacterial 16S small-subunit ribosomal RNA genes isolated from the inner elbow of five healthy human subjects. This analysis revealed 113 operational taxonomic units (OTUs; "phylotypes") at the level of 97% similarity that belong to six bacterial divisions. To survey all depths of the skin, we sampled using three methods: swab, scrape, and punch biopsy. Proteobacteria dominated the skin microbiota at all depths of sampling. Interpersonal variation is approximately equal to intrapersonal variation when considering bacterial community membership and structure. Finally, we report strong similarities in the complexity and identity of mouse and human skin microbiota. This study of healthy human skin microbiota will serve to direct future research addressing the role of skin microbiota in health and disease, and metagenomic projects addressing the complex physiological interactions between the skin and the microbes that inhabit this environment.

...read moreread less

853 citations

Journal Article•DOI•

[...]

Tatiana M. Karafet¹, Fernando L. Mendez¹, Monica B. Meilerman¹, Peter A. Underhill², Stephen L. Zegura¹, Michael F. Hammer - Show less +2 more•Institutions (2)

University of Arizona¹, Stanford University²

Protein networks in disease.

TL;DR: Major changes in the topology of the parsimony tree are described and names for new and rearranged lineages within the tree following the rules presented by the Y Chromosome Consortium in 2002 are provided.

...read moreread less

Abstract: Markers on the non-recombining portion of the human Y chromosome continue to have applications in many fields including evolutionary biology, forensics, medical genetics, and genealogical reconstruction. In 2002, the Y Chromosome Consortium published a single parsimony tree showing the relationships among 153 haplogroups based on 243 binary markers and devised a standardized nomenclature system to name lineages nested within this tree. Here we present an extensively revised Y chromosome tree containing 311 distinct haplogroups, including two new major haplogroups (S and T), and incorporating approximately 600 binary markers. We describe major changes in the topology of the parsimony tree and provide names for new and rearranged lineages within the tree following the rules presented by the Y Chromosome Consortium in 2002. Several changes in the tree topology have important implications for studies of human ancestry. We also present demography-independent age estimates for 11 of the major clades in the new Y chromosome tree.

...read moreread less

831 citations

Journal Article•DOI•

[...]

Trey Ideker¹, Roded Sharan•Institutions (1)

University of California, San Diego¹

01 Apr 2008-Genome Research

TL;DR: Promising applications of protein networks to disease in four major areas are reviewed: identifying new disease genes; the study of their network properties; identifying disease-related subnetworks; and network-based disease classification.

...read moreread less

Abstract: During a decade of proof-of-principle analysis in model organisms, protein networks have been used to further the study of molecular evolution, to gain insight into the robustness of cells to perturbation, and for assignment of new protein functions. Following these analyses, and with the recent rise of protein interaction measurements in mammals, protein networks are increasingly serving as tools to unravel the molecular basis of disease. We review promising applications of protein networks to disease in four major areas: identifying new disease genes; the study of their network properties; identifying disease-related subnetworks; and network-based disease classification. Applications in infectious disease, personalized medicine, and pharmacology are also forthcoming as the available protein network information improves in quality and coverage.

...read moreread less

800 citations

Journal Article•DOI•

Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training.

[...]

Vardges Ter-Hovhannisyan¹, Alexandre Lomsadze, Yury O. Chernoff, Mark Borodovsky¹•Institutions (1)

Georgia Institute of Technology¹

01 Dec 2008-Genome Research

TL;DR: A new ab initio algorithm, GeneMark-ES version 2, that identifies protein-coding genes in fungal genomes that does not require a predetermined training set to estimate parameters of the underlying hidden Markov model (HMM).

...read moreread less

Abstract: We describe a new ab initio algorithm, GeneMark-ES version 2, that identifies protein-coding genes in fungal genomes. The algorithm does not require a predetermined training set to estimate parameters of the underlying hidden Markov model (HMM). Instead, the anonymous genomic sequence in question is used as an input for iterative unsupervised training. The algorithm extends our previously developed method tested on genomes of Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster. To better reflect features of fungal gene organization, we enhanced the intron submodel to accommodate sequences with and without branch point sites. This design enables the algorithm to work equally well for species with the kinds of variations in splicing mechanisms seen in the fungal phyla Ascomycota, Basidiomycota, and Zygomycota. Upon self-training, the intron submodel switches on in several steps to reach its full complexity. We demonstrate that the algorithm accuracy, both at the exon and the whole gene level, is favorably compared to the accuracy of gene finders that employ supervised training. Application of the new method to known fungal genomes indicates substantial improvement over existing annotations. By eliminating the effort necessary to build comprehensive training sets, the new algorithm can streamline and accelerate the process of annotation in a large number of fungal genome sequencing projects.

...read moreread less

737 citations

Journal Article•DOI•

Long noncoding RNAs in mouse embryonic stem cell pluripotency and differentiation

[...]

Marcel E. Dinger¹, Paulo P. Amaral¹, Tim R. Mercer¹, Ken C Pang², Ken C Pang¹, Stephen J. Bruce¹, Brooke Gardiner³, Brooke Gardiner¹, Marjan E. Askarian-Amiri¹, Kelin Ru¹, Giulia Soldà⁴, Giulia Soldà¹, Cas Simons¹, Susan M. Sunkin⁵, Mark L. Crowe¹, Sean M. Grimmond, Andrew C. Perkins¹, John S. Mattick¹ - Show less +14 more•Institutions (5)

University of Queensland¹, Ludwig Institute for Cancer Research², Monash University³, University of Milan⁴, Allen Institute for Brain Science⁵

A barrier nucleosome model for statistical positioning of nucleosomes throughout the yeast genome

TL;DR: The data indicate that long ncRNAs are likely to be important in processes directing pluripotency and alternative differentiation programs, in some cases through engagement of the epigenetic machinery.

...read moreread less

Abstract: The transcriptional networks that regulate embryonic stem (ES) cell pluripotency and lineage specification are the subject of considerable attention. To date such studies have focused almost exclusively on protein-coding transcripts. However, recent transcriptome analyses show that the mammalian genome contains thousands of long noncoding RNAs (ncRNAs), many of which appear to be expressed in a developmentally regulated manner. The functions of these remain untested. To identify ncRNAs involved in ES cell biology, we used a custom-designed microarray to examine the expression profiles of mouse ES cells differentiating as embryoid bodies (EBs) over a 16-d time course. We identified 945 ncRNAs expressed during EB differentiation, of which 174 were differentially expressed, many correlating with pluripotency or specific differentiation events. Candidate ncRNAs were identified for further characterization by an integrated examination of expression profiles, genomic context, chromatin state, and promoter analysis. Many ncRNAs showed coordinated expression with genomically associated developmental genes, such as Dlx1, Dlx4, Gata6, and Ecsit. We examined two novel developmentally regulated ncRNAs, Evx1as and Hoxb5/6as, which are derived from homeotic loci and share similar expression patterns and localization in mouse embryos with their associated protein-coding genes. Using chromatin immunoprecipitation, we provide evidence that both ncRNAs are associated with trimethylated H3K4 histones and histone methyltransferase MLL1, suggesting a role in epigenetic regulation of homeotic loci during ES cell differentiation. Taken together, our data indicate that long ncRNAs are likely to be important in processes directing pluripotency and alternative differentiation programs, in some cases through engagement of the epigenetic machinery.

...read moreread less

Journal Article•DOI•

[...]

Travis N. Mavrich¹, Ilya Ioshikhes², Bryan J. Venters¹, Cizhong Jiang¹, Lynn P. Tomsho¹, Ji Qi¹, Stephan C. Schuster¹, Istvan Albert, B. Franklin Pugh - Show less +5 more•Institutions (2)

Pennsylvania State University¹, Ohio State University²

De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer

TL;DR: Evidence is presented that the organization of nucleosomes throughout genes is largely a consequence of statistical packing principles, and a high-resolution genome-wide map of TFIIB locations that implicates 3' NFRs in gene looping is presented.

...read moreread less

Abstract: Most nucleosomes are well-organized at the 5' ends of S. cerevisiae genes where "-1" and "+1" nucleosomes bracket a nucleosome-free promoter region (NFR). How nucleosomal organization is specified by the genome is less clear. Here we establish and inter-relate rules governing genomic nucleosome organization by sequencing DNA from more than one million immunopurified S. cerevisiae nucleosomes (displayed at http://atlas.bx.psu.edu/). Evidence is presented that the organization of nucleosomes throughout genes is largely a consequence of statistical packing principles. The genomic sequence specifies the location of the -1 and +1 nucleosomes. The +1 nucleosome forms a barrier against which nucleosomes are packed, resulting in uniform positioning, which decays at farther distances from the barrier. We present evidence for a novel 3' NFR that is present at >95% of all genes. 3' NFRs may be important for transcription termination and anti-sense initiation. We present a high-resolution genome-wide map of TFIIB locations that implicates 3' NFRs in gene looping.

...read moreread less

Journal Article•DOI•

[...]

David Hernandez¹, Patrice Francois, Laurent Farinelli, Magne Osteras, Jacques Schrenzel - Show less +1 more•Institutions (1)

University of Geneva¹

A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning

TL;DR: This study proposes a de novo assembler software that generates a set of accurate contigs of several kilobases that cover most of the bacterial genome on the Illumina sequencing platform that produces millions of very short sequences that are 35 bases in length.

...read moreread less

Abstract: Novel high-throughput DNA sequencing technologies allow researchers to characterize a bacterial genome during a single experiment and at a moderate cost. However, the increase in sequencing throughput that is allowed by using such platforms is obtained at the expense of individual sequence read length, which must be assembled into longer contigs to be exploitable. This study focuses on the Illumina sequencing platform that produces millions of very short sequences that are 35 bases in length. We propose a de novo assembler software that is dedicated to process such data. Based on a classical overlap graph representation and on the detection of potentially spurious reads, our software generates a set of accurate contigs of several kilobases that cover most of the bacterial genome. The assembly results were validated by comparing data sets that were obtained experimentally for Staphylococcus aureus strain MW2 and Helicobacter acinonychis strain Sheeba with that of their published genomes acquired by conventional sequencing of 1.5- to 3.0-kb fragments. We also provide indications that the broad coverage achieved by high-throughput sequencing might allow for the detection of clonal polymorphisms in the set of DNA molecules being sequenced.

...read moreread less

Journal Article•DOI•

[...]

Anton Valouev¹, Jeffrey Ichikawa, Thaisan Tonthat, Jeremy R. Stuart, Swati Ranade, Heather E. Peckham, Kathy Zeng, Joel A. Malek, Gina Costa, Kevin McKernan, Arend Sidow, Andrew Fire, Steven M. Johnson - Show less +9 more•Institutions (1)

Stanford University¹

Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps

TL;DR: These analyses provide a global view of the chromatin architecture of a multicellular animal at extremely high density and resolution and release this data set, via the UCSC Genome Browser, as a resource for the high-resolution analysis of chromatin conformation and DNA accessibility at individual loci within the C. elegans genome.

...read moreread less

Abstract: Using the massively parallel technique of sequencing by oligonucleotide ligation and detection (SOLiD; Applied Biosystems), we have assessed the in vivo positions of more than 44 million putative nucleosome cores in the multicellular genetic model organism Caenorhabditis elegans. These analyses provide a global view of the chromatin architecture of a multicellular animal at extremely high density and resolution. While we observe some degree of reproducible positioning throughout the genome in our mixed stage population of animals, we note that the major chromatin feature in the worm is a diversity of allowed nucleosome positions at the vast majority of individual loci. While absolute positioning of nucleosomes can vary substantially, relative positioning of nucleosomes (in a repeated array structure likely to be maintained at least in part by steric constraints) appears to be a significant property of chromatin structure. The high density of nucleosomal reads enabled a substantial extension of previous analysis describing the usage of individual oligonucleotide sequences along the span of the nucleosome core and linker. We release this data set, via the UCSC Genome Browser, as a resource for the high-resolution analysis of chromatin conformation and DNA accessibility at individual loci within the C. elegans genome.

...read moreread less

Journal Article•DOI•

[...]

Haibao Tang¹, Xiyin Wang², Xiyin Wang³, John E. Bowers³, Ray Ming⁴, Maqsudul Alam, Andrew H. Paterson³ - Show less +3 more•Institutions (4)

Plant Genome Mapping Laboratory¹, North China University of Science and Technology², University of Georgia³, University of Illinois at Urbana–Champaign⁴

01 Dec 2008-Genome Research

TL;DR: It is shown that a shared ancient hexaploidy event (or perhaps two roughly concurrent genome fusions) can be inferred based on the sequences from several divergent plant genomes, laying the foundation for approximating the number and arrangement of genes in the last universal common ancestor of angiosperms.

...read moreread less

Abstract: Large-scale (segmental or whole) genome duplication has been recurring in angiosperm evolution. Subsequent gene loss and rearrangements further affect gene copy numbers and fractionate ancestral gene linkages across multiple chromosomes. The fragmented "multiple-to-multiple" correspondences resulting from this distinguishing feature of angiosperm evolution complicates comparative genomic studies. Using a robust computational framework that combines information from multiple orthologous and duplicated regions to construct local syntenic networks, we show that a shared ancient hexaploidy event (or perhaps two roughly concurrent genome fusions) can be inferred based on the sequences from several divergent plant genomes. This "paleo-hexaploidy" clearly preceded the rosid-asterid split, but it remains equivocal whether it also affected monocots. The model resulting from our multi-alignments lays the foundation for approximating the number and arrangement of genes in the last universal common ancestor of angiosperms. Comparative analysis of inferred homologous genes derived from this model shows patterns of preferential gene retention or loss after polyploidy and reveals large variability of nucleotide substitution rates among plant nuclear genomes.

...read moreread less

Journal Article•DOI•

Evolution of the mammalian transcription factor binding repertoire via transposable elements

[...]

Guillaume Bourque¹, Bernard Leong, Vinsensius B. Vega, Xi Chen, Yen Ling Lee, Kandhadayar G. Srinivasan, Joon-Lin Chew, Yijun Ruan, Chia-Lin Wei, Huck-Hui Ng, Edison T. Liu - Show less +7 more•Institutions (1)

Genome Institute of Singapore¹

01 Nov 2008-Genome Research

TL;DR: It is established that these repeat-associated binding sites (RABS) have been associated with significant regulatory expansions throughout the mammalian phylogeny and that transposable elements play an important role in expanding the repertoire of binding sites.

...read moreread less

Abstract: Identification of lineage-specific innovations in genomic control elements is critical for understanding transcriptional regulatory networks and phenotypic heterogeneity. We analyzed, from an evolutionary perspective, the binding regions of seven mammalian transcription factors (ESR1, TP53, MYC, RELA, POU5F1, SOX2, and CTCF) identified on a genome-wide scale by different chromatin immunoprecipitation approaches and found that only a minority of sites appear to be conserved at the sequence level. Instead, we uncovered a pervasive association with genomic repeats by showing that a large fraction of the bona fide binding sites for five of the seven transcription factors (ESR1, TP53, POU5F1, SOX2, and CTCF) are embedded in distinctive families of transposable elements. Using the age of the repeats, we established that these repeat-associated binding sites (RABS) have been associated with significant regulatory expansions throughout the mammalian phylogeny. We validated the functional significance of these RABS by showing that they are over-represented in proximity of regulated genes and that the binding motifs within these repeats have undergone evolutionary selection. Our results demonstrate that transcriptional regulatory networks are highly dynamic in eukaryotic genomes and that transposable elements play an important role in expanding the repertoire of binding sites.

...read moreread less

Journal Article•DOI•

Sequencing of natural strains of Arabidopsis thaliana with short reads

[...]

Stephan Ossowski¹, Korbinian Schneeberger¹, Richard M. Clark, Christa Lanz¹, Norman Warthmann¹, Detlef Weigel¹ - Show less +2 more•Institutions (1)

Max Planck Society¹

01 Dec 2008-Genome Research

TL;DR: The Velvet assembler was incorporated into a targeted de novo assembly method and yielded 10,921 high-confidence contigs that were anchored to flanking sequences and harbored indels as large as 641 bp, and the methods are broadly applicable for polymorphism discovery in moderate to large genomes even at highly diverged loci.

...read moreread less

Abstract: Whole-genome hybridization studies have suggested that the nuclear genomes of accessions (natural strains) of Arabidopsis thaliana can differ by several percent of their sequence. To examine this variation, and as a first step in the 1001 Genomes Project for this species, we produced 15- to 25-fold coverage in Illumina sequencing-by-synthesis (SBS) reads for the reference accession, Col-0, and two divergent strains, Bur-0 and Tsu-1. We aligned reads to the reference genome sequence to assess data quality metrics and to detect polymorphisms. Alignments revealed 823,325 unique single nucleotide polymorphisms (SNPs) and 79,961 unique 1- to 3-bp indels in the divergent accessions at a specificity of >99%, and over 2000 potential errors in the reference genome sequence. We also identified >3.4 Mb of the Bur-0 and Tsu-1 genomes as being either extremely dissimilar, deleted, or duplicated relative to the reference genome. To obtain sequences for these regions, we incorporated the Velvet assembler into a targeted de novo assembly method. This approach yielded 10,921 high-confidence contigs that were anchored to flanking sequences and harbored indels as large as 641 bp. Our methods are broadly applicable for polymorphism discovery in moderate to large genomes even at highly diverged loci, and we established by subsampling the Illumina SBS coverage depth required to inform a broad range of functional and evolutionary studies. Our pipeline for aligning reads and predicting SNPs and indels, SHORE, is available for download at http://1001genomes.org.

...read moreread less

Journal Article•DOI•

Insights from the complete genome sequence of Mycobacterium marinum on the evolution of Mycobacterium tuberculosis

[...]

Timothy P. Stinear¹, Torsten Seemann², Paul Harrison², Grant A. Jenkin², John K. Davies², Paul D R Johnson³, Zahra Abdellah⁴, Claire Arrowsmith⁴, Tracey Chillingworth⁴, Carol Churcher⁴, Kay Clarke⁴, Ann Cronin⁴, Paul Davis⁴, Ian Goodhead⁴, Nancy Holroyd⁴, Kay Jagels⁴, Angela Lord⁴, Sharon Moule⁴, Karen Mungall⁴, Halina Norbertczak⁴, Michael A. Quail⁴, Ester Rabbinowitsch⁴, Danielle Walker⁴, Brian White⁴, Sally Whitehead⁴, Pamela L. C. Small⁵, Roland Brosch⁶, Lalita Ramakrishnan⁷, Michael A. Fischbach⁸, Julian Parkhill⁴, Stewart T. Cole⁹ - Show less +27 more•Institutions (9)

Monash University, Clayton campus¹, Monash University², University of Melbourne³, Wellcome Trust Sanger Institute⁴, University of Tennessee⁵, Pasteur Institute⁶, University of Washington⁷, Broad Institute⁸, École Polytechnique Fédérale de Lausanne⁹

The amphioxus genome illuminates vertebrate origins and cephalochordate biology

TL;DR: The genome of the M strain of M. marinum comprises a 6,636,827-bp circular chromosome with 5424 CDS, 10 prophages, and a 23-kb mercury-resistance plasmid as discussed by the authors.

...read moreread less

Abstract: Mycobacterium marinum, a ubiquitous pathogen of fish and amphibia, is a near relative of Mycobacterium tuberculosis, the etiologic agent of tuberculosis in humans. The genome of the M strain of M. marinum comprises a 6,636,827-bp circular chromosome with 5424 CDS, 10 prophages, and a 23-kb mercury-resistance plasmid. Prominent features are the very large number of genes (57) encoding polyketide synthases (PKSs) and nonribosomal peptide synthases (NRPSs) and the most extensive repertoire yet reported of the mycobacteria-restricted PE and PPE proteins, and related-ESX secretion systems. Some of the NRPS genes comprise a novel family and seem to have been acquired horizontally. M. marinum is used widely as a model organism to study M. tuberculosis pathogenesis, and genome comparisons confirmed the close genetic relationship between these two species, as they share 3000 orthologs with an average amino acid identity of 85%. Comparisons with the more distantly related Mycobacterium avium subspecies paratuberculosis and Mycobacterium smegmatis reveal how an ancestral generalist mycobacterium evolved into M. tuberculosis and M. marinum. M. tuberculosis has undergone genome downsizing and extensive lateral gene transfer to become a specialized pathogen of humans and other primates without retaining an environmental niche. M. marinum has maintained a large genome so as to retain the capacity for environmental survival while becoming a broad host range pathogen that produces disease strikingly similar to M. tuberculosis. The work described herein provides a foundation for using M. marinum to better understand the determinants of pathogenesis of tuberculosis.

...read moreread less

Journal Article•DOI•

[...]

Linda Z. Holland¹, Ricard Albalat², Kaoru Azumi³, Èlia Benito-Gutiérrez², Matthew J. Blow⁴, Marianne Bronner-Fraser⁵, Frédéric Brunet⁶, Thomas Butts⁷, Simona Candiani⁸, Larry J. Dishaw⁹, Larry J. Dishaw¹⁰, David E. K. Ferrier⁷, David E. K. Ferrier¹¹, Jordi Garcia-Fernàndez², Jeremy J. Gibson-Brown¹², Carmela Gissi¹³, Adam Godzik¹⁴, Finn Hallböök¹⁵, Dan Hirose¹⁶, Kazuyoshi Hosomichi¹⁷, Tetsuro Ikuta¹⁶, Hidetoshi Inoko¹⁷, Masanori Kasahara³, Jun Kasamatsu³, Takeshi Kawashima¹⁸, Takeshi Kawashima¹⁹, Ayuko Kimura²⁰, Masaaki Kobayashi¹⁶, Zbynek Kozmik²¹, Kaoru Kubokawa²⁰, Vincent Laudet⁶, Gary W. Litman⁹, Gary W. Litman¹⁰, Alice C. McHardy²², Alice C. McHardy²³, Daniel Meulemans⁵, Masaru Nonaka²⁰, Robert Piotr Olinski¹⁵, Zeev Pancer²⁴, Len A. Pennacchio⁴, Mario Pestarino⁸, Jonathan P. Rast²⁵, Isidore Rigoutsos²², Marc Robinson-Rechavi²⁶, Graeme J. Roch²⁷, Hidetoshi Saiga¹⁶, Yasunori Sasakura²⁸, Masanobu Satake, Yutaka Satou²⁹, Michael Schubert⁶, Nancy M. Sherwood²⁷, Takashi Shiina¹⁷, Naohito Takatori¹⁶, Naohito Takatori³⁰, Javier Tello²⁷, Pavel Vopalensky²¹, Shuichi Wada³¹, Anlong Xu³², Yuzhen Ye¹⁴, Keita Yoshida¹⁶, Fumiko Yoshizaki³³, Jr-Kai Yu⁵, Qing Zhang¹⁴, Christian M. Zmasek¹⁴, Pieter J. de Jong³⁴, Kazutoyo Osoegawa³⁴, Nicholas H. Putnam¹⁸, Daniel S. Rokhsar¹⁸, Daniel S. Rokhsar⁴, Noriyuki Satoh¹⁹, Noriyuki Satoh²⁹, Peter W. H. Holland⁷ - Show less +68 more•Institutions (34)

Scripps Institution of Oceanography¹, University of Barcelona², Hokkaido University³, United States Department of Energy⁴, California Institute of Technology⁵, École normale supérieure de Lyon⁶, University of Oxford⁷, University of Genoa⁸, University of South Florida⁹, Johns Hopkins University¹⁰, University of St Andrews¹¹, Washington University in St. Louis¹², University of Milan¹³, Discovery Institute¹⁴, Uppsala University¹⁵, Tokyo Metropolitan University¹⁶, Tokai University¹⁷, University of California, Berkeley¹⁸, Okinawa Institute of Science and Technology¹⁹, University of Tokyo²⁰, Academy of Sciences of the Czech Republic²¹, IBM²², Max Planck Society²³, University of Maryland, College Park²⁴, University of Toronto²⁵, University of Lausanne²⁶, University of Victoria²⁷, University of Tsukuba²⁸, Kyoto University²⁹, Osaka University³⁰, Nagahama Institute of Bio-Science and Technology³¹, Sun Yat-sen University³², Juntendo University³³, Children's Hospital Oakland Research Institute³⁴

Short read fragment assembly of bacterial genomes

TL;DR: The results indicate that the amphioxus genome is elemental to an understanding of the biology and evolution of nonchordate deuterostomes, invertebrate chordates, and vertebrates.

...read moreread less

Abstract: Cephalochordates, urochordates, and vertebrates evolved from a common ancestor over 520 million years ago To improve our understanding of chordate evolution and the origin of vertebrates, we intensively searched for particular genes, gene families, and conserved noncoding elements in the sequenced genome of the cephalochordate Branchiostoma floridae, commonly called amphioxus or lancelets Special attention was given to homeobox genes, opsin genes, genes involved in neural crest development, nuclear receptor genes, genes encoding components of the endocrine and immune systems, and conserved cis-regulatory enhancers The amphioxus genome contains a basic set of chordate genes involved in development and cell signaling, including a fifteenth Hox gene This set includes many genes that were co-opted in vertebrates for new roles in neural crest development and adaptive immunity However, where amphioxus has a single gene, vertebrates often have two, three, or four paralogs derived from two whole-genome duplication events In addition, several transcriptional enhancers are conserved between amphioxus and vertebrates--a very wide phylogenetic distance In contrast, urochordate genomes have lost many genes, including a diversity of homeobox families and genes involved in steroid hormone function The amphioxus genome also exhibits derived features, including duplications of opsins and genes proposed to function in innate immunity and endocrine systems Our results indicate that the amphioxus genome is elemental to an understanding of the biology and evolution of nonchordate deuterostomes, invertebrate chordates, and vertebrates

...read moreread less

Journal Article•DOI•

[...]

Mark Chaisson¹, Pavel A. Pevzner•Institutions (1)

University of California, San Diego¹

01 Feb 2008-Genome Research

TL;DR: A new Eulerian assembler is presented that generates nearly optimal short read assemblies of bacterial genomes and an approach to assemble reads in the case of the popular hybrid protocol when short and long Sanger-based reads are combined.

...read moreread less

Abstract: In the last year, high-throughput sequencing technologies have progressed from proof-of-concept to production quality. While these methods produce high-quality reads, they have yet to produce reads comparable in length to Sanger-based sequencing. Current fragment assembly algorithms have been implemented and optimized for mate-paired Sanger-based reads, and thus do not perform well on short reads produced by short read technologies. We present a new Eulerian assembler that generates nearly optimal short read assemblies of bacterial genomes and describe an approach to assemble reads in the case of the popular hybrid protocol when short and long Sanger-based reads are combined.

...read moreread less

Journal Article•DOI•

Deep sequencing of tomato short RNAs identifies microRNAs targeting genes involved in fruit ripening.

[...]

Simon Moxon¹, Runchun Jing¹, György Szittya, Frank Schwach¹, Rachel L. Rusholme Pilcher¹, Vincent Moulton¹, Tamas Dalmay¹ - Show less +3 more•Institutions (1)

University of East Anglia¹

01 Oct 2008-Genome Research

TL;DR: This study uses high-throughput pyrosequencing to identify conserved and nonconserved miRNAs and other short RNAs in tomato fruit and leaf and raises the possibility that fruit development and ripening may be under miRNA regulation.

...read moreread less

Abstract: In plants there are several classes of 21–24-nt short RNAs that regulate gene expression. The most conserved class is the microRNAs (miRNAs), although some miRNAs are found only in specific species. We used high-throughput pyrosequencing to identify conserved and nonconserved miRNAs and other short RNAs in tomato fruit and leaf. Several conserved miRNAs showed tissue-specific expression, which, combined with target gene validation results, suggests that miRNAs may play a role in fleshy fruit development. We also identified four new nonconserved miRNAs. One of the validated targets of a novel miRNA is a member of the CTR family involved in fruit ripening. However, 62 predicted targets showing near perfect complementarity to potential new miRNAs did not validate experimentally. This suggests that target prediction of plant short RNAs could have a high false-positive rate and must therefore be validated experimentally. We also found short RNAs from a Solanaceae-specific foldback transposon, which showed a miRNA/miRNA*-like distribution, suggesting that this element may function as a miRNA gene progenitor. The other Solanaceae-specific class of short RNA was derived from an endogenous pararetrovirus sequence inserted into the tomato chromosomes. This study opens a new avenue in the field of fleshy fruit biology by raising the possibility that fruit development and ripening may be under miRNA regulation.

...read moreread less

Journal Article•DOI•

Comprehensive high-throughput arrays for relative methylation (CHARM).

[...]

Rafael A. Irizarry¹, Christine Ladd-Acosta¹, Benilton S. Carvalho¹, Hao Wu¹, Sheri A. Brandenburg¹, Jeffrey A. Jeddeloh, Bo Wen¹, Andrew P. Feinberg - Show less +4 more•Institutions (1)

Johns Hopkins University¹

Two strategies for gene regulation by promoter nucleosomes

TL;DR: This study found that with an original array design strategy using tiling arrays and statistical procedures that average information from neighboring genomic locations, much improved specificity and sensitivity could be achieved, e.g., approximately 100% sensitivity at 90% specificity with McrBC.

...read moreread less

Abstract: This study was originally conceived to test in a rigorous way the specificity of three major approaches to high-throughput array-based DNA methylation analysis: (1) MeDIP, or methylated DNA immunoprecipitation, an example of antibody-mediated methyl-specific fractionation; (2) HELP, or HpaII tiny fragment enrichment by ligation-mediated PCR, an example of differential amplification of methylated DNA; and (3) fractionation by McrBC, an enzyme that cuts most methylated DNA. These results were validated using 1466 Illumina methylation probes on the GoldenGate methylation assay and further resolved discrepancies among the methods through quantitative methylation pyrosequencing analysis. While all three methods provide useful information, there were significant limitations to each, specifically bias toward CpG islands in MeDIP, relatively incomplete coverage in HELP, and location imprecision in McrBC. However, we found that with an original array design strategy using tiling arrays and statistical procedures that average information from neighboring genomic locations, much improved specificity and sensitivity could be achieved, e.g., approximately 100% sensitivity at 90% specificity with McrBC. We term this approach "comprehensive high-throughput arrays for relative methylation" (CHARM). While this approach was applied to McrBC analysis, the array design and computational algorithms are fractionation method-independent and make this a simple, general, relatively inexpensive tool suitable for genome-wide analysis, and in which individual samples can be assayed reliably at very high density, allowing locus-level genome-wide epigenetic discrimination of individuals, not just groups of samples. Furthermore, unlike the other approaches, CHARM is highly quantitative, a substantial advantage in application to the study of human disease.

...read moreread less

Journal Article•DOI•

[...]

Itay Tirosh¹, Naama Barkai•Institutions (1)

Weizmann Institute of Science¹

Comparative genome analysis of Salmonella Enteritidis PT4 and Salmonella Gallinarum 287/91 provides insights into evolutionary and host adaptation pathways.

TL;DR: The connection between patterns of nucleosome occupancy and the capacity to modulate gene expression upon changing conditions, i.e., transcriptional plasticity, is examined and two distinct strategies for gene regulation by chromatin are suggested, which are selectively employed by different genes.

...read moreread less

Abstract: Chromatin structure is central for the regulation of gene expression, but its genome-wide organization is only beginning to be understood. Here, we examine the connection between patterns of nucleosome occupancy and the capacity to modulate gene expression upon changing conditions, i.e., transcriptional plasticity. By analyzing genome-wide data of nucleosome positioning in yeast, we find that the presence of nucleosomes close to the transcription start site is associated with high transcriptional plasticity, while nucleosomes at more distant upstream positions are negatively correlated with transcriptional plasticity. Based on this, we identify two typical promoter structures associated with low or high plasticity, respectively. The first class is characterized by a relatively large nucleosome-free region close to the start site coupled with well-positioned nucleosomes further upstream, whereas the second class displays a more evenly distributed and dynamic nucleosome positioning, with high occupancy close to the start site. The two classes are further distinguished by multiple promoter features, including histone turnover, binding site locations, H2A.Z occupancy, expression noise, and expression diversity. Analysis of nucleosome positioning in human promoters reproduces the main observations. Our results suggest two distinct strategies for gene regulation by chromatin, which are selectively employed by different genes.

...read moreread less

Journal Article•DOI•

[...]

Nicholas R. Thomson¹, Debra J. Clayton, Daniel Windhorst, Georgios S. Vernikos¹, Susanne Davidson, Carol Churcher¹, Michael A. Quail¹, Mark P. Stevens², Michael Jones³, Michael Watson², Andy Barron¹, Abigail N. Layton, Derek Pickard¹, Robert A. Kingsley¹, Alex Bignell¹, Louise Clark¹, Barbara Harris¹, Doug Ormond¹, Zahra Abdellah¹, Karen Brooks¹, Inna Cherevach¹, Tracey Chillingworth¹, John Woodward¹, Halina Norberczak¹, Angela Lord¹, Claire Arrowsmith¹, Kay Jagels¹, Sharon Moule¹, Karen Mungall¹, Mandy Sanders¹, Sally Whitehead¹, José A. Chabalgoity⁴, Duncan J. Maskell⁵, Tom J. Humphrey, Mark Roberts⁶, Paul A. Barrow, Gordon Dougan¹, Julian Parkhill¹ - Show less +34 more•Institutions (6)

Wellcome Trust Sanger Institute¹, University of Edinburgh², University of Nottingham³, University of the Republic⁴, University of Cambridge⁵, University of Glasgow⁶

01 Oct 2008-Genome Research

TL;DR: Genome comparisons between these and other Salmonella isolates indicate that S. Gallinarum 287/91 is a recently evolved descendent of S. Enteritidis, and it is proposed that experimental analysis in chickens and mice could provide an experimentally tractable route toward unraveling the genetic basis of host adaptation in S. enterica.

...read moreread less

Abstract: We have determined the complete genome sequences of a host-promiscuous Salmonella enterica serovar Enteritidis PT4 isolate P125109 and a chicken-restricted Salmonella enterica serovar Gallinarum isolate 287/91. Genome comparisons between these and other Salmonella isolates indicate that S. Gallinarum 287/91 is a recently evolved descendent of S. Enteritidis. Significantly, the genome of S. Gallinarum has undergone extensive degradation through deletion and pseudogene formation. Comparison of the pseudogenes in S. Gallinarum with those identified previously in other host-adapted bacteria reveals the loss of many common functional traits and provides insights into possible mechanisms of host and tissue adaptation. We propose that experimental analysis in chickens and mice of S. Enteritidis-harboring mutations in functional homologs of the pseudogenes present in S. Gallinarum could provide an experimentally tractable route toward unraveling the genetic basis of host adaptation in S. enterica.

...read moreread less

Journal Article•DOI•

An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs)

[...]

Vardhman K. Rakyan¹, Thomas A. Down, Natalie P. Thorne, Paul Flicek, Eugene Kulesha, Stefan Gräf, Eleni M. Tomazou, Liselotte Bäckdahl, Nathan R. Johnson, Marlis Herberth, Kevin L. Howe, David K. Jackson, Marcos Mateo Miretti, Heike Fiegler, John C. Marioni, Ewan Birney, Tim Hubbard, Nigel P. Carter, Simon Tavaré, Stephan Beck - Show less +16 more•Institutions (1)

Queen Mary University of London¹

A diverse set of microRNAs and microRNA-like small RNAs in developing rice grains

TL;DR: The utility and implications of the findings with respect to the regulatory potential of regions with varied CpG density, gene expression, transcription factor motifs, gene ontology, and correlation with other epigenetic marks such as histone modifications are discussed.

...read moreread less

Abstract: We report a novel resource (methylation profiles of DNA, or mPod) for human genome-wide tissue-specific DNA methylation profiles. mPod consists of three fully integrated parts, genome-wide DNA methylation reference profiles of 13 normal somatic tissues, placenta, sperm, and an immortalized cell line, a visualization tool that has been integrated with the Ensembl genome browser and a new algorithm for the analysis of immunoprecipitation-based DNA methylation profiles. We demonstrate the utility of our resource by identifying the first comprehensive genome-wide set of tissue-specific differentially methylated regions (tDMRs) that may play a role in cellular identity and the regulation of tissue-specific genome function. We also discuss the implications of our findings with respect to the regulatory potential of regions with varied CpG density, gene expression, transcription factor motifs, gene ontology, and correlation with other epigenetic marks such as histone modifications.

...read moreread less

Journal Article•DOI•

[...]

Qian-Hao Zhu¹, Andrew Spriggs, Louisa Matthew, Longjiang Fan, Gavin Kennedy, Frank Gubler, Chris A. Helliwell - Show less +3 more•Institutions (1)

Commonwealth Scientific and Industrial Research Organisation¹

A gene expression network model of type 2 diabetes links cell cycle regulation in islets with diabetes susceptibility

TL;DR: A putative mirtron is identified, indicating that plants may also use spliced introns as a source of miRNAs, and a miRNA-like long hairpin is identified that generates phased 21 nt small RNAs, strongly expressed in developing grains, and show that these smallRNAs act in trans to cleave target mRNAs.

...read moreread less

Abstract: Endogenous small RNAs, including microRNAs (miRNAs) and short-interfering RNAs (siRNAs), function as post-transcriptional or transcriptional regulators in plants. miRNA function is essential for normal plant development and therefore is likely to be important in the growth of the rice grain. To investigate the roles of miRNAs in rice grain development, we carried out deep sequencing of the small RNA populations of rice grains at two developmental stages. In a data set of ∼5.5 million sequences, we found representatives of all 20 conserved plant miRNA families. We used an approach based on the presence of miRNA and miRNA* sequences to identify 39 novel, nonconserved rice miRNA families expressed in grains. Cleavage of predicted target mRNAs was confirmed for a number of the new miRNAs. We identified a putative mirtron, indicating that plants may also use spliced introns as a source of miRNAs. We also identified a miRNA-like long hairpin that generates phased 21 nt small RNAs, strongly expressed in developing grains, and show that these small RNAs act in trans to cleave target mRNAs. Comparison of the population of miRNAs and miRNA-like siRNAs in grains to those in other parts of the rice plant reveals that many are expressed in an organ-specific manner.

...read moreread less

Journal Article•DOI•

[...]

Mark P. Keller¹, YounJeong Choi, Ping Wang, Dawn Belt Davis, Mary E. Rabaglia, Angie T. Oler, Donald S. Stapleton, Carmen Argmann, Kathryn L. Schueler, Seve Edwards, H Adam Steinberg, Elias Chaibub Neto, Robert R. Kleinhanz, Scott Turner, Marc K. Hellerstein, Eric E. Schadt, Brian S. Yandell, Christina Kendziorski, Alan D. Attie¹ - Show less +15 more•Institutions (1)

University of Wisconsin-Madison¹

Genomic analysis of the immune gene repertoire of amphioxus reveals extraordinary innate complexity and diversity

TL;DR: A strong correlation is found between (2)H(2)O incorporation into islet DNA in vivo and the expression pattern of the cell cycle module and the pattern is highly correlated with that of several individual genes in insulin target tissues, including Igf2, which has been shown to promote beta-cell proliferation.

...read moreread less

Abstract: Insulin resistance is necessary but not sufficient for the development of type 2 diabetes. Diabetes results when pancreatic beta-cells fail to compensate for insulin resistance by increasing insulin production through an expansion of beta-cell mass or increased insulin secretion. Communication between insulin target tissues and beta-cells may initiate this compensatory response. Correlated changes in gene expression between tissues can provide evidence for such intercellular communication. We profiled gene expression in six tissues of mice from an obesity-induced diabetes-resistant and a diabetes-susceptible strain before and after the onset of diabetes. We studied the correlation structure of mRNA abundance and identified 105 co-expression gene modules. We provide an interactive gene network model showing the correlation structure between the expression modules within and among the six tissues. This resource also provides a searchable database of gene expression profiles for all genes in six tissues in lean and obese diabetes-resistant and diabetes-susceptible mice, at 4 and 10 wk of age. A cell cycle regulatory module in islets predicts diabetes susceptibility. The module predicts islet replication; we found a strong correlation between (2)H(2)O incorporation into islet DNA in vivo and the expression pattern of the cell cycle module. This pattern is highly correlated with that of several individual genes in insulin target tissues, including Igf2, which has been shown to promote beta-cell proliferation, suggesting that these genes may provide a link between insulin resistance and beta-cell proliferation.

...read moreread less

Journal Article•DOI•

[...]

Shengfeng Huang¹, Shaochun Yuan, Lei Guo, Yanhong Yu, Jun Li, Tao Wu, Tong Liu, Manyi Yang, Kui Wu, Huiling Liu, Jin Ge, Yingcai Yu, Huiqing Huang, Meiling Dong, Cuiling Yu, Shangwu Chen, Anlong Xu - Show less +13 more•Institutions (1)

Sun Yat-sen University¹

Genomics, transcriptomics, and peptidomics of neuropeptides and protein hormones in the red flour beetle Tribolium castaneum

TL;DR: The first comprehensive genomic survey of the immune gene repertoire of the Amphioxus Branchiostoma floridae suggests that the amphioxus, a species without vertebrate-type adaptive immunity, holds extraordinary innate complexity and diversity.

...read moreread less

Abstract: It has been speculated that before vertebrates evolved somatic diversity-based adaptive immunity, the germline-encoded diversity of innate immunity may have been more developed. Amphioxus occupies the basal position of the chordate phylum and hence is an important reference to the evolution of vertebrate immunity. Here we report the first comprehensive genomic survey of the immune gene repertoire of the amphioxus Branchiostoma floridae. It has been reported that the purple sea urchin has a vastly expanded innate receptor repertoire not previously seen in other species, which includes 222 toll-like receptors (TLRs), 203 NOD/NALP-like receptors (NLRs), and 218 scavenger receptors (SRs). We discovered that the amphioxus genome contains comparable expansion with 71 TLR gene models, 118 NLR models, and 270 SR models. Amphioxus also expands other receptor-like families, including 1215 C-type lectin models, 240 LRR and IGcam-containing models, 1363 other LRR-containing models, 75 C1q-like models, 98 ficolin-like models, and hundreds of models containing complement-related domains. The expansion is not restricted to receptors but is likely to extend to intermediate signal transducers because there are 58 TIR adapter-like models, 36 TRAF models, 44 initiator caspase models, and 541 death-fold domain-containing models in the genome. Amphioxus also has a sophisticated TNF system and a complicated complement system not previously seen in other invertebrates. Besides the increase of gene number, domain combinations of immune proteins are also increased. Altogether, this survey suggests that the amphioxus, a species without vertebrate-type adaptive immunity, holds extraordinary innate complexity and diversity.

...read moreread less

Journal Article•DOI•

[...]

Bin Li¹, Reinhard Predel, Susanne Neupert, Frank Hauser, Yoshiaki Tanaka, Giuseppe Cazzamali, Michael Williamson, Yasuyuki Arakane, Peter Verleyen, Liliane Schoofs, Joachim Schachtner, Cornelis J. P. Grimmelikhuijzen, Yoonseong Park - Show less +9 more•Institutions (1)

Kansas State University¹

01 Jan 2008-Genome Research

TL;DR: The authors' analysis of Tribolium indicates that, during insect evolution, genes for neuropeptides and protein hormones are often duplicated or lost.

...read moreread less

Abstract: Neuropeptides and protein hormones are ancient molecules that mediate cell-to-cell communication. The whole genome sequence from the red flour beetle Tribolium castaneum, along with those from other insect species, provides an opportunity to study the evolution of the genes encoding neuropeptide and protein hormones. We identified 41 of these genes in the Tribolium genome by using a combination of bioinformatic and peptidomic approaches. These genes encode >80 mature neuropeptides and protein hormones, 49 peptides of which were experimentally identified by peptidomics of the central nervous system and other neuroendocrine organs. Twenty-three genes have orthologs in Drosophila melanogaster: Sixteen genes in five different groups are likely the result of recent gene expansions during beetle evolution. These five groups contain peptides related to antidiuretic factor-b (ADF-b), CRF-like diuretic hormone (DH37 and DH47 of Tribolium), adipokinetic hormone (AKH), eclosion hormone, and insulin-like peptide. In addition, we found a gene encoding an arginine-vasopressin-like (AVPL) peptide and one for its receptor. Both genes occur only in Tribolium and not in other holometabolous insects with a sequenced genome. The presence of many additional osmoregulatory peptides in Tribolium agrees well with its ability to live in very dry surroundings. In contrast to these extra genes, there are at least nine neuropeptide genes missing in Tribolium, including the genes encoding the prepropeptides for corazonin, kinin, and allatostatin-A. The cognate receptor genes for these three peptides also appear to be absent in the Tribolium genome. Our analysis of Tribolium indicates that, during insect evolution, genes for neuropeptides and protein hormones are often duplicated or lost.

...read moreread less

Journal Article•DOI•

Quality scores and SNP detection in sequencing-by-synthesis systems

[...]

William Brockman¹, Pablo Alvarez, Sarah Young, Manuel Garber, Georgia Giannoukos, William Lee, Carsten Russ, Eric S. Lander, Chad Nusbaum, David B. Jaffe - Show less +6 more•Institutions (1)

Broad Institute¹