Correlation networks are increasingly being used in bioinformatics applications For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets These methods have been successfully applied in various biological contexts, eg cancer, mouse genetics, yeast genetics, and analysis of brain imaging data While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software Along with the R package we also present R software tutorials While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings The WGCNA package provides R functions for weighted correlation network analysis, eg co-expression network analysis of gene expression data The R package along with its source code and additional material are freely available at http://wwwgeneticsuclaedu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA
 

/pdf/wgcna-an-r-package-for-weighted-correlation-network-analysis-4vjh9rn783.pdf

WGCNA: an R package for weighted correlation network analysis.

High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.

/pdf/differential-expression-analysis-for-sequence-count-data-1txizymire.pdf

Differential expression analysis for sequence count data.

XI. STRATEGIES FOR IMPROVING DIABETES CARE D iabetes is a chronic illness that requires continuing medical care and patient self-management education to prevent acute complications and to reduce the risk of long-term complications. Diabetes care is complex and requires that many issues, beyond glycemic control, be addressed. A large body of evidence exists that supports a range of interventions to improve diabetes outcomes. These standards of care are intended to provide clinicians, patients, researchers, payors, and other interested individuals with the components of diabetes care, treatment goals, and tools to evaluate the quality of care. While individual preferences, comorbidities, and other patient factors may require modification of goals, targets that are desirable for most patients with diabetes are provided. These standards are not intended to preclude more extensive evaluation and management of the patient by other specialists as needed. For more detailed information, refer to Bode (Ed.): Medical Management of Type 1 Diabetes (1), Burant (Ed): Medical Management of Type 2 Diabetes (2), and Klingensmith (Ed): Intensive Diabetes Management (3). The recommendations included are diagnostic and therapeutic actions that are known or believed to favorably affect health outcomes of patients with diabetes. A grading system (Table 1), developed by the American Diabetes Association (ADA) and modeled after existing methods, was utilized to clarify and codify the evidence that forms the basis for the recommendations. The level of evidence that supports each recommendation is listed after each recommendation using the letters A, B, C, or E.

Standards of Medical Care in Diabetes

New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods.

/pdf/voom-precision-weights-unlock-linear-model-analysis-tools-5bgw8lj2b8.pdf

voom: precision weights unlock linear model analysis tools for RNA-seq read counts

The genomes of many animals, plants and fungi are tagged by methylation of DNA cytosine. To understand the biological significance of this epigenetic mark it is essential to know where in the genome it is located. New techniques are making it easier to map DNA methylation patterns on a large scale and the results have already provided surprises. In particular, the conventional view that DNA methylation functions predominantly to irreversibly silence transcription is being challenged. Not only is promoter methylation often highly dynamic during development, but many organisms also seem to target DNA methylation specifically to the bodies of active genes.

https://birdlab.bio.ed.ac.uk/bird/sites/sbsweb2.bio.ed.ac.uk.bird/files/20.pdf

DNA methylation landscapes: provocative insights from epigenomics

(2007). Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Journal of the American Statistical Association: Vol. 102, No. 477, pp. 388-389.

Bioinformatics and Computational Biology Solutions Using R and Bioconductor

Here are two books on a topic new to Technometrics: statistical and mathematical demography. The first author of Applied Mathematical Demography wrote the first two editions of this book alone. The second edition was published in 1985. Professor Keyfritz noted in the Preface (p. vii) that at age 90 he had no interest in doing another edition; however, the publisher encouraged him to find a coauthor. The result is an additional focus for the book in the world of biology that makes it much more relevant for the sciences. The book is now part of the publisher’s series on Statistics for Biology and Health. Much of it, of course, focuses on the many aspects of human populations. The new material focuses on mature population models, the particular focus of the new author (see, e.g., Caswell 2000). As one might expect from a book that was originally written in the 1970s, it does not include a lot of information on statistical computing. The new book by Alho and Spencer is focused on putting a better emphasis on statistics in the discipline of demography (Preface, p. vii). It is part of the publisher’s Series in Statistics. The authors are both statisticians, so the focus is on statistics as used for demographic problems. The authors are targeting human applications, so their perspective on science does not extend any further than epidemiology. The book actually strikes a good balance between statistical tools and demographic applications. The authors use the first two chapters to teach statisticians about the concepts of demography. The next four chapters are very similar to the statistics content found in introductory books on survival analysis, such as the recent book by Kleinbaum and Klein (2005), reported by Ziegel (2006). The next three chapters are focused on various aspects of forecasting demographic rates. The book concludes with chapters focusing on three areas of applications: errors in census numbers, financial applications, and small-area estimates.

Univariate Discrete Distributions

We developed a novel approach for conducting multisample, multigene, ultradeep bisulfite sequencing analysis of DNA methylation patterns in clinical samples. A massively parallel sequencing-by-synthesis method (454 sequencing) was used to directly sequence >100 bisulfite PCR products in a single sequencing run without subcloning. We showed the utility, robustness, and superiority of this approach by analyzing methylation in 25 gene-related CpG rich regions from >40 cases of primary cells, including normal peripheral blood lymphocytes, acute lymphoblastic leukemia (ALL), chronic lymphocytic leukemia (CLL), follicular lymphoma (FL), and mantle cell lymphoma (MCL). A total of 294,631 sequences was generated with an average read length of 131 bp. On average, >1,600 individual sequences were generated for each PCR amplicon far beyond the few clones (<20) typically analyzed by traditional bisulfite sequencing. Comprehensive analysis of CpG methylation patterns at a single DNA molecule level using clustering algorithms revealed differential methylation patterns between diseases. A significant increase in methylation was detected in ALL and FL samples compared with CLL and MCL. Furthermore, a progressive spreading of methylation was detected from the periphery toward the center of select CpG islands in the ALL and FL samples. The ultradeep sequencing also allowed simultaneous analysis of genetic and epigenetic data and revealed an association between a single nucleotide polymorphism and the methylation present in the LRP1B promoter. This new generation of methylome sequencing will provide digital profiles of aberrant DNA methylation for individual human cancers and offers a robust method for the epigenetic classification of tumor subtypes.

https://cancerres.aacrjournals.org/content/canres/67/18/8511.full.pdf

Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing.

The commensal gut microbiota has been implicated as a determinant in several human diseases and conditions. There is mounting evidence that the gut microbiota of laboratory mice (Mus musculus) similarly modulates the phenotype of mouse models used to study human disease and development. While differing model phenotypes have been reported using mice purchased from different vendors, the composition and uniformity of the fecal microbiota in mice of various genetic backgrounds from different vendors is unclear. Using culture-independent methods and robust statistical analysis, we demonstrate significant differences in the richness and diversity of fecal microbial populations in mice purchased from two large commercial vendors. Moreover, the abundance of many operational taxonomic units, often identified to the species level, as well as several higher taxa, differed in vendor- and strain-dependent manners. Such differences were evident in the fecal microbiota of weanling mice and persisted throughout the study, to twenty-four weeks of age. These data provide the first in-depth analysis of the developmental trajectory of the fecal microbiota in mice from different vendors, and a starting point from which researchers may be able to refine animal models affected by differences in the gut microbiota and thus possibly reduce the number of animals required to perform studies with sufficient statistical power.

/pdf/effects-of-vendor-and-genetic-background-on-the-composition-2cyjvbbcx4.pdf

Effects of vendor and genetic background on the composition of the fecal microbiota of inbred mice.

The rapid rise in natural gas extraction using hydraulic fracturing increases the potential for contamination of surface and ground water from chemicals used throughout the process. Hundreds of products containing more than 750 chemicals and components are potentially used throughout the extraction process, including more than 100 known or suspected endocrine-disrupting chemicals. We hypothesizedthataselectedsubsetofchemicalsusedinnaturalgasdrillingoperationsandalsosurface and ground water samples collected in a drilling-dense region of Garfield County, Colorado, would exhibit estrogen and androgen receptor activities. Water samples were collected, solid-phase extracted, and measured for estrogen and androgen receptor activities using reporter gene assays in human cell lines. Of the 39 unique water samples, 89%, 41%, 12%, and 46% exhibited estrogenic, antiestrogenic, androgenic, and antiandrogenic activities, respectively. Testing of a subset of natural gas drilling chemicals revealed novel antiestrogenic, novel antiandrogenic, and limited estrogenic activities. The Colorado River, the drainage basin for this region, exhibited moderate levels of estrogenic, antiestrogenic, and antiandrogenic activities, suggesting that higher localized activity at sites with known natural gas–related spills surrounding the river might be contributing to the multiple receptor activities observed in this water source. The majority of water samples collected from sites in a drilling-dense region of Colorado exhibited more estrogenic, antiestrogenic, or antiandrogenic activities than reference sites with limited nearby drilling operations. Our data suggest that natural gas drillingoperationsmayresultinelevatedendocrine-disruptingchemicalactivityinsurfaceandground water. (Endocrinology 155: 897–907, 2014)

/pdf/estrogen-and-androgen-receptor-activities-of-hydraulic-17dovyq1s6.pdf

J. Wade Davis

Papers

Bioinformatics and Computational Biology Solutions Using R and Bioconductor

Univariate Discrete Distributions

Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing.

Effects of vendor and genetic background on the composition of the fecal microbiota of inbred mice.

Estrogen and Androgen Receptor Activities of Hydraulic Fracturing Chemicals and Surface and Ground Water in a Drilling-Dense Region