Showing papers in "GigaScience in 2016"

PDF

Open Access

Journal Article•DOI•

Draft genome of the living fossil Ginkgo biloba

[...]

Rui Guan¹, Yunpeng Zhao², He Zhang³, Guangyi Fan, Xin Liu, Wenbin Zhou², Chengcheng Shi, Jiahao Wang, Weiqing Liu, Xinming Liang, Yuanyuan Fu¹, Kailong Ma, Lijun Zhao², Fu-Min Zhang⁴, Zuhong Lu¹, Simon Ming-Yuen Lee, Xun Xu, Jian Wang, Huanming Yang, Cheng-Xin Fu², Song Ge⁴, Wenbin Chen - Show less +18 more•Institutions (4)

Southeast University¹, Zhejiang University², The Chinese University of Hong Kong³, Chinese Academy of Sciences⁴

21 Nov 2016-GigaScience

TL;DR: The ginkgo genome consists mainly of LTR-RTs resulting from ancient gradual accumulation and two WGD events, which sheds light on sequencing large genomes, and opens an avenue for further genetic and evolutionary research.

...read moreread less

Abstract: Ginkgo biloba L. (Ginkgoaceae) is one of the most distinctive plants. It possesses a suite of fascinating characteristics including a large genome, outstanding resistance/tolerance to abiotic and biotic stresses, and dioecious reproduction, making it an ideal model species for biological studies. However, the lack of a high-quality genome sequence has been an impediment to our understanding of its biology and evolution. The 10.61 Gb genome sequence containing 41,840 annotated genes was assembled in the present study. Repetitive sequences account for 76.58% of the assembled sequence, and long terminal repeat retrotransposons (LTR-RTs) are particularly prevalent. The diversity and abundance of LTR-RTs is due to their gradual accumulation and a remarkable amplification between 16 and 24 million years ago, and they contribute to the long introns and large genome. Whole genome duplication (WGD) may have occurred twice, with an ancient WGD consistent with that shown to occur in other seed plants, and a more recent event specific to ginkgo. Abundant gene clusters from tandem duplication were also evident, and enrichment of expanded gene families indicates a remarkable array of chemical and antibacterial defense pathways. The ginkgo genome consists mainly of LTR-RTs resulting from ancient gradual accumulation and two WGD events. The multiple defense mechanisms underlying the characteristic resilience of ginkgo are fostered by a remarkable enrichment in ancient duplicated and ginkgo-specific gene clusters. The present study sheds light on sequencing large genomes, and opens an avenue for further genetic and evolutionary research.

...read moreread less

216 citations

Journal Article•DOI•

Introducing BASE: the Biomes of Australian Soil Environments soil microbial diversity database

[...]

Andrew Bissett¹, Anna Fitzgerald, Thys Meintjes², Pauline M. Mele³, Frank Reith⁴, Frank Reith⁵, Paul G. Dennis⁶, Martin F. Breed⁴, Belinda Brown, Mark V. Brown⁷, Joël Brugger⁸, Margaret Byrne, Stefan Caddy-Retalic⁴, Bernie Carmody, David J. Coates, Carolina Correa⁷, Belinda C. Ferrari⁷, Vadakattu V. S. R. Gupta⁵, Kelly Hamonts⁵, Kelly Hamonts⁹, Asha Haslem¹⁰, Philip Hugenholtz⁶, Mirko Karan¹¹, Jason Koval⁷, Andrew J. Lowe⁴, Stuart Macdonald¹², Leanne McGrath, David Martin⁵, Matthew J. Morgan⁵, Kristin I. North⁷, Chanyarat Paungfoo-Lonhienne⁶, Elise Pendall⁹, Lori A. Phillips¹³, Rebecca Pirzl⁵, Jeff R. Powell⁹, Mark A. Ragan⁶, Susanne Schmidt⁶, Nicole P. Seymour, Ian Snape¹⁴, John R. Stephen, Matthew Stevens¹⁰, Matt Tinning¹⁰, Kristen J. Williams⁵, Yun Kit Yeoh⁶, Carla M. Zammit⁶, Andrew G. Young⁵ - Show less +42 more•Institutions (14)

Hobart Corporation¹, Murdoch University², La Trobe University³, University of Adelaide⁴, Commonwealth Scientific and Industrial Research Organisation⁵, University of Queensland⁶, University of New South Wales⁷, Monash University, Clayton campus⁸, University of Sydney⁹, Walter and Eliza Hall Institute of Medical Research¹⁰, James Cook University¹¹, University of Tasmania¹², Agriculture and Agri-Food Canada¹³, Australian Antarctic Division¹⁴

18 May 2016-GigaScience

TL;DR: The ‘Biomes of Australian Soil Environments’ (BASE) project has generated a database of microbial diversity with associated metadata across extensive environmental gradients at continental scale, becoming the first Australian soil microbial diversity database.

...read moreread less

Abstract: Microbial inhabitants of soils are important to ecosystem and planetary functions, yet there are large gaps in our knowledge of their diversity and ecology. The ‘Biomes of Australian Soil Environments’ (BASE) project has generated a database of microbial diversity with associated metadata across extensive environmental gradients at continental scale. As the characterisation of microbes rapidly expands, the BASE database provides an evolving platform for interrogating and integrating microbial diversity and function. BASE currently provides amplicon sequences and associated contextual data for over 900 sites encompassing all Australian states and territories, a wide variety of bioregions, vegetation and land-use types. Amplicons target bacteria, archaea and general and fungal-specific eukaryotes. The growing database will soon include metagenomics data. Data are provided in both raw sequence (FASTQ) and analysed OTU table formats and are accessed via the project’s data portal, which provides a user-friendly search tool to quickly identify samples of interest. Processed data can be visually interrogated and intersected with other Australian diversity and environmental data using tools developed by the ‘Atlas of Living Australia’. Developed within an open data framework, the BASE project is the first Australian soil microbial diversity database. The database will grow and link to other global efforts to explore microbial, plant, animal, and marine biodiversity. Its design and open access nature ensures that BASE will evolve as a valuable tool for documenting an often overlooked component of biodiversity and the many microbe-driven processes that are essential to sustain soil function and ecosystem services.

...read moreread less

178 citations

Journal Article•DOI•

Species-level resolution of 16S rRNA gene amplicons sequenced through the MinION™ portable nanopore sequencer

[...]

Alfonso Benítez-Páez¹, Kevin J. Portune¹, Yolanda Sanz¹•Institutions (1)

Spanish National Research Council¹

28 Jan 2016-GigaScience

TL;DR: Although nanopore-based sequencing produces reads with lower per-base accuracy compared with other platforms, the MinION™ DNA sequencer is valuable for both high taxonomic resolution and microbial diversity analysis.

...read moreread less

Abstract: The miniaturised and portable DNA sequencer MinION™ has been released to the scientific community within the framework of an early access programme to evaluate its application for a wide variety of genetic approaches. This technology has demonstrated great potential, especially in genome-wide analyses. In this study, we tested the ability of the MinION™ system to perform amplicon sequencing in order to design new approaches to study microbial diversity using nearly full-length 16S rDNA sequences. Using R7.3 chemistry, we generated more than 3.8 million events (nt) during a single sequencing run. These data were sufficient to reconstruct more than 90 % of the 16S rRNA gene sequences for 20 different species present in a mock reference community. After read mapping and 16S rRNA gene assembly, consensus sequences and 2d reads were recovered to assign taxonomic classification down to the species level. Additionally, we were able to measure the relative abundance of all the species present in a mock community and detected a biased species distribution originating from the PCR reaction using ‘universal’ primers. Although nanopore-based sequencing produces reads with lower per-base accuracy compared with other platforms, the MinION™ DNA sequencer is valuable for both high taxonomic resolution and microbial diversity analysis. Improvements in nanopore chemistry, such as minimising base-calling errors and the nucleotide bias reported here for 16S amplicon sequencing, will further deliver more reliable information that is useful for the specific detection of microbial species and strains in complex ecosystems.

...read moreread less

168 citations

Journal Article•DOI•

Genome sequence of the olive tree, Olea europaea

[...]

Fernando Cruz¹, Irene Julca¹, Irene Julca², Jèssica Gómez-Garrido¹, Damian Loska¹, Marina Marcet-Houben¹, Emilio Cano³, Beatriz Galán³, Leonor Frias¹, Paolo Ribeca¹, Sophia Derdak¹, Marta Gut¹, Manuel Sánchez-Fernández, José Luis García³, Ivo Gut¹, Pablo Vargas³, Tyler Alioto¹, Toni Gabaldón - Show less +14 more•Institutions (3)

Pompeu Fabra University¹, Autonomous University of Barcelona², Spanish National Research Council³

27 Jun 2016-GigaScience

TL;DR: The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits.

...read moreread less

Abstract: The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.

...read moreread less

155 citations

Journal Article•DOI•

Tools and techniques for computational reproducibility

[...]

Stephen R. Piccolo¹, Michael B. Frampton¹•Institutions (1)

Brigham Young University¹

11 Jul 2016-GigaScience

TL;DR: No single strategy is sufficient for every scenario; thus it is often useful to combine approaches, and seven such strategies are described.

...read moreread less

Abstract: When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. Computers can be programmed to execute analysis tasks, and those programs can be repeated and shared with others. The deterministic nature of most computer programs means that the same analysis tasks, applied to the same data, will often produce the same outputs. However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed—and because of limitations associated with how scientists document analysis steps. Many tools and techniques are available to help overcome these challenges; here we describe seven such strategies. With a broad scientific audience in mind, we describe the strengths and limitations of each approach, as well as the circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

...read moreread less

135 citations

Journal Article•DOI•

INC-Seq: accurate single molecule reads using nanopore sequencing

[...]

Chenhao Li¹, Chenhao Li², Kern Rei Chng², Esther J. H. Boey², Amanda Hui Qi Ng², Andreas Wilm², Niranjan Nagarajan², Niranjan Nagarajan¹ - Show less +4 more•Institutions (2)

National University of Singapore¹, Genome Institute of Singapore²

02 Aug 2016-GigaScience

TL;DR: INC-Seq reads enabled accurate species-level classification, identification of species at 0.1 % abundance and robust quantification of relative abundances, providing a cheap and effective approach for pathogen detection and microbiome profiling on the MinION system.

...read moreread less

Abstract: Nanopore sequencing provides a rapid, cheap and portable real-time sequencing platform with the potential to revolutionize genomics. However, several applications are limited by relatively high single-read error rates (>10 %), including RNA-seq, haplotype sequencing and 16S sequencing. We developed the Intramolecular-ligated Nanopore Consensus Sequencing (INC-Seq) as a strategy for obtaining long and accurate nanopore reads, starting with low input DNA. Applying INC-Seq for 16S rRNA-based bacterial profiling generated full-length amplicon sequences with a median accuracy >97 %. INC-Seq reads enabled accurate species-level classification, identification of species at 0.1 % abundance and robust quantification of relative abundances, providing a cheap and effective approach for pathogen detection and microbiome profiling on the MinION system.

...read moreread less

128 citations

Journal Article•DOI•

Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data

[...]

Ivo D. Dinov¹•Institutions (1)

University of Michigan¹

25 Feb 2016-GigaScience

TL;DR: Using imaging, genetic and healthcare data, examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols are provided.

...read moreread less

Abstract: Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be ‘team science’.

...read moreread less

126 citations

Journal Article•DOI•

Mitochondrial metagenomics: letting the genes out of the bottle.

[...]

Alex Crampton-Platt¹, Alex Crampton-Platt², Douglas W. Yu³, Douglas W. Yu⁴, Xin Zhou, Alfried P. Vogler⁵, Alfried P. Vogler¹ - Show less +3 more•Institutions (5)

Natural History Museum¹, University College London², University of East Anglia³, Kunming Institute of Zoology⁴, Imperial College London⁵

22 Mar 2016-GigaScience

TL;DR: Mitochondrial metagenomics offers a promising avenue for unifying the ecological and evolutionary understanding of species diversity and makes it possible to obtain data on spatial and temporal turnover in whole-community phylogenetic and species composition, even in complex ecosystems where species-level taxonomy and biodiversity patterns are poorly known.

...read moreread less

Abstract: ‘Mitochondrial metagenomics’ (MMG) is a methodology for shotgun sequencing of total DNA from specimen mixtures and subsequent bioinformatic extraction of mitochondrial sequences. The approach can be applied to phylogenetic analysis of taxonomically selected taxa, as an economical alternative to mitogenome sequencing from individual species, or to environmental samples of mixed specimens, such as from mass trapping of invertebrates. The routine generation of mitochondrial genome sequences has great potential both for systematics and community phylogenetics. Mapping of reads from low-coverage shotgun sequencing of environmental samples also makes it possible to obtain data on spatial and temporal turnover in whole-community phylogenetic and species composition, even in complex ecosystems where species-level taxonomy and biodiversity patterns are poorly known. In addition, read mapping can produce information on species biomass, and potentially allows quantification of within-species genetic variation. The success of MMG relies on the formation of numerous mitochondrial genome contigs, achievable with standard genome assemblers, but various challenges for the efficiency of assembly remain, particularly in the face of variable relative species abundance and intra-specific genetic variation. Nevertheless, several studies have demonstrated the power of mitogenomes from MMG for accurate phylogenetic placement, evolutionary analysis of species traits, biodiversity discovery and the establishment of species distribution patterns; it offers a promising avenue for unifying the ecological and evolutionary understanding of species diversity.

...read moreread less

106 citations

Journal Article•DOI•

Draft genome of the Chinese mitten crab, Eriocheir sinensis.

[...]

Linsheng Song¹, Chao Bian, Yongju Luo, Lingling Wang², Xinxin You, Jia Li, Ying Qiu, Ma Xingyu, Zhu Zhifei, Ma Liang, Wang Zhaogen, Lei Ying, Jun Qiang¹, Hongxia Li¹, Juhua Yu¹, Alex Wong, Junmin Xu, Qiong Shi, Pao Xu¹ - Show less +15 more•Institutions (2)

Chinese Academy of Fishery Sciences¹, Chinese Academy of Sciences²

28 Jan 2016-GigaScience

TL;DR: The assembled draft genome will provide a valuable resource for the study of essential developmental processes and genetic determination of important traits of the Chinese mitten crab, and also for investigating crustacean evolution.

...read moreread less

Abstract: The Chinese mitten crab, Eriocheir sinensis, is one of the most studied and economically important crustaceans in China. Its transition from a swimming to a crawling method of movement during early development, anadromous migration during growth, and catadromous migration during breeding have been attractive features for research. However, knowledge of the underlying molecular mechanisms that regulate these processes is still very limited. A total of 258.8 gigabases (Gb) of raw reads from whole-genome sequencing of the crab were generated by the Illumina HiSeq2000 platform. The final genome assembly (1.12 Gb), about 67.5 % of the estimated genome size (1.66 Gb), is composed of 17,553 scaffolds (>2 kb) with an N50 of 224 kb. We identified 14,436 genes using AUGUSTUS, of which 7,549 were shown to have significant supporting evidence using the GLEAN pipeline. This gene number is much greater than that of the horseshoe crab, and the annotation completeness, as evaluated by CEGMA, reached 66.9 %. We report the first genome sequencing, assembly, and annotation of the Chinese mitten crab. The assembled draft genome will provide a valuable resource for the study of essential developmental processes and genetic determination of important traits of the Chinese mitten crab, and also for investigating crustacean evolution.

...read moreread less

100 citations

Journal Article•DOI•

Streaming algorithms for identification of pathogens and antibiotic resistance potential from real-time MinION TM sequencing

[...]

Minh Duc Cao¹, Devika Ganesamoorthy¹, Alysha G. Elliott¹, Huihui Zhang¹, Mark E. Cooper¹, Lachlan J. M. Coin², Lachlan J. M. Coin¹ - Show less +3 more•Institutions (2)

University of Queensland¹, Imperial College London²

26 Jul 2016-GigaScience

TL;DR: This work presents a framework for streaming analysis of MinION real-time sequence data, together with probabilistic streaming algorithms for species typing, strain typing and antibiotic resistance profile identification, and shows that the pipeline can process over 100 times more data than the current throughput of the MinION on a desktop computer.

...read moreread less

Abstract: The recently introduced Oxford Nanopore MinION platform generates DNA sequence data in real-time. This has great potential to shorten the sample-to-results time and is likely to have benefits such as rapid diagnosis of bacterial infection and identification of drug resistance. However, there are few tools available for streaming analysis of real-time sequencing data. Here, we present a framework for streaming analysis of MinION real-time sequence data, together with probabilistic streaming algorithms for species typing, strain typing and antibiotic resistance profile identification. Using four culture isolate samples, as well as a mixed-species sample, we demonstrate that bacterial species and strain information can be obtained within 30 min of sequencing and using about 500 reads, initial drug-resistance profiles within two hours, and complete resistance profiles within 10 h. While strain identification with multi-locus sequence typing required more than 15x coverage to generate confident assignments, our novel gene-presence typing could detect the presence of a known strain with 0.5x coverage. We also show that our pipeline can process over 100 times more data than the current throughput of the MinION on a desktop computer.

...read moreread less

86 citations

Journal Article•DOI•

Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data

[...]

Robert L. Davidson¹, Ralf J. M. Weber¹, Haoyu Liu¹, Archana Sharma-Oates¹, Mark R. Viant¹ - Show less +1 more•Institutions (1)

University of Birmingham¹

23 Feb 2016-GigaScience

TL;DR: This work presents an end-to-end mass spectrometry metabolomics workflow in the widely used platform, Galaxy, and recommends that Galaxy-M workflow files are included within the supplementary information of publications, enabling metabolomics studies to achieve greater reproducibility.

...read moreread less

Abstract: Metabolomics is increasingly recognized as an invaluable tool in the biological, medical and environmental sciences yet lags behind the methodological maturity of other omics fields. To achieve its full potential, including the integration of multiple omics modalities, the accessibility, standardization and reproducibility of computational metabolomics tools must be improved significantly. Here we present our end-to-end mass spectrometry metabolomics workflow in the widely used platform, Galaxy. Named Galaxy-M, our workflow has been developed for both direct infusion mass spectrometry (DIMS) and liquid chromatography mass spectrometry (LC-MS) metabolomics. The range of tools presented spans from processing of raw data, e.g. peak picking and alignment, through data cleansing, e.g. missing value imputation, to preparation for statistical analysis, e.g. normalization and scaling, and principal components analysis (PCA) with associated statistical evaluation. We demonstrate the ease of using these Galaxy workflows via the analysis of DIMS and LC-MS datasets, and provide PCA scores and associated statistics to help other users to ensure that they can accurately repeat the processing and analysis of these two datasets. Galaxy and data are all provided pre-installed in a virtual machine (VM) that can be downloaded from the GigaDB repository. Additionally, source code, executables and installation instructions are available from GitHub. The Galaxy platform has enabled us to produce an easily accessible and reproducible computational metabolomics workflow. More tools could be added by the community to expand its functionality. We recommend that Galaxy-M workflow files are included within the supplementary information of publications, enabling metabolomics studies to achieve greater reproducibility.

...read moreread less

Journal Article•DOI•

Genomic analyses reveal FAM84B and the NOTCH pathway are associated with the progression of esophageal squamous cell carcinoma

[...]

Caixia Cheng¹, Heyang Cui¹, Ling Zhang¹, Zhiwu Jia¹, Bin Song¹, Fang Wang¹, Yaoping Li¹, Jing Liu¹, Pengzhou Kong¹, Ruyi Shi¹, Yanghui Bi¹, Bin Yang¹, Juan Wang¹, Zhenxiang Zhao¹, Yanyan Zhang¹, Xiaoling Hu¹, Jie Yang¹, Chanting He¹, Zhiping Zhao¹, Jinfen Wang, Yanfeng Xi, Enwei Xu, Guodong Li, Shiping Guo, Yunqing Chen, Xiaofeng Yang¹, Xing Chen, Jianfang Liang¹, Jiansheng Guo¹, Xiaolong Cheng¹, Chuangui Wang², Qimin Zhan³, Yongping Cui¹ - Show less +29 more•Institutions (3)

Shanxi Medical University¹, Shanghai Jiao Tong University², Peking Union Medical College³

11 Jan 2016-GigaScience

TL;DR: The results suggest that FAM84B and the NOTCH pathway are involved in the progression of ESCC and may be potential diagnostic targets for ESCC susceptibility.

...read moreread less

Abstract: Esophageal squamous cell carcinoma (ESCC) is the sixth most lethal cancer worldwide and the fourth most lethal cancer in China. Genomic characterization of tumors, particularly those of different stages, is likely to reveal additional oncogenic mechanisms. Although copy number alterations and somatic point mutations associated with the development of ESCC have been identified by array-based technologies and genome-wide studies, the genomic characterization of ESCCs from different stages of the disease has not been explored. Here, we have performed either whole-genome sequencing or whole-exome sequencing on 51 stage I and 53 stage III ESCC patients to characterize the genomic alterations that occur during the various clinical stages of ESCC, and further validated these changes in 36 atypical hyperplasia samples. Recurrent somatic amplifications at 8q were found to be enriched in stage I tumors and the deletions of 4p-q and 5q were particularly identified in stage III tumors. In particular, the FAM84B gene was amplified and overexpressed in preclinical and ESCC tumors. Knockdown of FAM84B in ESCC cell lines significantly reduced in vitro cell growth, migration and invasion. Although the cancer-associated genes TP53, PIK3CA, CDKN2A and their pathways showed no significant difference between stage I and stage III tumors, we identified and validated a prevalence of mutations in NOTCH1 and in the NOTCH pathway that indicate that they are involved in the preclinical and early stages of ESCC. Our results suggest that FAM84B and the NOTCH pathway are involved in the progression of ESCC and may be potential diagnostic targets for ESCC susceptibility.

...read moreread less

Journal Article•DOI•

Chromosomer: a reference-based genome arrangement tool for producing draft chromosome sequences.

[...]

Gaik Tamazian¹, Pavel Dobrynin¹, Ksenia Krasheninnikova¹, Aleksey Komissarov¹, Klaus-Peter Koepfli¹, Klaus-Peter Koepfli², Stephen J. O'Brien³, Stephen J. O'Brien¹ - Show less +4 more•Institutions (3)

Saint Petersburg State University¹, Smithsonian Conservation Biology Institute², Nova Southeastern University³

22 Aug 2016-GigaScience

TL;DR: Chromosomer is a reference-based genome arrangement tool, which rapidly builds chromosomes from genome contigs or scaffolds using their alignments to a reference genome of a closely related species, and is a useful tool for genomic analysis of species without chromosome maps.

...read moreread less

Abstract: As the number of sequenced genomes rapidly increases, chromosome assembly is becoming an even more crucial step of any genome study. Since de novo chromosome assemblies are confounded by repeat-mediated artifacts, reference-assisted assemblies that use comparative inference have become widely used, prompting the development of several reference-assisted assembly programs for prokaryotic and eukaryotic genomes. We developed Chromosomer – a reference-based genome arrangement tool, which rapidly builds chromosomes from genome contigs or scaffolds using their alignments to a reference genome of a closely related species. Chromosomer does not require mate-pair libraries and it offers a number of auxiliary tools that implement common operations accompanying the genome assembly process. Despite implementing a straightforward alignment-based approach, Chromosomer is a useful tool for genomic analysis of species without chromosome maps. Putative chromosome assemblies by Chromosomer can be used in comparative genomic analysis, genomic variation assessment, potential linkage group inference and other kinds of analysis involving contig or scaffold mapping to a high-quality assembly.

...read moreread less

Journal Article•DOI•

Transcriptome sequences spanning key developmental states as a resource for the study of the cestode Schistocephalus solidus, a threespine stickleback parasite

[...]

Francois Olivier Hebert¹, Stephan Grambauer², Iain Barber², Christian R. Landry¹, Nadia Aubin-Horth¹ - Show less +1 more•Institutions (2)

Laval University¹, University of Leicester²

02 Jun 2016-GigaScience

TL;DR: This large-scale transcriptomic dataset provides a foundation for studies on how parasitic species with complex life cycles modulate their response to changes in biotic and abiotic conditions experienced inside their various hosts, which is a fundamental objective of parasitology.

...read moreread less

Abstract: Schistocephalus solidus is a well-established model organism for studying the complex life cycle of cestodes and the mechanisms underlying host-parasite interactions. However, very few large-scale genetic resources for this species are available. We have sequenced and de novo-assembled the transcriptome of S. solidus using tissues from whole worms at three key developmental states - non-infective plerocercoid, infective plerocercoid and adult plerocercoid - to provide a resource for studying the evolution of complex life cycles and, more specifically, how parasites modulate their interactions with their hosts during development. The de novo transcriptome assembly reconstructed the coding sequence of 10,285 high-confidence unigenes from which 24,765 non-redundant transcripts were derived. 7,920 (77 %) of these unigenes were annotated with a protein name and 7,323 (71 %) were assigned at least one Gene Ontology term. Our raw transcriptome assembly (unfiltered transcripts) covers 92 % of the predicted transcriptome derived from the S. solidus draft genome assembly currently available on WormBase. It also provides new ecological information and orthology relationships to further annotate the current WormBase transcriptome and genome. This large-scale transcriptomic dataset provides a foundation for studies on how parasitic species with complex life cycles modulate their response to changes in biotic and abiotic conditions experienced inside their various hosts, which is a fundamental objective of parasitology. Furthermore, this resource will help in the validation of the S solidus gene features that have been predicted based on genomic sequence.

...read moreread less

Journal Article•DOI•

Improved hybrid de novo genome assembly of domesticated apple (Malus x domestica)

[...]

Xuewei Li¹, Ling Kui², Jing Zhang, Yinpeng Xie¹, Liping Wang¹, Yan Yan¹, Na Wang¹, Jidi Xu¹, Cuiying Li¹, Wen Wang², Steve van Nocker³, Yang Dong⁴, Yang Dong⁵, Fengwang Ma¹, Qingmei Guan¹ - Show less +11 more•Institutions (5)

Northwest A&F University¹, Kunming Institute of Zoology², Michigan State University³, Kunming University of Science and Technology⁴, Yunnan Agricultural University⁵

08 Aug 2016-GigaScience

TL;DR: The new apple genome assembly will serve as a valuable resource for investigating complex apple traits at the genomic level, not only suitable for genome editing and gene cloning, but also for RNA-seq and whole-genome re-sequencing studies.

...read moreread less

Abstract: Domesticated apple (Malus × domestica Borkh) is a popular temperate fruit with high nutrient levels and diverse flavors. In 2012, global apple production accounted for at least one tenth of all harvested fruits. A high-quality apple genome assembly is crucial for the selection and breeding of new cultivars. Currently, a single reference genome is available for apple, assembled from 16.9 × genome coverage short reads via Sanger and 454 sequencing technologies. Although a useful resource, this assembly covers only ~89 % of the non-repetitive portion of the genome, and has a relatively short (16.7 kb) contig N50 length. These downsides make it difficult to apply this reference in transcriptive or whole-genome re-sequencing analyses. Here we present an improved hybrid de novo genomic assembly of apple (Golden Delicious), which was obtained from 76 Gb (~102 × genome coverage) Illumina HiSeq data and 21.7 Gb (~29 × genome coverage) PacBio data. The final draft genome is approximately 632.4 Mb, representing ~ 90 % of the estimated genome. The contig N50 size is 111,619 bp, representing a 7 fold improvement. Further annotation analyses predicted 53,922 protein-coding genes and 2,765 non-coding RNA genes. The new apple genome assembly will serve as a valuable resource for investigating complex apple traits at the genomic level. It is not only suitable for genome editing and gene cloning, but also for RNA-seq and whole-genome re-sequencing studies.

...read moreread less

Journal Article•DOI•

Integrated metabolomics and phytochemical genomics approaches for studies on rice

[...]

Yozo Okazaki¹, Kazuki Saito²•Institutions (2)

Kihara Institute for Biological Research¹, Chiba University²

02 Mar 2016-GigaScience

TL;DR: Research on metabolomics research on rice is discussed in order to elucidate the overall regulation of the metabolism as it is related to the growth and mechanisms of adaptation to genetic modifications and environmental stresses such as fungal infections, submergence, and oxidative stress.

...read moreread less

Abstract: Metabolomics is widely employed to monitor the cellular metabolic state and assess the quality of plant-derived foodstuffs because it can be used to manage datasets that include a wide range of metabolites in their analytical samples. In this review, we discuss metabolomics research on rice in order to elucidate the overall regulation of the metabolism as it is related to the growth and mechanisms of adaptation to genetic modifications and environmental stresses such as fungal infections, submergence, and oxidative stress. We also focus on phytochemical genomics studies based on a combination of metabolomics and quantitative trait locus (QTL) mapping techniques. In addition to starch, rice produces many metabolites that also serve as nutrients for human consumers. The outcomes of recent phytochemical genomics studies of diverse natural rice resources suggest there is potential for using further effective breeding strategies to improve the quality of ingredients in rice grains.

...read moreread less

Journal Article•DOI•

Draft genome of the leopard gecko, Eublepharis macularius

[...]

Zijun Xiong¹, Zijun Xiong², Fang Li, Qiye Li¹, Qiye Li³, Long Zhou, Tony Gamble⁴, Jiao Zheng, Ling Kui¹, Cai Li, Shengbin Li², Huanming Yang, Guojie Zhang³, Guojie Zhang¹ - Show less +10 more•Institutions (4)

Kunming Institute of Zoology¹, Xi'an Jiaotong University², University of Copenhagen³, Marquette University⁴

26 Oct 2016-GigaScience

TL;DR: In this article, a male leopard gecko, Eublepharis macularius, was reported to have a 2.02 Gb genome, which was close to the 2.23 Gb estimated by k-mer analysis.

...read moreread less

Abstract: Geckos are among the most species-rich reptile groups and the sister clade to all other lizards and snakes. Geckos possess a suite of distinctive characteristics, including adhesive digits, nocturnal activity, hard, calcareous eggshells, and a lack of eyelids. However, one gecko clade, the Eublepharidae, appears to be the exception to most of these ‘rules’ and lacks adhesive toe pads, has eyelids, and lays eggs with soft, leathery eggshells. These differences make eublepharids an important component of any investigation into the underlying genomic innovations contributing to the distinctive phenotypes in ‘typical’ geckos. We report high-depth genome sequencing, assembly, and annotation for a male leopard gecko, Eublepharis macularius (Eublepharidae). Illumina sequence data were generated from seven insert libraries (ranging from 170 to 20 kb), representing a raw sequencing depth of 136X from 303 Gb of data, reduced to 84X and 187 Gb after filtering. The assembled genome of 2.02 Gb was close to the 2.23 Gb estimated by k-mer analysis. Scaffold and contig N50 sizes of 664 and 20 kb, respectively, were comparable to the previously published Gekko japonicus genome. Repetitive elements accounted for 42 % of the genome. Gene annotation yielded 24,755 protein-coding genes, of which 93 % were functionally annotated. CEGMA and BUSCO assessment showed that our assembly captured 91 % (225 of 248) of the core eukaryotic genes, and 76 % of vertebrate universal single-copy orthologs. Assembly of the leopard gecko genome provides a valuable resource for future comparative genomic studies of geckos and other squamate reptiles.

...read moreread less

Journal Article•DOI•

2015 Brainhack Proceedings

[...]

R. Cameron Craddock¹, R. Cameron Craddock², Pierre Bellec³, Daniel S. Margules⁴, B. Nolan Nichols⁵, B. Nolan Nichols⁶, Jörg P. Pfannmöller⁷, AmanPreet Badhwar³, David N. Kennedy⁸, Jean-Baptiste Poline⁹, Roberto Toro¹⁰, Ben Cipollini¹¹, Ariel Rokem¹², Daniel Clark¹, Krzysztof J. Gorgolewski⁵, Daniel J. Clark¹, Samir Das¹³, Cécile Madjar¹⁴, Ayan Sengupta¹⁵, Zia Mohades¹³, Sebastien Dery¹³, Weiran Deng¹⁶, Eric Earl¹⁷, Damion V. Demeter¹⁷, Kate Mills¹⁷, Glad Mihai¹⁸, Luka Ruzic¹⁹, Nicholas A. Ketz²⁰, Andrew E. Reineberg²¹, Marianne C. Reddan²⁰, Anne-Lise Goddings²¹, Javier Gonzalez-Castillo²², Caroline Froehlich², Gil Dekel²³, Daniel S. Margulies⁴, Ben D. Fulcher²⁴, Tristan Glatard¹³, Tristan Glatard²⁵, Reza Adalat¹³, Natacha Beck¹³, Rémi Bernard¹³, Najmeh Khalili-Mahani¹³, Pierre Rioux¹³, M. Rousseau¹³, Alan C. Evans¹³, Yaroslav O. Halchenko²⁶, Matteo Visconti di Oleggio Castello²⁶, Raúl Hernández-Pérez, Edgar A. Morales, Laura V. Cuaya, Kaori L. Ito²⁷, Sook-Lei Liew²⁷, Hans J. Johnson²⁸, Erik Kan²⁹, Erik Kan²⁷, Julia Anglin, Michael R. Borich³⁰, Neda Jahanshad²⁷, Paul M. Thompson²⁷, Marcel Falkiewicz⁴, Julia M. Huntenburg⁴, David H. O’Connor², David H. O’Connor¹, Michael P. Milham¹, Michael P. Milham², Ramon Fraga Pereira³¹, Anibal Sólon Heinsfeld³¹, Alexandre Rosa Franco³¹, Augusto Buchweitz³¹, Felipe Meneguzzi³¹, Rickson C. Mesquita³², Luis C. T. Herrera³², Daniela Dentico³³, Vanessa Sochat⁵, Julio E. Villalon-Reina²⁷, Eleftherios Garyfallidis³⁴ - Show less +72 more•Institutions (34)

MIND Institute¹, Nathan Kline Institute for Psychiatric Research², Université de Montréal³, Max Planck Society⁴, Stanford University⁵, SRI International⁶, Greifswald University Hospital⁷, University of Massachusetts Medical School⁸, University of California, Berkeley⁹, Pasteur Institute¹⁰, University of California, San Diego¹¹, University of Washington¹², Montreal Neurological Institute and Hospital¹³, Douglas Mental Health University Institute¹⁴, Otto-von-Guericke University Magdeburg¹⁵, University of Hawaii at Manoa¹⁶, Oregon Health & Science University¹⁷, University of Greifswald¹⁸, Durham University¹⁹, University of Colorado Boulder²⁰, University College London²¹, National Institutes of Health²², City University of New York²³, Monash University²⁴, University of Lyon²⁵, Dartmouth College²⁶, University of Southern California²⁷, Roy J. and Lucille A. Carver College of Medicine²⁸, Children's Hospital Los Angeles²⁹, Emory University³⁰, Pontifícia Universidade Católica do Rio Grande do Sul³¹, State University of Campinas³², University of Wisconsin-Madison³³, Université de Sherbrooke³⁴

01 Nov 2016-GigaScience

TL;DR: The 2015 Brainhack Proceedings focused onributed collaboration, big data meta-analyses for clinical neuroimaging through ENIGMA wrapper scripts, and self-organization and brain function.

...read moreread less

Abstract: I1 Introduction to the 2015 Brainhack Proceedings R. Cameron Craddock, Pierre Bellec, Daniel S. Margules, B. Nolan Nichols, Jorg P. Pfannmoller A1 Distributed collaboration: the case for the enhancement of Brainspell’s interface AmanPreet Badhwar, David Kennedy, Jean-Baptiste Poline, Roberto Toro A2 Advancing open science through NiData Ben Cipollini, Ariel Rokem A3 Integrating the Brain Imaging Data Structure (BIDS) standard into C-PAC Daniel Clark, Krzysztof J. Gorgolewski, R. Cameron Craddock A4 Optimized implementations of voxel-wise degree centrality and local functional connectivity density mapping in AFNI R. Cameron Craddock, Daniel J. Clark A5 LORIS: DICOM anonymizer Samir Das, Cecile Madjar, Ayan Sengupta, Zia Mohades A6 Automatic extraction of academic collaborations in neuroimaging Sebastien Dery A7 NiftyView: a zero-footprint web application for viewing DICOM and NIfTI files Weiran Deng A8 Human Connectome Project Minimal Preprocessing Pipelines to Nipype Eric Earl, Damion V. Demeter, Kate Mills, Glad Mihai, Luka Ruzic, Nick Ketz, Andrew Reineberg, Marianne C. Reddan, Anne-Lise Goddings, Javier Gonzalez-Castillo, Krzysztof J. Gorgolewski A9 Generating music with resting-state fMRI data Caroline Froehlich, Gil Dekel, Daniel S. Margulies, R. Cameron Craddock A10 Highly comparable time-series analysis in Nitime Ben D. Fulcher A11 Nipype interfaces in CBRAIN Tristan Glatard, Samir Das, Reza Adalat, Natacha Beck, Remi Bernard, Najmeh Khalili-Mahani, Pierre Rioux, Marc-Etienne Rousseau, Alan C. Evans A12 DueCredit: automated collection of citations for software, methods, and data Yaroslav O. Halchenko, Matteo Visconti di Oleggio Castello A13 Open source low-cost device to register dog’s heart rate and tail movement Raul Hernandez-Perez, Edgar A. Morales, Laura V. Cuaya A14 Calculating the Laterality Index Using FSL for Stroke Neuroimaging Data Kaori L. Ito, Sook-Lei Liew A15 Wrapping FreeSurfer 6 for use in high-performance computing environments Hans J. Johnson A16 Facilitating big data meta-analyses for clinical neuroimaging through ENIGMA wrapper scripts Erik Kan, Julia Anglin, Michael Borich, Neda Jahanshad, Paul Thompson, Sook-Lei Liew A17 A cortical surface-based geodesic distance package for Python Daniel S Margulies, Marcel Falkiewicz, Julia M Huntenburg A18 Sharing data in the cloud David O’Connor, Daniel J. Clark, Michael P. Milham, R. Cameron Craddock A19 Detecting task-based fMRI compliance using plan abandonment techniques Ramon Fraga Pereira, Anibal Solon Heinsfeld, Alexandre Rosa Franco, Augusto Buchweitz, Felipe Meneguzzi A20 Self-organization and brain function Jorg P. Pfannmoller, Rickson Mesquita, Luis C.T. Herrera, Daniela Dentico A21 The Neuroimaging Data Model (NIDM) API Vanessa Sochat, B Nolan Nichols A22 NeuroView: a customizable browser-base utility Anibal Solon Heinsfeld, Alexandre Rosa Franco, Augusto Buchweitz, Felipe Meneguzzi A23 DIPY: Brain tissue classification Julio E. Villalon-Reina, Eleftherios Garyfallidis

...read moreread less

Journal Article•DOI•

AGOUTI: improving genome assembly and annotation using transcriptome data.

[...]

Simo V. Zhang¹, Luting Zhuo¹, Matthew W. Hahn¹•Institutions (1)

Indiana University¹

19 Jul 2016-GigaScience

TL;DR: This work presents AGOUTI (Annotated Genome Optimization Using Transcriptome Information), a tool that uses RNA sequencing data to simultaneously combine contigs into scaffolds and fragmented gene models into single models and shows that it is highly accurate and achieves greater accuracy and contiguity when compared with other existing methods.

...read moreread less

Abstract: Genomes sequenced using short-read, next-generation sequencing technologies can have many errors and may be fragmented into thousands of small contigs. These incomplete and fragmented assemblies lead to errors in gene identification, such that single genes spread across multiple contigs are annotated as separate gene models. Such biases can confound inferences about the number and identity of genes within species, as well as gene gain and loss between species. We present AGOUTI (Annotated Genome Optimization Using Transcriptome Information), a tool that uses RNA sequencing data to simultaneously combine contigs into scaffolds and fragmented gene models into single models. We show that AGOUTI improves both the contiguity of genome assemblies and the accuracy of gene annotation, providing updated versions of each as output. Running AGOUTI on both simulated and real datasets, we show that it is highly accurate and that it achieves greater accuracy and contiguity when compared with other existing methods. AGOUTI is a powerful and effective scaffolder and, unlike most scaffolders, is expected to be more effective in larger genomes because of the commensurate increase in intron length. AGOUTI is able to scaffold thousands of contigs while simultaneously reducing the number of gene models by hundreds or thousands. The software is available free of charge under the MIT license.

...read moreread less

Journal Article•DOI•

RES-Scanner: a software package for genome-wide identification of RNA-editing sites

[...]

Zongji Wang¹, Jinmin Lian, Qiye Li², Pei Zhang, Yang Zhou, Xiaoyu Zhan³, Guojie Zhang² - Show less +3 more•Institutions (3)

South China University of Technology¹, University of Copenhagen², Jinan University³

18 Aug 2016-GigaScience

TL;DR: RES-Scanner, as a software package written in the Perl programming language, provides a comprehensive solution that addresses read mapping, homozygous genotype calling, de novo RNA-editing site identification and annotation for any species with matching RNA-seq and DNA-seq data.

...read moreread less

Abstract: High-throughput sequencing (HTS) provides a powerful solution for the genome-wide identification of RNA-editing sites. However, it remains a great challenge to distinguish RNA-editing sites from genetic variants and technical artifacts caused by sequencing or read-mapping errors. Here we present RES-Scanner, a flexible and efficient software package that detects and annotates RNA-editing sites using matching RNA-seq and DNA-seq data from the same individuals or samples. RES-Scanner allows the use of both raw HTS reads and pre-aligned reads in BAM format as inputs. When inputs are HTS reads, RES-Scanner can invoke the BWA mapper to align reads to the reference genome automatically. To rigorously identify potential false positives resulting from genetic variants, we have equipped RES-Scanner with sophisticated statistical models to infer the reliability of homozygous genotypes called from DNA-seq data. These models are applicable to samples from either single individuals or a pool of multiple individuals if the ploidy information is known. In addition, RES-Scanner implements statistical tests to distinguish genuine RNA-editing sites from sequencing errors, and provides a series of sophisticated filtering options to remove false positives resulting from mapping errors. Finally, RES-Scanner can improve the completeness and accuracy of editing site identification when the data of multiple samples are available. RES-Scanner, as a software package written in the Perl programming language, provides a comprehensive solution that addresses read mapping, homozygous genotype calling, de novo RNA-editing site identification and annotation for any species with matching RNA-seq and DNA-seq data. The package is freely available.

...read moreread less

Journal Article•DOI•

High-throughput identification of novel conotoxins from the Chinese tubular cone snail (Conus betulinus) by multi-transcriptome sequencing

[...]

Chao Peng, Ge Yao, Bingmiao Gao¹, Chong-Xu Fan, Chao Bian, Jintu Wang, Ying Cao, Bo Wen, Yabing Zhu, Zhiqiang Ruan, Xiaofei Zhao, Xinxin You, Jie Bai, Jia Li, Zhilong Lin, Shijie Zou, Xinhui Zhang, Ying Qiu, Jieming Chen, Steven L. Coon², Jiaan Yang, Ji-Sheng Chen, Qiong Shi - Show less +19 more•Institutions (2)

Hainan Medical University¹, National Institutes of Health²

14 Apr 2016-GigaScience

TL;DR: Variation in conopeptides from different specimens of C. betulinus was observed, which suggested the presence of intraspecific variability in toxin production at the genetic level, and provide a potentially fertile resource for the development of new pharmaceuticals, and a pathway for the discovery of new conotoxins.

...read moreread less

Abstract: The venom of predatory marine cone snails mainly contains a diverse array of unique bioactive peptides commonly referred to as conopeptides or conotoxins. These peptides have proven to be valuable pharmacological probes and potential drugs because of their high specificity and affinity to important ion channels, receptors and transporters of the nervous system. Most previous studies have focused specifically on the conopeptides from piscivorous and molluscivorous cone snails, but little attention has been devoted to the dominant vermivorous species. The vermivorous Chinese tubular cone snail, Conus betulinus, is the dominant Conus species inhabiting the South China Sea. The transcriptomes of venom ducts and venom bulbs from a variety of specimens of this species were sequenced using both next-generation sequencing and traditional Sanger sequencing technologies, resulting in the identification of a total of 215 distinct conopeptides. Among these, 183 were novel conopeptides, including nine new superfamilies. It appeared that most of the identified conopeptides were synthesized in the venom duct, while a handful of conopeptides were identified only in the venom bulb and at very low levels. We identified 215 unique putative conopeptide transcripts from the combination of five transcriptomes and one EST sequencing dataset. Variation in conopeptides from different specimens of C. betulinus was observed, which suggested the presence of intraspecific variability in toxin production at the genetic level. These novel conopeptides provide a potentially fertile resource for the development of new pharmaceuticals, and a pathway for the discovery of new conotoxins.

...read moreread less

Journal Article•DOI•

Fish-T1K (Transcriptomes of 1,000 Fishes) Project: large-scale transcriptome data for fish evolution studies

[...]

Ying Sun¹, Yu Huang, Xiaofeng Li, Carole C. Baldwin², Zhuocheng Zhou, Zhixiang Yan, Keith A. Crandall³, Yong Zhang, Xiaomeng Zhao, Min Wang, Alex Wong, Chao Fang, Xinhui Zhang, Hai Huang, Jose V. Lopez⁴, Kirk Kilfoyle⁴, Guillermo Ortí³, Byrappa Venkatesh⁵, Qiong Shi⁶ - Show less +15 more•Institutions (6)

Sun Yat-sen University¹, National Museum of Natural History², George Washington University³, Nova Southeastern University⁴, Agency for Science, Technology and Research⁵, Shenzhen University⁶

03 May 2016-GigaScience

TL;DR: An international project known as the “Transcriptomes of 1,000 Fishes” (Fish-T1K) project has been established to generate RNA-seq transcriptome sequences for 1, thousand diverse species of ray-finned fishes.

...read moreread less

Abstract: Ray-finned fishes (Actinopterygii) represent more than 50 % of extant vertebrates and are of great evolutionary, ecologic and economic significance, but they are relatively underrepresented in ‘omics studies. Increased availability of transcriptome data for these species will allow researchers to better understand changes in gene expression, and to carry out functional analyses. An international project known as the “Transcriptomes of 1,000 Fishes” (Fish-T1K) project has been established to generate RNA-seq transcriptome sequences for 1,000 diverse species of ray-finned fishes. The first phase of this project has produced transcriptomes from more than 180 ray-finned fishes, representing 142 species and covering 51 orders and 109 families. Here we provide an overview of the goals of this project and the work done so far.

...read moreread less

Journal Article•DOI•

The preprocessed connectomes project repository of manually corrected skull-stripped T1-weighted anatomical MRI data

[...]

Benjamin Puccio¹, James P Pooley², John Pellman², Elise C. Taverna¹, R. Cameron Craddock², R. Cameron Craddock¹ - Show less +2 more•Institutions (2)

Nathan Kline Institute for Psychiatric Research¹, MIND Institute²

25 Oct 2016-GigaScience

TL;DR: The utility of skull-stripped anatomical images from the Neurofeedback sample is illustrated as a reference for comparing various automatic methods and the performance of the newly created library on independent data is evaluated.

...read moreread less

Abstract: Skull-stripping is the procedure of removing non-brain tissue from anatomical MRI data. This procedure can be useful for calculating brain volume and for improving the quality of other image processing steps. Developing new skull-stripping algorithms and evaluating their performance requires gold standard data from a variety of different scanners and acquisition methods. We complement existing repositories with manually corrected brain masks for 125 T1-weighted anatomical scans from the Nathan Kline Institute Enhanced Rockland Sample Neurofeedback Study. Skull-stripped images were obtained using a semi-automated procedure that involved skull-stripping the data using the brain extraction based on nonlocal segmentation technique (BEaST) software, and manually correcting the worst results. Corrected brain masks were added into the BEaST library and the procedure was repeated until acceptable brain masks were available for all images. In total, 85 of the skull-stripped images were hand-edited and 40 were deemed to not need editing. The results are brain masks for the 125 images along with a BEaST library for automatically skull-stripping other data. Skull-stripped anatomical images from the Neurofeedback sample are available for download from the Preprocessed Connectomes Project. The resulting brain masks can be used by researchers to improve preprocessing of the Neurofeedback data, as training and testing data for developing new skull-stripping algorithms, and for evaluating the impact on other aspects of MRI preprocessing. We have illustrated the utility of these data as a reference for comparing various automatic methods and evaluated the performance of the newly created library on independent data.

...read moreread less

Journal Article•DOI•

Brainhack: a collaborative workshop for the open neuroscience community.

[...]

R. Cameron Craddock¹, Daniel S. Margulies², Pierre Bellec³, B. Nolan Nichols⁴, B. Nolan Nichols⁵, Sarael Alcauter⁶, Fernando A. Barrios⁶, Yves Burnod⁷, Christopher J. Cannistraci⁸, Julien Cohen-Adad⁹, Benjamin De Leener⁹, Sebastien Dery¹⁰, Jonathan Downar¹¹, Jonathan Downar¹², Katharine Dunlop¹², Katharine Dunlop¹¹, Alexandre Rosa Franco, Caroline Froehlich, Andrew J. Gerber¹³, Andrew J. Gerber¹⁴, Satrajit S. Ghosh¹⁵, Satrajit S. Ghosh¹⁶, Thomas J. Grabowski¹⁷, Sean Hill¹⁸, Anibal Sólon Heinsfeld¹⁹, R. Matthew Hutchison¹⁵, Prantik Kundu⁸, Angela R. Laird²⁰, Sook-Lei Liew²¹, Daniel J. Lurie²², Donald G. McLaren¹⁵, Felipe Meneguzzi¹⁹, Maarten Mennes²³, Salma Mesmoudi⁷, David H. O’Connor¹, Erick H. Pasaye⁶, Scott Peltier²⁴, Jean-Baptiste Poline²², Jean-Baptiste Poline²⁵, Gautam Prasad²¹, Ramon Fraga Pereira¹⁹, Pierre-Olivier Quirion, Ariel Rokem¹⁷, Ziad S. Saad²⁶, Yonggang Shi²¹, Stephen C. Strother²⁷, Stephen C. Strother¹², Roberto Toro²⁸, Roberto Toro²⁹, Lucina Q. Uddin³⁰, John D. Van Horn²¹, John W. Van Meter³¹, Robert C. Welsh²⁴, Ting Xu¹ - Show less +50 more•Institutions (31)

MIND Institute¹, Max Planck Society², Université de Montréal³, Oklahoma State University Center for Health Sciences⁴, Stanford University⁵, National Autonomous University of Mexico⁶, University of Paris⁷, Icahn School of Medicine at Mount Sinai⁸, École Polytechnique de Montréal⁹, Montreal Neurological Institute and Hospital¹⁰, University Health Network¹¹, University of Toronto¹², Columbia University¹³, University of York¹⁴, Harvard University¹⁵, McGovern Institute for Brain Research¹⁶, University of Washington¹⁷, Karolinska Institutet¹⁸, Pontifícia Universidade Católica do Rio Grande do Sul¹⁹, Florida International University²⁰, University of Southern California²¹, University of California, Berkeley²², Radboud University Nijmegen²³, University of Michigan²⁴, Helen Wills Neuroscience Institute²⁵, National Institutes of Health²⁶, Baycrest Hospital²⁷, Pasteur Institute²⁸, Centre national de la recherche scientifique²⁹, University of Miami³⁰, Georgetown University Medical Center³¹

31 Mar 2016-GigaScience

TL;DR: Brainhack as mentioned in this paper is an open neuroscience community that offers a novel workshop format with participant-generated content that caters to the rapidly growing open neuroscience research community, including components from hackathons and unconferences, as well as parallel educational sessions.

...read moreread less

Abstract: Brainhack events offer a novel workshop format with participant-generated content that caters to the rapidly growing open neuroscience community. Including components from hackathons and unconferences, as well as parallel educational sessions, Brainhack fosters novel collaborations around the interests of its attendees. Here we provide an overview of its structure, past events, and example projects. Additionally, we outline current innovations such as regional events and post-conference publications. Through introducing Brainhack to the wider neuroscience community, we hope to provide a unique conference format that promotes the features of collaborative, open science.

...read moreread less

Journal Article•DOI•

Genomes and virulence difference between two physiological races of Phytophthora nicotianae

[...]

Hui Liu¹, Hui Liu², Xiao Ma³, Haiqin Yu, Fang Dunhuang, Yongping Li, Xiao Wang¹, Xiao Wang², Wen-Wen Wang², Yang Dong⁴, Yang Dong³, Bingguang Xiao - Show less +8 more•Institutions (4)

Chinese Academy of Sciences¹, Kunming Institute of Zoology², Yunnan Agricultural University³, Kunming University of Science and Technology⁴

28 Jan 2016-GigaScience

TL;DR: The genomes of P. nicotianae races 0 and 1 are assembled and annotated to provide not only high quality reference genomes of the disease, but also insights into the infection mechanisms of the soil-borne pathogen and its co-evolution with the host plant.

...read moreread less

Abstract: Black shank is a severe plant disease caused by the soil-borne pathogen Phytophthora nicotianae. Two physiological races of P. nicotianae, races 0 and 1, are predominantly observed in cultivated tobacco fields around the world. Race 0 has been reported to be more aggressive, having a shorter incubation period, and causing worse root rot symptoms, while race 1 causes more severe necrosis. The molecular mechanisms underlying the difference in virulence between race 0 and 1 remain elusive. We assembled and annotated the genomes of P. nicotianae races 0 and 1, which were obtained by a combination of PacBio single-molecular real-time sequencing and second-generation sequencing (both HiSeq and MiSeq platforms). Gene family analysis revealed a highly expanded ATP-binding cassette transporter gene family in P. nicotianae. Specifically, more RxLR effector genes were found in the genome of race 0 than in that of race 1. In addition, RxLR effector genes were found to be mainly distributed in gene-sparse, repeat-rich regions of the P. nicotianae genome. These results provide not only high quality reference genomes of P. nicotianae, but also insights into the infection mechanisms of P. nicotianae and its co-evolution with the host plant. They also reveal insights into the difference in virulence between the two physiological races.

...read moreread less

Journal Article•DOI•

High-quality genome assembly of channel catfish, Ictalurus punctatus

[...]

Xiaohui Chen, Liqiang Zhong, Chao Bian, Pao Xu¹, Ying Qiu, Xinxin You, Zhang Shiyong, Yu Huang, Jia Li, Minghua Wang, Qin Qin, Xiaohua Zhu, Chao Peng, Alex Wong, Zhu Zhifei, Min Wang, Ruobo Gu¹, Junmin Xu, Qiong Shi, Wenji Bian - Show less +16 more•Institutions (1)

Chinese Academy of Fishery Sciences¹

22 Aug 2016-GigaScience

TL;DR: A high-quality genome assembly for a channel catfish from a breeding stock inbred in China for more than three generations, which was originally imported to China from North America is reported, which is comparable to a recent report of the “Coco”Channel catfish.

...read moreread less

Abstract: The channel catfish (Ictalurus punctatus), a species native to North America, is one of the most important commercial freshwater fish in the world, especially in the United States’ aquaculture industry. Since its introduction into China in 1984, both cultivation area and yield of this species have been dramatically increased such that China is now the leading producer of channel catfish. To aid genomic research in this species, data sets such as genetic linkage groups, long-insert libraries, physical maps, bacterial artificial clones (BAC) end sequences (BES), transcriptome assemblies, and reference genome sequences have been generated. Here, using diverse assembly methods, we provide a comparable high-quality genome assembly for a channel catfish from a breeding stock inbred in China for more than three generations, which was originally imported to China from North America. Approximately 201.6 gigabases (Gb) of genome reads were sequenced by the Illumina HiSeq 2000 platform. Subsequently, we generated high quality, cost-effective and easily assembled sequences of the channel catfish genome with a scaffold N50 of 7.2 Mb and 95.6 % completeness. We also predicted that the channel catfish genome contains 21,556 protein-coding genes and 275.3 Mb (megabase pairs) of repetitive sequences. We report a high-quality genome assembly of the channel catfish, which is comparable to a recent report of the “Coco” channel catfish. These generated genome data could be used as an initial platform for molecular breeding to obtain novel catfish varieties using genomic approaches.

...read moreread less

Journal Article•DOI•

Low coverage sequencing of three echinoderm genomes: the brittle star Ophionereis fasciata, the sea star Patiriella regularis, and the sea cucumber Australostichopus mollis

[...]

Kyle A. Long¹, Carlos W. Nossa², Mary A. Sewell³, Nicholas H. Putnam², Joseph F. Ryan¹ - Show less +1 more•Institutions (3)

Whitney Laboratory for Marine Bioscience¹, Rice University², University of Auckland³

10 May 2016-GigaScience

TL;DR: The only echinoderm species with a genome sequence available to date is Strongylocentrotus pupuratus (Echinoidea) as discussed by the authors, which is known for their pentaradial symmetry as adults, unique water vascular system, mutable collagenous tissues, and endoskeletons of high magnesium calcite.

...read moreread less

Abstract: There are five major extant groups of Echinodermata: Crinoidea (feather stars and sea lillies), Ophiuroidea (brittle stars and basket stars), Asteroidea (sea stars), Echinoidea (sea urchins, sea biscuits, and sand dollars), and Holothuroidea (sea cucumbers) These animals are known for their pentaradial symmetry as adults, unique water vascular system, mutable collagenous tissues, and endoskeletons of high magnesium calcite To our knowledge, the only echinoderm species with a genome sequence available to date is Strongylocentrotus pupuratus (Echinoidea) The availability of additional echinoderm genome sequences is crucial for understanding the biology of these animals Here we present assembled draft genomes of the brittle star Ophionereis fasciata, the sea star Patiriella regularis, and the sea cucumber Australostichopus mollis from Illumina sequence data with coverages of 125x, 225x, and 214x, respectively These data provide a resource for mining gene superfamilies, identifying non-coding RNAs, confirming gene losses, and designing experimental constructs They will be important comparative resources for future genomic studies in echinoderms

...read moreread less

Journal Article•DOI•

Circadian rhythms have significant effects on leaf-to-canopy scale gas exchange under field conditions

[...]

Víctor Resco de Dios, Arthur Gessler¹, Juan Pedro Ferrio², Josu G. Alday³, Michael Bahn⁴, Jorge del Castillo, Sébastien Devidal⁵, Sonia García-Muñoz, Zachary Kayler⁶, Damien Landais⁵, Paula Martín-Gómez, Alexandru Milcu⁵, Clément Piel⁵, Karin Pirhofer-Walzl⁷, Olivier Ravel⁵, Serajis Salekin⁸, David T. Tissue⁹, Mark G. Tjoelker⁹, Jordi Voltas, Jacques Roy⁵ - Show less +16 more•Institutions (9)

Swiss Federal Institute for Forest, Snow and Landscape Research¹, University of Concepción², University of Liverpool³, University of Innsbruck⁴, Centre national de la recherche scientifique⁵, Lawrence Livermore National Laboratory⁶, Free University of Berlin⁷, University of Canterbury⁸, University of Sydney⁹

20 Oct 2016-GigaScience

TL;DR: Results show that circadian controls affect diurnal CO2 and H2O flux patterns in entire canopies in field-like conditions, and its consideration significantly improves model performance.

...read moreread less

Abstract: Molecular clocks drive oscillations in leaf photosynthesis, stomatal conductance, and other cell and leaf-level processes over ~24 h under controlled laboratory conditions. The influence of such circadian regulation over whole-canopy fluxes remains uncertain; diurnal CO2 and H2O vapor flux dynamics in the field are currently interpreted as resulting almost exclusively from direct physiological responses to variations in light, temperature and other environmental factors. We tested whether circadian regulation would affect plant and canopy gas exchange at the Montpellier European Ecotron. Canopy and leaf-level fluxes were constantly monitored under field-like environmental conditions, and under constant environmental conditions (no variation in temperature, radiation, or other environmental cues). We show direct experimental evidence at canopy scales of the circadian regulation of daytime gas exchange: 20–79 % of the daily variation range in CO2 and H2O fluxes occurred under circadian entrainment in canopies of an annual herb (bean) and of a perennial shrub (cotton). We also observed that considering circadian regulation improved performance by 8–17 % in commonly used stomatal conductance models. Our results show that circadian controls affect diurnal CO2 and H2O flux patterns in entire canopies in field-like conditions, and its consideration significantly improves model performance. Circadian controls act as a ‘memory’ of the past conditions experienced by the plant, which synchronizes metabolism across entire plant canopies.

...read moreread less

Journal Article•DOI•

Analyzing climate variations at multiple timescales can guide Zika virus response measures

[...]

Ángel G. Muñoz¹, Ángel G. Muñoz², Ángel G. Muñoz³, Madeleine C. Thomson⁴, Madeleine C. Thomson², Lisa Goddard², Sylvain Aldighieri⁵ - Show less +3 more•Institutions (5)

University of Zulia¹, Columbia University², Geophysical Fluid Dynamics Laboratory³, World Health Organization⁴, Pan American Health Organization⁵

06 Oct 2016-GigaScience

TL;DR: It is demonstrated that the extreme climate anomalies observed in most parts of South America during the current epidemic are not caused exclusively by El Niño or climate change, but by a combination of climate signals acting at multiple timescales.

...read moreread less

Abstract: The emergence of Zika virus (ZIKV) in Latin America and the Caribbean in 2014–2016 occurred during a period of severe drought and unusually high temperatures, conditions that have been associated with the 2015–2016 El Nino event, and/or climate change; however, no quantitative assessment has been made to date. Analysis of related flaviviruses transmitted by the same vectors suggests that ZIKV dynamics are sensitive to climate seasonality and longer-term variability and trends. A better understanding of the climate conditions conducive to the 2014–2016 epidemic may permit the development of climate-informed short and long-term strategies for ZIKV prevention and control. Using a novel timescale-decomposition methodology, we demonstrate that the extreme climate anomalies observed in most parts of South America during the current epidemic are not caused exclusively by El Nino or climate change, but by a combination of climate signals acting at multiple timescales. In Brazil, the dry conditions present in 2013–2015 are primarily explained by year-to-year variability superimposed on decadal variability, but with little contribution of long-term trends. In contrast, the warm temperatures of 2014–2015 resulted from the compound effect of climate change, decadal and year-to-year climate variability. ZIKV response strategies made in Brazil during the drought concurrent with the 2015-2016 El Nino event, may require revision in light of the likely return of rainfall associated with the borderline La Nina event expected in 2016–2017. Temperatures are likely to remain warm given the importance of long term and decadal scale climate signals.

...read moreread less

Journal Article•DOI•

Genomic resources and draft assemblies of the human and porcine varieties of scabies mites, Sarcoptes scabiei var. hominis and var. suis

[...]

Ehtesham Mofiz¹, Ehtesham Mofiz², Deborah C. Holt³, Torsten Seemann⁴, Bart J. Currie³, Katja Fischer⁵, Anthony T. Papenfuss - Show less +3 more•Institutions (5)

Walter and Eliza Hall Institute of Medical Research¹, University of Melbourne², Charles Darwin University³, Victorian Life Sciences Computation Initiative⁴, QIMR Berghofer Medical Research Institute⁵

02 Jun 2016-GigaScience

TL;DR: Extensive genomic resources for the scabies mite are developed, including reference genomes and a preliminary annotation of this reference comprising 13,226 putative coding sequences based on sequence similarity to known proteins.

...read moreread less

Abstract: The scabies mite, Sarcoptes scabiei, is a parasitic arachnid and cause of the infectious skin disease scabies in humans and mange in other animal species. Scabies infections are a major health problem, particularly in remote Indigenous communities in Australia, where secondary group A streptococcal and Staphylococcus aureus infections of scabies sores are thought to drive the high rate of rheumatic heart disease and chronic kidney disease. We sequenced the genome of two samples of Sarcoptes scabiei var. hominis obtained from unrelated patients with crusted scabies located in different parts of northern Australia using the Illumina HiSeq. We also sequenced samples of Sarcoptes scabiei var. suis from a pig model. Because of the small size of the scabies mite, these data are derived from pools of thousands of mites and are metagenomic, including host and microbiome DNA. We performed cleaning and de novo assembly and present Sarcoptes scabiei var. hominis and var. suis draft reference genomes. We have constructed a preliminary annotation of this reference comprising 13,226 putative coding sequences based on sequence similarity to known proteins. We have developed extensive genomic resources for the scabies mite, including reference genomes and a preliminary annotation.

...read moreread less