scispace - formally typeset
Search or ask a question
Browse all papers

Posted ContentDOI
31 May 2017-bioRxiv
TL;DR: SCENIC (Single Cell rEgulatory Network Inference and Clustering) is the first method to analyze scRNA-seq data using a network-centric, rather than cell-centric approach and allows for the simultaneous tracing of genomic regulatory programs and the mapping of cellular identities emerging from these programs.
Abstract: Single-cell RNA-seq allows building cell atlases of any given tissue and infer the dynamics of cellular state transitions during developmental or disease trajectories. Both the maintenance and transitions of cell states are encoded by regulatory programs in the genome sequence. However, this regulatory code has not yet been exploited to guide the identification of cellular states from single-cell RNA-seq data. Here we describe a computational resource, called SCENIC (Single Cell rEgulatory Network Inference and Clustering), for the simultaneous reconstruction of gene regulatory networks (GRNs) and the identification of stable cell states, using single-cell RNA-seq data. SCENIC outperforms existing approaches at the level of cell clustering and transcription factor identification. Importantly, we show that cell state identification based on GRNs is robust towards batch-effects and technical-biases. We applied SCENIC to a compendium of single-cell data from the mouse and human brain and demonstrate that the proper combinations of transcription factors, target genes, enhancers, and cell types can be identified. Moreover, we used SCENIC to map the cell state landscape in melanoma and identified a gene regulatory network underlying a proliferative melanoma state driven by MITF and STAT and a contrasting network controlling an invasive state governed by NFATC2 and NFIB. We further validated these predictions by showing that two transcription factors are predominantly expressed in early metastatic sentinel lymph nodes. In summary, SCENIC is the first method to analyze scRNA-seq data using a network-centric, rather than cell-centric approach. SCENIC is generic, easy to use, and flexible, and allows for the simultaneous tracing of genomic regulatory programs and the mapping of cellular identities emerging from these programs. Availability: SCENIC is available as an R workflow based on three new R/Bioconductor packages: GENIE3, RcisTarget and AUCell. As scalable alternative to GENIE3, we also provide GRNboost, paving the way towards the network analysis across millions of single cells.

1,101 citations



Journal ArticleDOI
TL;DR: This work examines an equiatomic medium-entropy alloy containing only three elements, CrCoNi, as a single-phase face-centred cubic solid solution, which displays strength-toughness properties that exceed those of all high-ENTropy alloys and most multi-phase alloys.
Abstract: High-entropy alloys are an intriguing new class of metallic materials that derive their properties from being multi-element systems that can crystallize as a single phase, despite containing high concentrations of five or more elements with different crystal structures. Here we examine an equiatomic medium-entropy alloy containing only three elements, CrCoNi, as a single-phase face-centred cubic solid solution, which displays strength-toughness properties that exceed those of all high-entropy alloys and most multi-phase alloys. At room temperature, the alloy shows tensile strengths of almost 1 GPa, failure strains of ∼70% and KJIc fracture-toughness values above 200 MPa m(1/2); at cryogenic temperatures strength, ductility and toughness of the CrCoNi alloy improve to strength levels above 1.3 GPa, failure strains up to 90% and KJIc values of 275 MPa m(1/2). Such properties appear to result from continuous steady strain hardening, which acts to suppress plastic instability, resulting from pronounced dislocation activity and deformation-induced nano-twinning.

1,101 citations


Journal ArticleDOI
Ganna Chornokur, Hui-Yi Lin, Jonathan Tyrer1, Kate Lawrenson2  +155 moreInstitutions (51)
19 Jun 2015-PLOS ONE
TL;DR: Associations between inherited cellular transport gene variants and risk of EOC histologic subtypes are revealed on a large cohort of women.
Abstract: BACKGROUND: Defective cellular transport processes can lead to aberrant accumulation of trace elements, iron, small molecules and hormones in the cell, which in turn may promote the formation of reactive oxygen species, promoting DNA damage and aberrant expression of key regulatory cancer genes. As DNA damage and uncontrolled proliferation are hallmarks of cancer, including epithelial ovarian cancer (EOC), we hypothesized that inherited variation in the cellular transport genes contributes to EOC risk. METHODS: In total, DNA samples were obtained from 14,525 case subjects with invasive EOC and from 23,447 controls from 43 sites in the Ovarian Cancer Association Consortium (OCAC). Two hundred seventy nine SNPs, representing 131 genes, were genotyped using an Illumina Infinium iSelect BeadChip as part of the Collaborative Oncological Gene-environment Study (COGS). SNP analyses were conducted using unconditional logistic regression under a log-additive model, and the FDR q<0.2 was applied to adjust for multiple comparisons. RESULTS: The most significant evidence of an association for all invasive cancers combined and for the serous subtype was observed for SNP rs17216603 in the iron transporter gene HEPH (invasive: OR = 0.85, P = 0.00026; serous: OR = 0.81, P = 0.00020); this SNP was also associated with the borderline/low malignant potential (LMP) tumors (P = 0.021). Other genes significantly associated with EOC histological subtypes (p<0.05) included the UGT1A (endometrioid), SLC25A45 (mucinous), SLC39A11 (low malignant potential), and SERPINA7 (clear cell carcinoma). In addition, 1785 SNPs in six genes (HEPH, MGST1, SERPINA, SLC25A45, SLC39A11 and UGT1A) were imputed from the 1000 Genomes Project and examined for association with INV EOC in white-European subjects. The most significant imputed SNP was rs117729793 in SLC39A11 (per allele, OR = 2.55, 95% CI = 1.5-4.35, p = 5.66x10-4). CONCLUSION: These results, generated on a large cohort of women, revealed associations between inherited cellular transport gene variants and risk of EOC histologic subtypes.

1,100 citations


Posted Content
TL;DR: In this article, the authors make use of complex valued embeddings to handle a large variety of binary relations, among them symmetric and antisymmetric relations, and their approach is scalable to large datasets as it remains linear in both space and time.
Abstract: In statistical relational learning, the link prediction problem is key to automatically understand the structure of large knowledge bases. As in previous studies, we propose to solve this problem through latent factorization. However, here we make use of complex valued embeddings. The composition of complex embeddings can handle a large variety of binary relations, among them symmetric and antisymmetric relations. Compared to state-of-the-art models such as Neural Tensor Network and Holographic Embeddings, our approach based on complex embeddings is arguably simpler, as it only uses the Hermitian dot product, the complex counterpart of the standard dot product between real vectors. Our approach is scalable to large datasets as it remains linear in both space and time, while consistently outperforming alternative approaches on standard link prediction benchmarks.

1,100 citations


Journal ArticleDOI
TL;DR: The size of a planet is an observable property directly connected to the physics of its formation and evolution as discussed by the authors, and the size of close-in (P < 100 days) small planets can be divided into two size regimes: R_p < 1.5 R⊕ or smaller with varying amounts of low-density gas that determine their total sizes.
Abstract: The size of a planet is an observable property directly connected to the physics of its formation and evolution. We used precise radius measurements from the California-Kepler Survey to study the size distribution of 2025 Kepler planets in fine detail. We detect a factor of ≥2 deficit in the occurrence rate distribution at 1.5–2.0 R⊕. This gap splits the population of close-in (P < 100 days) small planets into two size regimes: R_p < 1.5 R⊕ and R_p = 2.0-3.0 R⊕, with few planets in between. Planets in these two regimes have nearly the same intrinsic frequency based on occurrence measurements that account for planet detection efficiencies. The paucity of planets between 1.5 and 2.0 R⊕ supports the emerging picture that close-in planets smaller than Neptune are composed of rocky cores measuring 1.5 R⊕ or smaller with varying amounts of low-density gas that determine their total sizes.

1,100 citations


Posted Content
TL;DR: In this paper, a multi-layer recurrent neural network model was proposed to detect answer spans in Wikipedia paragraphs, which combines a search component based on bigram hashing and TF-IDF matching.
Abstract: This paper proposes to tackle open- domain question answering using Wikipedia as the unique knowledge source: the answer to any factoid question is a text span in a Wikipedia article. This task of machine reading at scale combines the challenges of document retrieval (finding the relevant articles) with that of machine comprehension of text (identifying the answer spans from those articles). Our approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA datasets indicate that (1) both modules are highly competitive with respect to existing counterparts and (2) multitask learning using distant supervision on their combination is an effective complete system on this challenging task.

1,100 citations


Journal ArticleDOI
TL;DR: The addition of daratumumab to lenalidomide and dexamethasone significantly lengthened progression-free survival among patients with relapsed or refractory multiple myeloma.
Abstract: BackgroundDaratumumab showed promising efficacy alone and with lenalidomide and dexamethasone in a phase 1–2 study involving patients with relapsed or refractory multiple myeloma. MethodsIn this phase 3 trial, we randomly assigned 569 patients with multiple myeloma who had received one or more previous lines of therapy to receive lenalidomide and dexamethasone either alone (control group) or in combination with daratumumab (daratumumab group). The primary end point was progression-free survival. ResultsAt a median follow-up of 13.5 months in a protocol-specified interim analysis, 169 events of disease progression or death were observed (in 53 of 286 patients [18.5%] in the daratumumab group vs. 116 of 283 [41.0%] in the control group; hazard ratio, 0.37; 95% confidence interval [CI], 0.27 to 0.52; P<0.001 by stratified log-rank test). The Kaplan–Meier rate of progression-free survival at 12 months was 83.2% (95% CI, 78.3 to 87.2) in the daratumumab group, as compared with 60.1% (95% CI, 54.0 to 65.7) in t...

1,100 citations


Journal ArticleDOI
TL;DR: Modelling of potential scattering sources and quantum lifetime analysis indicate that a combination of short-range and long-range interfacial scattering limits the low-temperature mobility of MoS2.
Abstract: High charge-carrier mobility that enables the observation of quantum oscillation is reported in mono- and few-layer MoS2 encapsulated and contacted by other two-dimensional materials.

1,100 citations


Posted Content
Rafal Jozefowicz1, Oriol Vinyals1, Mike Schuster1, Noam Shazeer1, Yonghui Wu1 
TL;DR: This work explores recent advances in Recurrent Neural Networks for large scale Language Modeling, and extends current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language.
Abstract: In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly improves state-of-the-art perplexity from 51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), while an ensemble of models sets a new record by improving perplexity from 41.0 down to 23.7. We also release these models for the NLP and ML community to study and improve upon.

1,100 citations


Journal ArticleDOI
TL;DR: In this article, the authors provide an overview of the progress in probing dynamical equilibration and thermalization of closed quantum many-body systems driven out of equilibrium by quenches, ramps and periodic driving.
Abstract: How do closed quantum many-body systems driven out of equilibrium eventually achieve equilibration? And how do these systems thermalize, given that they comprise so many degrees of freedom? Progress in answering these—and related—questions has accelerated in recent years—a trend that can be partially attributed to success with experiments performing quantum simulations using ultracold atoms and trapped ions. Here we provide an overview of this progress, specifically in studies probing dynamical equilibration and thermalization of systems driven out of equilibrium by quenches, ramps and periodic driving. In doing so, we also address topics such as the eigenstate thermalization hypothesis, typicality, transport, many-body localization and universality near phase transitions, as well as future prospects for quantum simulation. Statistical mechanics is adept at describing the equilibria of quantum many-body systems. But drive these systems out of equilibrium, and the physics is far from clear. Recent advances have broken new ground in probing these equilibration processes.

Journal ArticleDOI
TL;DR: The HER2 testing algorithm for breast cancer is updated and requires concomitant IHC review for dual-probe ISH groups 2 to 4 to arrive at the most accurate HER2 status designation (positive or negative) based on combined interpretation of the ISH and IHC assays.
Abstract: Purpose To update key recommendations of the American Society of Clinical Oncology/College of American Pathologists human epidermal growth factor receptor 2 (HER2) testing in breast cancer guideline. Methods Based on the signals approach, an Expert Panel reviewed published literature and research survey results on the observed frequency of less common in situ hybridization (ISH) patterns to update the recommendations. Recommendations Two recommendations addressed via correspondence in 2015 are included. First, immunohistochemistry (IHC) 2+ is defined as invasive breast cancer with weak to moderate complete membrane staining observed in > 10% of tumor cells. Second, if the initial HER2 test result in a core needle biopsy specimen of a primary breast cancer is negative, a new HER2 test may (not "must") be ordered on the excision specimen based on specific clinical criteria. The HER2 testing algorithm for breast cancer is updated to address the recommended work-up for less common clinical scenarios (approximately 5% of cases) observed when using a dual-probe ISH assay. These scenarios are described as ISH group 2 ( HER2/chromosome enumeration probe 17 [CEP17] ratio ≥ 2.0; average HER2 copy number < 4.0 signals per cell), ISH group 3 ( HER2/CEP17 ratio < 2.0; average HER2 copy number ≥ 6.0 signals per cell), and ISH group 4 ( HER2/CEP17 ratio < 2.0; average HER2 copy number ≥ 4.0 and < 6.0 signals per cell). The diagnostic approach includes more rigorous interpretation criteria for ISH and requires concomitant IHC review for dual-probe ISH groups 2 to 4 to arrive at the most accurate HER2 status designation (positive or negative) based on combined interpretation of the ISH and IHC assays. The Expert Panel recommends that laboratories using single-probe ISH assays include concomitant IHC review as part of the interpretation of all single-probe ISH assay results. Find additional information at www.asco.org/breast-cancer-guidelines .

Journal ArticleDOI
14 Jan 2015-mAbs
TL;DR: Since the commercialization of the first therapeutic monoclonal antibody product in 1986, this class of biopharmaceutical products has grown significantly so that, as of November 10, 2014, forty-seven monoclotal antibody products have been approved in the US or Europe for the treatment of a variety of diseases.
Abstract: Since the commercialization of the first therapeutic monoclonal antibody product in 1986, this class of biopharmaceutical products has grown significantly so that, as of November 10, 2014, forty-seven monoclonal antibody products have been approved in the US or Europe for the treatment of a variety of diseases, and many of these products have also been approved for other global markets. At the current approval rate of ∼ four new products per year, ∼ 70 monoclonal antibody products will be on the market by 2020, and combined world-wide sales will be nearly $125 billion.

Journal ArticleDOI
TL;DR: It is reported for the first time that a SARS-CoV-specific human monoclonal antibody,CR3022, could bind potently with 2019-nCoV RBD (KD of 6.3 nM), suggesting that CR3022 may have the potential to be developed as candidate therapeutics, alone or in combination with other neutralizing antibodies, for the prevention and treatment of 2019- nCoV infections.
Abstract: The newly identified 2019 novel coronavirus (2019-nCoV) has caused more than 11,900 laboratory-confirmed human infections, including 259 deaths, posing a serious threat to human health. Currently, however, there is no specific antiviral treatment or vaccine. Considering the relatively high identity of receptor-binding domain (RBD) in 2019-nCoV and SARS-CoV, it is urgent to assess the cross-reactivity of anti-SARS CoV antibodies with 2019-nCoV spike protein, which could have important implications for rapid development of vaccines and therapeutic antibodies against 2019-nCoV. Here, we report for the first time that a SARS-CoV-specific human monoclonal antibody, CR3022, could bind potently with 2019-nCoV RBD (KD of 6.3 nM). The epitope of CR3022 does not overlap with the ACE2 binding site within 2019-nCoV RBD. These results suggest that CR3022 may have the potential to be developed as candidate therapeutics, alone or in combination with other neutralizing antibodies, for the prevention and treatment of 2019-nCoV infections. Interestingly, some of the most potent SARS-CoV-specific neutralizing antibodies (e.g. m396, CR3014) that target the ACE2 binding site of SARS-CoV failed to bind 2019-nCoV spike protein, implying that the difference in the RBD of SARS-CoV and 2019-nCoV has a critical impact for the cross-reactivity of neutralizing antibodies, and that it is still necessary to develop novel monoclonal antibodies that could bind specifically to 2019-nCoV RBD.

Journal ArticleDOI
TL;DR: Improved care at birth is essential to prevent 1.3 million intrapartum stillbirths, end preventable maternal and neonatal deaths, and improve child development, and provide a way to target interventions to reach more than 7000 women every day worldwide who experience the reality of stillbirth.

Journal ArticleDOI
TL;DR: The improvement in overall survival establishes the combination of dabrafenib and trametinib as the standard targeted treatment for BRAF Val600 mutation-positive melanoma.

Journal ArticleDOI
TL;DR: The 2016 revision of the ARIA guidelines provides both updated and new recommendations about the pharmacologic treatment of AR, addressing the relative merits of using oral H1‐antihistamines, intranasal H1-antihistsamines, IntranasAL corticosteroids, and leukotriene receptor antagonists either alone or in combination.
Abstract: Background Allergic rhinitis (AR) affects 10% to 40% of the population. It reduces quality of life and school and work performance and is a frequent reason for office visits in general practice. Medical costs are large, but avoidable costs associated with lost work productivity are even larger than those incurred by asthma. New evidence has accumulated since the last revision of the Allergic Rhinitis and its Impact on Asthma (ARIA) guidelines in 2010, prompting its update. Objective We sought to provide a targeted update of the ARIA guidelines. Methods The ARIA guideline panel identified new clinical questions and selected questions requiring an update. We performed systematic reviews of health effects and the evidence about patients' values and preferences and resource requirements (up to June 2016). We followed the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) evidence-to-decision frameworks to develop recommendations. Results The 2016 revision of the ARIA guidelines provides both updated and new recommendations about the pharmacologic treatment of AR. Specifically, it addresses the relative merits of using oral H1-antihistamines, intranasal H1-antihistamines, intranasal corticosteroids, and leukotriene receptor antagonists either alone or in combination. The ARIA guideline panel provides specific recommendations for the choice of treatment and the rationale for the choice and discusses specific considerations that clinicians and patients might want to review to choose the management most appropriate for an individual patient. Conclusions Appropriate treatment of AR might improve patients' quality of life and school and work productivity. ARIA recommendations support patients, their caregivers, and health care providers in choosing the optimal treatment.

Journal ArticleDOI
TL;DR: A SNN for digit recognition which is based on mechanisms with increased biological plausibility, i.e., conductance-based instead of current-based synapses, spike-timing-dependent plasticity with time-dependent weight change, lateral inhibition, and an adaptive spiking threshold is presented.
Abstract: In order to understand how the mammalian neocortex is performing computations, two things are necessary; we need to have a good understanding of the available neuronal processing units and mechanisms, and we need to gain a better understanding of how those mechanisms are combined to build functioning systems. Therefore, in recent years there is an increasing interest in how spiking neural networks (SNN) can be used to perform complex computations or solve pattern recognition tasks. However, it remains a challenging task to design SNNs which use biologically plausible mechanisms (especially for learning new patterns), since most of such SNN architectures rely on training in a rate-based network and subsequent conversion to a SNN. We present a SNN for digit recognition which is based on mechanisms with increased biological plausibility, i.e. conductance-based instead of current-based synapses, spike-timing-dependent plasticity with time-dependent weight change, lateral inhibition, and an adaptive spiking threshold. Unlike most other systems, we do not use a teaching signal and do not present any class labels to the network. Using this unsupervised learning scheme, our architecture achieves 95% accuracy on the MNIST benchmark, which is better than previous SNN implementations without supervision. The fact that we used no domain-specific knowledge points toward the general applicability of our network design. Also, the performance of our network scales well with the number of neurons used and shows similar performance for four different learning rules, indicating robustness of the full combination of mechanisms, which suggests applicability in heterogeneous biological neural networks.

Journal ArticleDOI
TL;DR: An empirical drought reconstruction and three soil moisture metrics from 17 state-of-the-art general circulation models are used to show that these models project significantly drier conditions in the later half of the 21st century compared to the 20th century and earlier paleoclimatic intervals.
Abstract: In the Southwest and Central Plains of Western North America, climate change is expected to increase drought severity in the coming decades. These regions nevertheless experienced extended Medieval-era droughts that were more persistent than any historical event, providing crucial targets in the paleoclimate record for benchmarking the severity of future drought risks. We use an empirical drought reconstruction and three soil moisture metrics from 17 state-of-the-art general circulation models to show that these models project significantly drier conditions in the later half of the 21st century compared to the 20th century and earlier paleoclimatic intervals. This desiccation is consistent across most of the models and moisture balance variables, indicating a coherent and robust drying response to warming despite the diversity of models and metrics analyzed. Notably, future drought risk will likely exceed even the driest centuries of the Medieval Climate Anomaly (1100–1300 CE) in both moderate (RCP 4.5) and high (RCP 8.5) future emissions scenarios, leading to unprecedented drought conditions during the last millennium.

Journal ArticleDOI
TL;DR: In this article, the authors provide an updated recommendation for the usage of sets of parton distribution functions (PDFs) and the assessment of PDF and PDF+$\alpha_s$ uncertainties suitable for applications at the LHC Run II.
Abstract: We provide an updated recommendation for the usage of sets of parton distribution functions (PDFs) and the assessment of PDF and PDF+$\alpha_s$ uncertainties suitable for applications at the LHC Run II. We review developments since the previous PDF4LHC recommendation, and discuss and compare the new generation of PDFs, which include substantial information from experimental data from the Run I of the LHC. We then propose a new prescription for the combination of a suitable subset of the available PDF sets, which is presented in terms of a single combined PDF set. We finally discuss tools which allow for the delivery of this combined set in terms of optimized sets of Hessian eigenvectors or Monte Carlo replicas, and their usage, and provide some examples of their application to LHC phenomenology.

Journal ArticleDOI
TL;DR: GMT 6 defaults to classic mode and thus is a recommended upgrade for all GMT 5 users, and new users should take advantage of modern mode to make shorter scripts, quickly access commonly used global data sets, and take full advantage of the new tools to draw subplots, place insets, and create animations.

Journal ArticleDOI
TL;DR: A keyword analysis identifies the most popular subjects covered by bibliometric analysis, and multidisciplinary articles are shown to have the highest impact.
Abstract: Bibliometric methods or "analysis" are now firmly established as scientific specialties and are an integral part of research evaluation methodology especially within the scientific and applied fields. The methods are used increasingly when studying various aspects of science and also in the way institutions and universities are ranked worldwide. A sufficient number of studies have been completed, and with the resulting literature, it is now possible to analyse the bibliometric method by using its own methodology. The bibliometric literature in this study, which was extracted from Web of Science, is divided into two parts using a method comparable to the method of Jonkers et al. (Characteristics of bibliometrics articles in library and information sciences (LIS) and other journals, pp. 449---551, 2012: The publications either lie within the Information and Library Science (ILS) category or within the non-ILS category which includes more applied, "subject" based studies. The impact in the different groupings is judged by means of citation analysis using normalized data and an almost linear increase can be observed from 1994 onwards in the non-ILS category. The implication for the dissemination and use of the bibliometric methods in the different contexts is discussed. A keyword analysis identifies the most popular subjects covered by bibliometric analysis, and multidisciplinary articles are shown to have the highest impact. A noticeable shift is observed in those countries which contribute to the pool of bibliometric analysis, as well as a self-perpetuating effect in giving and taking references.

Journal ArticleDOI
TL;DR: It is shown that the lacquer used to cover warriors and certain parts of weapons is rich in chromium, and it is demonstrated that chromium on the metals is contamination from nearby lacquer after burial, and the chromium anti-rust treatment theory should be abandoned.
Abstract: For forty years, there has been a widely held belief that over 2,000 years ago the Chinese Qin developed an advanced chromate conversion coating technology (CCC) to prevent metal corrosion. This belief was based on the detection of chromium traces on the surface of bronze weapons buried with the Chinese Terracotta Army, and the same weapons’ very good preservation. We analysed weapons, lacquer and soils from the site, and conducted experimental replications of CCC and accelerated ageing. Our results show that surface chromium presence is correlated with artefact typology and uncorrelated with bronze preservation. Furthermore we show that the lacquer used to cover warriors and certain parts of weapons is rich in chromium, and we demonstrate that chromium on the metals is contamination from nearby lacquer after burial. The chromium anti-rust treatment theory should therefore be abandoned. The good metal preservation probably results from the moderately alkaline pH and very small particle size of the burial soil, in addition to bronze composition.

Posted Content
TL;DR: The OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains, ranging from social and information networks to biological networks, molecular graphs, source code ASTs, and knowledge graphs, indicating fruitful opportunities for future research.
Abstract: We present the Open Graph Benchmark (OGB), a diverse set of challenging and realistic benchmark datasets to facilitate scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains, ranging from social and information networks to biological networks, molecular graphs, source code ASTs, and knowledge graphs. For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics. In addition to building the datasets, we also perform extensive benchmark experiments for each dataset. Our experiments suggest that OGB datasets present significant challenges of scalability to large-scale graphs and out-of-distribution generalization under realistic data splits, indicating fruitful opportunities for future research. Finally, OGB provides an automated end-to-end graph ML pipeline that simplifies and standardizes the process of graph data loading, experimental setup, and model evaluation. OGB will be regularly updated and welcomes inputs from the community. OGB datasets as well as data loaders, evaluation scripts, baseline code, and leaderboards are publicly available at this https URL .

Proceedings ArticleDOI
01 Oct 2017
TL;DR: Recently, Bulat et al. as mentioned in this paper proposed LS3D-W, which is the largest and most challenging 3D facial landmark dataset to date, and trained a neural network for 3D face alignment.
Abstract: This paper investigates how far a very deep neural network is from attaining close to saturating performance on existing 2D and 3D face alignment datasets. To this end, we make the following 5 contributions: (a) we construct, for the first time, a very strong baseline by combining a state-of-the-art architecture for landmark localization with a state-of-the-art residual block, train it on a very large yet synthetically expanded 2D facial landmark dataset and finally evaluate it on all other 2D facial landmark datasets. (b)We create a guided by 2D landmarks network which converts 2D landmark annotations to 3D and unifies all existing datasets, leading to the creation of LS3D-W, the largest and most challenging 3D facial landmark dataset to date (~230,000 images). (c) Following that, we train a neural network for 3D face alignment and evaluate it on the newly introduced LS3D-W. (d) We further look into the effect of all “traditional” factors affecting face alignment performance like large pose, initialization and resolution, and introduce a “new” one, namely the size of the network. (e) We show that both 2D and 3D face alignment networks achieve performance of remarkable accuracy which is probably close to saturating the datasets used. Training and testing code as well as the dataset can be downloaded from https://www.adrianbulat.com/face-alignment/

Journal ArticleDOI
TL;DR: The improved MitoCarta 2.0 inventory provides a molecular framework for system-level analysis of mammalian mitochondria and helps to understand mitochondrial pathways in health and disease.
Abstract: Mitochondria are complex organelles that house essential pathways involved in energy metabolism, ion homeostasis, signalling and apoptosis. To understand mitochondrial pathways in health and disease, it is crucial to have an accurate inventory of the organelle's protein components. In 2008, we made substantial progress toward this goal by performing in-depth mass spectrometry of mitochondria from 14 organs, epitope tagging/microscopy and Bayesian integration to assemble MitoCarta (www.broadinstitute.org/pubs/MitoCarta): an inventory of genes encoding mitochondrial-localized proteins and their expression across 14 mouse tissues. Using the same strategy we have now reconstructed this inventory separately for human and for mouse based on (i) improved gene transcript models, (ii) updated literature curation, including results from proteomic analyses of mitochondrial sub-compartments, (iii) improved homology mapping and (iv) updated versions of all seven original data sets. The updated human MitoCarta2.0 consists of 1158 human genes, including 918 genes in the original inventory as well as 240 additional genes. The updated mouse MitoCarta2.0 consists of 1158 genes, including 967 genes in the original inventory plus 191 additional genes. The improved MitoCarta 2.0 inventory provides a molecular framework for system-level analysis of mammalian mitochondria.

Journal ArticleDOI
TL;DR: A first systematic analysis of microbiota changes in the ileum and colon using multiple diets and investigating both fecal and mucosal samples demonstrates correlations between the microbiota and dysfunctions of gut, adipose tissue, and liver, independent of a specific disease-inducing diet.
Abstract: Development of non-alcoholic fatty liver disease (NAFLD) is linked to obesity, adipose tissue inflammation, and gut dysfunction, all of which depend on diet. So far, studies have mainly focused on diet-related fecal microbiota changes, but other compartments may be more informative on host health. We present a first systematic analysis of microbiota changes in the ileum and colon using multiple diets and investigating both fecal and mucosal samples. Ldlr−/−.Leiden mice received one of three different energy-dense (ED)-diets (n = 15/group) for 15 weeks. All of the ED diets induced obesity and metabolic risk factors, altered short-chain fatty acids (SCFA), and increased gut permeability and NAFLD to various extents. ED diets reduced the diversity of high-abundant bacteria and increased the diversity of low-abundant bacteria in all of the gut compartments. The ED groups showed highly variable, partially overlapping microbiota compositions that differed significantly from chow. Correlation analyses demonstrated that (1) specific groups of bacteria correlate with metabolic risk factors, organ dysfunction, and NAFLD endpoints, (2) colon mucosa had greater predictive value than other compartments, (3) correlating bacteria differed per compartment, and (4) some bacteria correlated with plasma SCFA levels. In conclusion, this comprehensive microbiota analysis demonstrates correlations between the microbiota and dysfunctions of gut, adipose tissue, and liver, independent of a specific disease-inducing diet.

Journal ArticleDOI
TL;DR: The data indicate that the FTO allele associated with obesity represses mitochondrial thermogenesis in adipocyte precursor cells in a tissue-autonomous manner, and points to a pathway for adipocyte thermogenesis regulation involving ARID5B, rs1421085, IRX3, and IRX5, which, when manipulated, had pronounced pro-obesity and anti-ob obesity effects.
Abstract: BackgroundGenomewide association studies can be used to identify disease-relevant genomic regions, but interpretation of the data is challenging The FTO region harbors the strongest genetic association with obesity, yet the mechanistic basis of this association remains elusive MethodsWe examined epigenomic data, allelic activity, motif conservation, regulator expression, and gene coexpression patterns, with the aim of dissecting the regulatory circuitry and mechanistic basis of the association between the FTO region and obesity We validated our predictions with the use of directed perturbations in samples from patients and from mice and with endogenous CRISPR–Cas9 genome editing in samples from patients ResultsOur data indicate that the FTO allele associated with obesity represses mitochondrial thermogenesis in adipocyte precursor cells in a tissue-autonomous manner The rs1421085 T-to-C single-nucleotide variant disrupts a conserved motif for the ARID5B repressor, which leads to derepression of a pot

Journal ArticleDOI
TL;DR: In previously untreated patients with confirmed AML who were ineligible for intensive chemotherapy, overall survival was longer and the incidence of remission was higher among patients who received azacitidine plus venetoclax than among those who received zsitidine alone.
Abstract: Background Older patients with acute myeloid leukemia (AML) have a dismal prognosis, even after treatment with a hypomethylating agent. Azacitidine added to venetoclax had promising effica...

Journal ArticleDOI
TL;DR: The JASPAR CORE collection was expanded with 494 new TF binding profiles, and 130 transcription factor flexible models trained on ChIP-seq data for vertebrates, which capture dinucleotide dependencies within TF binding sites were introduced.
Abstract: JASPAR (http://jaspar.genereg.net) is an open-access database storing curated, non-redundant transcription factor (TF) binding profiles representing transcription factor binding preferences as position frequency matrices for multiple species in six taxonomic groups. For this 2016 release, we expanded the JASPAR CORE collection with 494 new TF binding profiles (315 in vertebrates, 11 in nematodes, 3 in insects, 1 in fungi and 164 in plants) and updated 59 profiles (58 in vertebrates and 1 in fungi). The introduced profiles represent an 83% expansion and 10% update when compared to the previous release. We updated the structural annotation of the TF DNA binding domains (DBDs) following a published hierarchical structural classification. In addition, we introduced 130 transcription factor flexible models trained on ChIP-seq data for vertebrates, which capture dinucleotide dependencies within TF binding sites. This new JASPAR release is accompanied by a new web tool to infer JASPAR TF binding profiles recognized by a given TF protein sequence. Moreover, we provide the users with a Ruby module complementing the JASPAR API to ease programmatic access and use of the JASPAR collection of profiles. Finally, we provide the JASPAR2016 R/Bioconductor data package with the data of this release.