scispace - formally typeset
Search or ask a question
Browse all papers

Proceedings ArticleDOI
31 Oct 2019
TL;DR: This work proposes an efficient algorithm to embed a given image into the latent space of StyleGAN, which enables semantic image editing operations that can be applied to existing photographs.
Abstract: We propose an efficient algorithm to embed a given image into the latent space of StyleGAN. This embedding enables semantic image editing operations that can be applied to existing photographs. Taking the StyleGAN trained on the FFHD dataset as an example, we show results for image morphing, style transfer, and expression transfer. Studying the results of the embedding algorithm provides valuable insights into the structure of the StyleGAN latent space. We propose a set of experiments to test what class of images can be embedded, how they are embedded, what latent space is suitable for embedding, and if the embedding is semantically meaningful.

851 citations


Journal ArticleDOI
TL;DR: Front line medical staff with close contact with infected patients, including working in the departments of respiratory, emergency, infectious disease, and ICU, showed higher scores on fear scale, HAMA and HAMD, and they were 1.4 times more likely to feel fear, twice more likely to suffer anxiety and depression.
Abstract: The pandemic of 2019 coronavirus disease (COVID-19) has burdened an unprecedented psychological stress on people around the world, especially the medical workforce. The study focuses on assess the psychological status of them. The authors conducted a single-center, cross-sectional survey via online questionnaires. Occurrence of fear, anxiety and depression were measured by the numeric rating scale (NRS) on fear, Hamilton Anxiety Scale (HAMA), and Hamilton Depression Scale (HAMD), respectively. A total of 2299 eligible participants were enrolled from the authors' institution, including 2042 medical staff and 257 administrative staff. The severity of fear, anxiety and depression were significantly different between two groups. Furthermore, as compared to the non-clinical staff, front line medical staff with close contact with infected patients, including working in the departments of respiratory, emergency, infectious disease, and ICU, showed higher scores on fear scale, HAMA and HAMD, and they were 1.4 times more likely to feel fear, twice more likely to suffer anxiety and depression. The medical staff especially working in above-mentioned departments made them more susceptible to psychological disorders. Effective strategies toward to improving the mental health should be provided to these individuals.

851 citations


Journal ArticleDOI
TL;DR: An attempt has been made in this paper to review As contamination, its effect on human health and various conventional and advance technologies which are being used for the removal of As from soil and water.

851 citations


Journal ArticleDOI
TL;DR: The augmentation of fully convolutional networks with long short term memory recurrent neural network (LSTM RNN) sub-modules for time series classification with attention mechanism and refinement as a method to enhance the performance of trained models are proposed.
Abstract: Fully convolutional neural networks (FCNs) have been shown to achieve the state-of-the-art performance on the task of classifying time series sequences. We propose the augmentation of fully convolutional networks with long short term memory recurrent neural network (LSTM RNN) sub-modules for time series classification. Our proposed models significantly enhance the performance of fully convolutional networks with a nominal increase in model size and require minimal preprocessing of the data set. The proposed long short term memory fully convolutional network (LSTM-FCN) achieves the state-of-the-art performance compared with others. We also explore the usage of attention mechanism to improve time series classification with the attention long short term memory fully convolutional network (ALSTM-FCN). The attention mechanism allows one to visualize the decision process of the LSTM cell. Furthermore, we propose refinement as a method to enhance the performance of trained models. An overall analysis of the performance of our model is provided and compared with other techniques.

851 citations


Proceedings ArticleDOI
17 Aug 2015
TL;DR: A principled control-theoretic model is developed that can optimally combine throughput and buffer occupancy information to outperform traditional approaches in bitrate adaptation in client-side players and is presented as a novel model predictive control algorithm.
Abstract: User-perceived quality-of-experience (QoE) is critical in Internet video applications as it impacts revenues for content providers and delivery systems. Given that there is little support in the network for optimizing such measures, bottlenecks could occur anywhere in the delivery system. Consequently, a robust bitrate adaptation algorithm in client-side players is critical to ensure good user experience. Previous studies have shown key limitations of state-of-art commercial solutions and proposed a range of heuristic fixes. Despite the emergence of several proposals, there is still a distinct lack of consensus on: (1) How best to design this client-side bitrate adaptation logic (e.g., use rate estimates vs. buffer occupancy); (2) How well specific classes of approaches will perform under diverse operating regimes (e.g., high throughput variability); or (3) How do they actually balance different QoE objectives (e.g., startup delay vs. rebuffering). To this end, this paper makes three key technical contributions. First, to bring some rigor to this space, we develop a principled control-theoretic model to reason about a broad spectrum of strategies. Second, we propose a novel model predictive control algorithm that can optimally combine throughput and buffer occupancy information to outperform traditional approaches. Third, we present a practical implementation in a reference video player to validate our approach using realistic trace-driven emulations.

851 citations


Journal ArticleDOI
02 Dec 2016-Science
TL;DR: The data indicate that epigenetic fate inflexibility may limit current immunotherapies, and PD-1 pathway blockade resulted in transcriptional rewiring and reengagement of effector circuitry in the TEX epigenetic landscape.
Abstract: Blocking Programmed Death–1 (PD-1) can reinvigorate exhausted CD8 T cells (T EX ) and improve control of chronic infections and cancer However, whether blocking PD-1 can reprogram T EX into durable memory T cells (T MEM ) is unclear We found that reinvigoration of T EX in mice by PD-L1 blockade caused minimal memory development After blockade, reinvigorated T EX became reexhausted if antigen concentration remained high and failed to become T MEM upon antigen clearance T EX acquired an epigenetic profile distinct from that of effector T cells (T EFF ) and T MEM cells that was minimally remodeled after PD-L1 blockade This finding suggests that T EX are a distinct lineage of CD8 T cells Nevertheless, PD-1 pathway blockade resulted in transcriptional rewiring and reengagement of effector circuitry in the T EX epigenetic landscape These data indicate that epigenetic fate inflexibility may limit current immunotherapies

851 citations


Journal ArticleDOI
TL;DR: This review presents the recent research, trends and prospects in chitosan and some special pharmaceutical and biomedical applications are also highlighted.
Abstract: Chitosan is a natural polycationic linear polysaccharide derived from chitin. The low solubility of chitosan in neutral and alkaline solution limits its application. Nevertheless, chemical modification into composites or hydrogels brings to it new functional properties for different applications. Chitosans are recognized as versatile biomaterials because of their non-toxicity, low allergenicity, biocompatibility and biodegradability. This review presents the recent research, trends and prospects in chitosan. Some special pharmaceutical and biomedical applications are also highlighted.

851 citations


Journal ArticleDOI
TL;DR: The technical aspect of automated driving is surveyed, with an overview of available datasets and tools for ADS development and many state-of-the-art algorithms implemented and compared on their own platform in a real-world driving setting.
Abstract: Automated driving systems (ADSs) promise a safe, comfortable and efficient driving experience. However, fatalities involving vehicles equipped with ADSs are on the rise. The full potential of ADSs cannot be realized unless the robustness of state-of-the-art is improved further. This paper discusses unsolved problems and surveys the technical aspect of automated driving. Studies regarding present challenges, high-level system architectures, emerging methodologies and core functions including localization, mapping, perception, planning, and human machine interfaces, were thoroughly reviewed. Furthermore, many state-of-the-art algorithms were implemented and compared on our own platform in a real-world driving setting. The paper concludes with an overview of available datasets and tools for ADS development.

851 citations



Journal ArticleDOI
TL;DR: Decagon is presented, an approach for modeling polypharmacy side effects that develops a new graph convolutional neural network for multirelational link prediction in multimodal networks and can predict the exact side effect, if any, through which a given drug combination manifests clinically.
Abstract: Motivation The use of drug combinations, termed polypharmacy, is common to treat patients with complex diseases or co-existing conditions However, a major consequence of polypharmacy is a much higher risk of adverse side effects for the patient Polypharmacy side effects emerge because of drug-drug interactions, in which activity of one drug may change, favorably or unfavorably, if taken with another drug The knowledge of drug interactions is often limited because these complex relationships are rare, and are usually not observed in relatively small clinical testing Discovering polypharmacy side effects thus remains an important challenge with significant implications for patient mortality and morbidity Results Here, we present Decagon, an approach for modeling polypharmacy side effects The approach constructs a multimodal graph of protein-protein interactions, drug-protein target interactions and the polypharmacy side effects, which are represented as drug-drug interactions, where each side effect is an edge of a different type Decagon is developed specifically to handle such multimodal graphs with a large number of edge types Our approach develops a new graph convolutional neural network for multirelational link prediction in multimodal networks Unlike approaches limited to predicting simple drug-drug interaction values, Decagon can predict the exact side effect, if any, through which a given drug combination manifests clinically Decagon accurately predicts polypharmacy side effects, outperforming baselines by up to 69% We find that it automatically learns representations of side effects indicative of co-occurrence of polypharmacy in patients Furthermore, Decagon models particularly well polypharmacy side effects that have a strong molecular basis, while on predominantly non-molecular side effects, it achieves good performance because of effective sharing of model parameters across edge types Decagon opens up opportunities to use large pharmacogenomic and patient population data to flag and prioritize polypharmacy side effects for follow-up analysis via formal pharmacological studies Availability and implementation Source code and preprocessed datasets are at: http://snapstanfordedu/decagon

850 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a cosmological analysis based on full-mission Planck observations of temperature and polarization anisotropies of the cosmic microwave background (CMB) radiation.
Abstract: This paper presents cosmological results based on full-mission Planck observations of temperature and polarization anisotropies of the cosmic microwave background (CMB) radiation. Our results are in very good agreement with the 2013 analysis of the Planck nominal-mission temperature data, but with increased precision. The temperature and polarization power spectra are consistent with the standard spatially-flat 6-parameter ΛCDM cosmology with a power-law spectrum of adiabatic scalar perturbations (denoted “base ΛCDM” in this paper). From the Planck temperature data combined with Planck lensing, for this cosmology we find a Hubble constant, H0 = (67.8 ± 0.9) km s-1Mpc-1, a matter density parameter Ωm = 0.308 ± 0.012, and a tilted scalar spectral index with ns = 0.968 ± 0.006, consistent with the 2013 analysis. Note that in this abstract we quote 68% confidence limits on measured parameters and 95% upper limits on other parameters. We present the first results of polarization measurements with the Low Frequency Instrument at large angular scales. Combined with the Planck temperature and lensing data, these measurements give a reionization optical depth of τ = 0.066 ± 0.016, corresponding to a reionization redshift of . These results are consistent with those from WMAP polarization measurements cleaned for dust emission using 353-GHz polarization maps from the High Frequency Instrument. We find no evidence for any departure from base ΛCDM in the neutrino sector of the theory; for example, combining Planck observations with other astrophysical data we find Neff = 3.15 ± 0.23 for the effective number of relativistic degrees of freedom, consistent with the value Neff = 3.046 of the Standard Model of particle physics. The sum of neutrino masses is constrained to ∑ mν < 0.23 eV. The spatial curvature of our Universe is found to be very close to zero, with | ΩK | < 0.005. Adding a tensor component as a single-parameter extension to base ΛCDM we find an upper limit on the tensor-to-scalar ratio of r0.002< 0.11, consistent with the Planck 2013 results and consistent with the B-mode polarization constraints from a joint analysis of BICEP2, Keck Array, and Planck (BKP) data. Adding the BKP B-mode data to our analysis leads to a tighter constraint of r0.002 < 0.09 and disfavours inflationarymodels with a V(φ) ∝ φ2 potential. The addition of Planck polarization data leads to strong constraints on deviations from a purely adiabatic spectrum of fluctuations. We find no evidence for any contribution from isocurvature perturbations or from cosmic defects. Combining Planck data with other astrophysical data, including Type Ia supernovae, the equation of state of dark energy is constrained to w = −1.006 ± 0.045, consistent with the expected value for a cosmological constant. The standard big bang nucleosynthesis predictions for the helium and deuterium abundances for the best-fit Planck base ΛCDM cosmology are in excellent agreement with observations. We also constraints on annihilating dark matter and on possible deviations from the standard recombination history. In neither case do we find no evidence for new physics. The Planck results for base ΛCDM are in good agreement with baryon acoustic oscillation data and with the JLA sample of Type Ia supernovae. However, as in the 2013 analysis, the amplitude of the fluctuation spectrum is found to be higher than inferred from some analyses of rich cluster counts and weak gravitational lensing. We show that these tensions cannot easily be resolved with simple modifications of the base ΛCDM cosmology. Apart from these tensions, the base ΛCDM cosmology provides an excellent description of the Planck CMB observations and many other astrophysical data sets.

Journal ArticleDOI
TL;DR: An approach to designing tight-binding ligands with a substantial reduction in false positives relative to compounds synthesized on the basis of other computational or medicinal chemistry approaches is reported, demonstrating the robustness and broad range of applicability of this approach, which can be used to drive decisions in lead optimization.
Abstract: Designing tight-binding ligands is a primary objective of small-molecule drug discovery. Over the past few decades, free-energy calculations have benefited from improved force fields and sampling algorithms, as well as the advent of low-cost parallel computing. However, it has proven to be challenging to reliably achieve the level of accuracy that would be needed to guide lead optimization (∼5× in binding affinity) for a wide range of ligands and protein targets. Not surprisingly, widespread commercial application of free-energy simulations has been limited due to the lack of large-scale validation coupled with the technical challenges traditionally associated with running these types of calculations. Here, we report an approach that achieves an unprecedented level of accuracy across a broad range of target classes and ligands, with retrospective results encompassing 200 ligands and a wide variety of chemical perturbations, many of which involve significant changes in ligand chemical structures. In addition, we have applied the method in prospective drug discovery projects and found a significant improvement in the quality of the compounds synthesized that have been predicted to be potent. Compounds predicted to be potent by this approach have a substantial reduction in false positives relative to compounds synthesized on the basis of other computational or medicinal chemistry approaches. Furthermore, the results are consistent with those obtained from our retrospective studies, demonstrating the robustness and broad range of applicability of this approach, which can be used to drive decisions in lead optimization.

Journal ArticleDOI
Neal Cardwell1, Yuchung Cheng1, C. Stephen Gunn1, Soheil Hassas Yeganeh1, Van Jacobson1 
TL;DR: When bottleneck buffers are large, loss-based congestion control keeps them full, causing bufferbloat, leading to low throughput, which requires an alternative to loss- based congestion control.
Abstract: When bottleneck buffers are large, loss-based congestion control keeps them full, causing bufferbloat When bottleneck buffers are small, loss-based congestion control misinterprets loss as a signa

Proceedings ArticleDOI
18 Jun 2018
TL;DR: The AVA dataset densely annotates 80 atomic visual actions in 437 15-minute video clips, where actions are localized in space and time, resulting in 1.59M action labels with multiple labels per person occurring frequently.
Abstract: This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 437 15-minute video clips, where actions are localized in space and time, resulting in 1.59M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations. This departs from existing datasets for spatio-temporal action recognition, which typically provide sparse annotations for composite actions in short video clips. AVA, with its realistic scene and action complexity, exposes the intrinsic difficulty of action recognition. To benchmark this, we present a novel approach for action localization that builds upon the current state-of-the-art methods, and demonstrates better performance on JHMDB and UCF101-24 categories. While setting a new state of the art on existing datasets, the overall results on AVA are low at 15.8% mAP, underscoring the need for developing new approaches for video understanding.

Journal ArticleDOI
TL;DR: A quality function is introduced that assesses the agreement of a pseudopotentials calculation with all-electron FLAPW results, and the necessary plane-wave energy cutoff, and allows for a Nelder–Mead optimization algorithm on a training set of materials to optimize the input parameters of the pseudopotential construction.

Journal ArticleDOI
02 Jun 2017-PLOS ONE
TL;DR: The proposed MCC-classifier has a close performance to SVM-imba while being simpler and more efficient and an optimal Bayes classifier for the MCC metric using an approach based on Frechet derivative.
Abstract: Data imbalance is frequently encountered in biomedical applications Resampling techniques can be used in binary classification to tackle this issue However such solutions are not desired when the number of samples in the small class is limited Moreover the use of inadequate performance metrics, such as accuracy, lead to poor generalization results because the classifiers tend to predict the largest size class One of the good approaches to deal with this issue is to optimize performance metrics that are designed to handle data imbalance Matthews Correlation Coefficient (MCC) is widely used in Bioinformatics as a performance metric We are interested in developing a new classifier based on the MCC metric to handle imbalanced data We derive an optimal Bayes classifier for the MCC metric using an approach based on Frechet derivative We show that the proposed algorithm has the nice theoretical property of consistency Using simulated data, we verify the correctness of our optimality result by searching in the space of all possible binary classifiers The proposed classifier is evaluated on 64 datasets from a wide range data imbalance We compare both classification performance and CPU efficiency for three classifiers: 1) the proposed algorithm (MCC-classifier), the Bayes classifier with a default threshold (MCC-base) and imbalanced SVM (SVM-imba) The experimental evaluation shows that MCC-classifier has a close performance to SVM-imba while being simpler and more efficient

Journal ArticleDOI
TL;DR: Evidence regarding human exposure to microplastics via seafood via seafood is described and potential health effects are discussed and mitigation and adaptation strategies targeting the life cycle of microplastic are recommended.
Abstract: We describe evidence regarding human exposure to microplastics via seafood and discuss potential health effects. Shellfish and other animals consumed whole pose particular concern for human exposure. If there is toxicity, it is likely dependent on dose, polymer type, size, surface chemistry, and hydrophobicity. Human activity has led to microplastic contamination throughout the marine environment. As a result of widespread contamination, microplastics are ingested by many species of wildlife including fish and shellfish. Because microplastics are associated with chemicals from manufacturing and that sorb from the surrounding environment, there is concern regarding physical and chemical toxicity. Evidence regarding microplastic toxicity and epidemiology is emerging. We characterize current knowledge and highlight gaps. We also recommend mitigation and adaptation strategies targeting the life cycle of microplastics and recommend future research to assess impacts of microplastics on humans. Addressing these research gaps is a critical priority due to the nutritional importance of seafood consumption.

Journal ArticleDOI
01 Jan 2015-Database
TL;DR: One of the most comprehensive collections of human gene-disease associations and a valuable set of tools for investigating the molecular mechanisms underlying diseases of genetic origin, designed to fulfill the needs of different user profiles, are offered.
Abstract: DisGeNET is a comprehensive discovery platform designed to address a variety of questions concerning the genetic underpinning of human diseases. DisGeNET contains over 380,000 associations between >16,000 genes and 13,000 diseases, which makes it one of the largest repositories currently available of its kind. DisGeNET integrates expert-curated databases with text-mined data, covers information on Mendelian and complex diseases, and includes data from animal disease models. It features a score based on the supporting evidence to prioritize gene-disease associations. It is an open access resource available through a web interface, a Cytoscape plugin and as a Semantic Web resource. The web interface supports user-friendly data exploration and navigation. DisGeNET data can also be analysed via the DisGeNET Cytoscape plugin, and enriched with the annotations of other plugins of this popular network analysis software suite. Finally, the information contained in DisGeNET can be expanded and complemented using Semantic Web technologies and linked to a variety of resources already present in the Linked Data cloud. Hence, DisGeNET offers one of the most comprehensive collections of human gene-disease associations and a valuable set of tools for investigating the molecular mechanisms underlying diseases of genetic origin, designed to fulfill the needs of different user profiles, including bioinformaticians, biologists and health-care practitioners. Database URL: http://www.disgenet.org/

Journal ArticleDOI
TL;DR: Hypofractionated radiotherapy using 60 Gy in 20 fractions is non-inferior to conventional fractionation using 74 Gy in 37 fractions and is recommended as a new standard of care for external-beam radiotherapy of localised prostate cancer after 5 years follow-up.
Abstract: Summary Background Prostate cancer might have high radiation-fraction sensitivity that would give a therapeutic advantage to hypofractionated treatment. We present a pre-planned analysis of the efficacy and side-effects of a randomised trial comparing conventional and hypofractionated radiotherapy after 5 years follow-up. Methods CHHiP is a randomised, phase 3, non-inferiority trial that recruited men with localised prostate cancer (pT1b–T3aN0M0). Patients were randomly assigned (1:1:1) to conventional (74 Gy delivered in 37 fractions over 7·4 weeks) or one of two hypofractionated schedules (60 Gy in 20 fractions over 4 weeks or 57 Gy in 19 fractions over 3·8 weeks) all delivered with intensity-modulated techniques. Most patients were given radiotherapy with 3–6 months of neoadjuvant and concurrent androgen suppression. Randomisation was by computer-generated random permuted blocks, stratified by National Comprehensive Cancer Network (NCCN) risk group and radiotherapy treatment centre, and treatment allocation was not masked. The primary endpoint was time to biochemical or clinical failure; the critical hazard ratio (HR) for non-inferiority was 1·208. Analysis was by intention to treat. Long-term follow-up continues. The CHHiP trial is registered as an International Standard Randomised Controlled Trial, number ISRCTN97182923. Findings Between Oct 18, 2002, and June 17, 2011, 3216 men were enrolled from 71 centres and randomly assigned (74 Gy group, 1065 patients; 60 Gy group, 1074 patients; 57 Gy group, 1077 patients). Median follow-up was 62·4 months (IQR 53·9–77·0). The proportion of patients who were biochemical or clinical failure free at 5 years was 88·3% (95% CI 86·0–90·2) in the 74 Gy group, 90·6% (88·5–92·3) in the 60 Gy group, and 85·9% (83·4–88·0) in the 57 Gy group. 60 Gy was non-inferior to 74 Gy (HR 0·84 [90% CI 0·68–1·03], p NI =0·0018) but non-inferiority could not be claimed for 57 Gy compared with 74 Gy (HR 1·20 [0·99–1·46], p NI =0·48). Long-term side-effects were similar in the hypofractionated groups compared with the conventional group. There were no significant differences in either the proportion or cumulative incidence of side-effects 5 years after treatment using three clinician-reported as well as patient-reported outcome measures. The estimated cumulative 5 year incidence of Radiation Therapy Oncology Group (RTOG) grade 2 or worse bowel and bladder adverse events was 13·7% (111 events) and 9·1% (66 events) in the 74 Gy group, 11·9% (105 events) and 11·7% (88 events) in the 60 Gy group, 11·3% (95 events) and 6·6% (57 events) in the 57 Gy group, respectively. No treatment-related deaths were reported. Interpretation Hypofractionated radiotherapy using 60 Gy in 20 fractions is non-inferior to conventional fractionation using 74 Gy in 37 fractions and is recommended as a new standard of care for external-beam radiotherapy of localised prostate cancer. Funding Cancer Research UK, Department of Health, and the National Institute for Health Research Cancer Research Network.

Journal ArticleDOI
TL;DR: This work retrospectively analysed data from a Zika virus outbreak in French Polynesia to provide a quantitative estimate of the risk of microcephaly in fetuses and neonates whose mothers are infected with Zika virus.

Journal ArticleDOI
TL;DR: In this article, the impact of pyrolysis temperature and the type of biomass on the physicochemical characteristics of biochar and its impact on soil fertility is discussed, and a review succinctly presents the impact.
Abstract: Biochar is a pyrogenous, organic material synthesized through pyrolysis of different biomass (plant or animal waste). The potential biochar applications include: (1) pollution remediation due to high CEC and specific surface area; (2) soil fertility improvement on the way of liming effect, enrichment in volatile matter and increase of pore volume, (3) carbon sequestration due to carbon and ash content, etc. Biochar properties are affected by several technological parameters, mainly pyrolysis temperature and feedstock kind, which differentiation can lead to products with a wide range of values of pH, specific surface area, pore volume, CEC, volatile matter, ash and carbon content. High pyrolysis temperature promotes the production of biochar with a strongly developed specific surface area, high porosity, pH as well as content of ash and carbon, but with low values of CEC and content of volatile matter. This is most likely due to significant degree of organic matter decomposition. Biochars produced from animal litter and solid waste feedstocks exhibit lower surface areas, carbon content, volatile matter and high CEC compared to biochars produced from crop residue and wood biomass, even at higher pyrolysis temperatures. The reason for this difference is considerable variation in lignin and cellulose content as well as in moisture content of biomass. The physicochemical properties of biochar determine application of this biomaterial as an additive to improve soil quality. This review succinctly presents the impact of pyrolysis temperature and the type of biomass on the physicochemical characteristics of biochar and its impact on soil fertility.

Journal ArticleDOI
02 Jan 2019
TL;DR: A neural model for representing snippets of code as continuous distributed vectors as a single fixed-length code vector which can be used to predict semantic properties of the snippet, making it the first to successfully predict method names based on a large, cross-project corpus.
Abstract: We present a neural model for representing snippets of code as continuous distributed vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed-length code vector, which can be used to predict semantic properties of the snippet. To this end, code is first decomposed to a collection of paths in its abstract syntax tree. Then, the network learns the atomic representation of each path while simultaneously learning how to aggregate a set of them. We demonstrate the effectiveness of our approach by using it to predict a method's name from the vector representation of its body. We evaluate our approach by training a model on a dataset of 12M methods. We show that code vectors trained on this dataset can predict method names from files that were unobserved during training. Furthermore, we show that our model learns useful method name vectors that capture semantic similarities, combinations, and analogies. A comparison of our approach to previous techniques over the same dataset shows an improvement of more than 75%, making it the first to successfully predict method names based on a large, cross-project corpus. Our trained model, visualizations and vector similarities are available as an interactive online demo at http://code2vec.org. The code, data and trained models are available at https://github.com/tech-srl/code2vec.

Proceedings Article
02 Jul 2018
TL;DR: In this article, a new model-poisoning methodology based on model replacement is proposed to poison a global model in federated learning, which can reach 100% accuracy on the backdoor task.
Abstract: Federated learning enables thousands of participants to construct a deep learning model without sharing their private training data with each other For example, multiple smartphones can jointly train a next-word predictor for keyboards without revealing what individual users type We demonstrate that any participant in federated learning can introduce hidden backdoor functionality into the joint global model, eg, to ensure that an image classifier assigns an attacker-chosen label to images with certain features, or that a word predictor completes certain sentences with an attacker-chosen word We design and evaluate a new model-poisoning methodology based on model replacement An attacker selected in a single round of federated learning can cause the global model to immediately reach 100% accuracy on the backdoor task We evaluate the attack under different assumptions for the standard federated-learning tasks and show that it greatly outperforms data poisoning Our generic constrain-and-scale technique also evades anomaly detection-based defenses by incorporating the evasion into the attacker's loss function during training

Proceedings Article
Mark Chen1, Alec Radford1, Rewon Child1, Jeffrey Wu1, Heewoo Jun1, David Luan1, Ilya Sutskever1 
12 Jul 2020
TL;DR: This work trains a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure, and finds that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification.
Abstract: Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full finetuning, matching the top supervised pre-trained models. An even larger model trained on a mixture of ImageNet and web images is competitive with self-supervised benchmarks on ImageNet, achieving 72.0% top-1 accuracy on a linear probe of our features.

Proceedings Article
09 Dec 2017
TL;DR: Masked autoregressive flow as mentioned in this paper is a generalization of Inverse Autoregressive Flow that uses the random numbers that the model uses internally when generating data for density estimation.
Abstract: Autoregressive models are among the best performing neural density estimators. We describe an approach for increasing the flexibility of an autoregressive model, based on modelling the random numbers that the model uses internally when generating data. By constructing a stack of autoregressive models, each modelling the random numbers of the next model in the stack, we obtain a type of normalizing flow suitable for density estimation, which we call Masked Autoregressive Flow. This type of flow is closely related to Inverse Autoregressive Flow and is a generalization of Real NVP. Masked Autoregressive Flow achieves state-of-the-art performance in a range of general-purpose density estimation tasks.

Journal ArticleDOI
TL;DR: Metal losses affect the performance of every plasmonic or metamaterial structure; dealing with them will determine the degree to which these structures will find practical applications.
Abstract: Metal losses affect the performance of every plasmonic or metamaterial structure; dealing with them will determine the degree to which these structures will find practical applications.

Journal ArticleDOI
01 Jan 2016-Database
TL;DR: The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects and generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets.
Abstract: The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based ann ...

Journal ArticleDOI
TL;DR: A major efficiency limit for solution-processed perovskite optoelectronic devices, for example light-emitting diodes, is trap-mediated non-radiative losses as mentioned in this paper.
Abstract: A major efficiency limit for solution-processed perovskite optoelectronic devices, for example light-emitting diodes, is trap-mediated non-radiative losses. Defect passivation using organic molecul ...

Journal ArticleDOI
23 May 2017-BMJ
TL;DR: In this paper, the authors developed and validated updated QRISK3 prediction algorithms to estimate the 10-year risk of cardiovascular disease in women and men accounting for potential new risk factors, including chronic kidney disease (stage 3, 4 or 5), a measure of systolic blood pressure variability (standard deviation of repeated measures), migraine, corticosteroids, systemic lupus erythematosus (SLE), atypical antipsychotics, severe mental illness, and HIV/AIDs).
Abstract: Objectives: To develop and validate updated QRISK3 prediction algorithms to estimate the 10 year risk of cardiovascular disease in women and men accounting for potential new risk factors. Design: Prospective open cohort study. Setting: General practices in England providing data for the QResearch database. Participants: 1309 QResearch general practices in England: 981 practices were used to develop the scores and a separate set of 328 practices were used to validate the scores. 7.89 million patients aged 25-84 years were in the derivation cohort and 2.67 million patients in the validation cohort. Patients were free of cardiovascular disease and not prescribed statins at baseline. Methods: Cox proportional hazards models in the derivation cohort to derive separate risk equations in men and women for evaluation at 10 years. Risk factors considered included those already in QRISK2 (age, ethnicity, deprivation, systolic blood pressure, body mass index, total cholesterol: high density lipoprotein cholesterol ratio, smoking, family history of coronary heart disease in a first degree relative aged less than 60 years, type 1 diabetes, type 2 diabetes, treated hypertension, rheumatoid arthritis, atrial fibrillation, chronic kidney disease (stage 4 or 5)) and new risk factors (chronic kidney disease (stage 3, 4, or 5), a measure of systolic blood pressure variability (standard deviation of repeated measures), migraine, corticosteroids, systemic lupus erythematosus (SLE), atypical antipsychotics, severe mental illness, and HIV/AIDs). We also considered erectile dysfunction diagnosis or treatment in men. Measures of calibration and discrimination were determined in the validation cohort for men and women separately and for individual subgroups by age group, ethnicity, and baseline disease status. Main outcome measures: Incident cardiovascular disease recorded on any of the following three linked data sources: general practice, mortality, or hospital admission records. Results: 363 565 incident cases of cardiovascular disease were identified in the derivation cohort during follow-up arising from 50.8 million person years of observation. All new risk factors considered met the model inclusion criteria except for HIV/AIDS, which was not statistically significant. The models had good calibration and high levels of explained variation and discrimination. In women, the algorithm explained 59.6% of the variation in time to diagnosis of cardiovascular disease (R2, with higher values indicating more variation), and the D statistic was 2.48 and Harrell’s C statistic was 0.88 (both measures of discrimination, with higher values indicating better discrimination). The corresponding values for men were 54.8%, 2.26, and 0.86. Overall performance of the updated QRISK3 algorithms was similar to the QRISK2 algorithms. Conclusion: Updated QRISK3 risk prediction models were developed and validated. The inclusion of additional clinical variables in QRISK3 (chronic kidney disease, a measure of systolic blood pressure variability (standard deviation of repeated measures), migraine, corticosteroids, SLE, atypical antipsychotics, severe mental illness, and erectile dysfunction) can help enable doctors to identify those at most risk of heart disease and stroke.

Posted Content
TL;DR: The proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers.
Abstract: We present a generative image inpainting system to complete images with free-form mask and guidance. The system is based on gated convolutions learned from millions of images without additional labelling efforts. The proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers. Moreover, as free-form masks may appear anywhere in images with any shape, global and local GANs designed for a single rectangular mask are not applicable. Thus, we also present a patch-based GAN loss, named SN-PatchGAN, by applying spectral-normalized discriminator on dense image patches. SN-PatchGAN is simple in formulation, fast and stable in training. Results on automatic image inpainting and user-guided extension demonstrate that our system generates higher-quality and more flexible results than previous methods. Our system helps user quickly remove distracting objects, modify image layouts, clear watermarks and edit faces. Code, demo and models are available at: this https URL