scispace - formally typeset
Search or ask a question

Showing papers by "Edoardo M. Airoldi published in 2016"


Journal ArticleDOI
Daniel J. Klionsky1, Kotb Abdelmohsen2, Akihisa Abe3, Joynal Abedin4  +2519 moreInstitutions (695)
TL;DR: In this paper, the authors present a set of guidelines for the selection and interpretation of methods for use by investigators who aim to examine macro-autophagy and related processes, as well as for reviewers who need to provide realistic and reasonable critiques of papers that are focused on these processes.
Abstract: In 2008 we published the first set of guidelines for standardizing research in autophagy. Since then, research on this topic has continued to accelerate, and many new scientists have entered the field. Our knowledge base and relevant new technologies have also been expanding. Accordingly, it is important to update these guidelines for monitoring autophagy in different organisms. Various reviews have described the range of assays that have been used for this purpose. Nevertheless, there continues to be confusion regarding acceptable methods to measure autophagy, especially in multicellular eukaryotes. For example, a key point that needs to be emphasized is that there is a difference between measurements that monitor the numbers or volume of autophagic elements (e.g., autophagosomes or autolysosomes) at any stage of the autophagic process versus those that measure flux through the autophagy pathway (i.e., the complete process including the amount and rate of cargo sequestered and degraded). In particular, a block in macroautophagy that results in autophagosome accumulation must be differentiated from stimuli that increase autophagic activity, defined as increased autophagy induction coupled with increased delivery to, and degradation within, lysosomes (in most higher eukaryotes and some protists such as Dictyostelium) or the vacuole (in plants and fungi). In other words, it is especially important that investigators new to the field understand that the appearance of more autophagosomes does not necessarily equate with more autophagy. In fact, in many cases, autophagosomes accumulate because of a block in trafficking to lysosomes without a concomitant change in autophagosome biogenesis, whereas an increase in autolysosomes may reflect a reduction in degradative activity. It is worth emphasizing here that lysosomal digestion is a stage of autophagy and evaluating its competence is a crucial part of the evaluation of autophagic flux, or complete autophagy. Here, we present a set of guidelines for the selection and interpretation of methods for use by investigators who aim to examine macroautophagy and related processes, as well as for reviewers who need to provide realistic and reasonable critiques of papers that are focused on these processes. These guidelines are not meant to be a formulaic set of rules, because the appropriate assays depend in part on the question being asked and the system being used. In addition, we emphasize that no individual assay is guaranteed to be the most appropriate one in every situation, and we strongly recommend the use of multiple assays to monitor autophagy. Along these lines, because of the potential for pleiotropic effects due to blocking autophagy through genetic manipulation, it is imperative to target by gene knockout or RNA interference more than one autophagy-related protein. In addition, some individual Atg proteins, or groups of proteins, are involved in other cellular pathways implying that not all Atg proteins can be used as a specific marker for an autophagic process. In these guidelines, we consider these various methods of assessing autophagy and what information can, or cannot, be obtained from them. Finally, by discussing the merits and limits of particular assays, we hope to encourage technical innovation in the field.

5,187 citations


Journal ArticleDOI
TL;DR: A hierarchical mixed membership model for analyzing topical content of documents, in which mixing weights are parameterized by observed covariates is posit, enabling researchers to introduce elements of the experimental design that informed document collection into the model, within a generally applicable framework.
Abstract: Statistical models of text have become increasingly popular in statistics and computer science as a method of exploring large document collections. Social scientists often want to move beyond exploration, to measurement and experimentation, and make inference about social and political processes that drive discourse and content. In this article, we develop a model of text data that supports this type of substantive research. Our approach is to posit a hierarchical mixed membership model for analyzing topical content of documents, in which mixing weights are parameterized by observed covariates. In this model, topical prevalence and topical content are specified as a simple generalized linear model on an arbitrary number of document-level covariates, such as news source and time of release, enabling researchers to introduce elements of the experimental design that informed document collection into the model, within a generally applicable framework. We demonstrate the proposed methodology by analyzi...

429 citations


Journal ArticleDOI
TL;DR: It is shown that engineered nuclear export of Hsf1 results in cytotoxicity associated with massive protein aggregation and reveals that yeast chaperone gene expression is an essential housekeeping mechanism and provides a roadmap for defining the function of HSF1 as a driver of oncogenesis.

142 citations


Posted Content
TL;DR: An extended unconfoundedness assumption that accounts for interference is proposed, and new covariate-adjustment methods are developed that lead to valid estimates of treatment and interference effects in observational studies on networks.
Abstract: Causal inference on a population of units connected through a network often presents technical challenges, including how to account for interference. In the presence of local interference, for instance, potential outcomes of a unit depend on its treatment as well as on the treatments of other local units, such as its neighbors according to the network. In observational studies, a further complication is that the typical unconfoundedness assumption must be extended - say, to include the treatment of neighbors, and indi- vidual and neighborhood covariates - to guarantee identification and valid inference. Here, we propose new estimands that define treatment and interference effects. We then derive analytical expressions for the bias of a naive estimator that wrongly assumes away interference. The bias depends on the level of interference but also on the degree of association between individual and neighborhood treatments. We propose an extended unconfoundedness assumption that accounts for interference, and we develop new covariate-adjustment methods that lead to valid estimates of treatment and interference effects in observational studies on networks. Estimation is based on a generalized propensity score that balances individual and neighborhood covariates across units under different levels of individual treatment and of exposure to neighbors' treatment. We carry out simulations, calibrated using friendship networks and covariates in a nationally representative longitudinal study of adolescents in grades 7-12, in the United States, to explore finite-sample performance in different realistic settings.

102 citations


Journal ArticleDOI
TL;DR: It is shown that words that are both frequent and exclusive to a theme are more effective at characterizing topical content, and a regularization scheme is proposed that leads to better estimates of these quantities.
Abstract: An ongoing challenge in the analysis of document collections is how to summarize content in terms of a set of inferred themes that can be interpreted substantively in terms of topics. The current practice of parameterizing the themes in terms of most frequent words limits interpretability by ignoring the differential use of words across topics. Here, we show that words that are both frequent and exclusive to a theme are more effective at characterizing topical content, and we propose a regularization scheme that leads to better estimates of these quantities. We consider a supervised setting where professional editors have annotated documents to topic categories, organized into a tree, in which leaf-nodes correspond to more specific topics. Each document is annotated to multiple categories, at different levels of the tree. We introduce a hierarchical Poisson convolution model to analyze these annotated documents. A parallelized Hamiltonian Monte Carlo sampler allows the inference to scale to millio...

88 citations



Journal ArticleDOI
TL;DR: Results show that Facebook users, on average, increase use of wall posts and decrease use of private messages after the introduction of granular privacy controls, and that user-specific factors play crucial roles in shaping users’ varying reactions to the policy change.
Abstract: We examine the role of granular privacy controls on dynamic content-sharing activities and disclosure patterns of Facebook users based on the exogenous policy change in December 2009. Using a unique panel data set, we first conduct regression discontinuity analyses to verify a discontinuous jump in context generation activities and disclosure patterns around the time of the policy change. We next estimate unobserved effects models to assess the short-run and long-run effects of the change. Results show that Facebook users, on average, increase use of wall posts and decrease use of private messages after the introduction of granular privacy controls. Also, users’ disclosure patterns change to reflect the increased openness in content sharing. These effects are realized immediately and over time. More importantly, we show that user-specific factors play crucial roles in shaping users’ varying reactions to the policy change. While more privacy sensitive users (those who do not reveal their gender and/or thos...

63 citations


Journal ArticleDOI
Daniel J. Klionsky1, Kotb Abdelmohsen2, Akihisa Abe3, Joynal Abedin4  +2519 moreInstitutions (697)
TL;DR: Author(s): Klionsky, DJ; Abdelmohsen, K; Abe, A; Abedin, MJ; Abeliovich, H; A Frozena, AA; Adachi, H, Adeli, K, Adhihetty, PJ; Adler, SG; Agam, G; Agarwal, R; Aghi, MK; Agnello, M; Agostinis, P; Aguilar, PV; Aguirre-Ghis
Abstract: Author(s): Klionsky, DJ; Abdelmohsen, K; Abe, A; Abedin, MJ; Abeliovich, H; Arozena, AA; Adachi, H; Adams, CM; Adams, PD; Adeli, K; Adhihetty, PJ; Adler, SG; Agam, G; Agarwal, R; Aghi, MK; Agnello, M; Agostinis, P; Aguilar, PV; Aguirre-Ghiso, J; Airoldi, EM; Ait-Si-Ali, S; Akematsu, T; Akporiaye, ET; Al-Rubeai, M; Albaiceta, GM; Albanese, C; Albani, D; Albert, ML; Aldudo, J; Algul, H; Alirezaei, M; Alloza, I; Almasan, A; Almonte-Beceril, M; Alnemri, ES; Alonso, C; Altan-Bonnet, N; Altieri, DC; Alvarez, S; Alvarez-Erviti, L; Alves, S; Amadoro, G; Amano, A; Amantini, C; Ambrosio, S; Amelio, I; Amer, AO; Amessou, M; Amon, A; An, Z; Anania, FA; Andersen, SU; Andley, UP; Andreadi, CK; Andrieu-Abadie, N; Anel, A; Ann, DK; Anoopkumar-Dukie, S; Antonioli, M; Aoki, H; Apostolova, N; Aquila, S; Aquilano, K; Araki, K; Arama, E; Aranda, A; Araya, J; Arcaro, A; Arias, E; Arimoto, H; Ariosa, AR; Armstrong, JL; Arnould, T; Arsov, I; Asanuma, K; Askanas, V; Asselin, E; Atarashi, R; Atherton, SS; Atkin, JD; Attardi, LD; Auberger, P; Auburger, G; Aurelian, L; Autelli, R

54 citations


Proceedings Article
02 May 2016
TL;DR: In this article, an iterative estimation procedure termed averaged implicit stochastic gradient descent (ai-sgd) is proposed, which achieves the Cramer-Rao bound under strong convexity, i.e., it is asymptotically an optimal unbiased estimator of the true parameter value.
Abstract: Iterative procedures for parameter estimation based on stochastic gradient descent (sgd) allow the estimation to scale to massive data sets. However, they typically suer from numerical instability, while estimators based on sgd are statistically inecient as they do not use all the informationinthedataset. Toaddressthesetwoissueswe propose an iterative estimation procedure termed averaged implicit sgd (ai-sgd). For statistical eciency ai-sgd employs averaging of the iterates, which achieves the Cramer-Rao bound under strong convexity, i.e., it is asymptotically an optimal unbiased estimator of the true parameter value. For numerical stability ai-sgd employs animplicitupdateateachiteration, whichissimilar to updates performed by proximal operators in optimization. In practice, ai-sgd achieves competitive performance with state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates.

44 citations


Posted ContentDOI
06 Oct 2016-bioRxiv
TL;DR: Estimating the contribution of transcript levels to orthogonal sources of variability found that scaled mRNA levels can account for most of the mean-level-variability but not necessarily for across-tissues variability, suggesting extensive post-transcriptional regulation.
Abstract: Transcriptional and post-transcriptional regulation shape tissue-type-specific proteomes, but their relative contributions remain contested. Estimates of the factors determining protein levels in human tissues do not distinguish between (i) the factors determining the variability between the abundances of different proteins, i.e., mean-level-variability and, (ii) the factors determining the physiological variability of the same protein across different tissue types, i.e., across-tissue variability. We sought to estimate the contribution of transcript levels to these two orthogonal sources of variability, and found that mRNA levels can account for most of the mean-level-variability but not for across-tissue variability. The precise quantification of the latter estimate is limited by substantial measurement noise. However, protein-to-mRNA ratios exhibit substantial across-tissue variability that is functionally concerted and reproducible across different datasets, suggesting extensive post-transcriptional regulation. These results caution against estimating protein fold-changes from mRNA fold-changes between different cell-types, and highlight the contribution of post-transcriptional regulation to shaping tissue-type-specific proteomes.

35 citations


Journal ArticleDOI
13 Sep 2016-eLife
TL;DR: A computational approach leveraging digestion variability to determine nucleosome positions at a base-pair resolution from MNase-seq data and generates a variability template as a simple error model for how MNase digestion affects the mapping of individual nucleosomes.
Abstract: Plants, animals and other eukaryotes wrap their DNA around complexes of proteins called histones to form repeating units known as nucleosomes. The interaction between histones and DNA is strong, and so the DNA region inside a nucleosome has limited access to other proteins, including those that drive the expression of genes. Moving a nucleosome slightly can change the access to its DNA and significantly impact how the genes in the region are regulated. Nevertheless, determining the position of nucleosomes accurately or testing how nucleosomes are different between individual cells are challenging tasks. Most methods for identifying nucleosomes use an enzyme called micrococcal nuclease (or MNase for short) to break down the DNA that isn’t protected in nucleosomes, followed by high-throughput DNA sequencing to identify the DNA fragments that remain. However, this technique, known as MNase-seq, is limited because it only measures an average location of the nucleosomes across millions of cells. Now, Zhou, Blocker et al. have developed a new computational approach to identify nucleosome positions more accurately using MNase-seq data obtained from both yeast and human cells. This approach revealed that in more than half of the yeast genome, a given nucleosome is found at slightly different positions in different cells. Nucleosomes positioned near the beginning of a gene mark it open or closed for binding by the cell’s gene expression machinery. Zhou, Blocker et al. suggest that the nucleosomes’ positions influence how gene expression starts via a multi-step process. Following on from this work, the next step is to use the newly developed method to study how nucleosome positions change when other regulators of gene activity bind and when genes are activated or repressed.

01 Jun 2016
TL;DR: This article showed that yeast heat shock factor 1 (Hsf1) is essential even at low temperatures and showed that engineered nuclear export of Hsf1 results in cytotoxicity associated with massive protein aggregation.
Abstract: Despite its eponymous association with the heat shock response, yeast heat shock factor 1 (Hsf1) is essential even at low temperatures. Here we show that engineered nuclear export of Hsf1 results in cytotoxicity associated with massive protein aggregation. Genome-wide analysis revealed that Hsf1 nuclear export immediately decreased basal transcription and mRNA expression of 18 genes, which predominately encode chaperones. Strikingly, rescuing basal expression of Hsp70 and Hsp90 chaperones enabled robust cell growth in the complete absence of Hsf1. With the exception of chaperone gene induction, the vast majority of the heat shock response was Hsf1 independent. By comparative analysis of mammalian cell lines, we found that only heat shock-induced but not basal expression of chaperones is dependent on the mammalian Hsf1 homolog (HSF1). Our work reveals that yeast chaperone gene expression is an essential housekeeping mechanism and provides a roadmap for defining the function of HSF1 as a driver of oncogenesis.

Posted Content
TL;DR: This work explores an approach, where the joint distribution of observed data and missing data is specified through non-standard conditional distributions, and applies Tukey's conditional representation to exponential family models, and proposes a computationally tractable inferential strategy for this class of models.
Abstract: Data analyses typically rely upon assumptions about missingness mechanisms that lead to observed versus missing data. When the data are missing not at random, direct assumptions about the missingness mechanism, and indirect assumptions about the distributions of observed and missing data, are typically untestable. We explore an approach, where the joint distribution of observed data and missing data is specified through non-standard conditional distributions. In this formulation, which traces back to a factorization of the joint distribution, apparently proposed by J.W. Tukey, the modeling assumptions about the conditional factors are either testable or are designed to allow the incorporation of substantive knowledge about the problem at hand, thereby offering a possibly realistic portrayal of the data, both missing and observed. We apply Tukey's conditional representation to exponential family models, and we propose a computationally tractable inferential strategy for this class of models. We illustrate the utility of this approach using high-throughput biological data with missing data that are not missing at random.

Book ChapterDOI
18 Feb 2016
TL;DR: In this paper, the authors propose a solution to the problem of 14.14.14-14.15.0/1/0/0.00/1.00
Abstract: 14.

Journal ArticleDOI
TL;DR: An actor-oriented continuous-time model is adopted and enhanced to jointly estimate the co-evolution of the users' social network structure and their content production behavior using a Markov Chain Monte Carlo (MCMC)-based simulation approach and provides researchers and practitioners a statistically rigorous approach to analyze network effects in observational data.
Abstract: With the rapid growth of online social network sites (SNS), it has become imperative for platform owners and online marketers to investigate what drives content production on these platforms. However, previous research has found it difficult to statistically model these factors from observational data due to the inability to explicitly specify and discern the effects of network formation from network influence. In this paper, we adopt and enhance an actor-oriented continuous-time model to jointly estimate the co-evolution of the users' social network structure and their content production behavior using a Markov Chain Monte Carlo (MCMC)-based simulation approach. Specifically, we offer a method to analyze non-stationary and continuous behavior with network effects, similar to what is observed in social media ecosystems. Leveraging a unique dataset contributed by Facebook, we apply our model to data on university students across six months and uncover novel insights: 1) users tend to connect with others that have similar posting behavior, 2) however, after doing so, users tend to diverge in posting behavior, and 3) friend selections are sensitive to the strength of the posting behavior. Our method provides researchers and practitioners a statistically rigorous approach to analyze network effects in observational data. By suitably applying this method, researchers can generate insights and recommendations for SNS platforms to sustain an active and viable online community.

Journal ArticleDOI
TL;DR: A modeling approach based on nonparametric templates to control for the variability along the sequence of read counts associated with nucleosomal DNA due to enzymatic digestion and other sample preparation steps is developed, and a calibrated Bayesian method to detect local concentrations of nucleosome positions is developed.
Abstract: We consider the problem of estimating the genome-wide distribution of nucleosome positions from paired-end sequencing data. We develop a modeling approach based on nonparametric templates to control for the variability along the sequence of read counts associated with nucleosomal DNA due to enzymatic digestion and other sample preparation steps, and we develop a calibrated Bayesian method to detect local concentrations of nucleosome positions. We also introduce a set of estimands that provides rich, interpretable summaries of nucleosome positioning. Inference is carried out via a distributed Hamiltonian Monte Carlo algorithm that can scale linearly with the length of the genome being analyzed. We provide MPI-based Python implementations of the proposed methods, stand-alone and on Amazon EC2, which can provide inferences on an entire Saccharomyces cerevisiae genome in less than 1 hr on EC2. We evaluate the accuracy and reproducibility of the inferences leveraging a factorially designed simulation s...

Journal ArticleDOI
TL;DR: In this article, the relationship between finitely exchangeable arrays and sequences was demonstrated and sharp bounds on the total variation distance between distributions of finitely and infinitely exchangeable array arrays were derived.

Posted Content
TL;DR: Formal assumptions and inference methodology are introduced that allow us to explicitly bound the last plausible time of treatment in observational studies with unknown times of treatment, and ultimately yield valid causal estimates in such situations.
Abstract: Time plays a fundamental role in causal analyses, where the goal is to quantify the effect of a specific treatment on future outcomes. In a randomized experiment, times of treatment, and when outcomes are observed, are typically well defined. In an observational study, treatment time marks the point from which pre-treatment variables must be regarded as outcomes, and it is often straightforward to establish. Motivated by a natural experiment in online marketing, we consider a situation where useful conceptualizations of the experiment behind an observational study of interest lead to uncertainty in the determination of times at which individual treatments take place. Of interest is the causal effect of heavy snowfall in several parts of the country on daily measures of online searches for batteries, and then purchases. The data available give information on actual snowfall, whereas the natural treatment is the anticipation of heavy snowfall, which is not observed. In this article, we introduce formal assumptions and inference methodology centered around a novel notion of plausible time of treatment. These methods allow us to explicitly bound the last plausible time of treatment in observational studies with unknown times of treatment, and ultimately yield valid causal estimates in such situations.