scispace - formally typeset
Search or ask a question

Showing papers in "PLOS ONE in 2015"


Journal ArticleDOI
10 Jul 2015-PLOS ONE
TL;DR: This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers by introducing a methodology that allows to visualize the contributions of single pixels to predictions for kernel-based classifiers over Bag of Words features and for multilayered neural networks.
Abstract: Understanding and interpreting classification decisions of automated image classification systems is of high value in many applications, as it allows to verify the reasoning of the system and provides additional information to the human expert. Although machine learning methods are solving very successfully a plethora of tasks, they have in most cases the disadvantage of acting as a black box, not providing any information about what made them arrive at a particular decision. This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers. We introduce a methodology that allows to visualize the contributions of single pixels to predictions for kernel-based classifiers over Bag of Words features and for multilayered neural networks. These pixel contributions can be visualized as heatmaps and are provided to a human expert who can intuitively not only verify the validity of the classification decision, but also focus further analysis on regions of potential interest. We evaluate our method for classifiers trained on PASCAL VOC 2009 images, synthetic image data containing geometric shapes, the MNIST handwritten digits data set and for the pre-trained ImageNet model available as part of the Caffe open source package.

3,330 citations


Journal ArticleDOI
Daniel D Murray1, Kazuo Suzuki1, Matthew Law1, Jonel Trebicka2  +1486 moreInstitutions (9)
14 Oct 2015-PLOS ONE
TL;DR: No associations with mortality were found with any circulating miRNAs studied and these results cast doubt onto the effectiveness of circulating miRNA as early predictors of mortality or the major underlying diseases that contribute to mortality in participants treated for HIV-1 infection.
Abstract: Introduction The use of anti-retroviral therapy (ART) has dramatically reduced HIV-1 associated morbidity and mortality. However, HIV-1 infected individuals have increased rates of morbidity and mortality compared to the non-HIV-1 infected population and this appears to be related to end-organ diseases collectively referred to as Serious Non-AIDS Events (SNAEs). Circulating miRNAs are reported as promising biomarkers for a number of human disease conditions including those that constitute SNAEs. Our study sought to investigate the potential of selected miRNAs in predicting mortality in HIV-1 infected ART treated individuals. Materials and Methods A set of miRNAs was chosen based on published associations with human disease conditions that constitute SNAEs. This case: control study compared 126 cases (individuals who died whilst on therapy), and 247 matched controls (individuals who remained alive). Cases and controls were ART treated participants of two pivotal HIV-1 trials. The relative abundance of each miRNA in serum was measured, by RTqPCR. Associations with mortality (all-cause, cardiovascular and malignancy) were assessed by logistic regression analysis. Correlations between miRNAs and CD4+ T cell count, hs-CRP, IL-6 and D-dimer were also assessed. Results None of the selected miRNAs was associated with all-cause, cardiovascular or malignancy mortality. The levels of three miRNAs (miRs -21, -122 and -200a) correlated with IL-6 while miR-21 also correlated with D-dimer. Additionally, the abundance of miRs -31, -150 and -223, correlated with baseline CD4+ T cell count while the same three miRNAs plus miR-145 correlated with nadir CD4+ T cell count. Discussion No associations with mortality were found with any circulating miRNA studied. These results cast doubt onto the effectiveness of circulating miRNA as early predictors of mortality or the major underlying diseases that contribute to mortality in participants treated for HIV-1 infection.

3,094 citations


Journal ArticleDOI
04 Mar 2015-PLOS ONE
TL;DR: It is shown that the visual interpretability of ROC plots in the context of imbalanced datasets can be deceptive with respect to conclusions about the reliability of classification performance, owing to an intuitive but wrong interpretation of specificity.
Abstract: Binary classifiers are routinely evaluated with performance measures such as sensitivity and specificity, and performance is frequently illustrated with Receiver Operating Characteristics (ROC) plots. Alternative measures such as positive predictive value (PPV) and the associated Precision/Recall (PRC) plots are used less frequently. Many bioinformatics studies develop and evaluate classifiers that are to be applied to strongly imbalanced datasets in which the number of negatives outweighs the number of positives significantly. While ROC plots are visually appealing and provide an overview of a classifier's performance across a wide range of specificities, one can ask whether ROC plots could be misleading when applied in imbalanced classification scenarios. We show here that the visual interpretability of ROC plots in the context of imbalanced datasets can be deceptive with respect to conclusions about the reliability of classification performance, owing to an intuitive but wrong interpretation of specificity. PRC plots, on the other hand, can provide the viewer with an accurate prediction of future classification performance due to the fact that they evaluate the fraction of true positives among positive predictions. Our findings have potential implications for the interpretation of a large number of studies that use ROC plots on imbalanced datasets.

2,451 citations


Journal ArticleDOI
09 Jul 2015-PLOS ONE
TL;DR: The substrate and nutritional heterogeneity introduced by authigenic seep carbonates act to promote diverse, uniquely adapted assemblages, even after seepage ceases, demonstrating the significant role of carbonate rocks in promoting diversity.
Abstract: Carbonate communities: The activity of anaerobic methane oxidizing microbes facilitates precipitation of vast quantities of authigenic carbonate at methane seeps. Here we demonstrate the significant role of carbonate rocks in promoting diversity by providing unique habitat and food resources for macrofaunal assemblages at seeps on the Costa Rica margin (400–1850 m). The attendant fauna is surprisingly similar to that in rocky intertidal shores, with numerous grazing gastropods (limpets and snails) as dominant taxa. However, the community feeds upon seep-associated microbes. Macrofaunal density, composition, and diversity on carbonates vary as a function of seepage activity, biogenic habitat and location. The macrofaunal community of carbonates at non-seeping (inactive) sites is strongly related to the hydrography (depth, temperature, O2) of overlying water, whereas the fauna at sites of active seepage is not. Densities are highest on active rocks from tubeworm bushes and mussel beds, particularly at the Mound 12 location (1000 m). Species diversity is higher on rocks exposed to active seepage, with multiple species of gastropods and polychaetes dominant, while crustaceans, cnidarians, and ophiuroids were better represented on rocks at inactive sites. Macro-infauna (larger than 0.3 mm) from tube cores taken in nearby seep sediments at comparable depths exhibited densities similar to those on carbonate rocks, but had lower diversity and different taxonomic composition. Seep sediments had higher densities of ampharetid, dorvilleid, hesionid, cirratulid and lacydoniid polychaetes, whereas carbonates had more gastropods, as well as syllid, chrysopetalid and polynoid polychaetes. Stable isotope signatures and metrics: The stable isotope signatures of carbonates were heterogeneous, as were the food sources and nutrition used by the animals. Carbonate δ13Cinorg values (mean = -26.98‰) ranged from -53.3‰ to +10.0‰, and were significantly heavier than carbonate δ13Corg (mean = -33.83‰), which ranged from -74.4‰ to -20.6‰. Invertebrates on carbonates had average δ13C (per rock) = -31.0‰ (range -18.5‰ to -46.5‰) and δ15N = 5.7‰ (range -4.5‰ to +13.4‰). Average δ13C values did not differ between active and inactive sites; carbonate fauna from both settings depend on chemosynthesis-based nutrition. Community metrics reflecting trophic diversity (SEAc, total Hull Area, ranges of δ13C and δ15N) and species packing (mean distance to centroid, nearest neighbor distance) also did not vary as a function of seepage activity or site. However, distinct isotopic signatures were observed among related, co-occurring species of gastropods and polychaetes, reflecting intense microbial resource partitioning. Overall, the substrate and nutritional heterogeneity introduced by authigenic seep carbonates act to promote diverse, uniquely adapted assemblages, even after seepage ceases. The macrofauna in these ecosystems remain largely overlooked in most surveys, but are major contributors to biodiversity of chemosynthetic ecosystems and the deep sea in general.

1,685 citations


Journal ArticleDOI
11 Mar 2015-PLOS ONE
TL;DR: This work combines spatially explicit estimates of the baseline population with demographic data in order to derive scenario-driven projections of coastal population development and highlights countries and regions with a high degree of exposure to coastal flooding and help identifying regions where policies and adaptive planning for building resilient coastal communities are not only desirable but essential.
Abstract: Coastal zones are exposed to a range of coastal hazards including sea-level rise with its related effects. At the same time, they are more densely populated than the hinterland and exhibit higher rates of population growth and urbanisation. As this trend is expected to continue into the future, we investigate how coastal populations will be affected by such impacts at global and regional scales by the years 2030 and 2060. Starting from baseline population estimates for the year 2000, we assess future population change in the low-elevation coastal zone and trends in exposure to 100-year coastal floods based on four different sea-level and socio-economic scenarios. Our method accounts for differential growth of coastal areas against the land-locked hinterland and for trends of urbanisation and expansive urban growth, as currently observed, but does not explicitly consider possible displacement or out-migration due to factors such as sea-level rise. We combine spatially explicit estimates of the baseline population with demographic data in order to derive scenario-driven projections of coastal population development. Our scenarios show that the number of people living in the low-elevation coastal zone, as well as the number of people exposed to flooding from 1-in-100 year storm surge events, is highest in Asia. China, India, Bangladesh, Indonesia and Viet Nam are estimated to have the highest total coastal population exposure in the baseline year and this ranking is expected to remain largely unchanged in the future. However, Africa is expected to experience the highest rates of population growth and urbanisation in the coastal zone, particularly in Egypt and sub-Saharan countries in Western and Eastern Africa. The results highlight countries and regions with a high degree of exposure to coastal flooding and help identifying regions where policies and adaptive planning for building resilient coastal communities are not only desirable but essential. Furthermore, we identify needs for further research and scope for improvement in this kind of scenario-based exposure analysis.

1,604 citations


Journal ArticleDOI
23 Sep 2015-PLOS ONE
TL;DR: Racism was associated with poorer mental health, including depression, anxiety, psychological stress and various other outcomes, and the association between racism and physical health was significantly stronger for Asian American and Latino(a) American participants compared with African American participants.
Abstract: Despite a growing body of epidemiological evidence in recent years documenting the health impacts of racism, the cumulative evidence base has yet to be synthesized in a comprehensive meta-analysis focused specifically on racism as a determinant of health. This meta-analysis reviewed the literature focusing on the relationship between reported racism and mental and physical health outcomes. Data from 293 studies reported in 333 articles published between 1983 and 2013, and conducted predominately in the U.S., were analysed using random effects models and mean weighted effect sizes. Racism was associated with poorer mental health (negative mental health: r = -.23, 95% CI [-.24,-.21], k = 227; positive mental health: r = -.13, 95% CI [-.16,-.10], k = 113), including depression, anxiety, psychological stress and various other outcomes. Racism was also associated with poorer general health (r = -.13 (95% CI [-.18,-.09], k = 30), and poorer physical health (r = -.09, 95% CI [-.12,-.06], k = 50). Moderation effects were found for some outcomes with regard to study and exposure characteristics. Effect sizes of racism on mental health were stronger in cross-sectional compared with longitudinal data and in non-representative samples compared with representative samples. Age, sex, birthplace and education level did not moderate the effects of racism on health. Ethnicity significantly moderated the effect of racism on negative mental health and physical health: the association between racism and negative mental health was significantly stronger for Asian American and Latino(a) American participants compared with African American participants, and the association between racism and physical health was significantly stronger for Latino(a) American participants compared with African American participants. Protocol PROSPERO registration number: CRD42013005464.

1,412 citations


Journal ArticleDOI
Jennifer E. Huffman1, Eva Albrecht, Alexander Teumer2, Massimo Mangino3, Karen Kapur, Toby Johnson4, Z. Kutalik, Nicola Pirastu5, Giorgio Pistis6, Lorna M. Lopez1, Toomas Haller7, Perttu Salo8, Anuj Goel9, Man Li10, Toshiko Tanaka8, Abbas Dehghan11, Daniela Ruggiero, Giovanni Malerba12, Albert V. Smith13, Ilja M. Nolte, Laura Portas, Amanda Phipps-Green14, Lora Boteva1, Pau Navarro1, Åsa Johansson15, Andrew A. Hicks16, Ozren Polasek17, Tõnu Esko18, John F. Peden9, Sarah E. Harris1, Federico Murgia, Sarah H. Wild1, Albert Tenesa1, Adrienne Tin10, Evelin Mihailov7, Anne Grotevendt2, Gauti Kjartan Gislason, Josef Coresh10, Pio D'Adamo5, Sheila Ulivi, Peter Vollenweider19, Gérard Waeber19, Susan Campbell1, Ivana Kolcic17, Krista Fisher7, Margus Viigimaa, Jeffrey Metter8, Corrado Masciullo6, Elisabetta Trabetti12, Cristina Bombieri12, Rossella Sorice, Angela Doering, Eva Reischl, Konstantin Strauch20, Albert Hofman11, André G. Uitterlinden11, Melanie Waldenberger, H-Erich Wichmann20, Gail Davies1, Alan J. Gow1, Nicola Dalbeth21, Lisa K. Stamp14, Johannes H. Smit22, Mirna Kirin1, Ramaiah Nagaraja8, Matthias Nauck2, Claudia Schurmann2, Kathrin Budde2, Susan M. Farrington1, Evropi Theodoratou1, Antti Jula8, Veikko Salomaa8, Cinzia Sala6, Christian Hengstenberg23, Michel Burnier19, R Maegi7, Norman Klopp20, Stefan Kloiber24, Sabine Schipf25, Samuli Ripatti26, Stefano Cabras27, Nicole Soranzo28, Georg Homuth2, Teresa Nutile, Patricia B. Munroe4, Nicholas D. Hastie1, Harry Campbell1, Igor Rudan1, Claudia P. Cabrera29, Chris Haley1, Oscar H. Franco11, Tony R. Merriman14, Vilmundur Gudnason13, Mario Pirastu, Brenda W.J.H. Penninx30, Brenda W.J.H. Penninx11, Harold Snieder, Andres Metspalu7, Marina Ciullo, Peter P. Pramstaller16, Cornelia M. van Duijn11, Luigi Ferrucci8, Giovanni Gambaro31, Ian J. Deary1, Malcolm G. Dunlop1, James F. Wilson1, Paolo Gasparini5, Ulf Gyllensten15, Tim D. Spector3, Alan F. Wright1, Caroline Hayward1, Hugh Watkins9, Markus Perola8, Murielle Bochud32, W. H. Linda Kao10, Mark J. Caulfield4, Daniela Toniolo6, Henry Voelzke25, Christian Gieger, Anna Koettgen33, Veronique Vitart1 
26 Mar 2015-PLOS ONE
TL;DR: Interactions between body mass index (BMI) and common genetic variants affecting serum urate levels, genome-wide, and regression-type analyses in a non BMI-stratified overall sample suggested a role for N-glycan biosynthesis as a prominent urate-associated pathway in the lean stratum.
Abstract: We tested for interactions between body mass index (BMI) and common genetic variants affecting serum urate levels, genome-wide, in up to 42569 participants. Both stratified genome-wide association (GWAS) analyses, in lean, overweight and obese individuals, and regression-type analyses in a non BMI-stratified overall sample were performed. The former did not uncover any novel locus with a major main effect, but supported modulation of effects for some known and potentially new urate loci. The latter highlighted a SNP at RBFOX3 reaching genome-wide significant level (effect size 0.014, 95% CI 0.008-0.02, Pinter= 2.6 x 10-8). Two top loci in interaction term analyses, RBFOX3 and ERO1LB-EDARADD, also displayed suggestive differences in main effect size between the lean and obese strata. All top ranking loci for urate effect differences between BMI categories were novel and most had small magnitude but opposite direction effects between strata. They include the locus RBMS1-TANK (men, Pdifflean-overweight= 4.7 x 10-8), a region that has been associated with several obesity related traits, and TSPYL5 (men, Pdifflean-overweight= 9.1 x 10-8), regulating adipocytes-produced estradiol. The top-ranking known urate loci was ABCG2, the strongest known gout risk locus, with an effect halved in obese compared to lean men (Pdifflean-obese= 2 x 10-4). Finally, pathway analysis suggested a role for N-glycan biosynthesis as a prominent urate-associated pathway in the lean stratum. These results illustrate a potentially powerful way to monitor changes occurring in obesogenic environment.

1,293 citations


Journal ArticleDOI
02 Apr 2015-PLOS ONE
TL;DR: The cocor package covers a broad range of tests including the comparisons of independent and dependent correlations with either overlapping or nonoverlapping variables, and includes an implementation of Zou’s confidence interval for all of these comparisons.
Abstract: A valid comparison of the magnitude of two correlations requires researchers to directly contrast the correlations using an appropriate statistical test. In many popular statistics packages, however, tests for the significance of the difference between correlations are missing. To close this gap, we introduce cocor, a free software package for the R programming language. The cocor package covers a broad range of tests including the comparisons of independent and dependent correlations with either overlapping or nonoverlapping variables. The package also includes an implementation of Zou’s confidence interval for all of these comparisons. The platform independent cocor package enhances the R statistical computing environment and is available for scripting. Two different graphical user interfaces—a plugin for RKWard and a web interface—make cocor a convenient and user-friendly tool.

1,292 citations


Journal ArticleDOI
08 Dec 2015-PLOS ONE
TL;DR: Estimates of the global prevalence and incidence of chlamydia, gonorrhoea, trichomoniasis, and syphilis in adult women and men remain high, with nearly one million new infections with curable STI each day.
Abstract: Background: Quantifying sexually transmitted infection (STI) prevalence and incidence is important for planning interventions and advocating for resources. The World Health Organization (WHO) periodically estimates global and regional prevalence and incidence of four curable STIs: chlamydia, gonorrhoea, trichomoniasis and syphilis.Methods and Findings: WHO's 2012 estimates were based upon literature reviews of prevalence data from 2005 through 2012 among general populations for genitourinary infection with chlamydia, gonorrhoea, and trichomoniasis, and nationally reported data on syphilis seroprevalence among antenatal care attendees. Data were standardized for laboratory test type, geography, age, and high risk subpopulations, and combined using a Bayesian meta-analytic approach. Regional incidence estimates were generated from prevalence estimates by adjusting for average duration of infection. In 2012, among women aged 15-49 years, the estimated global prevalence of chlamydia was 4.2%(95% uncertainty interval (UI): 3.7-4.7%), gonorrhoea 0.8%(0.6-1.0%), trichomoniasis 5.0%(4.0-6.4%), and syphilis 0.5%(0.4-0.6%); among men, estimated chlamydia prevalence was 2.7% (2.0-3.6%), gonorrhoea 0.6%(0.40.9%), trichomoniasis 0.6%(0.4-0.8%), and syphilis 0.48% (0.3-0.7%). These figures correspond to an estimated 131 million new cases of chlamydia (100-166 million), 78 million of gonorrhoea (53-110 million), 143 million of trichomoniasis (98-202 million), and 6 million of syphilis (4-8 million). Prevalence and incidence estimates varied by region and sex.Conclusions: Estimates of the global prevalence and incidence of chlamydia, gonorrhoea, trichomoniasis, and syphilis in adult women and men remain high, with nearly one million new infections with curable STI each day. The estimates highlight the urgent need for the public health community to ensure that well-recognized effective interventions for STI prevention, screening, diagnosis, and treatment are made more widely available. Improved estimation methods are needed to allow use of more varied data and generation of estimates at the national level.

1,235 citations


Journal ArticleDOI
Ganna Chornokur, Hui-Yi Lin, Jonathan Tyrer1, Kate Lawrenson2  +155 moreInstitutions (51)
19 Jun 2015-PLOS ONE
TL;DR: Associations between inherited cellular transport gene variants and risk of EOC histologic subtypes are revealed on a large cohort of women.
Abstract: BACKGROUND: Defective cellular transport processes can lead to aberrant accumulation of trace elements, iron, small molecules and hormones in the cell, which in turn may promote the formation of reactive oxygen species, promoting DNA damage and aberrant expression of key regulatory cancer genes. As DNA damage and uncontrolled proliferation are hallmarks of cancer, including epithelial ovarian cancer (EOC), we hypothesized that inherited variation in the cellular transport genes contributes to EOC risk. METHODS: In total, DNA samples were obtained from 14,525 case subjects with invasive EOC and from 23,447 controls from 43 sites in the Ovarian Cancer Association Consortium (OCAC). Two hundred seventy nine SNPs, representing 131 genes, were genotyped using an Illumina Infinium iSelect BeadChip as part of the Collaborative Oncological Gene-environment Study (COGS). SNP analyses were conducted using unconditional logistic regression under a log-additive model, and the FDR q<0.2 was applied to adjust for multiple comparisons. RESULTS: The most significant evidence of an association for all invasive cancers combined and for the serous subtype was observed for SNP rs17216603 in the iron transporter gene HEPH (invasive: OR = 0.85, P = 0.00026; serous: OR = 0.81, P = 0.00020); this SNP was also associated with the borderline/low malignant potential (LMP) tumors (P = 0.021). Other genes significantly associated with EOC histological subtypes (p<0.05) included the UGT1A (endometrioid), SLC25A45 (mucinous), SLC39A11 (low malignant potential), and SERPINA7 (clear cell carcinoma). In addition, 1785 SNPs in six genes (HEPH, MGST1, SERPINA, SLC25A45, SLC39A11 and UGT1A) were imputed from the 1000 Genomes Project and examined for association with INV EOC in white-European subjects. The most significant imputed SNP was rs117729793 in SLC39A11 (per allele, OR = 2.55, 95% CI = 1.5-4.35, p = 5.66x10-4). CONCLUSION: These results, generated on a large cohort of women, revealed associations between inherited cellular transport gene variants and risk of EOC histologic subtypes.

1,100 citations


Journal ArticleDOI
06 Feb 2015-PLOS ONE
TL;DR: Mental, neurological and substance use disorders contribute to a significant proportion of disease burden and health systems can respond by implementing established, cost effective interventions, or by supporting the research necessary to develop better prevention and treatment options.
Abstract: Background The Global Burden of Disease Study 2010 (GBD 2010), estimated that a substantial proportion of the world’s disease burden came from mental, neurological and substance use disorders. In this paper, we used GBD 2010 data to investigate time, year, region and age specific trends in burden due to mental, neurological and substance use disorders.

Journal ArticleDOI
17 Sep 2015-PLOS ONE
TL;DR: It is concluded that whilst Google Scholar can find much grey literature and specific, known studies, it should not be used alone for systematic review searches, rather, it forms a powerful addition to other traditional search methods.
Abstract: Google Scholar (GS), a commonly used web-based academic search engine, catalogues between 2 and 100 million records of both academic and grey literature (articles not formally published by commercial academic publishers). Google Scholar collates results from across the internet and is free to use. As a result it has received considerable attention as a method for searching for literature, particularly in searches for grey literature, as required by systematic reviews. The reliance on GS as a standalone resource has been greatly debated, however, and its efficacy in grey literature searching has not yet been investigated. Using systematic review case studies from environmental science, we investigated the utility of GS in systematic reviews and in searches for grey literature. Our findings show that GS results contain moderate amounts of grey literature, with the majority found on average at page 80. We also found that, when searched for specifically, the majority of literature identified using Web of Science was also found using GS. However, our findings showed moderate/poor overlap in results when similar search strings were used in Web of Science and GS (10–67%), and that GS missed some important literature in five of six case studies. Furthermore, a general GS search failed to find any grey literature from a case study that involved manual searching of organisations’ websites. If used in systematic reviews for grey literature, we recommend that searches of article titles focus on the first 200 to 300 results. We conclude that whilst Google Scholar can find much grey literature and specific, known studies, it should not be used alone for systematic review searches. Rather, it forms a powerful addition to other traditional search methods. In addition, we advocate the use of tools to transparently document and catalogue GS search results to maintain high levels of transparency and the ability to be updated, critical to systematic reviews.

Journal ArticleDOI
24 Apr 2015-PLOS ONE
TL;DR: CCTop provides the bench biologist with a tool for the rapid and efficient identification of high quality target sites and was experimentally validated for gene inactivation, non-homologous end-joining as well as homology directed repair.
Abstract: Engineering of the CRISPR/Cas9 system has opened a plethora of new opportunities for site-directed mutagenesis and targeted genome modification. Fundamental to this is a stretch of twenty nucleotides at the 5’ end of a guide RNA that provides specificity to the bound Cas9 endonuclease. Since a sequence of twenty nucleotides can occur multiple times in a given genome and some mismatches seem to be accepted by the CRISPR/Cas9 complex, an efficient and reliable in silico selection and evaluation of the targeting site is key prerequisite for the experimental success. Here we present the CRISPR/Cas9 target online predictor (CCTop, http://crispr.cos.uni-heidelberg.de) to overcome limitations of already available tools. CCTop provides an intuitive user interface with reasonable default parameters that can easily be tuned by the user. From a given query sequence, CCTop identifies and ranks all candidate sgRNA target sites according to their off-target quality and displays full documentation. CCTop was experimentally validated for gene inactivation, non-homologous end-joining as well as homology directed repair. Thus, CCTop provides the bench biologist with a tool for the rapid and efficient identification of high quality target sites.

Journal ArticleDOI
23 Dec 2015-PLOS ONE
TL;DR: Overall, this work defines exclusive and common M1 and M2 signatures and provides novel and improved tools to distinguish M1 or M2 murine macrophages.
Abstract: Classically (M1) and alternatively activated (M2) macrophages exhibit distinct phenotypes and functions. It has been difficult to dissect macrophage phenotypes in vivo, where a spectrum of macrophage phenotypes exists, and also in vitro, where low or non-selective M2 marker protein expression is observed. To provide a foundation for the complexity of in vivo macrophage phenotypes, we performed a comprehensive analysis of the transcriptional signature of murine M0, M1 and M2 macrophages and identified genes common or exclusive to either subset. We validated by real-time PCR an M1-exclusive pattern of expression for CD38, G-protein coupled receptor 18 (Gpr18) and Formyl peptide receptor 2 (Fpr2) whereas Early growth response protein 2 (Egr2) and c-Myc were M2-exclusive. We further confirmed these data by flow cytometry and show that M1 and M2 macrophages can be distinguished by their relative expression of CD38 and Egr2. Egr2 labeled more M2 macrophages (~70%) than the canonical M2 macrophage marker Arginase-1, which labels 24% of M2 macrophages. Conversely, CD38 labeled most (71%) in vitro M1 macrophages. In vivo, a similar CD38+ population greatly increased after LPS exposure. Overall, this work defines exclusive and common M1 and M2 signatures and provides novel and improved tools to distinguish M1 and M2 murine macrophages.

Journal ArticleDOI
17 Feb 2015-PLOS ONE
TL;DR: This work presents a new semi-automated dasymetric modeling approach that incorporates detailed census and ancillary data in a flexible, “Random Forest” estimation technique, and outlines how this algorithm will be extended to provide freely-available gridded population data sets for Africa, Asia and Latin America.
Abstract: High resolution, contemporary data on human population distributions are vital for measuring impacts of population growth, monitoring human-environment interactions and for planning and policy development. Many methods are used to disaggregate census data and predict population densities for finer scale, gridded population data sets. We present a new semi-automated dasymetric modeling approach that incorporates detailed census and ancillary data in a flexible, “Random Forest” estimation technique. We outline the combination of widely available, remotely-sensed and geospatial data that contribute to the modeled dasymetric weights and then use the Random Forest model to generate a gridded prediction of population density at ~100 m spatial resolution. This prediction layer is then used as the weighting surface to perform dasymetric redistribution of the census counts at a country level. As a case study we compare the new algorithm and its products for three countries (Vietnam, Cambodia, and Kenya) with other common gridded population data production methodologies. We discuss the advantages of the new method and increases over the accuracy and flexibility of those previous approaches. Finally, we outline how this algorithm will be extended to provide freely-available gridded population data sets for Africa, Asia and Latin America.

Journal ArticleDOI
10 Jun 2015-PLOS ONE
TL;DR: Analysis of 45 million documents indexed in the Web of Science over the period 1973-2013 shows that in both natural and medical sciences (NMS) and social sciences and humanities, Reed-Elsevier, Wiley-Blackwell, Springer, and Taylor & Francis increased their share of the published output, especially since the advent of the digital era (mid-1990s).
Abstract: The consolidation of the scientific publishing industry has been the topic of much debate within and outside the scientific community, especially in relation to major publishers’ high profit margins. However, the share of scientific output published in the journals of these major publishers, as well as its evolution over time and across various disciplines, has not yet been analyzed. This paper provides such analysis, based on 45 million documents indexed in the Web of Science over the period 1973-2013. It shows that in both natural and medical sciences (NMS) and social sciences and humanities (SSH), Reed-Elsevier, Wiley-Blackwell, Springer, and Taylor & Francis increased their share of the published output, especially since the advent of the digital era (mid-1990s). Combined, the top five most prolific publishers account for more than 50% of all papers published in 2013. Disciplines of the social sciences have the highest level of concentration (70% of papers from the top five publishers), while the humanities have remained relatively independent (20% from top five publishers). NMS disciplines are in between, mainly because of the strength of their scientific societies, such as the ACS in chemistry or APS in physics. The paper also examines the migration of journals between small and big publishing houses and explores the effect of publisher change on citation impact. It concludes with a discussion on the economics of scholarly publishing.

Journal ArticleDOI
07 Dec 2015-PLOS ONE
TL;DR: The first emoji sentiment lexicon is provided, called the Emoji Sentiment Ranking, and a sentiment map of the 751 most frequently used emojis is drawn, which indicates that most of the emoji are positive, especially the most popular ones.
Abstract: There is a new generation of emoticons, called emojis, that is increasingly being used in mobile communications and social media. In the past two years, over ten billion emojis were used on Twitter. Emojis are Unicode graphic symbols, used as a shorthand to express concepts and ideas. In contrast to the small number of well-known emoticons that carry clear emotional contents, there are hundreds of emojis. But what are their emotional contents? We provide the first emoji sentiment lexicon, called the Emoji Sentiment Ranking, and draw a sentiment map of the 751 most frequently used emojis. The sentiment of the emojis is computed from the sentiment of the tweets in which they occur. We engaged 83 human annotators to label over 1.6 million tweets in 13 European languages by the sentiment polarity (negative, neutral, or positive). About 4% of the annotated tweets contain emojis. The sentiment analysis of the emojis allows us to draw several interesting conclusions. It turns out that most of the emojis are positive, especially the most popular ones. The sentiment distribution of the tweets with and without emojis is significantly different. The inter-annotator agreement on the tweets with emojis is higher. Emojis tend to occur at the end of the tweets, and their sentiment polarity increases with the distance. We observe no significant differences in the emoji rankings between the 13 languages and the Emoji Sentiment Ranking. Consequently, we propose our Emoji Sentiment Ranking as a European language-independent resource for automated sentiment analysis. Finally, the paper provides a formalization of sentiment and a novel visualization in the form of a sentiment bar.

Journal ArticleDOI
13 Oct 2015-PLOS ONE
TL;DR: Time spent in MVPA is an important target for intervention and preventing transfer of time from LIPA to SB might lessen the negative effects of physical inactivity, so time spent in each of these behaviors are codependent.
Abstract: The associations between time spent in sleep, sedentary behaviors (SB) and physical activity with health are usually studied without taking into account that time is finite during the day, so time spent in each of these behaviors are codependent. Therefore, little is known about the combined effect of time spent in sleep, SB and physical activity, that together constitute a composite whole, on obesity and cardio-metabolic health markers. Cross-sectional analysis of NHANES 2005–6 cycle on N = 1937 adults, was undertaken using a compositional analysis paradigm, which accounts for this intrinsic codependence. Time spent in SB, light intensity (LIPA) and moderate to vigorous activity (MVPA) was determined from accelerometry and combined with self-reported sleep time to obtain the 24 hour time budget composition. The distribution of time spent in sleep, SB, LIPA and MVPA is significantly associated with BMI, waist circumference, triglycerides, plasma glucose, plasma insulin (all p<0.001), and systolic (p<0.001) and diastolic blood pressure (p<0.003), but not HDL or LDL. Within the composition, the strongest positive effect is found for the proportion of time spent in MVPA. Strikingly, the effects of MVPA replacing another behavior and of MVPA being displaced by another behavior are asymmetric. For example, re-allocating 10 minutes of SB to MVPA was associated with a lower waist circumference by 0.001% but if 10 minutes of MVPA is displaced by SB this was associated with a 0.84% higher waist circumference. The proportion of time spent in LIPA and SB were detrimentally associated with obesity and cardiovascular disease markers, but the association with SB was stronger. For diabetes risk markers, replacing SB with LIPA was associated with more favorable outcomes. Time spent in MVPA is an important target for intervention and preventing transfer of time from LIPA to SB might lessen the negative effects of physical inactivity.

Journal ArticleDOI
15 Apr 2015-PLOS ONE
TL;DR: It is found that the Bitcoin forms a unique asset possessing properties of both a standard financial asset and a speculative one.
Abstract: The Bitcoin has emerged as a fascinating phenomenon in the Financial markets. Without any central authority issuing the currency, the Bitcoin has been associated with controversy ever since its popularity, accompanied by increased public interest, reached high levels. Here, we contribute to the discussion by examining the potential drivers of Bitcoin prices, ranging from fundamental sources to speculative and technical ones, and we further study the potential influence of the Chinese market. The evolution of relationships is examined in both time and frequency domains utilizing the continuous wavelets framework, so that we not only comment on the development of the interconnections in time but also distinguish between short-term and long-term connections. We find that the Bitcoin forms a unique asset possessing properties of both a standard financial asset and a speculative one.

Journal ArticleDOI
22 May 2015-PLOS ONE
TL;DR: A protocol for rapid and inexpensive preparation of hundreds of multiplexed genomic libraries for Illumina sequencing by carrying out the Nextera tagmentation reaction in small volumes, replacing costly reagents with cheaper equivalents, and omitting unnecessary steps is presented.
Abstract: Whole-genome sequencing has become an indispensible tool of modern biology. However, the cost of sample preparation relative to the cost of sequencing remains high, especially for small genomes where the former is dominant. Here we present a protocol for rapid and inexpensive preparation of hundreds of multiplexed genomic libraries for Illumina sequencing. By carrying out the Nextera tagmentation reaction in small volumes, replacing costly reagents with cheaper equivalents, and omitting unnecessary steps, we achieve a cost of library preparation of $8 per sample, approximately 6 times cheaper than the standard Nextera XT protocol. Furthermore, our procedure takes less than 5 hours for 96 samples. Several hundred samples can then be pooled on the same HiSeq lane via custom barcodes. Our method will be useful for re-sequencing of microbial or viral genomes, including those from evolution experiments, genetic screens, and environmental samples, as well as for other sequencing applications including large amplicon, open chromosome, artificial chromosomes, and RNA sequencing.

Journal ArticleDOI
20 Aug 2015-PLOS ONE
TL;DR: Air pollution data from over 1500 sites, including airborne particulate matter (PM), SO2, NO2, and O3, is made available, and Kriging interpolation is applied to four months of data to derive pollution maps for eastern China.
Abstract: China has recently made available hourly air pollution data from over 1500 sites, including airborne particulate matter (PM), SO2, NO2, and O3. We apply Kriging interpolation to four months of data to derive pollution maps for eastern China. Consistent with prior findings, the greatest pollution occurs in the east, but significant levels are widespread across northern and central China and are not limited to major cities or geologic basins. Sources of pollution are widespread, but are particularly intense in a northeast corridor that extends from near Shanghai to north of Beijing. During our analysis period, 92% of the population of China experienced >120 hours of unhealthy air (US EPA standard), and 38% experienced average concentrations that were unhealthy. China’s population-weighted average exposure to PM2.5 was 52 μg/m3. The observed air pollution is calculated to contribute to 1.6 million deaths/year in China [0.7–2.2 million deaths/year at 95% confidence], roughly 17% of all deaths in China.

Journal ArticleDOI
10 Nov 2015-PLOS ONE
TL;DR: A new representation and feature extraction method for biological sequences that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction is introduced.
Abstract: We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we focus on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction. In this method, we adopt artificial neural network approaches and represent a protein sequence with a single dense n-dimensional vector. To evaluate this method, we apply it in classification of 324,018 protein sequences obtained from Swiss-Prot belonging to 7,027 protein families, where an average family classification accuracy of 93%±0.06% is obtained, outperforming existing family classification methods. In addition, we use ProtVec representation to predict disordered proteins from structured proteins. Two databases of disordered sequences are used: the DisProt database as well as a database featuring the disordered regions of nucleoporins rich with phenylalanine-glycine repeats (FG-Nups). Using support vector machine classifiers, FG-Nup sequences are distinguished from structured protein sequences found in Protein Data Bank (PDB) with a 99.8% accuracy, and unstructured DisProt sequences are differentiated from structured DisProt sequences with 100.0% accuracy. These results indicate that by only providing sequence data for various proteins into this model, accurate information about protein structure can be determined. Importantly, this model needs to be trained only once and can then be applied to extract a comprehensive set of information regarding proteins of interest. Moreover, this representation can be considered as pre-training for various applications of deep learning in bioinformatics. The related data is available at Life Language Processing Website: http://llp.berkeley.edu and Harvard Dataverse: http://dx.doi.org/10.7910/DVN/JMFHTN.

Journal ArticleDOI
25 Jun 2015-PLOS ONE
TL;DR: Results indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors.
Abstract: 80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008–2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management—organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15–75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological knowledge from data rich countries to countries with limited soil data.

Journal ArticleDOI
14 Sep 2015-PLOS ONE
TL;DR: Analysis of the bacterial 16S ribosomal RNA (rRNA) gene by using a high-throughput culture-independent pyrosequencing method provided evidence of a moderate dysbiosis in the structure of gut microbiota in patients with MS, and phylogenetic tree analysis revealed that many of the clostridial species associated with MS might be distinct from those broadly associated with autoimmune conditions.
Abstract: The pathogenesis of multiple sclerosis (MS), an autoimmune disease affecting the brain and spinal cord, remains poorly understood. Patients with MS typically present with recurrent episodes of neurological dysfunctions such as blindness, paresis, and sensory disturbances. Studies on experimental autoimmune encephalomyelitis (EAE) animal models have led to a number of testable hypotheses including a hypothetical role of altered gut microbiota in the development of MS. To investigate whether gut microbiota in patients with MS is altered, we compared the gut microbiota of 20 Japanese patients with relapsing-remitting (RR) MS (MS20) with that of 40 healthy Japanese subjects (HC40) and an additional 18 healthy subjects (HC18). All the HC18 subjects repeatedly provided fecal samples over the course of months (158 samples in total). Analysis of the bacterial 16S ribosomal RNA (rRNA) gene by using a high-throughput culture-independent pyrosequencing method provided evidence of a moderate dysbiosis in the structure of gut microbiota in patients with MS. Furthermore, we found 21 species that showed significant differences in relative abundance between the MS20 and HC40 samples. On comparing MS samples to the 158 longitudinal HC18 samples, the differences were found to be reproducibly significant for most of the species. These taxa comprised primarily of clostridial species belonging to Clostridia clusters XIVa and IV and Bacteroidetes. The phylogenetic tree analysis revealed that none of the clostridial species that were significantly reduced in the gut microbiota of patients with MS overlapped with other spore-forming clostridial species capable of inducing colonic regulatory T cells (Treg), which prevent autoimmunity and allergies; this suggests that many of the clostridial species associated with MS might be distinct from those broadly associated with autoimmune conditions. Correcting the dysbiosis and altered gut microbiota might deserve consideration as a potential strategy for the prevention and treatment of MS.

Journal ArticleDOI
16 Oct 2015-PLOS ONE
TL;DR: The analytic and clinical validation of the gene panel are reported and near-perfect analytic specificity enables complete coverage of many genes without the false positives typically seen with traditional sequencing assays at mutant allele frequencies or fractions below 5%.
Abstract: Next-generation sequencing of cell-free circulating solid tumor DNA addresses two challenges in contemporary cancer care. First this method of massively parallel and deep sequencing enables assessment of a comprehensive panel of genomic targets from a single sample, and second, it obviates the need for repeat invasive tissue biopsies. Digital Sequencing™ is a novel method for high-quality sequencing of circulating tumor DNA simultaneously across a comprehensive panel of over 50 cancer-related genes with a simple blood test. Here we report the analytic and clinical validation of the gene panel. Analytic sensitivity down to 0.1% mutant allele fraction is demonstrated via serial dilution studies of known samples. Near-perfect analytic specificity (> 99.9999%) enables complete coverage of many genes without the false positives typically seen with traditional sequencing assays at mutant allele frequencies or fractions below 5%. We compared digital sequencing of plasma-derived cell-free DNA to tissue-based sequencing on 165 consecutive matched samples from five outside centers in patients with stage III-IV solid tumor cancers. Clinical sensitivity of plasma-derived NGS was 85.0%, comparable to 80.7% sensitivity for tissue. The assay success rate on 1,000 consecutive samples in clinical practice was 99.8%. Digital sequencing of plasma-derived DNA is indicated in advanced cancer patients to prevent repeated invasive biopsies when the initial biopsy is inadequate, unobtainable for genomic testing, or uninformative, or when the patient's cancer has progressed despite treatment. Its clinical utility is derived from reduction in the costs, complications and delays associated with invasive tissue biopsies for genomic testing.

Journal ArticleDOI
01 Apr 2015-PLOS ONE
TL;DR: Plastic debris in the Mediterranean surface waters was dominated by millimeter-sized fragments, but showed a higher proportion of large plastic objects than that present in oceanic gyres, reflecting the closer connection with pollution sources.
Abstract: Concentrations of floating plastic were measured throughout the Mediterranean Sea to assess whether this basin can be regarded as a great accumulation region of plastic debris. We found that the average density of plastic (1 item per 4 m2), as well as its frequency of occurrence (100% of the sites sampled), are comparable to the accumulation zones described for the five subtropical ocean gyres. Plastic debris in the Mediterranean surface waters was dominated by millimeter-sized fragments, but showed a higher proportion of large plastic objects than that present in oceanic gyres, reflecting the closer connection with pollution sources. The accumulation of floating plastic in the Mediterranean Sea (between 1,000 and 3,000 tons) is likely related to the high human pressure together with the hydrodynamics of this semi-enclosed basin, with outflow mainly occurring through a deep water layer. Given the biological richness and concentration of economic activities in the Mediterranean Sea, the affects of plastic pollution on marine and human life are expected to be particularly frequent in this plastic accumulation region.

Journal ArticleDOI
08 Jul 2015-PLOS ONE
TL;DR: A DNA metabarcoding protocol that utilises the standard cytochrome c oxidase subunit I (COI) barcoding fragment to detect freshwater macroinvertebrate taxa and indicated that primer efficiency is highly species-specific would prevent straightforward assessments of species abundance and biomass in a sample.
Abstract: Metabarcoding is an emerging genetic tool to rapidly assess biodiversity in ecosystems. It involves high-throughput sequencing of a standard gene from an environmental sample and comparison to a reference database. However, no consensus has emerged regarding laboratory pipelines to screen species diversity and infer species abundances from environmental samples. In particular, the effect of primer bias and the detection limit for specimens with a low biomass has not been systematically examined, when processing samples in bulk. We developed and tested a DNA metabarcoding protocol that utilises the standard cytochrome c oxidase subunit I (COI) barcoding fragment to detect freshwater macroinvertebrate taxa. DNA was extracted in bulk, amplified in a single PCR step, and purified, and the libraries were directly sequenced in two independent MiSeq runs (300-bp paired-end reads). Specifically, we assessed the influence of specimen biomass on sequence read abundance by sequencing 31 specimens of a stonefly species with known haplotypes spanning three orders of magnitude in biomass (experiment I). Then, we tested the recovery of 52 different freshwater invertebrate taxa of similar biomass using the same standard barcoding primers (experiment II). Each experiment was replicated ten times to maximise statistical power. The results of both experiments were consistent across replicates. We found a distinct positive correlation between species biomass and resulting numbers of MiSeq reads. Furthermore, we reliably recovered 83% of the 52 taxa used to test primer bias. However, sequence abundance varied by four orders of magnitudes between taxa despite the use of similar amounts of biomass. Our metabarcoding approach yielded reliable results for high-throughput assessments. However, the results indicated that primer efficiency is highly species-specific, which would prevent straightforward assessments of species abundance and biomass in a sample. Thus, PCR-based metabarcoding assessments of biodiversity should rely on presence-absence metrics.

Journal ArticleDOI
29 Oct 2015-PLOS ONE
TL;DR: In this article, the authors conducted a systematic review and meta-analysis of all studies reporting a prevalence of NAFLD based on any diagnostic method in participants 1-19 years old, regardless of the main aim of the study.
Abstract: Background & Aims Narrative reviews of paediatric NAFLD quote prevalences in the general population that range from 9% to 37%; however, no systematic review of the prevalence of NAFLD in children/adolescents has been conducted. We aimed to estimate prevalence of non-alcoholic fatty liver disease (NAFLD) in young people and to determine whether this varies by BMI category, gender, age, diagnostic method, geographical region and study sample size. Methods We conducted a systematic review and meta-analysis of all studies reporting a prevalence of NAFLD based on any diagnostic method in participants 1–19 years old, regardless of whether assessing NAFLD prevalence was the main aim of the study. Results The pooled mean prevalence of NAFLD in children from general population studies was 7.6% (95%CI: 5.5% to 10.3%) and 34.2% (95% CI: 27.8% to 41.2%) in studies based on child obesity clinics. In both populations there was marked heterogeneity between studies (I2 = 98%). There was evidence that prevalence was generally higher in males compared with females and increased incrementally with greater BMI. There was evidence for differences between regions in clinical population studies, with estimated prevalence being highest in Asia. There was no evidence that prevalence changed over time. Prevalence estimates in studies of children/adolescents attending obesity clinics and in obese children/adolescents from the general population were substantially lower when elevated alanine aminotransferase (ALT) was used to assess NAFLD compared with biopsies, ultrasound scan (USS) or magnetic resonance imaging (MRI). Conclusions Our review suggests the prevalence of NAFLD in young people is high, particularly in those who are obese and in males.

Journal ArticleDOI
20 Oct 2015-PLOS ONE
TL;DR: Sequenced RNA in human peripheral whole blood reveals and quantifies the activity of hundreds of coding genes not accessible by classical mRNA specific assays, suggesting that circRNAs could be used as biomarker molecules in standard clinical blood samples.
Abstract: Covalently closed circular RNA molecules (circRNAs) have recently emerged as a class of RNA isoforms with widespread and tissue specific expression across animals, oftentimes independent of the corresponding linear mRNAs. circRNAs are remarkably stable and sometimes highly expressed molecules. Here, we sequenced RNA in human peripheral whole blood to determine the potential of circRNAs as biomarkers in an easily accessible body fluid. We report the reproducible detection of thousands of circRNAs. Importantly, we observed that hundreds of circRNAs are much higher expressed than corresponding linear mRNAs. Thus, circRNA expression in human blood reveals and quantifies the activity of hundreds of coding genes not accessible by classical mRNA specific assays. Our findings suggest that circRNAs could be used as biomarker molecules in standard clinical blood samples.

Journal ArticleDOI
08 Jul 2015-PLOS ONE
TL;DR: This study proposes Twitter as a proxy for human mobility, as it relies on publicly available data and provides high resolution positioning when users opt to geotag their tweets with their current location, and demonstrates that Twitter can be a reliable source for studying human mobility patterns.
Abstract: Understanding human mobility is crucial for a broad range of applications from disease prediction to communication networks. Most efforts on studying human mobility have so far used private and low resolution data, such as call data records. Here, we propose Twitter as a proxy for human mobility, as it relies on publicly available data and provides high resolution positioning when users opt to geotag their tweets with their current location. We analyse a Twitter dataset with more than six million geotagged tweets posted in Australia, and we demonstrate that Twitter can be a reliable source for studying human mobility patterns. Our analysis shows that geotagged tweets can capture rich features of human mobility, such as the diversity of movement orbits among individuals and of movements within and between cities. We also find that short- and long-distance movers both spend most of their time in large metropolitan areas, in contrast with intermediate-distance movers’ movements, reflecting the impact of different modes of travel. Our study provides solid evidence that Twitter can indeed be a useful proxy for tracking and predicting human movement.