scispace - formally typeset
Search or ask a question

Showing papers by "Neil R. Smalheiser published in 2020"


Journal ArticleDOI
01 Jan 2020-Database
TL;DR: A machine learning-based model that automatically predicts which sentence(s) from abstracts state the main finding in case reports, a step toward setting up a retrieval system in which, given one case report, one can find other case reports that report the same or very similar main findings.
Abstract: Clinical case reports are the 'eyewitness reports' of medicine and provide a valuable, unique, albeit noisy and underutilized type of evidence. Generally, a case report has a single main finding that represents the reason for writing up the report in the first place. However, no one has previously created an automatic way of identifying main finding sentences in case reports. We previously created a manual corpus of main finding sentences extracted from the abstracts and full text of clinical case reports. Here, we have utilized the corpus to create a machine learning-based model that automatically predicts which sentence(s) from abstracts state the main finding. The model has been evaluated on a separate manual corpus of clinical case reports and found to have good performance. This is a step toward setting up a retrieval system in which, given one case report, one can find other case reports that report the same or very similar main findings. The code and necessary files to run the main finding model can be downloaded from https://github.com/qi29/main_ finding_recognition, released under the Apache License, Version 2.0.

7 citations


Posted ContentDOI
26 Nov 2020-bioRxiv
TL;DR: Changing the real-life behavior of scientists in planning their experiments may require developing educational tools that allow them to actively visualize the inter-relationships among effect size, sample size, statistical power, and replicability in a direct and intuitive manner.
Abstract: A recent flood of publications has documented serious problems in scientific reproducibility, power, and reporting of biomedical articles, yet scientists persist in their usual practices. Why? We examined a popular and important preclinical assay, the Forced Swim Test (FST) in mice used to test putative antidepressants. Whether the mice were assayed in a naive state vs. in a model of depression or stress, and whether the mice were given test agents vs. known antidepressants regarded as positive controls, the mean effect sizes seen in the experiments were indeed extremely large (1.5 – 2.5 in Cohen’s d units); most of the experiments utilized 7-10 animals per group which did have adequate power to reliably detect effects of this magnitude. We propose that this may at least partially explain why investigators using the FST do not perceive intuitively that their experimental designs fall short -- even though proper prospective design would require ~21-26 animals per group to detect, at a minimum, large effects (0.8 in Cohen’s d units) when the true effect of a test agent is unknown. Our data provide explicit parameters and guidance for investigators seeking to carry out prospective power estimation for the FST. More generally, altering the real-life behavior of scientists in planning their experiments may require developing educational tools that allow them to actively visualize the inter-relationships among effect size, sample size, statistical power, and replicability in a direct and intuitive manner.

6 citations


Journal ArticleDOI
28 Oct 2020
TL;DR: Metadata are surprisingly accurate in predicting when 2 articles derive from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence.
Abstract: Objectives To identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence. Materials and methods We updated our previous model by creating larger, more recent, and more diverse positive and negative training sets consisting of article pairs that were (or not) linked to the same ClinicalTrials.gov trial registry number. Features were extracted from PubMed metadata; pairwise similarity scores were modeled using logistic regression and used to form clusters of articles that are likely to arise from the same registered clinical trial. Results Articles from the same trial were identified with high accuracy (F1 = 0.859), nominally better than the previous model (F1 = 0.843). Predicted clusters showed a low error rate of splitting of 8-11% (ie, when 2 articles belonged to the same trial but were assigned to different clusters). Performance was similar whether only randomized controlled trial articles or a more diverse set of clinical trial articles were processed. Discussion Metadata are surprisingly accurate in predicting when 2 articles derive from the same underlying clinical trial. Conclusion We have continued confidence in the Aggregator tool which can be accessed publicly at http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.

3 citations


Posted ContentDOI
23 Jun 2020-medRxiv
TL;DR: The Citation Cloud is built, an extension to PubMed that allows any user to visualize and analyze the citation cloud around any target article A: the set of articles cited by A; those which cite A; Those which are co-cited with A; and those which are bibliographically coupled to A.
Abstract: Using open citations provided by iCite and other sources, we have built an extension to PubMed that allows any user to visualize and analyze the citation cloud around any target article A: the set of articles cited by A; those which cite A; those which are co-cited with A; and those which are bibliographically coupled to A This greatly enables the study of citations by the scientific community The Citation Cloud can be accessed by running any query on the Anne OTate value-added PubMed search interface http://arrowsmithpsychuicedu/cgi-bin/arrowsmith_uic/AnneOTatecgi and clicking on the Citations button next to any retrieved article

1 citations