scispace - formally typeset
Search or ask a question

Showing papers by "J. Stephen Downie published in 2021"


Journal ArticleDOI
TL;DR: This work examines the role that exploratory data analysis and visualization tools may play in understanding large bibliographic datasets and presents one such tool, Hathi trust+Bookworm, which allows multifaceted exploration of the multimillion work HathiTrust Digital Library.
Abstract: The emergence of large multi‐institutional digital libraries has opened the door to aggregate‐level examinations of the published word. Such large‐scale analysis offers a new way to pursue traditional problems in the humanities and social sciences, using digital methods to ask routine questions of large corpora. However, inquiry into multiple centuries of books is constrained by the burdens of scale, where statistical inference is technically complex and limited by hurdles to access and flexibility. This work examines the role that exploratory data analysis and visualization tools may play in understanding large bibliographic datasets. We present one such tool, HathiTrust+Bookworm, which allows multifaceted exploration of the multimillion work HathiTrust Digital Library, and center it in the broader space of scholarly tools for exploratory data analysis.

10 citations


Journal ArticleDOI
TL;DR: The Synthetic Biology Knowledge System (SBKS) as discussed by the authors is an instance of the SynBioHub repository that includes text and data information that has been mined from papers published in ACS as discussed by the authors.
Abstract: The Synthetic Biology Knowledge System (SBKS) is an instance of the SynBioHub repository that includes text and data information that has been mined from papers published in ACS Synthetic Biology. This paper describes the SBKS curation framework that is being developed to construct the knowledge stored in this repository. The text mining pipeline performs automatic annotation of the articles using natural language processing techniques to identify salient content such as key terms, relationships between terms, and main topics. The data mining pipeline performs automatic annotation of the sequences extracted from the supplemental documents with the genetic parts used in them. Together these two pipelines link genetic parts to papers describing the context in which they are used. Ultimately, SBKS will reduce the time necessary for synthetic biologists to find the information necessary to complete their designs.

6 citations