scispace - formally typeset
Search or ask a question

Showing papers by "J. Stephen Downie published in 2020"


Book ChapterDOI
30 Nov 2020
TL;DR: This study aims to offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build knowledge-graph-based systems for enhanced scholarly information organization by focusing on two key factors: 1) Bert model variants, and 2) classification strategies.
Abstract: With the rapid growth of research publications, there is a vast amount of scholarly knowledge that needs to be organized in digital libraries. To deal with this challenge, techniques relying on knowledge-graph structures are being advocated. Within such graph-based pipelines, inferring relation types between related scientific concepts is a crucial step. Recently, advanced techniques relying on language models pre-trained on large corpora have been popularly explored for automatic relation classification. Despite the remarkable contributions that have been made, many of these methods were evaluated under different scenarios, which limits their comparability. To address this shortcoming, we present a thorough empirical evaluation of eight Bert-based classification models by focusing on two key factors: 1) Bert model variants, and 2) classification strategies. Experiments on three corpora show that domain-specific pre-training corpus benefits the Bert-based classification model to identify the type of scientific relations. Although the strategy of predicting a single relation each time achieves a higher classification accuracy than the strategy of identifying multiple relation types simultaneously in general, the latter strategy demonstrates a more consistent performance in the corpus with either a large or small number of annotations. Our study aims to offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build knowledge-graph-based systems for enhanced scholarly information organization.

7 citations


Journal ArticleDOI
TL;DR: The HTDL’s potential for musicology research is discussed by providing a bibliometric analysis of the collection as a whole, and of the music materials in particular, and several opportunities for improvement are highlighted.
Abstract: The HathiTrust Digital Library (HTDL) is one of the largest digital libraries in the world, containing seventeen million volumes from the collections of major academic and research libraries. In this paper, we discuss the HTDL’s potential for musicology research by providing a bibliometric analysis of the collection as a whole, and of the music materials in particular. A series of case studies illustrates the kinds of musicological research that may be conducted using the HTDL. We highlight several opportunities for improvement and discuss promising future directions for new knowledge creation through the processing and analysis of large amounts of retrospective data. The HTDL presents significant new opportunities to the study of music that will continue to expand as data, metadata and collection enhancements are introduced.

6 citations


16 Oct 2020
TL;DR: In this article, the authors propose a vision of digitally enabled collaboration that may help amateur music societies rebuild their sense of community and purpose, by working together with academics, archives, and a major US arts centre to reconnect with their past and enrich understanding of their own histories and traditions within a broader national context.
Abstract: In post-COVID times we are focusing quite rightly on the plight of our major cultural institutions; but just as important are the local societies that enrich our community life, including amateur music societies, devastated by stringent social-distancing requirements and the health and safety implications of live performance in small spaces. We propose a vision of digitally enabled collaboration that may help these societies rebuild their sense of community and purpose, by working together with academics, archives, and a major US arts centre to reconnect with their past and enrich understanding of their own histories and traditions within a broader national context.

5 citations


Journal ArticleDOI
01 Oct 2020
TL;DR: This work proposes a hybrid approach to extract scientific concept relations from scholarly publications by utilizing syntactic rules as a form of distant supervision to link related scientific term pairs and training a classifier to further identify the relation type per pair.
Abstract: Knowledge graphs have been successfully built from unstructured texts in general domains such as newswire by leveraging distant supervision relation signals from linked data repositories such as DBpedia. In contrast, the lack of a comprehensive ontology of scholarly relations makes it difficult to similarly adopt distant supervision to create knowledge graphs over scholarly articles. In light of this difficulty, we propose a hybrid approach to extract scientific concept relations from scholarly publications by: (a) utilizing syntactic rules as a form of distant supervision to link related scientific term pairs; and (b) training a classifier to further identify the relation type per pair. Our system targets a high‐precision performance objective as opposed to high recall, aiming to reduce the noisy results albeit at the cost of extracting fewer relations when building scholarly knowledge graphs over massive‐scale publications. Results on two benchmark datasets show that our hybrid system surpasses the state‐of‐the‐art with an overall 60% F1 score led by the nearly 15% precision boost in identifying related scientific concepts. We further achieved an overall F1 in the range 34.1% to 51.2%, on relation classification, per experimental dataset.

5 citations


Posted Content
TL;DR: The authors presented a thorough empirical evaluation on eight Bert-based classification models by focusing on two key factors: (1) Bert model variants, and (2) classification strategies, and they showed that domain-specific pre-training corpus benefits the Bertbased classification model to identify the type of scientific relations.
Abstract: With the rapid growth of research publications, there is a vast amount of scholarly knowledge that needs to be organized in digital libraries. To deal with this challenge, techniques relying on knowledge-graph structures are being advocated. Within such graph-based pipelines, inferring relation types between related scientific concepts is a crucial step. Recently, advanced techniques relying on language models pre-trained on the large corpus have been popularly explored for automatic relation classification. Despite remarkable contributions that have been made, many of these methods were evaluated under different scenarios, which limits their comparability. To this end, we present a thorough empirical evaluation on eight Bert-based classification models by focusing on two key factors: 1) Bert model variants, and 2) classification strategies. Experiments on three corpora show that domain-specific pre-training corpus benefits the Bert-based classification model to identify the type of scientific relations. Although the strategy of predicting a single relation each time achieves a higher classification accuracy than the strategy of identifying multiple relation types simultaneously in general, the latter strategy demonstrates a more consistent performance in the corpus with either a large or small size of annotations. Our study aims to offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build knowledge-graph-based systems for enhanced scholarly information organization.

2 citations



Journal ArticleDOI
01 Oct 2020
TL;DR: This poster describes processes used to convert from MARC to BIBFRAME, to enrich descriptions with links, and to visualize worksets using RDF created.
Abstract: In 2018 the HathiTrust Research Center built an experimental triplestore of BIBFRAME RDF from the HathiTrust Digital Library's MARC metadata catalog. This poster describes processes used to convert from MARC to BIBFRAME, to enrich descriptions with links, and to visualize worksets using RDF created.

1 citations


Journal ArticleDOI
01 Oct 2020
TL;DR: In this article, the results from a series of network analyses examining how synthetic biology ethics is evolving are presented. But the results are limited to 802 institutions and 2,179 authors and suggest that the fields' social structure has become democratized at the individual level but remains dominated by a handful of institutions at the organizational level.
Abstract: Although synthetic biology is now an established field, ethical reflection surrounding it has only recently become a subject of sustained scientific inquiry. This poster displays the results from a series of network analyses examining how synthetic biology ethics is evolving. The studies gather ethics articles from 802 institutions and 2,179 authors and suggest that the fields' social structure has become democratized at the individual level but remains dominated by a handful of institutions at the organizational level.

1 citations


Proceedings ArticleDOI
01 Aug 2020
TL;DR: This paper provides a use case utilizing an English literature dataset of 178,381 volumes curated by the HathiTrust Research Center (HTRC) for measuring the change of three literature genres.
Abstract: This paper investigates the limitations and challenges of the curated datasets provided by digital libraries in support of digital humanities (DH) research. Our presented work provides a use case utilizing an English literature dataset of 178,381 volumes curated by the HathiTrust Research Center (HTRC) for measuring the change of three literature genres. These volumes were selected from over 17 million digitized items in the HathiTrust Digital Library. We demonstrate our methods and workflow for improving the representativeness and scholarly usability of the existing datasets. We analyzed and effectively overcame three common limitations: duplicate volumes, uneven distribution of data and OCR errors. We suggest that stakeholders of digital libraries should flag and address these limitations to improve their provisions' usability in the context of digital humanities research.