scispace - formally typeset
Search or ask a question

Showing papers by "J. Stephen Downie published in 2019"


Journal ArticleDOI
01 Jan 2019
TL;DR: It is demonstrated that choosing a consolidated or federated approach fundamentally alters the dataset configuration process for cross‐corpora workset building, so should be considered early in deployment specification and design.
Abstract: Linked Data provides a conceptual foundation for creating unified views across Digital Libraries, but implementation challenges must be overcome to realize the vision of computationally assisted cross‐corpus research. We report practical experiences comparing two alternative workset building approaches across combined datasets: the HathiTrust Digital Library and the Early English Books Online Text Creation Partnership. In one experiment we combine both datasets within one triplestore using a single ontology and apply consolidated querying; in the other we build two distributed triplestores, each dataset conforming to its own ontology, and connected through federated querying. Each solution presents tradeoffs in complexity, system efficiency and responsiveness, and in the workload of configuring new methods providing access to Digital Libraries. We demonstrate that choosing a consolidated or federated approach fundamentally alters the dataset configuration process for cross‐corpora workset building, so should be considered early in deployment specification and design. As both approaches provide equivalent functionality to the end‐user, the practice and experience documented here inform design and development of distributed Linked Data Digital Libraries offering combined collection querying.

3 citations


Journal ArticleDOI
TL;DR: The conceptual design and report on the implementation of Capisco, a low-cost approach to concept-based access to digital libraries that avoids the need for complete semantic document markup using ontologies by leveraging an automatically generated Concept-in-Context (CiC) network, is presented.
Abstract: In this article, we present the conceptual design and report on the implementation of Capisco—a low-cost approach to concept-based access to digital libraries. Capisco avoids the need for complete semantic document markup using ontologies by leveraging an automatically generated Concept-in-Context (CiC) network. The network is seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system disambiguates the semantics of terms in the documents by their semantics and context and identifies the relevant CiC concepts. Supplementary to this, the disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. For established digital library systems, completely replacing, or even making significant changes to the document retrieval mechanism (document analysis, indexing strategy, query processing, and query interface) would require major technological effort and would most likely be disruptive. In addition to presenting Capisco, we describe ways to harness the results of our developed semantic analysis and disambiguation, while retaining the existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifications.

3 citations


Proceedings ArticleDOI
09 Nov 2019
TL;DR: This analysis on the popular songs indicates that concreteness of popular song lyrics fell from the middle of the 1960s until the 1990s and rose after that, and the advent of Hip-Hop/Rap and the number of words in song lyrics are highly correlated with the rise in Concreteness after the early 1990s.
Abstract: Recently, music complexity has drawn attention from researchers in Music Digital Libraries area. In particular, computational methods to measure music complexity have been studied to provide better music services in large-scale music digital libraries. However, the majority of music complexity research has focused on audio-related facets of music, while song lyrics have been rarely considered. Based on the observation that most popular songs contain lyrics, whose different levels of complexity contribute to the overall music complexity, this paper investigates song lyric complexity and how it might be measured computationally. In particular, this paper examines the concreteness of song lyrics using trend analysis. Our analysis on the popular songs indicates that concreteness of popular song lyrics fell from the middle of the 1960s until the 1990s and rose after that. The advent of Hip-Hop/Rap and the number of words in song lyrics are highly correlated with the rise in concreteness after the early 1990s.

2 citations


Proceedings ArticleDOI
31 Mar 2019
TL;DR: This paper investigates practical routes to globally unique identifiers for the medieval manuscripts of the Bodleian Library and considers how Archival Resource Keys (ARKs), a type of URI, can be applied to the Medieval Manuscript catalog as well as determining how ARKs can support MMM's research goals.
Abstract: In data management, the use of identifiers is essential for disambiguation and referencing. The scope of the use of identifiers varies. For example, disambiguation within an institution using integer identifiers may be sufficient for operational procedures, whereas digital scholarship using global resources relies on universally unique identifiers. In this paper we investigate practical routes to globally unique identifiers for the medieval manuscripts of the Bodleian Library. The Oxford Linked Open Data (OxLOD) and Mapping Manuscript Migrations (MMM) projects require unique identifiers for the transformation of the medieval manuscripts catalogue into linked data, in an effort to increase discoverability and consistency across platforms. We consider how Archival Resource Keys (ARKs), a type of URI, can be applied to the Medieval Manuscript catalog as well as determining how ARKs can support MMM's research goals. We begin with examining the Text Encoding Initiative (TEI) catalogue records to under-stand the data provided and identify and describe entities which do not presently have identifiers. Further, we evaluate ARKs for producing identifiers, prioritizing those which are required to answer common research questions.

1 citations


Proceedings ArticleDOI
02 Jun 2019
TL;DR: The steps necessary—and the challenges that had to be overcome—to replicate the work given in a publicly available blog using the HathiTrust Research Center’s virtual machine Data Capsule platform are detailed.
Abstract: We report on a case-study to independently reproduce the work given in a publicly available blog on how to develop a topic model sourced from a collection of texts, where both the data set and source code used are readily available. More specifically, we detail the steps necessary---and the challenges that had to be overcome---to replicate the work using the HathiTrust Research Center's virtual machine Data Capsule platform. From this we make recommendations for authors to follow, based on the lessons learned. We also show that the Data Capsule model can be put to work in a way that is of benefit to those interested in supporting computational reproducibility within their organizations.

Journal ArticleDOI
01 Jan 2019
TL;DR: Wang et al. as mentioned in this paper identified how users in different cultural backgrounds seek and use cultural heritage information on e•Dunhuang, as well as their perceptions and opinions toward that.
Abstract: E‐Dunhuang is a visual material centered digital library of a UNESCO world heritage. This study aims to identify how users in different cultural backgrounds seek and use cultural heritage information on e‐Dunhuang, as well as their perceptions and opinions toward that. As a preliminary report, this poster focuses on the visual content (i.e. images, panorama) of e‐Dunhuang, which is arguably the most prominent component in this platform. Results of usability tests and follow‐up interviews reveal that users had polarized opinions. The findings can inform cross‐cultural access to cultural heritage collections dominant with visual content.