Showing papers by "J. Stephen Downie published in 2018"

PDF

Open Access

Proceedings Article•DOI•

Worksets Expand the Scholarly Utility of Digital Libraries

[...]

Kevin R. Page¹, Jacob Jett², Timothy W. Cole², Deren Kudeki², David Bainbridge³, Peter Organisciak⁴, J. Stephen Downie² - Show less +3 more•Institutions (4)

University of Oxford¹, University of Illinois at Urbana–Champaign², University of Waikato³, University of Denver⁴

23 May 2018

TL;DR: This work distill from prior user studies three key objectives for worksets (extra-digital library manipulation, intra-item properties, and robust representations) and describes how HTRC's implementation of its RDF-compliant workset model helps to satisfy these objectives.

...read moreread less

Abstract: Scholars using digital libraries and archives routinely create worksets-aggregations of digital objects-as a way to segregate resources of interest for in-depth scrutiny. To illustrate how worksets can enhance the scholarly utility of digital library content, we distill from prior user studies three key objectives for worksets (extra-digital library manipulation, intra-item properties, and robust representations), and discuss how they motivated the workset model being developed at the HathiTrust Research Center (HTRC). We describe how HTRC's implementation of its RDF-compliant workset model helps to satisfy these objectives.

...read moreread less

3 citations

Proceedings Article•DOI•

Exploratory Investigation of Word Embedding in Song Lyric Topic Classification: Promising Preliminary Results

[...]

Kahyun Choi¹, J. Stephen Downie¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

23 May 2018

TL;DR: This work investigates a data-driven vector representation of word embedding for the task of classifying song lyrics into their semantic topics and adopts the averaged word vectors from the lyrics and user's interpretations about them, which are short in general.

...read moreread less

Abstract: In this work we investigate a data-driven vector representation of word embedding for the task of classifying song lyrics into their semantic topics. Previous research on topic classification of song lyrics has used traditional frequency based text representation. On the other hand, empirically driven word embedding has shown sensible performance improvment of text classification tasks, because of its ability to capture semantic relationship between words from big data. As averaging the word vectors from a short text is known to work reasonably well compared to the other comprehensive models utilizing their order, we adopt the averaged word vectors from the lyrics and user's interpretations about them, which are short in general, as the feature for this classification task. This simple approach showed promising classification accuracy of 57%. From this, we envision the potential of the data-driven approaches to creating features, such as the sequence of word vectors and doc2vec models, to improve the performance of the system.

...read moreread less

3 citations

Proceedings Article•DOI•

Providing Pin-point Page-level Precision to 1 Trillion Tokens of Text for Workset Creation

[...]

David Bainbridge¹, J. Stephen Downie², Boris Capitanu²•Institutions (2)

University of Waikato¹, University of Illinois at Urbana–Champaign²

23 May 2018

TL;DR: An extended example of the web environment in use that allows users to search over 1 trillion tokens of text of the HathiTrust Part-of-Speech Extracted Features Dataset to help produce worksets for scholarly analysis is presented.

...read moreread less

Abstract: We report on the work undertaken developing a web environment that allows users to search over 1 trillion tokens of text -- down to the page-level -- of the HathiTrust Part-of-Speech Extracted Features Dataset to help produce worksets for scholarly analysis. We present an extended example of the web environment in use, along with details about its implementation.

...read moreread less

2 citations

Book Chapter•DOI•

[...]

Xiao Hu¹, Ira Keung Kit Tam¹, Meijun Liu¹, J. Stephen Downie²•Institutions (2)

University of Hong Kong¹, University of Illinois at Urbana–Champaign²

25 Mar 2018

TL;DR: A large-scale dataset of similar artists recommended in four well-known online music steaming services, namely Spotify, Last.fm, the Echo Nest, and KKBOX, was collected and preliminary results reveal that similar artists in these services were related to the genre and popularity of the artists.

...read moreread less

Abstract: In supporting music search, online music streaming services often suggest artists who are deemed as similar to those listened to or liked by users. However, there has been an ongoing debate on what constitutes artist similarity. Approaching this problem from an empirical perspective, this study collected a large-scale dataset of similar artists recommended in four well-known online music steaming services, namely Spotify, Last.fm, the Echo Nest, and KKBOX, on which an exploratory quantitative analysis was conducted. Preliminary results reveal that similar artists in these services were related to the genre and popularity of the artists. The findings shed light on how the concept of artist similarity is manifested in widely adopted real-world applications, which will in turn help enhance our understanding of music similarity and recommendation.

...read moreread less

2 citations

Journal Article•DOI•

Mood metadata on Chinese music websites: an exploratory study with user feedback

[...]

Xiao Hu¹, Christy W.L. Cheong², Siwei Zhang¹, J. Stephen Downie³•Institutions (3)

University of Hong Kong¹, Macau Polytechnic Institute², National Center for Supercomputing Applications³

13 Sep 2018-Online Information Review

TL;DR: Insight is provided on understanding the mood metadata on Chinese music websites and uniquely contributes to existing knowledge of culturally diversified music access.

...read moreread less

Abstract: Music mood is an important metadata type on online music repositories and stream music services worldwide. Many existing studies on mood metadata have focused on music websites and services in the Western world to the exclusion of those serving users in other cultures. The purpose of this paper is to bridge this gap by exploring mood labels on influential Chinese music websites.,Mood labels and the associated song titles were collected from six Chinese music websites, and analyzed in relation to mood models and findings in the literature. An online music listening test was conducted to solicit users’ feedback on the mood labels on two popular Chinese music websites. Mood label selections on 30 songs from 64 Chinese listeners were collected and compared to those given by the two websites.,Mood labels, although extensively employed on Chinese music websites, may be insufficient in meeting listeners’ needs. More mood labels of high arousal semantics are needed. Song languages and user familiarity to the songs show influence on users’ selection of mood labels given by the websites.,Suggestions are proposed for future development of mood metadata and mood-enabled user interfaces in the context of global online music access.,This paper provides insights on understanding the mood metadata on Chinese music websites and uniquely contributes to existing knowledge of culturally diversified music access.

...read moreread less

2 citations

Proceedings Article•DOI•

Jazzcats: navigating an RDF triplestore of integrated performance metadata

[...]

Daniel Bangert¹, Terhi Nurmikko-Fuller², J. Stephen Downie³, Yun Hao³•Institutions (3)

University of Göttingen¹, Australian National University², University of Illinois at Urbana–Champaign³

28 Sep 2018

TL;DR: Critical processes of data curation for digital libraries, including quality assessment of the ingested datasets, are highlighted, including research questions enabled by JazzCats, raise musicological implications, and offer suggestions to overcome current limitations.

...read moreread less

Abstract: Applying Linked Data techniques to musical metadata can facilitate new paths of musicological inquiry. JazzCats: Jazz Collection of Aggregated Triples is a prototype project interlinking four discrete jazz performance datasets and external sources as references. Tabular, relational, and graph legacy datasets have necessitated different RDF production and ingestion workflows to support scholarly study of performance traditions. This paper highlights critical processes of data curation for digital libraries, including quality assessment of the ingested datasets. In addition, we describe research questions enabled by JazzCats, raise musicological implications, and offer suggestions to overcome current limitations.

...read moreread less

1 citations

Proceedings Article•DOI•

Swinging Triples: Bridging Jazz Performance Datasets using Linked Data

[...]

Terhi Nurmikko-Fuller¹, Daniel Bangert², Yun Hao³, J. Stephen Downie³•Institutions (3)

Australian National University¹, University of Göttingen², University of Illinois at Urbana–Champaign³

09 Oct 2018

TL;DR: The jazz performance metadata prototype JazzCats:Jazz Collection of Aggregated Triples uses Linked Data to bridge four discrete jazz music datasets and is a new digital resource that can be used to support and enrich scholarship and research in musicology and performance studies.

...read moreread less

Abstract: The jazz performance metadata prototype JazzCats:Jazz Collection of Aggregated Triples uses Linked Data to bridge four discrete jazz music datasets: Linked Jazz, with prosopographical and interpersonal information about musicians; the Weimar Jazz Database (WJazzD), containing musicological metadata; a discography of the jazz standard BodyS and J-DISC, a fourth independent but complementary and extensive discographic project. Through the use of custom-built ontological structures the data, originally stored in various different information structures, has been converted to RDF and merged together in a single triplestore. The result is a new digital resource that can be used to support and enrich scholarship and research in musicology and performance studies.

...read moreread less

1 citations

Creating A Disability Corpus for Literary Analysis: Pilot Classification Experiments

[...]

Ryan Dubnicek, Ted Underwood, J. Stephen Downie

01 Jan 2018

TL;DR: This project seeks to pilot a classification process using manually assigned ground truth on a subset of volumes from the HathiTrust, and suggests full-scale deployment of a statistical classifier on a large corpus of literature in order to assemble a disability corpus.

...read moreread less

Abstract: As literary text opens to researchers for distant reading, the computational analysis of large corpora of text for literary scholarship, problems beyond typical data science roadblocks, such as data scale and statistical significance of findings have emerged. For scholars studying character and social representation in literature, the identification of characters within the given classes of study is crucial, painstaking, and often a manual process. However, for characters with disabilities, manual identification is prohibitively difficult to undertake at scale, and especially challenging given the coded textual markers that can be used to refer to disability. There currently exists no corpus of characters in fiction with disabilities, which is the first step to at-scale computational study of this topic. This project seeks to pilot a classification process using manually assigned ground truth on a subset of volumes from the HathiTrust. Having successfully built and evaluated a Naïve Bayes classifier, we suggest full-scale deployment of a statistical classifier on a large corpus of literature in order to assemble a disability corpus. This project also covers preliminary exploratory textual analysis of characters with disabilities to yield potential research questions for further exploration.

...read moreread less

1 citations

Proceedings Article•DOI•

Seeding Strategies for Semantic Disambiguation

[...]

Annika Hinze¹, David Bainbridge¹, Rebekah Wilkins¹, Craig Taube-Schock¹, J. Stephen Downie² - Show less +1 more•Institutions (2)

University of Waikato¹, University of Illinois at Urbana–Champaign²

23 May 2018

TL;DR: The seeding algorithm is introduced and seeding strategies for identifying initial concepts in text volumes, such as books, that are stored in a digital library are explored.

...read moreread less

Abstract: and phrases in a text, for which we use an automatically generated Concept-in-Context (CiC) network. Words and phrases rarely belong to a single concept; disambiguation in Capisco relies on interplay between words that are in close vicinity in the text. Starting the disambiguation is a seeding process, that identifies the first concepts, which then form the context for further disambiguation steps. This paper introduces the seeding algorithm and explores seeding strategies for identifying these initial concepts in text volumes, such as books, that are stored in a digital library.

...read moreread less

1 citations

Proceedings Article•DOI•

MEDDLEing with Digital Library Searches: Surmounting User Model and System Misalignments through Lightweight Bespoke Proxying

[...]

David Bainbridge¹, Annika Hinze¹, Sally Jo Cunningham¹, J. Stephen Downie²•Institutions (2)

University of Waikato¹, University of Illinois at Urbana–Champaign²

23 May 2018

TL;DR: A bespoke proxying technique is detail called Meddle -- for ModifiED Digital Library Environment -- which is a lightweight agile technique that helps address identified pitfalls in a DL search interface that operates independently of the originating digital library.

...read moreread less

Abstract: We document how surprisingly easy it is for user misconceptions to arise when using digital library search interfaces, and the significant unseen impact this can have on the user's interpretation of search results. Further, we detail a bespoke proxying technique we have devised called Meddle -- for ModifiED Digital Library Environment -- which is a lightweight agile technique that helps address identified pitfalls in a DL search interface that operates independently of the originating digital library.

...read moreread less

Proceedings Article•DOI•

At the Nexus of Data and Collections: New Affordances in the Age of Mass-Scale Digital Libraries

[...]

J. Stephen Downie¹, Elizabeth Lorang², Leen-Kiat Soh², David Bainbridge³, Sandra McIntyre, Kevin R. Page⁴ - Show less +2 more•Institutions (4)

University of Illinois at Urbana–Champaign¹, University of Nebraska–Lincoln², University of Waikato³, University of Oxford⁴

23 May 2018

TL;DR: Within the context of mass-scale digital libraries, this panel will explore methodologies and uses for-as well as the results of- conceiving of " data as collections" and "collections as data" through use cases involving data mining of the HathiTrust Digital Library.

...read moreread less

Abstract: Within the context of mass-scale digital libraries, this panel will explore methodologies and uses for-as well as the results of- conceiving of "data as collections" and "collections as data." The panel will explore the implications of these concepts through use cases involving data mining of the HathiTrust Digital Library, particularly major projects developed at the HathiTrust Research Center. Featured will be the Workset Creation for Scholarly Analysis + Data Capsules (WCSA+DC) project, the Solr Extracted Features project, and the Image Analysis for Archival Discovery (Aida) project. Each of these projects focuses on various aspects of text, image and data mining and analysis of mass-scale digital library collections.

...read moreread less