scispace - formally typeset
Search or ask a question
Author

J. Stephen Downie

Bio: J. Stephen Downie is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Music information retrieval & Digital library. The author has an hindex of 30, co-authored 164 publications receiving 4135 citations. Previous affiliations of J. Stephen Downie include University of Western Ontario & National Center for Supercomputing Applications.


Papers
More filters
Journal ArticleDOI
TL;DR: This work examines the role that exploratory data analysis and visualization tools may play in understanding large bibliographic datasets and presents one such tool, Hathi trust+Bookworm, which allows multifaceted exploration of the multimillion work HathiTrust Digital Library.
Abstract: The emergence of large multi‐institutional digital libraries has opened the door to aggregate‐level examinations of the published word. Such large‐scale analysis offers a new way to pursue traditional problems in the humanities and social sciences, using digital methods to ask routine questions of large corpora. However, inquiry into multiple centuries of books is constrained by the burdens of scale, where statistical inference is technically complex and limited by hurdles to access and flexibility. This work examines the role that exploratory data analysis and visualization tools may play in understanding large bibliographic datasets. We present one such tool, HathiTrust+Bookworm, which allows multifaceted exploration of the multimillion work HathiTrust Digital Library, and center it in the broader space of scholarly tools for exploratory data analysis.

10 citations

Proceedings Article
01 Jan 2012
TL;DR: In this paper, the authors assess contemporary MIR solutions to these issues, aligning them with the emerging notion of Research Objects for reproducible research in other domains, and propose their adoption as a route to reuse in MIR.
Abstract: Many solutions for the reuse and remixing of MIR methods and the tools implementing them have been introduced over recent years. Proposals for achieving the necessary interoperability have ranged from shared software libraries and interfaces, through common frameworks and portals, to standardised file formats and metadata. Each proposal shares the desire to reuse and combine repurposable components into assemblies (or “workflows”) that can be used in novel and possibly more ambitious ways. Reuse and remixing also have great implications for the process of MIR research. The encapsulation of any algorithm and its operation ‐ including inputs, parameters, and outputs ‐ is fundamental to the repeatability and reproducibility of any experiment. This is desirable both for the open and reliable evaluation of algorithms (e.g. in MIREX) and for the advancement of MIR by building more effectively upon prior research. At present there is no clear best practice widely adopted throughout the community. Should this be considered a failure? Are there limits to interoperability unique to MIR, and how might they be overcome? In this paper we assess contemporary MIR solutions to these issues, aligning them with the emerging notion of Research Objects for reproducible research in other domains, and propose their adoption as a route to reuse in MIR.

10 citations

DOI
01 Mar 2017
TL;DR: The Extracted Features (EF) dataset is developed, a dataset of quantitative counts for every page of nearly 5 million scanned books that includes unigram counts, part of speech tagging, header and footer extraction, counts of characters at both sides of the page, and more.
Abstract: Consortial collections have led to unprecedented scales of digitized corpora, but the insights that they enable are hampered by the complexities of access, particularly to in-copyright or orphan works. Pursuing a principle of non-consumptive access, we developed the Extracted Features (EF) dataset, a dataset of quantitative counts for every page of nearly 5 million scanned books. The EF includes unigram counts, part of speech tagging, header and footer extraction, counts of characters at both sides of the page, and more. Distributing book data with features already extracted saves resource costs associated with large-scale text use, improves the reproducibility of research done on the dataset, and opens the door to datasets on copyrighted books. We describe the coverage of the dataset and demonstrate its useful application through duplicate book alignment and identification of their cleanest scans, topic modeling, word list expansion, and multifaceted visualization.

10 citations

Proceedings ArticleDOI
21 Jun 2015
TL;DR: An automatic topic discovery system from web-mined user-generated interpretations of songs to provide subject access to a music digital library is proposed and filtering techniques to identify high-quality topics are proposed.
Abstract: The assignment of subject metadata to music is useful for organizing and accessing digital music collections. Since manual subject annotation of large-scale music collections is labor-intensive, automatic methods are preferred. Topic modeling algorithms can be used to automatically identify latent topics from appropriate text sources. Candidate text sources such as song lyrics are often too poetic, resulting in lower-quality topics. Users' interpretations of song lyrics provide an alternative source. In this paper, we propose an automatic topic discovery system from web-mined user-generated interpretations of songs to provide subject access to a music digital library. We also propose and evaluate filtering techniques to identify high-quality topics. In our experiments, we use 24,436 popular songs that exist in both the Million Song Dataset and songmeanings.com. Topic models are generated using Latent Dirichlet Allocation (LDA). To evaluate the coherence of learned topics, we calculate the Normalized Pointwise Mutual Information (NPMI) of the top ten words in each topic based on occurrences in Wikipedia. Finally, we evaluate the resulting topics using a subset of 422 songs that have been manually assigned to six subjects. Using this system, 71% of the manually assigned subjects were correctly identified. These results demonstrate that topic modeling of song interpretations is a promising method for subject metadata enrichment in music digital libraries. It also has implications for affording similar access to collections of poetry and fiction.

10 citations

Journal ArticleDOI
14 Oct 2016
TL;DR: It is found that user‐generated interpretations always outperformed lyrics in terms of classification accuracy, suggesting that user interpretations are more useful in the subject classification task than lyrics because the semantically ambiguous poetic nature of lyrics tends to confuse classifiers.
Abstract: That music seekers consider song subject metadata to be helpful in their searching/browsing experience has been noted in prior published research. In an effort to develop a subject-based tagging system, we explored the creation of automatically generated song subject classifications. Our classifications were derived from two different sources of song-related text: 1) lyrics; and 2) user interpretations of lyrics collected from songmeanings.com. While both sources contain subject-related information, we found that user-generated interpretations always outperformed lyrics in terms of classification accuracy. This suggests that user interpretations are more useful in the subject classification task than lyrics because the semantically ambiguous poetic nature of lyrics tends to confuse classifiers. An examination of top-ranked terms and confusion matrices supported our contention that users' interpretations work better for detecting the meaning of songs than what is conveyed through lyrics.

10 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This survey reviews 100+ recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques.
Abstract: Extending beyond the boundaries of science, art, and culture, content-based multimedia information retrieval provides new paradigms and methods for searching through the myriad variety of media all over the world. This survey reviews 100p recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques. Based on the current state of the art, we discuss the major challenges for the future.

1,652 citations

Book
19 Apr 2012
TL;DR: In this paper, the authors introduce concepts relevant to Information Behavior Models, Paradigms, and Theories in the study of Information Behavior Methods for Studying Information Behavior Research Results and Reflections.
Abstract: Abbreviated Contents Figures and Tables Preface Introduction and Examples Concepts Relevant to Information Behavior Models, Paradigms, and Theories in the Study of Information Behavior Methods for Studying Information Behavior Research Results and Reflections Appendix: Glossary Appendix: Questions for Discussion and Application References Index

1,347 citations

Book
01 Jan 1972
TL;DR: Invisible colleges diffusion of knowledge in scientific communities is also a way as one of the collective books that gives many advantages as discussed by the authors The advantages are not only for you, but for the other peoples with those meaningful benefits.
Abstract: No wonder you activities are, reading will be always needed. It is not only to fulfil the duties that you need to finish in deadline time. Reading will encourage your mind and thoughts. Of course, reading will greatly develop your experiences about everything. Reading invisible colleges diffusion of knowledge in scientific communities is also a way as one of the collective books that gives many advantages. The advantages are not only for you, but for the other peoples with those meaningful benefits.

1,262 citations

Book
14 Apr 2006
TL;DR: A theory of expectation is used to explain how music evokes various emotions for readers interested in cognitive science and evolutionary psychology as well as music as mentioned in this paper, which can be found in the book "Sweet Anticipation".
Abstract: A theory of expectations is used to explain how music evokes various emotions for readers interested in cognitive science and evolutionary psychology as well as music. The psychological theory of expectation that David Huron proposes in "Sweet Anticipation" grew out of experimental efforts to understand how music evokes emotions. These efforts evolved into a general theory of expectation that will prove informative to readers interested in cognitive science and evolutionary psychology as well as those interested in music. The book describes a set of psychological mechanisms and illustrates how these mechanisms work in the case of music. All examples of notated music can be heard on the Web. Huron proposes that emotions evoked by expectation involve five functionally distinct response systems: reactive responses (which engage defensive reflexes); tension responses (where uncertainty leads to stress); predictive responses (which reward accurate prediction); imaginative responses (which facilitate deferred gratification); and appraisal responses (which occur after conscious thought is engaged). For real-world events, these five response systems typically produce a complex mixture of feelings. The book identifies some of the aesthetic possibilities afforded by expectation, and shows how common musical devices (such as syncopation, cadence, meter, tonality, and climax) exploit the psychological opportunities. The theory also provides new insights into the physiological psychology of awe, laughter, and "spine-tingling chills." Huron traces the psychology of expectations from the patterns of the physical/cultural world through imperfectly learned heuristics used to predict that world to the phenomenal qualia experienced by those who apprehend the world.

1,158 citations