scispace - formally typeset
Search or ask a question
Author

J. Stephen Downie

Bio: J. Stephen Downie is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Music information retrieval & Digital library. The author has an hindex of 30, co-authored 164 publications receiving 4135 citations. Previous affiliations of J. Stephen Downie include University of Western Ontario & National Center for Supercomputing Applications.


Papers
More filters
Proceedings ArticleDOI
20 Jun 2022
TL;DR: This exploratory study proposes a prototype sentence-level parallel corpus to support studying optical character recognition (OCR) quality in curated digitized library collections and conducts an analysis of OCR errors with a specific focus on their associations with the source text metadata.
Abstract: This exploratory study proposes a prototype sentence-level parallel corpus to support studying optical character recognition (OCR) quality in curated digitized library collections. Existing data resources, such as ICDAR2019 [21] and GT4HistOCR [23], generally aligned content by artifact publishing characteristics such as documents or lines, which is limited to explore OCR noise concentrating on natural language granularity like sentences and chapters. Building upon an existing volume-aligned corpus that collected human-proofread texts from Project Gutenberg and paired OCR views from HathiTrust Digital Library, we extracted and aligned 167,079 sentences from 189 sampled books in four domains published from 1793 to 1984. To support downstream research on OCR quality, we conducted an analysis of OCR errors with a specific focus on their associations with the source text metadata. We found that sampled data in agriculture has a higher ratio of real-word errors than other domains, while sentences from social-science volumes contain more non-word errors. Besides, data sampled from early-age volumes tend to have a high ratio of non-word errors, while samples from recently-published volumes is likely to have more real-word errors. Following our findings, we suggest that scholars should consider the potential influence of source data characteristics on their findings in the study of OCR quality issues.CCS CONCEPTS• Information systems → Digital libraries and archives; • Applied computing → Document management and text processing; Document capture.
Journal ArticleDOI
01 Jan 2019
TL;DR: Wang et al. as mentioned in this paper identified how users in different cultural backgrounds seek and use cultural heritage information on e•Dunhuang, as well as their perceptions and opinions toward that.
Abstract: E‐Dunhuang is a visual material centered digital library of a UNESCO world heritage. This study aims to identify how users in different cultural backgrounds seek and use cultural heritage information on e‐Dunhuang, as well as their perceptions and opinions toward that. As a preliminary report, this poster focuses on the visual content (i.e. images, panorama) of e‐Dunhuang, which is arguably the most prominent component in this platform. Results of usability tests and follow‐up interviews reveal that users had polarized opinions. The findings can inform cross‐cultural access to cultural heritage collections dominant with visual content.
Proceedings ArticleDOI
21 Jun 2010
TL;DR: This demonstration shows a recommender system for the Music Information Retrieval (MIR) research community that extracts the key topics and tags by analyzing the ten-year cumulative ISMIR proceedings, and recommends papers and research colleagues to users in an interactive way.
Abstract: In this demonstration, we show a recommender system for the Music Information Retrieval (MIR) research community. We extract the key topics and tags by analyzing the ten-year cumulative ISMIR proceedings, and recommend papers and research colleagues to users in an interactive way.
Proceedings ArticleDOI
01 Aug 2020
TL;DR: This paper provides a use case utilizing an English literature dataset of 178,381 volumes curated by the HathiTrust Research Center (HTRC) for measuring the change of three literature genres.
Abstract: This paper investigates the limitations and challenges of the curated datasets provided by digital libraries in support of digital humanities (DH) research. Our presented work provides a use case utilizing an English literature dataset of 178,381 volumes curated by the HathiTrust Research Center (HTRC) for measuring the change of three literature genres. These volumes were selected from over 17 million digitized items in the HathiTrust Digital Library. We demonstrate our methods and workflow for improving the representativeness and scholarly usability of the existing datasets. We analyzed and effectively overcame three common limitations: duplicate volumes, uneven distribution of data and OCR errors. We suggest that stakeholders of digital libraries should flag and address these limitations to improve their provisions' usability in the context of digital humanities research.
Journal ArticleDOI
TL;DR: The development of the synthetic biology knowledge system (SBKS) text processing pipeline is described, which uses natural language processing techniques to extract and correlate information from the literature for synthetic biology researchers.
Abstract: Scientific articles contain a wealth of information about experimental methods and results describing biological designs. Due to its unstructured nature and multiple sources of ambiguity and variability, extracting this information from text is a difficult task. In this paper, we describe the development of the synthetic biology knowledge system (SBKS) text processing pipeline. The pipeline uses natural language processing techniques to extract and correlate information from the literature for synthetic biology researchers. Specifically, we apply named entity recognition, relation extraction, concept grounding, and topic modeling to extract information from published literature to link articles to elements within our knowledge system. Our results show the efficacy of each of the components on synthetic biology literature and provide future directions for further advancement of the pipeline.

Cited by
More filters
Journal ArticleDOI
TL;DR: This survey reviews 100+ recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques.
Abstract: Extending beyond the boundaries of science, art, and culture, content-based multimedia information retrieval provides new paradigms and methods for searching through the myriad variety of media all over the world. This survey reviews 100p recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques. Based on the current state of the art, we discuss the major challenges for the future.

1,652 citations

Book
19 Apr 2012
TL;DR: In this paper, the authors introduce concepts relevant to Information Behavior Models, Paradigms, and Theories in the study of Information Behavior Methods for Studying Information Behavior Research Results and Reflections.
Abstract: Abbreviated Contents Figures and Tables Preface Introduction and Examples Concepts Relevant to Information Behavior Models, Paradigms, and Theories in the Study of Information Behavior Methods for Studying Information Behavior Research Results and Reflections Appendix: Glossary Appendix: Questions for Discussion and Application References Index

1,347 citations

Book
01 Jan 1972
TL;DR: Invisible colleges diffusion of knowledge in scientific communities is also a way as one of the collective books that gives many advantages as discussed by the authors The advantages are not only for you, but for the other peoples with those meaningful benefits.
Abstract: No wonder you activities are, reading will be always needed. It is not only to fulfil the duties that you need to finish in deadline time. Reading will encourage your mind and thoughts. Of course, reading will greatly develop your experiences about everything. Reading invisible colleges diffusion of knowledge in scientific communities is also a way as one of the collective books that gives many advantages. The advantages are not only for you, but for the other peoples with those meaningful benefits.

1,262 citations

Book
14 Apr 2006
TL;DR: A theory of expectation is used to explain how music evokes various emotions for readers interested in cognitive science and evolutionary psychology as well as music as mentioned in this paper, which can be found in the book "Sweet Anticipation".
Abstract: A theory of expectations is used to explain how music evokes various emotions for readers interested in cognitive science and evolutionary psychology as well as music. The psychological theory of expectation that David Huron proposes in "Sweet Anticipation" grew out of experimental efforts to understand how music evokes emotions. These efforts evolved into a general theory of expectation that will prove informative to readers interested in cognitive science and evolutionary psychology as well as those interested in music. The book describes a set of psychological mechanisms and illustrates how these mechanisms work in the case of music. All examples of notated music can be heard on the Web. Huron proposes that emotions evoked by expectation involve five functionally distinct response systems: reactive responses (which engage defensive reflexes); tension responses (where uncertainty leads to stress); predictive responses (which reward accurate prediction); imaginative responses (which facilitate deferred gratification); and appraisal responses (which occur after conscious thought is engaged). For real-world events, these five response systems typically produce a complex mixture of feelings. The book identifies some of the aesthetic possibilities afforded by expectation, and shows how common musical devices (such as syncopation, cadence, meter, tonality, and climax) exploit the psychological opportunities. The theory also provides new insights into the physiological psychology of awe, laughter, and "spine-tingling chills." Huron traces the psychology of expectations from the patterns of the physical/cultural world through imperfectly learned heuristics used to predict that world to the phenomenal qualia experienced by those who apprehend the world.

1,158 citations