scispace - formally typeset
Search or ask a question
Author

Ichiro Fujinaga

Bio: Ichiro Fujinaga is an academic researcher from McGill University. The author has contributed to research in topics: Optical music recognition & Music information retrieval. The author has an hindex of 29, co-authored 164 publications receiving 3353 citations. Previous affiliations of Ichiro Fujinaga include Marianopolis College & Johns Hopkins University.


Papers
More filters
Journal ArticleDOI
TL;DR: An overview of the literature concerning the automatic analysis of images of printed and handwritten musical scores and a reference scheme for any researcher wanting to compare new OMR algorithms against well-known ones is presented.
Abstract: For centuries, music has been shared and remembered by two traditions: aural transmission and in the form of written documents normally called musical scores. Many of these scores exist in the form of unpublished manuscripts and hence they are in danger of being lost through the normal ravages of time. To preserve the music some form of typesetting or, ideally, a computer system that can automatically decode the symbolic images and create new scores is required. Programs analogous to optical character recognition systems called optical music recognition (OMR) systems have been under intensive development for many years. However, the results to date are far from ideal. Each of the proposed methods emphasizes different properties and therefore makes it difficult to effectively evaluate its competitive advantages. This article provides an overview of the literature concerning the automatic analysis of images of printed and handwritten musical scores. For self-containment and for the benefit of the reader, an introduction to OMR processing systems precedes the literature overview. The following study presents a reference scheme for any researcher wanting to compare new OMR algorithms against well-known ones.

246 citations

Proceedings Article
01 Jan 2005
TL;DR: jAudio is a new framework for feature extraction designed to eliminate the duplication of effort in calculating features from an audio signal and provides a unique method of handling multidimensional features and a new mechanism for dependency handling to prevent duplicate calculations.
Abstract: jAudio is a new framework for feature extraction designed to eliminate the duplication of effort in calculating features from an audio signal. This system meets the needs of MIR researchers by providing a library of analysis algorithms that are suitable for a wide array of MIR tasks. In order to provide these features with a minimal learning curve, the system implements a GUI that makes the process of selecting desired features straight forward. A command-line interface is also provided to manipulate jAudio via scripting. Furthermore, jAudio provides a unique method of handling multidimensional features and a new mechanism for dependency handling to prevent duplicate calculations. The system takes a sequence of audio files as input. In the GUI, users select the features that they wish to have extracted—letting jAudio take care of all dependency problems—and either execute directly from the GUI or save the settings for batch processing. The output is either an ACE XML file or an ARFF file depending on the user’s preference.

181 citations

Proceedings Article
01 Jan 2004
TL;DR: This paper presents a system that extracts 109 musical features from symbolic recordings (MIDI) and uses them to classify the recordings by genre and argues the importance of using high-level musical features, something that has been largely neglected in automatic classification systems to date in favour of low-level features.
Abstract: This paper presents a system that extracts 109 musical features from symbolic recordings (MIDI, in this case) and uses them to classify the recordings by genre. The features used here are based on instrumentation, texture, rhythm, dynamics, pitch statistics, melody and chords. The classification is performed hierarchically using different sets of features at different levels of the hierarchy. Which features are used at each level, and their relative weightings, are determined using genetic algorithms. Classification is performed using a novel ensemble of feedforward neural networks and k-nearest neighbour classifiers. Arguments are presented emphasizing the importance of using high-level musical features, something that has been largely neglected in automatic classification systems to date in favour of low-level features. The effect on classification performance of varying the number of candidate features is examined in order to empirically demonstrate the importance of using a large variety of musically meaningful features. Two differently sized hierarchies are used in order to test the performance of the system under different conditions. Very encouraging classification success rates of 98% for root genres and 90% for leaf genres are obtained for a hierarchical taxonomy consisting of 9 leaf genres.

180 citations

Proceedings ArticleDOI
08 Oct 2006
TL;DR: A number of counterarguments that emphasize the importance of continuing research in automatic genre classification are presented and specific strategies for overcoming current performance limitations are discussed.
Abstract: Research in automatic genre classification has been producing increasingly small performance gains in recent years, with the result that some have suggested that such research should be abandoned in favor of more general similarity research. It has been further argued that genre classification is of limited utility as a goal in itself because of the ambiguities and subjectivity inherent to genre. This paper presents a number of counterarguments that emphasize the importance of continuing research in automatic genre classification. Specific strategies for overcoming current performance limitations are discussed, and a brief review of background research in musicology and psychology relating to genre is presented. Insights from these highly relevant fields are generally absent from discourse within the MIR community, and it is hoped that this will help to encourage a more multi-disciplinary approach to automatic genre classification in the future.

156 citations

Journal ArticleDOI
TL;DR: A quantitative comparison of different algorithms for the removal of stafflines from music images is presented and a new skeletonization-based approach is suggested.
Abstract: This paper presents a quantitative comparison of different algorithms for the removal of stafflines from music images. It contains a survey of previously proposed algorithms and suggests a new skeletonization-based approach. We define three different error metrics, compare the algorithms with respect to these metrics, and measure their robustness with respect to certain image defects. Our test images are computer-generated scores on which we apply various image deformations typically found in real-world data. In addition to modern western music notation, our test set also includes historic music notation such as mensural notation and lute tablature. Our general approach and evaluation methodology is not specific to staff removal but applicable to other segmentation problems as well.

133 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Proceedings ArticleDOI
25 Oct 2010
TL;DR: The openSMILE feature extraction toolkit is introduced, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities and has a modular, component based architecture which makes extensions via plug-ins easy.
Abstract: We introduce the openSMILE feature extraction toolkit, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities. Audio low-level descriptors such as CHROMA and CENS features, loudness, Mel-frequency cepstral coefficients, perceptual linear predictive cepstral coefficients, linear predictive coefficients, line spectral frequencies, fundamental frequency, and formant frequencies are supported. Delta regression and various statistical functionals can be applied to the low-level descriptors. openSMILE is implemented in C++ with no third-party dependencies for the core functionality. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. It supports on-line incremental processing for all implemented features as well as off-line and batch processing. Numeric compatibility with future versions is ensured by means of unit tests. openSMILE can be downloaded from http://opensmile.sourceforge.net/.

2,286 citations

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, and achieved remarkable performances in both lexicon free and lexicon-based scene text recognition tasks.
Abstract: Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition, which evidently verifies the generality of it.

2,184 citations