scispace - formally typeset
Search or ask a question
Journal ArticleDOI

An Automatic Synthesis of Musical Phrases from Multi-Pitch Samples

TL;DR: Contrary to common practice, this work proposes to record and utilize sound samples containing not one, but short sequences of pitches, so that natural pitch transitions are preserved and phrases appear to be much smoother, despite using very limited set of performance rules.
Abstract: Sound synthesizers are a natural element of a musician's toolset. In music arrangement, samplers can often produce satisfactory results, but it requires a combination of manual and automatic methods that may be arduous at times. Concatenative Sound Synthesis reproduces many of natural performance- and expression-related nuances- but at a cost of high demand for processing power. Here, another method of musical phrase synthesis aimed at music arrangement is presented that addresses these and other related issues. Contrary to common practice, we propose to record and utilize sound samples containing not one, but short sequences of pitches. In effect, natural pitch transitions are preserved and phrases appear to be much smoother, despite using very limited set of performance rules. The proof-of-concept implementation of the proposed method is discussed in detail along with attempts at optimizing the final sound effects, based on auditory tests. The limitations and future applications of the synthesizer are also discussed.
Citations
More filters
Journal ArticleDOI
TL;DR: A comparative study of music features derived from audio recordings, i.e. the same music pieces but representing different music genres, excerpts performed by different musicians, and songs performed by a musician, whose style evolved over time is presented.
Abstract: The paper presents a comparative study of music features derived from audio recordings, i.e. the same music pieces but representing different music genres, excerpts performed by different musicians, and songs performed by a musician, whose style evolved over time. Firstly, the origin and the background of the division of music genres were shortly presented. Then, several objective parameters of an audio signal were recalled that have an easy interpretation in the context of perceptual relevance. Within the study parameter values were extracted from music excerpts, gathered and compared to determine to what extent they are similar within the songs of the same performer or samples representing the same piece.

4 citations


Cites background from "An Automatic Synthesis of Musical P..."

  • ...One of the ways the songs can be described is by using their origin (J-rock, Brit pop), time when they are composed or performed, performance (Pluta et al., 2017), mood of music (Barthet et al., 2017; Plewa, Kostek, 2015), instruments used (symphonic, acoustic, rock), music techniques (riff, rap)…...

    [...]

Book ChapterDOI
04 Apr 2018
TL;DR: An algorithm is presented that uses the Kalman filter to combine simple phrase structure models with observed differences in pitch within the phrase to refine the phrase model and hence adjust the loudness level and tempo of qualities of the melody line.
Abstract: In this paper, we present an algorithm that uses the Kalman filter to combine simple phrase structure models with observed differences in pitch within the phrase to refine the phrase model and hence adjust the loudness level and tempo of qualities of the melody line. We show how similar adjustments may be made to the accompaniment to introduce expressive attributes to a midi file representation of a score. In the paper, we show that the subjects had some difficulty in distinguishing between the resulting expressive renderings and human performances of the same score.
References
More filters
Journal ArticleDOI
TL;DR: This review traces the early work on the development of speech synthesizers, discovery of minimal acoustic cues for phonetic contrasts, evolution of phonemic rule programs, incorporation of prosodic rules, and formulation of techniques for text analysis.
Abstract: The automatic conversion of English text to synthetic speech is presently being performed, remarkably well, by a number of laboratory systems and commercial devices. Progress in this area has been made possible by advances in linguistic theory, acoustic-phonetic characterization of English sound patterns, perceptual psychology, mathematical modeling of speech production, structured programming, and computer hardware design. This review traces the early work on the development of speech synthesizers, discovery of minimal acoustic cues for phonetic contrasts, evolution of phonemic rule programs, incorporation of prosodic rules, and formulation of techniques for text analysis. Examples of rules are used liberally to illustrate the state of the art. Many of the examples are taken from Klattalk, a text-to-speech system developed by the author. A number of scientific problems are identified that prevent current systems from achieving the goal of completely human-sounding speech. While the emphasis is on rule programs that drive a format synthesizer, alternatives such as articulatory synthesis and waveform concatenation are also reviewed. An extensive bibliography has been assembled to show both the breadth of synthesis activity and the wealth of phenomena covered by rules in the best of these programs. A recording of selected examples of the historical development of synthetic speech, enclosed as a 33 1/3-rpm record, is described in the Appendix.

843 citations


"An Automatic Synthesis of Musical P..." refers methods in this paper

  • ...In particular the Concatenative Sound Synthesis (CSS), inspired by a speech synthesis method (Klatt, 1983; Prudon, 2003), seems promising....

    [...]

Book
02 Sep 2004
TL;DR: In this article, the neurobiology and development of rhythm meter-rhythm interactions is discussed. But, as a kind of attentional behavior, it is not a normal behavior for humans.
Abstract: Table of Contents Introduction Meter as a Kind of Attentional Behavior Relevant research on Rhythmic Perception and Production The Neurobiology and Development of Rhythm Meter-Rhythm Interactions I: Ground Rules Metric Representations and Metric Well-Formedness Meter-Rhythm Interactions II: Problems Metric Flux in Beethoven's Fifth Non-Isochronous Meters NI-Meters in Theory and Practice The Many Meters Hypothesis Conclusion Notes Bibliography Index

550 citations


"An Automatic Synthesis of Musical P..." refers background in this paper

  • ...Such values are acceptable in comparison to 100 ms – a value considered to be the fastest perceptual musical separation possible (London, 2004)....

    [...]

Journal ArticleDOI
TL;DR: The KTH rule system models performance principles used by musicians when performing a musical score, within the realm of Western classical, jazz and popular music, by using selections of rules and rule quantities to model semantic descriptions such as emotional expressions.
Abstract: The KTH rule system models performance principles used by musicians when performing a musical score, within the realm of Western classical, jazz and popular music. An overview is given of the major rules involving phrasing, micro-level timing, metrical patterns and grooves, articulation, tonal tension, intonation, ensemble timing, and performance noise. By using selections of rules and rule quantities, semantic descriptions such as emotional expressions can be modeled. A recent real-time implementation provides the means for controlling the expressive character of the music. The communicative purpose and meaning of the resulting performance variations are discussed as well as limitations and future improvements.

214 citations


"An Automatic Synthesis of Musical P..." refers background or methods in this paper

  • ...The current proposal Through our survey of the literature, we noticed that a great number of performance rules described in the literature (Friberg et al., 2006; Delekta, Pluta, 2015) refers to note transitions....

    [...]

  • ...In case of synthesizers which generate individual pitches separately, the rules are applied to the following parameters: signal amplitude, inter-onset duration (note spacing), offset to onset duration (note duration), vibrato amplitude, and deviation from 12-TET tuning (Friberg et al., 2006)....

    [...]

  • ...A vast amount of research was carried out in order to establish a quantitative description of this phenomenon (Bresin, 1998; Friberg et al., 2006; Gabrielsson, 1985; Widmer, 1995; 2002;Widmer, Tobudic, 2003), which resulted in a number of universal “performance rules”....

    [...]

Proceedings Article
01 Jan 1991
TL;DR: It appears inevitable that sampling synthesis will migrate toward spectral modeling, and physical models provide more compact algorithms for generating familiar classes of sounds, such as strings and woodwinds, which could conceivably help solve the transient modeling problem.
Abstract: algorithm synthesis seems destined to diminish in importance due to the lack of analysis support. As many algorithmic synthesis attempts showed us long ago, it is difficult to find a wide variety of musically pleasing sounds by exploring the parameters of a mathematical expression. Apart from a musical context that might impart some meaning, most sounds are simply uninteresting. The most straightforward way to obtain interesting sounds is to draw on past instrument technology or natural sounds. Both spectral-modeling and physical-modeling synthesis techniques can model such sounds. In both cases the model is determined by an analysis procedure that computes optimal model parameters to approximate a particular input sound. The musician manipulates the parameters to create musical variations. Obtaining better control of sampling synthesis will require more general sound transformations. To proceed toward this goal, transformations must be understood in terms of what we hear. The best way we know to understand a sonic transformation is to study its effect on the short-time spectrum, where the spectrum-analysis parameters are tuned to match the characteristics of hearing as closely as possible. Thus, it appears inevitable that sampling synthesis will migrate toward spectral modeling. If abstract methods disappear and sampling synthesis is absorbed into spectral modeling, this leaves only two categories: physical-modeling and spectral-modeling. This boils all synthesis techniques down to those which model either the source or the receiver of the sound. Some characteristics of each case are listed in the following table: “Viewpoints on the History of Digital Synthesis,” J.O. Smith, Proc. Int. Computer Music Conf. (ICMC-91), Montreal, pp. 1-10, Oct. 1991. 5 PROJECTIONS FOR THE FUTURE Page 13 Spectral Modeling Physical Modeling Fully general Specialized case by case Any basilar membrane skyline Any instrument at some cost Time and frequency domains Time and space domains Numerous time-freq envelopes Several physical variables Memory requirements large More compact description Large operation-count/sample Small to large complexity Stochastic part initially easy Stochastic part usually tricky Attacks difficult Attacks natural Articulations difficult Articulations natural Expressivity limited Expressivity unlimited Nonlinearities difficult Nonlinearities natural Delay/reverb hard Delay/reverb natural Can calibrate to nature Can calibrate to nature Can calibrate to any sound May calibrate to own sound Physics not too helpful Physics very helpful Cartoons from pictures Working models from all clues Evolution restricted Evolution unbounded Represents sound receiver Represents sound source Since spectral modeling constructs directly the spectrum received along the basilar membrane of the ear, its scope is inherently broader than that of physical modeling. However, physical models provide more compact algorithms for generating familiar classes of sounds, such as strings and woodwinds. Also, they are generally more efficient at producing effects in the spectrum arising from attack articulations, long delays, pulsed noise, or nonlinearity in the physical instrument. It is also interesting to pause and consider how invariably performing musicians have interacted with resonators since the dawn of time in music. When a resonator has an impulse-response duration greater than that of a spectral frame (nominally the “integration time” of the ear), as happens with any string, then implementation of the resonator directly in the short-time spectrum becomes inconvenient. A resonator is a much easier to implement as a recursion than as a super-thin formant in a short-time spectrum. Of course, as Orion Larson says: “Anything is possible in software.” Spectral modeling has unsolved problems in the time domain: it is not yet known how to best modify a short-time Fourier analysis in the vicinity of an attack or other phase-sensitive transient. Phase is important during transients and not during steady-state intervals; a proper time-varying spectrum model should retain phase only where needed for accurate synthesis. The general question of timbre perception of non-stationary sounds becomes important. Wavelet transforms support more general signal building blocks that could conceivably help solve the transient modeling problem. Most activity with wavelet transforms to date has been confined to basic constant-Q spectrum analysis, where the analysis filters are aligned to a logarithmic frequency grid and have a constant ratio of bandwidth to center frequency or Q. Spectral models are also not yet sophisticated; sinusoids and filtered noise with piecewise-linear envelopes are a good start, but surely there are other good primiPrivate communication, 1976 “Viewpoints on the History of Digital Synthesis,” J.O. Smith, Proc. Int. Computer Music Conf. (ICMC-91), Montreal, pp. 1-10, Oct. 1991.

118 citations


"An Automatic Synthesis of Musical P..." refers background in this paper

  • ...Even though not all consider sampling as a sound synthesis method, arguing that it is merely a playback of recorded sounds, others classify it in the same group of methods as wavetable and granular synthesis (Smith, 1991)....

    [...]

01 Sep 2006
TL;DR: The concatenative real-time sound synthesis system CataRT plays grains from a large corpus of segmented and descriptor-analysed sounds according to proximity to a target position in the descriptor space as a content-based extension to granular synthesis providing direct access to specific sound characteristics.
Abstract: The concatenative real-time sound synthesis system CataRT plays grains from a large corpus of segmented and descriptor-analysed sounds according to proximity to a target position in the descriptor space. This can be seen as a content-based extension to granular synthesis providing direct access to specific sound characteristics. CataRT is implemented in Max/MSP using the FTM library and an SQL database. Segmentation and MPEG-7 descriptors are loaded from SDIF files or generated on-the-fly. CataRT allows to explore the corpus interactively or via a target sequencer, to resynthesise an audio file or live input with the source sounds, or to experiment with expressive speech synthesis and gestural control.

113 citations


"An Automatic Synthesis of Musical P..." refers methods in this paper

  • ...There are various implementations of CSS. CorpusBased Concatenative Synthesis (Schwarz et al., 2006) is a content-based extension to granular synthesis....

    [...]