Using Self-Similarity for Sound/Music Synthesis

Published in Proceedings of ICMC, Montreal, pp423-424, ICMA: San Francisco, 1991.

Abstract:

This paper is a description of work in progress exploring the use of self-similarity for sound synthesis. Motivating the work is a desire to look into the possibilities for using self-similar structures in the auditory domain (e.g., finding relationships between sound and music.) This work is a preliminary step toward a design for an alternative music notation system. The system uses hierarchy and recursion principles to organize structures and musical ideas. In any creative work, establishing the relationship between the whole and the parts is done through a constant movement between different domains (e.g., foreground and background, micro and macro levels, form and content.) Currently we are experimenting with the idea of using fractals as a tool for capturing some of the systematic natures of the creative process in music, as well as finding a new language for timbre synthesis.

In this paper, we will give a brief summary of manifestations of self-similarity in music. Further, we describe the design and implementation of a program we have developed for sound synthesis and music score generation. The program uses a paradigm similar to Lindenmayer's L-system[4] to develop the synthesis parameters. These parameters are coded in multi-layered structures, each level specifying the re-writing rules (e.g., time segmentation, frequency and amplitude progressions.) Different node re-writing algorithms are used to create self-similar or self-affine structures. The vertical harmonics are created according to horizontal frequency development.

Introduction

A Programmable Score Editor

The decision to represent structures of different perceptual layers in the same way was a programming decision; however, it seems to have interesting musical connotations. If we define an icon using its own definition, we will be segmenting time in smaller and smaller pieces and eventually reach the timbre level. Stockhausen[9] studied and experimented with the relationship between sound and music. He has designed a rather elaborate method of timbre composition using rhythms by what he calls ``phase duration'' (which is supposed to act as a parallel to pitch.) It is not far fetched to think that the higher level musical understanding of our mind has evolved according to the lower level structures of the sound. Following this path a subjective question arises, ``Could an entity such as music exist without sound?''

We can also view this issue from the angle of form and content. Schroeder[7] points out that the cantor set¹can be a resolution to the seeming paradox of infinite divisibility of matter. Although it is naive to simply look at sound as material and music as form, it may be a good starting point for a model. Koblyakov[3] predicts that in new music, material and organization are going to be inseparable and new parameters (e.g., sound quality) are going to emerge. It is already rather difficult to separate many of the traditional parameters in computer music.

Some Instances of Self-Similarity in Music

Although Shepard applies a formant-like envelope to the frequency domain representation of the signal, it is done for smoothing the perceptual transition and sustaining the paradox effect. The paradox is created from the fact that we try to extract a one-dimensional²variable (pitch) out of a multidimensional signal (timbre.) We can think of pitch as a value which identifies the relationship between the partials of a signal in a one-dimensional way. Different frequency components of the Shepard Tone are in geometrical relationships with each other. If we view the frequency domain representation of the signal, time scaling according to the same geometrical relationship $\beta$ does not change the ``body'' of the signal but only its boundary conditions, therefore we hear the same pitch and not a pitch scaled according to the scale factor.

The self-similarity of such signals is quite obvious in the time domain. We can explain the paradox phenomenon by stating that we hear the same pitch when the signal is time-scaled by the appropriate amount because the signal is scale invariant (with limits).

Voss and Clarke[11] have found some applications of $1/f$ noise in music. The $1/f$ noise, which is characterized by the slope of its power spectrum (on a log-log graph,) has been found in many natural phenomena, from electric components to flood level of the river Nile[12]. Dodge[1] finds fractals and $1/f$ noise to be an interesting paradigm for computer-aided composition. He also suggests that the ``memory''of $1/f$ noise can account for its success. This ``memory'' can be explained by its scale invariancy and its long-term autocorrelation. Voss and Clarke have shown that most music, regardless of culture, behaves very close to $1/f$ noise. They have also stated that listeners have found $1/f$ noise to be most ``music-like''. Listeners found white noise to be too random and $1/{f}^{2}$ noise to be too correlated. One can think of $1/f$ noise as a border between randomness and predictability. Short utterances of $1/f$ noise can masquerade themselves as music; however, a longer listening leaves the listener unsatisfied, since naturally, there is no thought or culture behind the signal. Mandelbrot[5, page 375] believes that this scheme does not extend below the note level, since the high frequency energy in instruments (e.g., fiddle body, wood-wind pipes, and the resonance of the lungs) are governed by a different mechanism; therefore the high energy spectrum is more like $1/{f}^{2}$ than $1/f$ .

Vaughn[10] has studied the emotion in some recorded Karelian Laments. In her study, she views the pitch contour as an analog of musical behavior, and she treats the pitch contour as a set of shapes (a signal), rather than a series of notes. She has investigated the boundary properties of repetitive melodies in the Laments and she has found self-similar structures in the pitch contour at the point of transition of the singer to a trance-like state near the ritual moment.

A Synthesis Program

Originally we developed the software without any knowledge of the L-system. The programming decisions were made according to the design of the previously described score editor. The paradigm itself was based on our perception of the physical matter, which is perhaps one of the most basic manifestation of structure in our consciousness. The initial intent for design of the language was so that it would be possible to keep a library of structures, make structures be context sensentive, and be able to connect different structures in hierarchical or recursive connections. This language would then be used for storing what the user would specify as the score in a graphical score editor. Currently what is implemented in the language is only recursive and hierarchical definition of the parameters.

The Synthesis Language

The time and frequency development is illustrated for the first two levels of recursion in Figure 1. With a ``stop recursion'' value of 0.01 seconds, some parts are developed to as many as 57 levels.

We experimented with a few different output production and parameter development schemes. Currently the parameter developments can be according to geometric or arithmetic relationships. The arithmetic development is done according to a constant value (which is found in the main seed.) Once we understand the behavior of the system better, we can make this relationship programmable as well.

For output production we first used the parameters as instantaneous frequencies and kept a linear phase, which means that we only used the last level parameters. Because of the Devil's Staircase (Cantor function) effect[5, page 82] we had a difficult time with noise created from sudden changes in frequency or amplitude. Then we experimented with ``stop recursion'' values small enough to be less than the period of instantaneous frequencies. In this way we were shaping the wave according to the lower-level parameters. This approach may technically seem simple minded or naive. The motivation behind the experiment was to see if we could establish a structural relationship between the normal level of music perception[3] and the shape of the auditory signal.

We also experimented with using every level of the development parameters as values for partials of the sound, which means that at every level every point adds a partial for its period of time. Again because of the Devil's Staircase effect, there was some noise which we could not avoid. We solved this problem by applying a window function to every partial. With this configuration we were able to obtain very interesting auditory results. Figure 2 is the spectrogram of the sound created by the sample score.⁵

As we can see, time is fractally segmented according to the given structure. This is true for any length of time (the boundaries have to be picked correctly, and the lower limit is defined by the ``stop recursion'' value.) The frequency content of every segment has a static part, which is created by higher level developments, and a changing part, which is created by the lower levels. The changing part of the frequency content is changing according to the same structure. If we assume that the hierarchy of time is one of the hierarchies of our perception, then we have a sound which can manifest the same structure in different layers of our perception.

It is perhaps too early to conclude anything from this work. We are currently in the process of redesigning the language to implement the ability to have libraries of structures. We are also thinking about how different structures can interact with each other in the development process.

Using Self-Similarity for Sound/Music Synthesis

Abstract:

Introduction

A Programmable Score Editor

Some Instances of Self-Similarity in Music

A Synthesis Program

The Synthesis Language

Acknowledgment

Bibliography

About this document ...

Footnotes