Perceptually Motivated Video Sonification

What is sonification?

Sonification is the act of expressing in sound that which is inaudible. For instance, metal detectors use sound to display changes in magnetic fields.

Why sonification?

There are a number of advantages to using sonification. For example, certain types of patterns are easier to pick up by the ear than by the eye and using sound to display information does not obscure the field of view. However, there are also interesting creative possibilities to sonification. Art can be thought as the search for new forms and the translation of forms from one dimension to another can be a useful method to achieve this.

I have chosen to work with images and video because they can contain vast amounts of varied data. Using brushes or cameras, one can seek to capture the forms and beauty of the external world. Using largely the same tools in the same manner, but instead of the eye, displaying these forms to the ear, I seek a new way to appreciate and express their beauty. Furthermore, since sound and music can only exist in time, it was natural to first focus my efforts on video images.

Perceptually Motivated Sonification

There have already been a number of attempts at sonifiying images, from tools to assist sight-impaired people to artistic uses. However, in most cases there is very little apparent relationship between the resulting sounds and the images used. For this reason, the purpose of this research is to try and find ways in which image sonification can be carried out by creating perceptual links between image and sound.

How?

Complex images usually contain “feature points.” For example, a plain white wall lacks any feature that the eye can cling on to but the corners of a door or a window in that wall can be easily and precisely identified. These feature points play a very important role not only in human perception but also in computer vision and image analysis. From a limited data set of features, it is possible to describe the structure of a complex image. Furthermore, once image features have been identified it is possible to track their position over time and thus know in what way the objects in the image are moving or changing.

Just as a complex image can be described using a set of very simple features, complex sounds can be synthesized by adding together several simple sounds. If we assign the control of one of those simple sonic components to a single moving image feature, we can achieve “motion flow field sonification.” If we look at an image showing only the position of the feature points we nevertheless see lines and circles and various other shapes. However, since there are only points in the image, we must conclude that those shapes are in fact the result of our brain’s interpretation of this image. Gestalt psychology gives some explanations as to how simple objects group together to form perceptual objects. The Gestalt rules that describe these groupings can also be applied to the perception of sound. Hence, it should be possible to achieve a level of resemblance between an image and its sonification by taking care to conserve the perceptual relationships between the visual features when translating them to sound.

In practice, the “simple sounds” can be almost anything. Since there is no single correct way to sonify an image artistically, the choice of the precise type of sound to use is left to the creator.

In this video both the horizontal and vertical axes have been mapped to pitch. Note how the branch in the middle and the floating objects appear as distinct perceptual objects both in sound and in image.

In order to attempt to create more traditionally musical sounds, the pitches in this video have been quantized to those of chords that follow a progression as the video advances. Nevertheless, there is still a very tight relationship between the visual and sonic forms.

In this video, pitch is mapped to the vertical axis to match the primary direction of movement in the video. The result is a dense texture due to the high number of features in the image. Furthermore, the multiple overlapping glissandi approximate the Shepard tone auditory illusion.

For this video, I tried see if it wouldn’t be possible to create wind sounds from an image of an object being blown by… the wind. In order to achieve this effect, I used the motion flow field to control a large bank of band-pass filters that were used to process a single noise source. By changing the width of the filters, I could adjust the sound from more noisy (like wind in leaves) to more flute-like. What you hear is a mix of four different mappings. The result was actually too wind-like to my taste, and I added some comb filtering to two of the voices. Pay attention to not only the plant in the middle but also the branches at the top left and bottom right, and how they “appear” in the stereo field.

1 Comment

Greg Beller says:

April 22, 2021 at 11:46 pm

Hi Jean-Marc,

I’m reading your blog while downloading latest release of cv.jit.
Here is a similar example I made (with some video shot in Japan as well), without cv.jit, but basic video -> fft mapping.
http://www.gregbeller.com/2013/05/sonification/

Looking forward using (once again) cv.jit in a new project.

Thanks for that !

1 Trackback

What is the relationship between the discovery of a new technique a work of art? | Etienne Deleflie

Sonification Demos