Sonification Demos

Perceptually Motivated Video Sonification
What is sonification?

Sonification is the act of expressing in sound that which is inaudible. For instance, metal detectors use sound to display changes in magnetic fields.

Why sonification?

There are a number of advantages to using sonification. For example, certain types of patterns are easier to pick up by the ear than by the eye and using sound to display information does not obscure the field of view. However, there are also interesting creative possibilities to sonification. Art can be thought as the search for new forms and the translation of forms from one dimension to another can be a useful method to achieve this.

I have chosen to work with images and video because they can contain vast amounts of varied data. Using brushes or cameras, one can seek to capture the forms and beauty of the external world. Using largely the same tools in the same manner, but instead of the eye, displaying these forms to the ear, I seek a new way to appreciate and express their beauty. Furthermore, since sound and music can only exist in time, it was natural to first focus my efforts on video images.

Perceptually Motivated Sonification

There have already been a number of attempts at sonifiying images, from tools to assist sight-impaired people to artistic uses. However, in most cases there is very little apparent relationship between the resulting sounds and the images used. For this reason, the purpose of this research is to try and find ways in which image sonification can be carried out by creating perceptual links between image and sound.








Complex images usually contain “feature points.” For example, a plain white wall lacks any feature that the eye can cling on to but the corners of a door or a window in that wall can be easily and precisely identified. These feature points play a very important role not only in human perception but also in computer vision and image analysis. From a limited data set of features, it is possible to describe the structure of a complex image. Furthermore, once image features have been identified it is possible to track their position over time and thus know in what way the objects in the image are moving or changing.

Just as a complex image can be described using a set of very simple features, complex sounds can be synthesized by adding together several simple sounds. If we assign the control of one of those simple sonic components to a single moving image feature, we can achieve “motion flow field sonification.” If we look at an image showing only the position of the feature points we nevertheless see lines and circles and various other shapes. However, since there are only points in the image, we must conclude that those shapes are in fact the result of our brain’s interpretation of this image. Gestalt psychology gives some explanations as to how simple objects group together to form perceptual objects. The Gestalt rules that describe these groupings can also be applied to the perception of sound. Hence, it should be possible to achieve a level of resemblance between an image and its sonification by taking care to conserve the perceptual relationships between the visual features when translating them to sound.

In practice, the “simple sounds” can be almost anything. Since there is no single correct way to sonify an image artistically, the choice of the precise type of sound to use is left to the creator.






In this video both the horizontal and vertical axes have been mapped to pitch. Note how the branch in the middle and the floating objects appear as distinct perceptual objects both in sound and in image. この映像では縦軸も横軸も音の高さと連携しています。ここで注意すべき点は中心の枝と漂流物が、画像においても音においても別々のものとして認知されます。

In order to attempt to create more traditionally musical sounds, the pitches in this video have been quantized to those of chords that follow a progression as the video advances. Nevertheless, there is still a very tight relationship between the visual and sonic forms. より伝統的な音楽に近づく為にこの映像の周波数を進行する和音の音に限定しています。それでも音と映像の知覚関係がはっきりと生まれてきます。

In this video, pitch is mapped to the vertical axis to match the primary direction of movement in the video. The result is a dense texture due to the high number of features in the image. Furthermore, the multiple overlapping glissandi approximate the Shepard tone auditory illusion. この映像では音の高さが縦軸にマッピングされています。画像には多くの特徴点が検出された為、音も非常に密度が高くなっています。また、常に上へと流れる水はシェパード・トーン(無限音階)にも類似しています。

For this video, I tried see if it wouldn’t be possible to create wind sounds from an image of an object being blown by… the wind. In order to achieve this effect, I used the motion flow field to control a large bank of band-pass filters that were used to process a single noise source. By changing the width of the filters, I could adjust the sound from more noisy (like wind in leaves) to more flute-like. What you hear is a mix of four different mappings. The result was actually too wind-like to my taste, and I added some comb filtering to two of the voices. Pay attention to not only the plant in the middle but also the branches at the top left and bottom right, and how they “appear” in the stereo field. この動画では、風に吹かれている物の映像から風のような音が作れないかを実験しました。この効果を実現する為に一つのノイズ音源をモーションフローフィールドで制御されている大量のバンドパスフィルターで処理しました。フィルターの幅を調整する事によって風に吹かれる木の葉っぱの音から笛のような音まで作れます。この動画のサウンドトラックは、4つの異なったマッピングによる可聴化のミックスです。元々の可聴化はあまりにも風にそっくりでした為、コムフィルターで一部の音に少しメタリックな質感を与えました。ここで注目して頂きたいのは、真ん中の植物だけでなく、左上と右下の枝がどのようにステレオ空間に現れるかというところです。

Video sonification can be used to generate sonic material which can then be further worked into larger compositions, as in this piece. (Click on the picture to listen.) All the material in the piece was generated from a video sequence similar to that used for the river sonification demo, yielding very dense textures and sounds that are reminiscent of the cicada songs that you would hear near where the footage was shot. 動画の可聴化は音楽作品の音素材を生成する方法でもあります。ここでリンクされている曲は、上記の川の可聴化動画と同じようなビデオ素材から生成された音で構成されています。可聴化された川の付近で聴こえる蝉を思わせる非常に濃い且つ複雑な音を生み出す事ができました。(画像をクリックして曲が聴こえます。)