The idea for this post came from two interesting programmatic transformations of images and sound:

  • Here Dmitry Ulyanov describes using the popular style transfer methods using neural networks for audio, rather than for images. The combination of two sound clips was interesting and provided some curious results.

  • Here Robert Foss shows off a script that transforms images to sound and applies common sound effect transformations on them, before converting them back into images, with interesting results.

For the latter here’s a gif I made of the Tate Modern with the pitch of the intermediate sound messed with to produce a surreal video effect…

The goal here was to see if the core of these two could be combined into something interesting. We’ve seen that we can take the style from a piece of music and apply it to another sound. We’ve also seen that we can transform images into sound data and mess with them. So why not use the intermediate sound from raw image data as one of our style transfer samples?

Initially my thought was that what we’d end up with sound wise would be just noise and we could transfer that into an image at the end. This turned out to be wishful thinking. Perhaps with the right tuning it would be possible to end up with an interesting image; however the basic concept doesn’t bring us anywhere close. The sound leftover after the style transfer (or whatever you want to call it) to me sounds pretty interesting. While describing this process to a friend they quite aptly described it as, “Music mixed with chaos”.

Chaos Music

Radiohead - OK Computer - Karma Police

What better an example to look at than Radiohead’s OK Computer…

Let’s take the album cover and make it sound:

becomes… (beware it gets pretty loud)

Now lets run that through some style transfer with some clips from a song from the album - Karma Police

becomes

Taking a louder bit of the song…

becomes

Convering the sound back to images doesn’t give anything worth looking at really, it ends up pretty much as noise with some slightly visible features:

Training this took quite a while as I was doing this while travelling and without access to a GPU. For that reason I started going with smaller source images…


Broken Social Scene - Broken Social Scene - Fire eye’d boy

Image as sound:

Song sample:

Result:


Feist - The Reminder- I feel it all

For something more chill

Image as sound:

Song sample:

Result:


Muse - Origin of Symmetry - Plug in Baby

For a reasonably isolated riff…

Image as sound:

Song sample:

Result:


I find these resulting sounds interesting and surprisingly cool to listen to… however I don’t have another suggested usecase. Some things are just interesting by themselves sometimes…

Code lives here (sans data):

https://github.com/Hugh-OBrien/chaos_music