Jekyll2018-10-29T22:24:39+00:00/Hugh O’BrienComputing and other things.
Style Transfer Between Sound and Images2017-08-28T00:00:00+00:002017-08-28T00:00:00+00:00/blog/2017/08/28/audio-image-style<script>
function showImage(button) {
var img = document.getElementById('tate-gif');
console.log(img.style.display)
if (img.style.display == ''){
img.style.display = 'none';
button.innerHTML = 'Show gif';
} else {
img.style.display = '';
button.innerHTML = 'Hide gif';
}
}
</script>
<p>The idea for this post came from two interesting programmatic transformations of images and sound:</p>
<ul>
<li>
<p><a href="https://dmitryulyanov.github.io/audio-texture-synthesis-and-style-transfer/">Here</a> Dmitry Ulyanov describes using the popular style transfer methods using neural networks for audio, rather than for images. The combination of two sound clips was interesting and provided some curious results.</p>
</li>
<li>
<p><a href="https://github.com/robertfoss/audio_shop">Here</a> Robert Foss shows off a script that transforms images to sound and applies common sound effect transformations on them, before converting them back into images, with interesting results.</p>
</li>
</ul>
<p>For the latter here’s a gif I made of the Tate Modern with the pitch of the intermediate sound messed with to produce a surreal video effect…</p>
<p><img id="tate-gif" src="/assets/images/tate_pitch.gif" />
<button onclick="javascript:showImage(this)">Hide gif</button></p>
<p>The goal here was to see if the core of these two could be combined into something interesting. We’ve seen that we can take the style from a piece of music and apply it to another sound. We’ve also seen that we can transform images into sound data and mess with them. So why not use the intermediate sound from raw image data as one of our style transfer samples?</p>
<p>Initially my thought was that what we’d end up with sound wise would be just noise and we could transfer that into an image at the end. This turned out to be wishful thinking. Perhaps with the right tuning it would be possible to end up with an interesting image; however the basic concept doesn’t bring us anywhere close. The sound leftover after the style transfer (or whatever you want to call it) to me sounds pretty interesting. While describing this process to a friend they quite aptly described it as, “Music mixed with chaos”.</p>
<h2 id="chaos-music">Chaos Music</h2>
<p><strong>Radiohead - OK Computer - Karma Police</strong></p>
<p>What better an example to look at than Radiohead’s OK Computer…</p>
<p>Let’s take the album cover and make it sound:</p>
<p><img src="/assets/sounds/radiohead/ok_computer.jpg" alt="" /></p>
<p>becomes… (beware it gets pretty loud)</p>
<audio controls="">
<source src="/assets/sounds/radiohead/cover.mp3" type="audio/mpeg" />
</audio>
<p>Now lets run that through some style transfer with some clips from a song from the album - Karma Police</p>
<audio controls="">
<source src="/assets/sounds/radiohead/quiet.mp3" type="audio/mpeg" />
</audio>
<p>becomes</p>
<audio controls="">
<source src="/assets/sounds/radiohead/kp_quiet_transferred.mp3" type="audio/mpeg" />
</audio>
<p>Taking a louder bit of the song…</p>
<audio controls="">
<source src="/assets/sounds/radiohead/loud.mp3" type="audio/mpeg" />
</audio>
<p>becomes</p>
<audio controls="">
<source src="/assets/sounds/radiohead/kp_loud_transferred.mp3" type="audio/mpeg" />
</audio>
<p>Convering the sound back to images doesn’t give anything worth looking at really, it ends up pretty much as noise with some slightly visible features:</p>
<p><img src="/assets/sounds/radiohead/kp_loud_transferred.png" alt="" /></p>
<p>Training this took quite a while as I was doing this while travelling and without access to a GPU. For that reason I started going with smaller source images…</p>
<p><br /></p>
<p><strong>Broken Social Scene - Broken Social Scene - Fire eye’d boy</strong></p>
<p><img src="/assets/sounds/broken/broken.jpg" alt="" /></p>
<p>Image as sound:</p>
<audio controls="">
<source src="/assets/sounds/broken/broken_social_scene_style.mp3" type="audio/mpeg" />
</audio>
<p>Song sample:</p>
<audio controls="">
<source src="/assets/sounds/broken/fire_start.mp3" type="audio/mpeg" />
</audio>
<p>Result:</p>
<audio controls="">
<source src="/assets/sounds/broken/fire_eyed_start_transferred.mp3" type="audio/mpeg" />
</audio>
<p><br /></p>
<p><strong>Feist - The Reminder- I feel it all</strong></p>
<p>For something more chill</p>
<p><img src="/assets/sounds/feist/thereminder.jpg" alt="" /></p>
<p>Image as sound:</p>
<audio controls="">
<source src="/assets/sounds/feist/readable_sound.mp3" type="audio/mpeg" />
</audio>
<p>Song sample:</p>
<audio controls="">
<source src="/assets/sounds/feist/feel.mp3" type="audio/mpeg" />
</audio>
<p>Result:</p>
<audio controls="">
<source src="/assets/sounds/feist/transferred.mp3" type="audio/mpeg" />
</audio>
<p><br /></p>
<p><strong>Muse - Origin of Symmetry - Plug in Baby</strong></p>
<p>For a reasonably isolated riff…</p>
<p><img src="/assets/sounds/muse/origin.jpg" alt="" /></p>
<p>Image as sound:</p>
<audio controls="">
<source src="/assets/sounds/muse/origin_sound.mp3" type="audio/mpeg" />
</audio>
<p>Song sample:</p>
<audio controls="">
<source src="/assets/sounds/muse/plug_start.mp3" type="audio/mpeg" />
</audio>
<p>Result:</p>
<audio controls="">
<source src="/assets/sounds/muse/plug_start_transferred.mp3" type="audio/mpeg" />
</audio>
<p><br /></p>
<p>I find these resulting sounds interesting and surprisingly cool to listen to… however I don’t have another suggested usecase. Some things are just interesting by themselves sometimes…</p>
<p>Code lives here (sans data):</p>
<p><a href="https://github.com/Hugh-OBrien/chaos_music">https://github.com/Hugh-OBrien/chaos_music</a></p>The idea for this post came from two interesting programmatic transformations of images and sound: Here Dmitry Ulyanov describes using the popular style transfer methods using neural networks for audio, rather than for images. The combination of two sound clips was interesting and provided some curious results. Here Robert Foss shows off a script that transforms images to sound and applies common sound effect transformations on them, before converting them back into images, with interesting results. For the latter here’s a gif I made of the Tate Modern with the pitch of the intermediate sound messed with to produce a surreal video effect… Hide gif The goal here was to see if the core of these two could be combined into something interesting. We’ve seen that we can take the style from a piece of music and apply it to another sound. We’ve also seen that we can transform images into sound data and mess with them. So why not use the intermediate sound from raw image data as one of our style transfer samples? Initially my thought was that what we’d end up with sound wise would be just noise and we could transfer that into an image at the end. This turned out to be wishful thinking. Perhaps with the right tuning it would be possible to end up with an interesting image; however the basic concept doesn’t bring us anywhere close. The sound leftover after the style transfer (or whatever you want to call it) to me sounds pretty interesting. While describing this process to a friend they quite aptly described it as, “Music mixed with chaos”. Chaos Music Radiohead - OK Computer - Karma Police What better an example to look at than Radiohead’s OK Computer… Let’s take the album cover and make it sound: becomes… (beware it gets pretty loud) Now lets run that through some style transfer with some clips from a song from the album - Karma Police becomes Taking a louder bit of the song… becomes Convering the sound back to images doesn’t give anything worth looking at really, it ends up pretty much as noise with some slightly visible features: Training this took quite a while as I was doing this while travelling and without access to a GPU. For that reason I started going with smaller source images… Broken Social Scene - Broken Social Scene - Fire eye’d boy Image as sound: Song sample: Result: Feist - The Reminder- I feel it all For something more chill Image as sound: Song sample: Result: Muse - Origin of Symmetry - Plug in Baby For a reasonably isolated riff… Image as sound: Song sample: Result: I find these resulting sounds interesting and surprisingly cool to listen to… however I don’t have another suggested usecase. Some things are just interesting by themselves sometimes… Code lives here (sans data): https://github.com/Hugh-OBrien/chaos_musicCompiling Tensorflow for Unity3d2017-02-12T00:00:00+00:002017-02-12T00:00:00+00:00/blog/2017/02/12/tensorflow-for-unity<p>I’ve been thinking for a while about how best to combine machine learning knowledge I’ve built up and my other hobby - making video games. To this end I’ve been looking into using Tensorflow with Unity3d. I forsee a lot of issues around performance at runtime, along with cross platform issues issues down the road.</p>
<p><img src="/assets/images/tensor-unity.png" alt="Tensor-and-unity" /></p>
<!--more-->
<p>As a start here is a quick rundown of compiling Tensorflow to run a trained graph from a C# Unity script using the C++ api.</p>
<hr />
<h3 id="1-getting-a-graph-to-use">1. Getting a Graph to Use</h3>
<p>The idea here is to keep it to a bare minimum for what you need. The goal here is to make sure we can get TF will run at all rather than spending time making it do something useful. For these first couple of steps I borrowed heavily from <a href="https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f#">Jim Flemming’s excellent Medium post</a> on the basics of using the C++ api, which we’ll need. Using a simple python script you can generate a protobuf file to store your graph, I used:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>import tensorflow as tf
import numpy as np
with tf.Session() as sess:
a = tf.Variable(5.0, name='a')
b = tf.Variable(6.0, name='b')
c = tf.maximum(a, b, name="c")
sess.run(tf.initialize_all_variables())
print a.eval()
print b.eval()
print c.eval()
tf.train.write_graph(sess.graph_def, 'models/', 'graph.pb', as_text=False)
</code></pre></div></div>
<p>which does the very impressive task of finding the max of two numbers and storing it in a variable <code class="highlighter-rouge">c</code>.</p>
<h3 id="2-compiling-tensorflow-to-a-shared-libaray-so">2. Compiling Tensorflow to a shared libaray (.so)</h3>
<p>We can use the C++ api for TF to read and excetue graphs we’ve already saved pretty easily. In addition the Google documentation for using their build tool, <a href="https://bazel.build/">Bazel</a> is pretty good so compiling the shared library isn’t really too much of an issue. Here I used Jim Flemming’s build script with a few changes.</p>
<h3 id="debugging">Debugging</h3>
<p>When we start using this with Unity it’s not the same as running a script internally, we have to recompile to add debug messages. There’s not a simple method of accessing the Unity editor console to print useful debug messages anyway since we’re constrained by the return type of our function. This is a problem even with this very simple example as we’ve hard coded the path to the protobuf file which will cause problems down the road. Unity isn’t handling the paths so it won’t put the files in sensible places when building for instance. To give us some useful output on errors I’ve added logging to a file instead of standard out and known return values so it’s obvious where in the code we hit an error.</p>
<h3 id="shared-library">Shared library</h3>
<p>The function is incased in an <code class="highlighter-rouge">extern "C" {}</code>. Since we’re not building an exceutable we can’t just have a main function that returns 0, we want to return our max number. The C++ compiler however doesn’t preserve function names so calling our function after compilation won’t work, we have to tell the compiler we want to declare our funtion with C linkage. The <a href="https://docs.unity3d.com/Manual/PluginsForDesktop.html">Unity documentation</a> does a fine job of explaining why we need to do this.</p>
<p>Here’s the code put inside the Tensorflow repo at <code class="highlighter-rouge">/tensorflow/loader/loader.cc</code></p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#include "tensorflow/core/public/session.h"
#include "tensorflow/core/platform/env.h"
#include <fstream>
using namespace tensorflow;
extern "C" {
const int run() {
// Initialize a tensorflow session
Session* session;
Status status = NewSession(SessionOptions(), &session);
if (!status.ok()) {
return 2;
}
std::ofstream logFile;
logFile.open ("loader_output.txt");
// Read in the protobuf graph we exported
// (The path seems to be relative to the cwd. Keep this in mind
// when using `bazel run` since the cwd isn't where you call
// `bazel run` but from inside a temp folder.)
// -----------------------------------------------------------------------------------
// for Unity these paths are also important, the location of the graph.pb file matters
// and will change depending on whether you're running from the editor or a build
GraphDef graph_def;
status = ReadBinaryProto(Env::Default(), "models/graph.pb", &graph_def);
if (!status.ok()) {
logFile << status.ToString() << "\n";
return 4;
}
// Add the graph to the session
status = session->Create(graph_def);
if (!status.ok()) {
logFile << status.ToString() << "\n";
return 5;
}
// Setup inputs and outputs:
// Our graph doesn't require any inputs, since it specifies default values,
// but we'll change an input to demonstrate.
Tensor a(DT_FLOAT, TensorShape());
a.scalar<float>()() = 3.0;
Tensor b(DT_FLOAT, TensorShape());
b.scalar<float>()() = 2.0;
std::vector<std::pair<string, tensorflow::Tensor>> inputs = {
{ "a", a },
{ "b", b },
};
// The session will initialize the outputs
std::vector<tensorflow::Tensor> outputs;
// Run the session, evaluating our "c" operation from the graph
status = session->Run(inputs, {"c"}, {}, &outputs);
if (!status.ok()) {
logFile << status.ToString() << "\n";
return 6;
}
// Grab the first output (we only evaluated one graph node: "c")
// and convert the node to a scalar representation.
auto output_c = outputs[0].scalar<float>();
// Free any resources used by the session
session->Close();
logFile << status.ToString() << "\n";
logFile.close();
// return the output number as an integer
return (int)output_c();
}
}
</code></pre></div></div>
<p>We’ve hard coded here that it should output 3, which is the max of 3.0 and 2.0 cast to an <code class="highlighter-rouge">int</code></p>
<p>We can compile using Bazel to either an exceutable or a shared library. The former is good to make sure the above code works, you can just throw the function into a main function and compile it. Once that’s working we can make our shared library. The Bazel BUILD file for that looks like:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cc_binary(
name = "loader.so",
linkshared = 1,
srcs = ["loader.cc"],
deps = [
"//tensorflow/core:tensorflow",
]
)
</code></pre></div></div>
<p>If you keep this inside the <code class="highlighter-rouge">loader</code> folder with the cc file we can build from that folder using <code class="highlighter-rouge">bazel build :loader.so</code>. The actual so we need will end up</p>
<h3 id="3-running-in-the-unity-editor">3: Running in the Unity Editor</h3>
<p>You can access the shared library using the <code class="highlighter-rouge">[DllImport ...]</code> statement for C#, it doesn’t matter this isn’t a dll! This works the same as any Unity plugin, I have it in <code class="highlighter-rouge">assets/plugins</code> Here’s my very simple unity script:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>public class PluginImport : MonoBehaviour {
[DllImport ("loader")]
private static extern int run ();
//Lets make our calls from the Plugin
void Start () {
int output = run ();
Debug.Log (output);
gameObject.GetComponent<Text> ().text = output.ToString();
}
}
</code></pre></div></div>
<p>All this is doing is:</p>
<ul>
<li>Loading our library</li>
<li>On scene start running the <code class="highlighter-rouge">run</code> function from the library</li>
<li>Putting the resulting integer as the string on an attached UI element.</li>
</ul>
<p>Because of the way we hardcoded the model loading in the C++ and how the Unity editor does paths the protobuf needs to be in the root of the Unity project itself, not the plugins folder, not <code class="highlighter-rouge">/assets</code>. Now we can run this from the editor…</p>
<p><img src="/assets/images/unity-3.png" alt="Running in Unity Editor" /></p>
<p>… Very impressive Unity, 3 is the right answer…</p>
<h3 id="4-running-in-a-build">4: Running in a build</h3>
<p>The shared libaray we’ve made as far as I know will only work for Linux. We need to compile different plugins for other platforms. Within the Unity editor we can set plugins to be included in different builds from the inspector for the asset. By default everything gets included which is fine if we’re only bulding for Linux but our project size will get pretty out of control if we have 3 or 4 TF libraries always being included in each build.</p>
<p>The default settings will work for the build; however this is where our lazy hard coding of paths is a problem. In this case the root path is going to be whereever your excutable is run from, so the <code class="highlighter-rouge">/models</code> folder needs to go there, not anywhere within the <code class="highlighter-rouge">/data</code> folder!</p>
<p>It’s not particularly impressive but it works - we can access Tensorflow from a built Unity3d project! Now to run some more exciting graphs…
<br />
<br />
<br /></p>I’ve been thinking for a while about how best to combine machine learning knowledge I’ve built up and my other hobby - making video games. To this end I’ve been looking into using Tensorflow with Unity3d. I forsee a lot of issues around performance at runtime, along with cross platform issues issues down the road.