adrianseeley/GPU_MSDA_FF_ANN.html

## GPU_MSDA_FF_ANN.html
<!DOCTYPE html>
<html>
<pre id="page" style="font-family: monospace; white-space: pre-wrap;">
<h3>A Sinus Activated Multi-Stochastic-Descending/Ascending (MSD/A) Feed Forward Artificial Neural Network (ANN) Computed by GPU via JavaScript and WebGL GLSL (GATO 2014)</h3>This web page attempts to outline an implementation for solving non-linearly-separable (NLS) classification and function approximation problems using a sinus activated feed forward neural network, trained via multi-stochastic-descension/ascension (MSD/A), and evaluated using the GPU via JavaScript and WebGL GLSL source code.

In order to overcome NLS using MSD/A, a sinus activation function: <b>sin(x)</b>, has been used in place of sigmoid: <b>1 / (1 + exp(-x))</b>, hyper-tangent: <b>htan(x)</b>, and/or averaging: <b>sum / count</b>, activation functions.

Although ANNs capable of overcoming NLS problems are said to be capable of entering any computationally complete state, actually finding and entering a specific state required to solve a real world problem is the hard part, and falls to the responsibility of the training algorithm;

Training via backpropagation has proved remarkably successful for a number of production ANNs, and remains the de facto standard for finding (at least local) optimizations via gradient descent, momentum cultivation and rectification of a chaotic initialization state. Backpropagation requires the standard forward pass to evaluate a vector, but then also requires the calculation of an error delta per neuron in a backwards pass. In this way all neurons are slowly brought into an optimized state in a fashion similar to that of an annealing particle swarm optimization. Backpropagation can have difficulties with 'sharp' problems, often described as 'finding a needle in a haystack'; Upon which training almost exclusively learns an identity function, as opposed to some comprehensive emulation of the true function that produced the training data.

Training via genetics is currently the alternate de facto standard in production grade environments. By considering the weight and bias values of the ANN to be individual genes, and their combination therein to be an ANN DNA: a population of ANN DNAs can be created, mutated and can most importantly breed to produce offspring; effectively merging and crossing over the DNA from multiple parents. Genetics are an incredibly powerful, and virtually unbounded means of training an ANN into literally any potential state. Genetics also reduce the chaos assumption by gaining traction on approachable solutions in the same manner as backpropagation, but reduce it further still by 1) including fully random DNA offspring, 2) relative value snapping, and 3) backwards traversal of the genetic family tree to exit local optimums. Storing ANN DNA populations for a small network is a trivial space requirement, but as networks approach billions of neurons that requirement becomes a major concern for small scale computation which might otherwise only be capable of holding a single network in memory.

Finally the simpler and more naive method used here, MSD/A, builds a chain of fully stochastic mutation changes (a series of address marked random value writes), then applies those changes to the ANN in an attempt to descend the error surface. If the multi-stochastic change fails to provide a descension, the chain is rewound, effectively undoing those changes to the ANN and recovering it's original more advantageous position. In this way the ANN continuously attempts to make a step in a better direction, immediately undoing worse steps; colloquially referred to as 'hill climbing', but applied here using space optimizing methods borrowed from version control change calculations. MSD/A benefits from the same advantageous gains as genetics, however because only a single ANN is ever stored in memory, breeding, merging and crossover have no other parents to operate against; effectively removing the potential to make some intelligent large jumps on certain error surfaces. MSD/A makes up for the lack of breeding by not only requiring the least space to train, but also the least computations per epoch (or iteration). MSD/A does not claim to be the fastest, nor the most effective - but it is arguably the simplest working solution to implement effectively (even with a low grade compute shader), and the easiest to scale laterally across virtually infinite machines (machines receive updated gains and post only updated gains, messages consisting easily of less than 8 bytes of transport data).

For almost all real world classification and forecasting problems, backpropagation and/or genetics are the obviously superior training methodologies. However there is a very specific subset of problems that suffer from inherent difficulties in training, commonly due to a recursive data source not expressible in a closed form, or the need to isolate an exact constant required to exhibit a target behavior (notably primes and factoring). Backpropagation and genetics both fall prey to false positives or 'red herrings', and are usually unable (or are extremely unwilling) to train into the global optimum. The 'unapproachability' of the error surface causes effectively junk local optimums that are unusable as discrete data, such as in password auditing.

By removing the bias of approachability, whilst maintaining advantageous position gaining, and by using a stabilizing activation function (<b>sin(x)</b>), MSD/A shows remarkable potential in production environments for WPA auditing, PRNG auditing and the reverse engineering of complex recursive functions not expressible in a closed form - although a great deal of further research is still required to understand the nature of convergence times in very deeply layered networks.

</pre>
<pre id="cons" style="font-family: monospace; white-space: pre-wrap;"></pre>
<div id="btn" style="text-align:center;"><button onclick="go();">Prove It</button></div>
<script id="vs" type="text/plain">
	attribute vec2 VTXPOS;  // the position of this vertex
    varying   vec2 TEXPOS;  // the UV coordinates of this vertex (also known as texCoord or texture coordinates)

    // render a quad, wooo exciting *confetti*
    void main() {
        TEXPOS = (VTXPOS / 2.) + vec2(0.5, 0.5);
        gl_Position = vec4(VTXPOS, 0, 1);
    }
</script>
<script id="fs" type="text/plain">
	precision mediump float;			// standard GLSL declarator of medium precision floating point numbers
    varying vec2      TEXPOS;           // the UV coordinates of this fragment (also known as texCoord or texture coordinates)
    uniform sampler2D TEX_CARRIES;      // a texture containing the signal being carried through the network (outputs)
    uniform sampler2D TEX_WEIGHTS;      // a texture containing the weights of the neurons in this layer of the network
    uniform sampler2D TEX_BIASES;       // a texture containing the biases of the neurons in this layer of the network
    uniform float     TOTAL_CARRIES;    // the total number of carry values on the texture (NOTE:A)
    uniform float     TOTAL_NEURONS;    // the total number of neurons in this layer of the network (NOTE:A)

    /* NOTE:A
		As textures in WebGL GLSL are required to be of power of two dimensions (1, 2, 4, 128, 1024, etc.),
		and because the input and output layers may have a different amount of neurons, we are required to
		include an actual hard set uniform value indicating the number of carried values (TOTAL_CARRIES) and
		the number of neurons (TOTAL_NEURONS).
    */

    const   float     CARRY_TEXTURE_HEIGHT  = {{CARRY_TEXTURE_HEIGHT}};  // {{double-moustache values are
    const   float     WEIGHT_TEXTURE_WIDTH  = {{WEIGHT_TEXTURE_WIDTH}};  //   string replaced in, then compiled}}
    const   float     WEIGHT_TEXTURE_HEIGHT = {{WEIGHT_TEXTURE_HEIGHT}}; // dimensions of sampler textures
    const   float     BIAS_TEXTURE_WIDTH    = {{BIAS_TEXTURE_WIDTH}};    // (NOTE:B)


    /* NOTE:B
		You can't use a uniform value as a for loop comparator, for example:

			for (float carry_index = 0.0; carry_index < CARRY_TEXTURE_HEIGHT; carry_index++)

		If CARRY_TEXTURE_HEIGHT was a uniform this shader would throw an error on compilation.
		It's still not very happy about looping over 128.0 here using floats, but this
		implementation is designed as a pragmatic demonstration for teaching purposes only.

		In order to get the space efficiency from using the smallest textures needed, while
		still being able to compile, we first calculate the dimensions, then replace the
		{{appropriate markers}} with the appropriate values. The GLSL compiler only sees
		literal float values (i.e. 420.420) not {{mustaches}}.
    */

    // WebGL GLSL doesn't do the bitwise operations we need to encode floats
    // use algebra instead
	float extract_bits (float num, float from, float to) {
	    from = floor(from + 0.5);
	    to = floor(to + 0.5);
	    return mod(floor((floor(num) + 0.5) / exp2(from)), floor(1.0 * exp2(to - from) + 0.5));
	}

	// WebGL GLSL renders to a non-floating point texture where each byte represents
	// a color channel. Instead we use those 4 bytes to encode a float using the IEEE 754.
	vec4 encode_float (float val) {

		// it works, don't touch it
	    if (val == 0.0) return vec4(0, 0, 0, 0);
	    float sign = val > 0.0 ? 0.0 : 1.0;
	    val = abs(val);
	    float exponent = floor(log2(val));
	    float biased_exponent = exponent + 127.0;
	    float fraction = ((val / exp2(exponent)) - 1.0) * 8388608.0;
	    float t = biased_exponent / 2.0;
	    float last_bit_of_biased_exponent = fract(t) * 2.0;
	    float remaining_bits_of_biased_exponent = floor(t);
	    float byte4 = extract_bits(fraction, 0.0, 8.0) / 255.0;
	    float byte3 = extract_bits(fraction, 8.0, 16.0) / 255.0;
	    float byte2 = (last_bit_of_biased_exponent * 128.0 + extract_bits(fraction, 16.0, 23.0)) / 255.0;
	    float byte1 = (sign * 128.0 + remaining_bits_of_biased_exponent) / 255.0;
	    return vec4(byte4, byte3, byte2, byte1);
	}

	// fragment compute shader entry point
    void main() {

    	// determine neuron's index by it's UV 'y' coordinate (TEXPOS)
    	float neuron_index = floor(TEXPOS.y * CARRY_TEXTURE_HEIGHT);

    	// if the index of this neuron is over the total neurons return early
    	// this occurs when we have a non power of 2 total neurons, or on
    	// the input and output layers where we may have wonky neuron counts
    	if (neuron_index >= TOTAL_NEURONS) return;

    	// sample bias pixel
		vec4 bias_pixel = texture2D(TEX_BIASES, vec2(floor(neuron_index / 4.0) / (BIAS_TEXTURE_WIDTH / 4.0), 0.5));

		// calculate which component of bias pixel is bias value
		float bias_value = floor(mod(neuron_index, 4.0));

		// determine bias value (this is a switch case for rgb or a)
		if      (bias_value == 0.0) bias_value = bias_pixel[0];
		else if (bias_value == 1.0) bias_value = bias_pixel[1];
		else if (bias_value == 2.0) bias_value = bias_pixel[2];
		else if (bias_value == 3.0) bias_value = bias_pixel[3];

		// iterate through carried outputs (inputs for this neuron)
		for (float carry_index = 0.0; carry_index < CARRY_TEXTURE_HEIGHT; carry_index++) {

			// since we can't 'for' to a non-const variable, we use this to short out extra carries
			// caused by having a non power of 2 number of inputs
			if (carry_index >= TOTAL_CARRIES) continue;

			// sample weight pixel
			vec4 weight_pixel = texture2D(TEX_WEIGHTS, vec2(floor(carry_index / 4.0) / (WEIGHT_TEXTURE_WIDTH / 4.0), neuron_index / WEIGHT_TEXTURE_HEIGHT));

			// calculate which component of weight pixel is weight value
			float weight_value = floor(mod(carry_index, 4.0));

			// determine bias value (this is a switch case for rgb or a)
			if      (weight_value == 0.0) weight_value = weight_pixel[0];
			else if (weight_value == 1.0) weight_value = weight_pixel[1];
			else if (weight_value == 2.0) weight_value = weight_pixel[2];
			else if (weight_value == 3.0) weight_value = weight_pixel[3];

			// add weighted carried output to bias value
			bias_value += weight_value * texture2D(TEX_CARRIES, vec2(0.5, carry_index / CARRY_TEXTURE_HEIGHT)).r;
		}

		// average activation
		//bias_value /= TOTAL_CARRIES + 1.0;

		// sigmoid activation
		//bias_value = 1.0 / (1.0 + exp(bias_value));

		// hypertan activation
		//bias_value = exp(bias_value);
		//bias_value = (bias_value - 1.0 / bias_value) / (bias_value + 1.0 / bias_value);

		// sin activation
		bias_value = sin(bias_value);

		// set the fragment color to the activated (bias value + weighted transfer)
		// but as an encoded float to be converted in JS by typed array buffer conversion
		gl_FragColor = encode_float(bias_value);
    }
</script>
<script>
	// program entry point, use this function to create a configured network
	// that comes with packaged training functions
	function GPU_MSDA_FF_ANN (cfg) {

		// ensure all the configuration values are there
		if (!cfg)                throw 'expecting cfg object';
		if (!cfg.inputs)         throw 'expecting cfg.inputs: 1->128';                    // the number of inputs per vector
		if (!cfg.outputs)        throw 'expecting cfg.outputs: 1->128';                   // the number of outputs per vector
		if (!cfg.hidden_layers)  throw 'expecting cfg.hidden_layers: 1->MemoryException'; // the number of hidden layers
		if (!cfg.hidden_neurons) throw 'expecting cfg.hidden_neurons: 1->128';            // the number of hidden neurons per hidden layer

		// this is one of the places a little dirty optimization sneaks in,
		// in order to reduce the texture space needed we use the next highest
		// power of two texture, but each texture can sneak off little corners
		// thanks to things like the input layer not needing weights or biases.
		// don't touch this unless you are absolutely sure you know whats up.
		cfg.carry_texture_width   = 1;
		cfg.carry_texture_height  = Math.pow(2, Math.ceil(Math.log(          Math.max(cfg.inputs, cfg.outputs, cfg.hidden_neurons)    )  / Math.log(2)));
		cfg.weight_texture_width  = Math.pow(2, Math.ceil(Math.log(Math.ceil(Math.max(cfg.inputs,              cfg.hidden_neurons) / 4)) / Math.log(2)));
		cfg.weight_texture_height = Math.pow(2, Math.ceil(Math.log(          Math.max(            cfg.outputs, cfg.hidden_neurons)    )  / Math.log(2)));
		cfg.bias_texture_width    = Math.pow(2, Math.ceil(Math.log(Math.ceil(Math.max(cfg.outputs,             cfg.hidden_neurons) / 4)) / Math.log(2)));
		cfg.bias_texture_height   = 1;

		// create buffers for all the layer textures and for reading values back out
		cfg.carry_texture_buffer      = new Float32Array(cfg.carry_texture_width * cfg.carry_texture_height * 4);
		cfg.carry_texture_buffer_read = new Uint8Array  (cfg.carry_texture_width * cfg.carry_texture_height * 4);
		cfg.weight_texture_buffers    = [];
		cfg.bias_texture_buffers      = [];
		for (var hidden_layer = 0; hidden_layer < cfg.hidden_layers + 1; hidden_layer++) { // the output layer is a hidden layer (+1)
			cfg.weight_texture_buffers.push(new Float32Array(cfg.weight_texture_width * cfg.weight_texture_height * 4));
			cfg.bias_texture_buffers  .push(new Float32Array(cfg.bias_texture_width   * cfg.bias_texture_height   * 4));
		}

		// initialize whole network to 1
		var init_to = 1;
		for (var hidden_layer = 0; hidden_layer < cfg.hidden_layers + 1; hidden_layer++) { // the output layer is a hidden layer (+1)
			for (var w = 0; w < cfg.weight_texture_buffers[hidden_layer].length; w++)
				cfg.weight_texture_buffers[hidden_layer][w] = init_to;
			for (var b = 0; b < cfg.bias_texture_buffers[hidden_layer].length; b++)
				cfg.bias_texture_buffers[hidden_layer][b] = init_to;
		}

		// create a canvas to hook WebGL draw calls to
		cfg.render_canvas        = document.createElement('canvas');

		// the canvas needs to be precisely sized so that every pixel
		// of our textures renders with exact values (no interpolation),
		// this way each pixel represents a single neuron
		cfg.render_canvas.width  = cfg.carry_texture_width;
		cfg.render_canvas.height = cfg.carry_texture_height;

		// create a helper function for compiling shaders
		function create_shader (gl, str, type) {
			var shader = gl.createShader(type);
			gl.shaderSource(shader, str);
			gl.compileShader(shader);
			if (!gl.getShaderParameter(shader, gl.COMPILE_STATUS)) throw gl.getShaderInfoLog(shader);
			return shader;
		};

		// hook WebGL
		cfg.render_canvas.gl     = cfg.render_canvas.getContext('experimental-webgl');

		// acquire input texture float support (no output support *sad face*)
		cfg.render_canvas.gl.oes_float = cfg.render_canvas.gl.getExtension('OES_texture_float');
		if (!cfg.render_canvas.gl.oes_float) throw 'no oes texture float support, shitting the bed instead';

		// make a gl program
		cfg.render_canvas.gl.program = cfg.render_canvas.gl.createProgram();

		// read and compile vertex shader
		cfg.render_canvas.gl.attachShader(
			cfg.render_canvas.gl.program,
			create_shader(
				cfg.render_canvas.gl,
				document.getElementById('vs').textContent,
				cfg.render_canvas.gl.VERTEX_SHADER));

		// read and compile the fragment shader {{replace mustache values too}}
		cfg.render_canvas.gl.attachShader(
			cfg.render_canvas.gl.program,
			create_shader(
				cfg.render_canvas.gl,
				document.getElementById('fs').textContent
					.split('{{CARRY_TEXTURE_HEIGHT}}') .join(cfg.carry_texture_height  + '.')  // the '.'s are because GLSL
					.split('{{WEIGHT_TEXTURE_WIDTH}}') .join(cfg.weight_texture_width  + '.')  // likes it's float values
					.split('{{WEIGHT_TEXTURE_HEIGHT}}').join(cfg.weight_texture_height + '.')  // to be expressed with a dot;
					.split('{{BIAS_TEXTURE_WIDTH}}')   .join(cfg.bias_texture_width    + '.'), // for example: '420.'
					cfg.render_canvas.gl.FRAGMENT_SHADER));

		// link shader programs
		cfg.render_canvas.gl.linkProgram(cfg.render_canvas.gl.program);

		// give up writing WebGL shaders because apparently everything is an error,
		// check for linker errors
		if (!cfg.render_canvas.gl.getProgramParameter(cfg.render_canvas.gl.program, cfg.render_canvas.gl.LINK_STATUS)) throw cfg.render_canvas.gl.getProgramInfoLog(cfg.render_canvas.gl.program);

		// create the vertex buffer for a screen shaped quad
		cfg.render_canvas.gl.program.VTXPOSbuffer          = cfg.render_canvas.gl.createBuffer();
		cfg.render_canvas.gl.program.VTXPOSbuffer.itemSize = 2;
	    cfg.render_canvas.gl.program.VTXPOSbuffer.numItems = 4;
		cfg.render_canvas.gl.bindBuffer(cfg.render_canvas.gl.ARRAY_BUFFER, cfg.render_canvas.gl.program.VTXPOSbuffer);
		cfg.render_canvas.gl.bufferData(cfg.render_canvas.gl.ARRAY_BUFFER, new Float32Array([-1, -1, 1, -1, -1, 1, 1, 1]), cfg.render_canvas.gl.STATIC_DRAW);

		// mark the compiled program as to use
		cfg.render_canvas.gl.useProgram(cfg.render_canvas.gl.program);

		// attach vertex buffer to program
		cfg.render_canvas.gl.enableVertexAttribArray(cfg.render_canvas.gl.program.verTEXPOSArray);
		cfg.render_canvas.gl.vertexAttribPointer(cfg.render_canvas.gl.program.VTXPOSbuffer, 2, cfg.render_canvas.gl.FLOAT, false, 0, 0);

		// pull all the uniform locations for the fragment shader
		cfg.render_canvas.gl.program.TEX_CARRIES           = cfg.render_canvas.gl.getUniformLocation(cfg.render_canvas.gl.program, 'TEX_CARRIES');
		cfg.render_canvas.gl.program.TEX_WEIGHTS           = cfg.render_canvas.gl.getUniformLocation(cfg.render_canvas.gl.program, 'TEX_WEIGHTS');
		cfg.render_canvas.gl.program.TEX_BIASES            = cfg.render_canvas.gl.getUniformLocation(cfg.render_canvas.gl.program, 'TEX_BIASES');
		cfg.render_canvas.gl.program.CARRY_TEXTURE_HEIGHT  = cfg.render_canvas.gl.getUniformLocation(cfg.render_canvas.gl.program, 'CARRY_TEXTURE_HEIGHT');
		cfg.render_canvas.gl.program.WEIGHT_TEXTURE_WIDTH  = cfg.render_canvas.gl.getUniformLocation(cfg.render_canvas.gl.program, 'WEIGHT_TEXTURE_WIDTH');
		cfg.render_canvas.gl.program.WEIGHT_TEXTURE_HEIGHT = cfg.render_canvas.gl.getUniformLocation(cfg.render_canvas.gl.program, 'WEIGHT_TEXTURE_HEIGHT');
		cfg.render_canvas.gl.program.BIAS_TEXTURE_WIDTH    = cfg.render_canvas.gl.getUniformLocation(cfg.render_canvas.gl.program, 'BIAS_TEXTURE_WIDTH');
		cfg.render_canvas.gl.program.TOTAL_CARRIES         = cfg.render_canvas.gl.getUniformLocation(cfg.render_canvas.gl.program, 'TOTAL_CARRIES');
		cfg.render_canvas.gl.program.TOTAL_NEURONS         = cfg.render_canvas.gl.getUniformLocation(cfg.render_canvas.gl.program, 'TOTAL_NEURONS');

		// create textures references for all three textures
		cfg.render_canvas.gl.program.TEX_CARRIES_REF = cfg.render_canvas.gl.createTexture();
		cfg.render_canvas.gl.program.TEX_WEIGHTS_REF = cfg.render_canvas.gl.createTexture();
		cfg.render_canvas.gl.program.TEX_BIASES_REF  = cfg.render_canvas.gl.createTexture();

		// define the function that will actually evaluate a network given input
		cfg.run = function (inputs) {

			// define a helper function to load textures using pixel perfect settings
			function load_texture (texture_unit, tex_ref, tex_width, tex_height, tex_buffer, tex_sampler) {

				// set the active texture unit (0, 1, or 2 in our case)
				cfg.render_canvas.gl.activeTexture(cfg.render_canvas.gl.TEXTURE0 + texture_unit);

				// bind the texture reference
				cfg.render_canvas.gl.bindTexture(cfg.render_canvas.gl.TEXTURE_2D, tex_ref);

				// copy the buffer of the texture pixel data to the GPU
				cfg.render_canvas.gl.texImage2D(cfg.render_canvas.gl.TEXTURE_2D, 0, cfg.render_canvas.gl.RGBA, tex_width, tex_height, 0, cfg.render_canvas.gl.RGBA, cfg.render_canvas.gl.FLOAT, tex_buffer);

				// use pixel perfect right to the edges rendering
				cfg.render_canvas.gl.texParameteri(cfg.render_canvas.gl.TEXTURE_2D, cfg.render_canvas.gl.TEXTURE_MAG_FILTER, cfg.render_canvas.gl.NEAREST);
				cfg.render_canvas.gl.texParameteri(cfg.render_canvas.gl.TEXTURE_2D, cfg.render_canvas.gl.TEXTURE_MIN_FILTER, cfg.render_canvas.gl.NEAREST);
				cfg.render_canvas.gl.texParameteri(cfg.render_canvas.gl.TEXTURE_2D, cfg.render_canvas.gl.TEXTURE_WRAP_S, cfg.render_canvas.gl.CLAMP_TO_EDGE);
				cfg.render_canvas.gl.texParameteri(cfg.render_canvas.gl.TEXTURE_2D, cfg.render_canvas.gl.TEXTURE_WRAP_T, cfg.render_canvas.gl.CLAMP_TO_EDGE);

				// set the sampler to point at the correct texture unit
				cfg.render_canvas.gl.uniform1i(tex_sampler, texture_unit);
			};

			// define a helper function to render each layer of the network
			function render_layer (layer_index) {

				// load carries, weights and biases into texture units 0 1 and 2 respectively
				load_texture(0, cfg.render_canvas.gl.program.TEX_CARRIES_REF, cfg.carry_texture_width,  cfg.carry_texture_height,  cfg.carry_texture_buffer,                cfg.render_canvas.gl.program.TEX_CARRIES);
				load_texture(1, cfg.render_canvas.gl.program.TEX_WEIGHTS_REF, cfg.weight_texture_width, cfg.weight_texture_height, cfg.weight_texture_buffers[layer_index], cfg.render_canvas.gl.program.TEX_WEIGHTS);
				load_texture(2, cfg.render_canvas.gl.program.TEX_BIASES_REF,  cfg.bias_texture_width,   cfg.bias_texture_height,   cfg.bias_texture_buffers[layer_index],   cfg.render_canvas.gl.program.TEX_BIASES);

				// make draw call
				cfg.render_canvas.gl.drawArrays(cfg.render_canvas.gl.TRIANGLE_STRIP, 0, 4);

				// read output of draw call to ubyte buffer
				cfg.render_canvas.gl.readPixels(0, 0, cfg.carry_texture_width, cfg.carry_texture_height, cfg.render_canvas.gl.RGBA, cfg.render_canvas.gl.UNSIGNED_BYTE, cfg.carry_texture_buffer_read);

				// view output of draw call as floats
				var asfloats = new Float32Array(cfg.carry_texture_buffer_read.buffer);

				// transpose floats back into carry 'r' values for next layer
				for (var f = 0; f < asfloats.length; f++)
					cfg.carry_texture_buffer[f * 4] = asfloats[f];
			};

			// load inputs into carry, zeros for rest (we only use 'r' on carry writes,
			// we read carries out as floats from .RGBA though)
			for (var carry = 0; carry < cfg.carry_texture_buffer.length; carry += 4)
				cfg.carry_texture_buffer[carry] = inputs[carry / 4] || 0;

			// setup shader texture bounds for the input layer to the first hidden layer
			cfg.render_canvas.gl.uniform1f(cfg.render_canvas.gl.program.TOTAL_CARRIES, inputs.length);
			cfg.render_canvas.gl.uniform1f(cfg.render_canvas.gl.program.TOTAL_NEURONS, cfg.hidden_neurons);

			// render hidden layer 0
			render_layer(0);

			// setup shader texture bounds for a hidden layer to a hidden layer
			cfg.render_canvas.gl.uniform1f(cfg.render_canvas.gl.program.TOTAL_CARRIES, cfg.hidden_neurons);
			cfg.render_canvas.gl.uniform1f(cfg.render_canvas.gl.program.TOTAL_NEURONS, cfg.hidden_neurons);

			// render all hidden layers, exempting output layer which may have
			// a different number of neurons
			for (var hidden_layer = 1; hidden_layer < cfg.hidden_layers; hidden_layer++)
				render_layer(hidden_layer);

			// set uniforms for output layer
			cfg.render_canvas.gl.uniform1f(cfg.render_canvas.gl.program.TOTAL_CARRIES, cfg.hidden_neurons);
			cfg.render_canvas.gl.uniform1f(cfg.render_canvas.gl.program.TOTAL_NEURONS, cfg.outputs);

			// render output layer
			render_layer(cfg.hidden_layers);

			// collect outputs
			var output = [];
			for (var o = 0; o < cfg.outputs; o++)
				output.push(cfg.carry_texture_buffer[o * 4]);

			// return output
			return output;
		};

		// define a passable euclidean error function
		cfg.error_euclidean = function (a, b) {
			var distance = 0;
			for (var component = 0; component < a.length; a++)
				distance += Math.pow(a[component] - b[component], 2);
			return Math.sqrt(distance);
		};

		// define a passable absolute error function
		cfg.error_absolute = function (a, b) {
			var distance = 0;
			for (var component = 0; component < a.length; a++)
				distance += Math.abs(a[component] - b[component]);
			return distance;
		};

		// define a useable fitness total function
		cfg.fitness_total = function (test, fn_error) {
			var error = 0;
			for (var t = 0; t < test.length; t++)
				error += fn_error(cfg.run(test[t][0]), test[t][1]);
			return error;
		};

		// define a useable fitness average function
		cfg.fitness_average = function (test, fn_error) {
			var error = 0;
			for (var t = 0; t < test.length; t++)
				error += fn_error(cfg.run(test[t][0]), test[t][1]);
			return error / test.length;
		};

		// define a useable mutation function
		cfg.mutate = function (total_mutations) {

			// create an array to hold an undo changes record
			var mutations = [];

			// iterate through mutations requested
			for (var m = 0; m < total_mutations; m++) {

				// decide to mutate a weight or a bias
				var weight_or_bias = Math.random() > 0.5 ? 1 : 0;

				// if we decided to mutate a weight
				if (weight_or_bias == 1) {

					// randomly choose a hidden layer's texture index
					var texture_index = Math.floor(Math.random() * cfg.weight_texture_buffers.length);

					// randomly choose a float in that layer's texture
					var float_index = Math.floor(Math.random() * cfg.weight_texture_buffers[texture_index].length);

					// add an undo record
					mutations.push({weight_or_bias: weight_or_bias, texture_index: texture_index, float_index: float_index, original_value: cfg.weight_texture_buffers[texture_index][float_index]});

					// mutate
					cfg.weight_texture_buffers[texture_index][float_index] = Math.random();

				// otherwise we decided to mutate a bias
				} else {

					// randomly choose a hidden layer's texture index
					var texture_index = Math.floor(Math.random() * cfg.bias_texture_buffers.length);

					// randomly choose a float in that layer's texture
					var float_index = Math.floor(Math.random() * cfg.bias_texture_buffers[texture_index].length);

					// add an undo record
					mutations.push({weight_or_bias: weight_or_bias, texture_index: texture_index, float_index: float_index, original_value: cfg.bias_texture_buffers[texture_index][float_index]});

					// mutate
					cfg.bias_texture_buffers[texture_index][float_index] = Math.random();
				}
			}

			// return undoable record of mutations
			return mutations;
		};

		// define a useable unmutation function
		cfg.unmutate = function (mutations) {

			// iterate backwards (the direction is important) over mutations made
			for (var m = mutations.length - 1; m >= 0; m--)

				// if this was a weight mutation
				if (mutations[m].weight_or_bias == 1)

					// return the weight to it's original value
					cfg.weight_texture_buffers[mutations[m].texture_index][mutations[m].float_index] = mutations[m].original_value;

				// otherwise this was a bias mutation
				else

					// return the bias to it's original value
					cfg.bias_texture_buffers[mutations[m].texture_index][mutations[m].float_index] = mutations[m].original_value;
		};

		// return the configured network with baked in functions
		return cfg;
	};

	// hook the DOM for writing results like a console
	var cons = document.getElementById('cons');

	// create an XOR test set
	var test = [
		[[0, 0], [0]],
		[[0, 1], [1]],
		[[1, 0], [1]],
		[[1, 1], [0]]
	];

	// create a GPU_MSDA_FF_ANN using 2 inputs, 1 output, 3 hidden layers each with 3 hidden neurons
	var gnn = GPU_MSDA_FF_ANN({inputs: test[0][0].length, outputs: test[0][1].length, hidden_layers: 3, hidden_neurons: 3});

	// calculate the starting fitness as a benchmark
	var best_fitness = gnn.fitness_average(test, gnn.error_euclidean);

	// define a background iteration function
	function iter () {

		// define a helper function for expressing all outputs values and errors
		function express () {

			// define a helper function for padding the left of a string
			function pad_left (str, len) {

				// ensure numbers are strings
				str = str + '';

				// while the string isnt long enough
				while (str.length < len)

					// add spaces to the left
					str = ' ' + str;

				// return the padded string
				return str;
			};

			// iterate through all tests to express outputs
			for (var t = 0; t < test.length; t++) {

				// generate an estimate by running the test input through the network
				var esti = gnn.run(test[t][0])[0];

				// log the result to the console
				cons.innerHTML = 'inpu: ' + test[t][0] + ', real: ' + test[t][1] + ', esti: ' + pad_left(esti.toFixed(8), 11) + ', erro: ' + pad_left(Math.abs(test[t][1][0] - esti).toFixed(8), 11) + '<br>' + cons.innerHTML;
			}

			// log the overal best fitness to the console
			cons.innerHTML = '<br>(fitness: ' + best_fitness + ')<br>' + cons.innerHTML;
		};

		// for every pass (or epoch) add a dot to the console to show the user
		// the network is training
		cons.innerHTML = '. ' + cons.innerHTML;

		// perform a single mutation (XOR is easy, some larger networks can use values
		// as high as 1000->10000 depending on how many neurons are involved)
		var mutations = gnn.mutate(1);

		// calculate the fitness of the mutated network
		var mutated_fitness = gnn.fitness_average(test, gnn.error_euclidean);

		// if the mutated fitness worse than the original fitness
		if (mutated_fitness >= best_fitness)

			// undo mutation changes
			gnn.unmutate(mutations);

		// otherwise the mutated fitness is better than the original fitness
		else {

			// associate a new best fitness and keep changes
			best_fitness = mutated_fitness;

			// tell the user we made an improvement and show them the results
			express();
		}

		// if the best fitness is worse than 1% error
		if (best_fitness > 0.01)

			// keep iterating in the background to not lock up the DOM
			setTimeout(iter, 1);

		// otherwise the fitness is better than 1% error
		else {

			// tell the user how well the end result was
			express();

			// tell the user we are done!
			cons.innerHTML = 'done!<br><br>' + cons.innerHTML;
		}
	};

	// create an entry point function
	function go () {

		// hide button
		document.getElementById('btn').style.display = 'none';

		// begin iteration in the background to not lock up the DOM
		setTimeout(iter, 1);
	};
</script>
</html>