Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Two spiral neural network using particle swarm optimization and differential evolution

Neural Network for Solving the Two-Spiral Problem

This is a simple implementation of a 2-M-1 neural network trained using different optimization algorithms in order to solve the two-spiral problem. The two-spiral problem is a particularly difficult problem that requires separating two logistic spirals from one another [1] [2].

Two Spiral Problem

Files Included:

Neural Network:

  1. twin_spiral_vanilla.m - the main file containing the neural network structure. From here, different functions are called.
  2. nnCostFunction.m - function that contains the feedforward and backpropagation methods for the neural network. Here, both the cost and the gradient is computed and returned back to the main function via a handling variable.
  3. randInitializeWeights.m - function that randomly initializes the weights of the neural network in order for it to break symmetry.
  4. predict.m - function that predicts the classes of the input data given the trained parameters for the neural network.
  5. visualizeData.m - maps the generalization ability of the neural network given the trained parameters.

Optimization Algorithms:

  1. particleSwarmOptimization.m - takes the objective function, initial parameters, and hyperparameters and finds the optima using a global best PSO algorithm.
  2. differentialEvolution.m - takes the objective function, initial parameters, and hyperparameters and finds the optima using the differential evolution algorithm.
  3. fmincg.m - stands for Function minimize nonlinear conjugant gradient, an off-the-shelf algorithm that was used as a benchmark solution

Installation:

First, make sure that you have MATLAB installed. Compatibility to open-source software, such as Octave, has not yet been tested. If you're set, then just clone this repository:

$ git clone https://github.com/ljvmiranda921/two-spiral-neural-net.git

Usage:

Parameter Setting:

Both PSO and DE have different parameters that one can experiment on. In this implementation, the parameters present are summarized in the table below

Particle Swarm Optimization

Parameter Description
maxIter Number of iterations that the PSO algorithm will run
swarmSize Number of particles in the search space.
epsilon Scattering degree of the particles during initialization
c_1 Cognitive component (exploration parameter).
c_2 Social component (exploitation parameter).
inertiaWeight Inertia weight that controls the movement of particles.

Differential Evolution

Parameter Description
genMax Number of generations that the DE algorithm will run
population Number of particles in the search space.
epsilon_de Scattering degree of the particles during initialization
mutationF Degree of mutation effect (exploration parameter)
recombinationC Degree of recombination effect (exploitation parameter)

Implementation Notes:

Generalization Ability

  1. Particle Swarm Optimization (84.86%)
    Cost Function PSO Generalization ability of PSO
  2. Differential Evolution (84.102%)
    Generalization ability of DE
  3. Minimize Nonlinear Conjugant Gradient (100.00%)
    Generalization ability of FMINCG

License

This project is licensed under the MIT License - see the LICENSE.txt file for details

References

[1] Kevin J. Lang and Michael J. Witbrock: Learning to Tell Two Spirals Apart. In: Proceedings of the 1988 Connectionist Models Summer School, Morgan Kauffman, 1998.
[2] http://www.ibiblio.org/pub/academic/computer-science/neural-networks/programs/bench/two-spirals

(A more comprehensive explanation of the theory and the implementation results for the various optimization algorithms will be linked here as a separate documentation)

function [nn_params, cost, J_hist, trigger] = differentialEvolution(costFunction, initial_nn_params, epsilon, nvars, populationSize, generation, f, cr)
%DIFFERENTIALEVOLUTION Finds the optimal solution of an objective
% function using the Differential Evolution algorithm.
%
% DE Steps:
% 1. Initialization - initialize particles.
% 2. Mutation - randomly select three vectors and compute for trial vector.
% 3. Recombination - from a donor vector, do some exchange of genes using
% the probability C.
% 4. Selection - do tournament selection.
%% =============================== Initialize Parameters ===================================
% Particle Vector
particleVec = [];
for iter = 1:populationSize
temp = epsilon.* randn(nvars,1) + initial_nn_params;
particleVec(iter,:) = vertcat(temp);
end
% Cost Tracker
J = [];
% Cost History
J_hist = [];
for iter = 1:generation
%% =============================== Mutation ================================================
% For mutation, you choose three random vectors, and relate them in order
% to build the donor vector (donorVec). This method is controlled by the
% parameter f.
randIndex = randperm(populationSize,3);
donorVec = particleVec(randIndex(3),:) + f * (particleVec(randIndex(1),:) - particleVec(randIndex(2),:));
%% ============================ Recombination ==============================================
% For recombination, you build your trial vector (trialVec) using the
% elements of the donor vector (donorVec) and the elements of your target vector.
% This means that for each particle in the population (for each row of
% particle) and for each dimension of that particle (for each column of
% each row of particle), you have to do the comparison.
trialVec = zeros(populationSize,nvars);
randomNumVec = rand(populationSize,nvars);
i_rand = randperm(nvars,1);
for particleIndex = 1:size(particleVec,1)
for dimensionIndex = 1:nvars
if randomNumVec(particleIndex,dimensionIndex) <= cr || dimensionIndex == i_rand
trialVec(particleIndex,dimensionIndex) = donorVec(1,dimensionIndex);
elseif randomNumVec(particleIndex,dimensionIndex) > cr && dimensionIndex ~= i_rand
trialVec(particleIndex,dimensionIndex) = particleVec(particleIndex,dimensionIndex);
end
end
end
%% ============================ Selection ==============================================
% For selection, the target vector (particle) is then compared against the
% trial vector (trialVec) if costFunction(trialVec(particleIndex,:)) is less than the computed
% costFunction(particleVec(particleIndex,:)), then you just replace that
% particle with the one captured from trialVec. After all particles are
% tested, we then have the second generation.
for particleIndex = 1:size(particleVec,1)
if costFunction(trialVec(particleIndex,:)') <= costFunction(particleVec(particleIndex,:)')
particleVec(particleIndex,:) = trialVec(particleIndex,:);
end
end
%% ============================ Get Fittest Gene for this Generation =========================================
for particleIndex = 1:size(particleVec,1)
J(particleIndex) = costFunction(particleVec(particleIndex,:)');
end
[minimaJ, minimaJIndex] = min(J);
J_hist = [J_hist' [iter; minimaJ]]';
Y = ['Generation ',num2str(iter),'| Fittest Gene Cost: ', num2str(minimaJ)];
disp(Y);
end
nn_params = particleVec(minimaJIndex,:)';
cost = min(J);
trigger = 2;
end
function [nn_params, cost] = gradientDescentNN(costFunction, initial_nn_params, learning_rate, num_iters)
% [gradientDescentNN] Performs gradient descent to learn theta
for iter = 1 : num_iters
[computedCost, gradient] = costFunction(initial_nn_params);
fprintf('\nIteration %i | Cost: %j \n',iter,computedCost);
disp(computedCost);
initial_nn_params = initial_nn_params - learning_rate * gradient;
end
nn_params = initial_nn_params;
cost = computedCost;
end
function [J, grad] = nnCostFunction(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
%
% Parameters:
% nn_params - for first-use, put the initial neural network parameters you
% have
% input_layer_size - number of nodes in the input layer
% hidden_layer_size - number of nodes in the hidden layer
% num_labels - the number of categories you have
% X - input variables
% y - target variables
% lambda - regularization parameter.
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
m = size(X, 1);
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));
%% ============================================== FEEDFORWARD IMPLEMENTATION =========================================
a_1 = [ones(m, 1) X];
z_2 = a_1 * Theta1';
a_2 = sin(z_2);
a_2 = [ones(size(a_2,1), 1) a_2];
z_3 = a_2 * Theta2';
a_3 = sigmoid(z_3);
hypothesis = a_3;
y_i = zeros(m,num_labels);
for i = 1:m
y_i(i,y(i)) = 1;
end
J = 1/m * sum(sum(-1 * y_i .* log(hypothesis)-(1-y_i) .* log(1-hypothesis)));
% J = 1/m * sumsqr(hypothesis - y_i);
%J = (1/m) * sum(sum(-1 * ((y_i + 1) / 2) .* log((hypothesis + 1) / 2) - (1 - (y_i / 2)) .* log(1 - (hypothesis + 1) / 2)));
regularization_term = (lambda / (2 * m) * (sum(sumsqr(Theta1(:,2:input_layer_size+1))) + sum(sumsqr(Theta2(:,2:hidden_layer_size+1)))));
J = J + regularization_term;
%% ============================================== BACKPROPAGATION IMPLEMENTATION =========================================
for t = 1:m
a_1 = [1; X(t,:)']; % Attach bias vector (with ones) on a1
z_2 = Theta1 * a_1; % Compute pre-activation z2
a_2 = [1; sin(z_2)]; % Compute activation a2 (using sin), attach bias vector
z_3 = Theta2 * a_2; % Compute pre-activation z3
a_3 = sigmoid(z_3); % Compute activation a3 (using sigmoid)
target_output = ([1:num_labels]==y(t))';
delta_3 = a_3 - target_output; % Compute the difference between the activation result and the target output
delta_2 = (Theta2' * delta_3) .* [1; cos(z_2)]; % Compute delta_2 , always get the g' of layer 2.
delta_2 = delta_2(2:end);
% Accumulate Delta
Theta1_grad = Theta1_grad + delta_2 * a_1';
Theta2_grad = Theta2_grad + delta_3 * a_2';
end
% Compute for the partial derivative terms
Theta1_grad = (1/m) * Theta1_grad + (lambda/m) * [zeros(size(Theta1, 1), 1) Theta1(:,2:end)];
Theta2_grad = (1/m) * Theta2_grad + (lambda/m) * [zeros(size(Theta2, 1), 1) Theta2(:,2:end)];
grad = [Theta1_grad(:) ; Theta2_grad(:)];
% ======================================================================================================================
end
function [nn_params, cost, J_hist, trigger] = particleSwarmOptimization(costFunction, initial_nn_params, epsilon, nvars, swarmSize, maxIter, c_1, c_2, inertia, v_initial)
% PARTICLESWARMOPTIMIZATION Finds the optimal solution of an objective
% function using the PSO optimization algorithm.
%
%
% Parameters:
% epsilon - defines the spread of the particles in the search area.
% nvars - defines the number of parameters to be minimized in the objective
% function
% swarmSize - defines the number of particles found in the swarm.
% maxIter - number of iterations for the algorithm to run.
% c_1 - cognitive coefficient (exploration parameter)
% c_2 - social coefficient (exploitation parameter)
% inertiaWeight - defines the inertia for the particle movement.
% v_initial - sets the initial nudge of the particles.
%% =================== Initialize Different Matrices =======================
% currentPos
currentPos = [];
for iter = 1:swarmSize
tempSwarm = epsilon.* randn(82,1) + initial_nn_params;
currentPos(iter,:) = vertcat(tempSwarm);
end
% pbestPos
pbestPos = currentPos;
% gbestPos
for particle = 1:swarmSize
temp = currentPos(particle,:)';
[J] = costFunction(temp);
jMat(particle,:) = vertcat(J);
end
[~, minIndex] = min(jMat);
gbestPos = currentPos(minIndex,:);
% velocityMatrix
velocityMatrix = v_initial * ones(swarmSize,nvars);
% J_hist and other graphing vectors
J_hist = [];
pBestJVec = [];
currentPosJVec = [];
%% ========================================= Actual PSO ====================================================
filename = 'pso_finalrun_1.gif';
for iter = 1:maxIter
for particle = 1:swarmSize
% Set the personal best option
currentPosJVec = [currentPosJVec costFunction(currentPos(particle,:))];
if costFunction(currentPos(particle,:)') < costFunction(pbestPos(particle,:)')
pbestPos(particle,:) = currentPos(particle,:);
end
pBestJVec = [pBestJVec costFunction(pbestPos(particle,:))];
% Set the global best option
if costFunction(currentPos(particle,:)') < costFunction(gbestPos')
gbestPos = currentPos(particle,:);
end
end
%-------------------------------------------------------------------------------------------------------------------------------------
p = 1;
myCurrent = costFunction(currentPos(p,:)');
myBest = costFunction(pbestPos(p,:)');
ourBest = costFunction(gbestPos');
X = ['Iteration ',num2str(iter),'| currentPos: ', num2str(myCurrent), ' | pbestPos: ', num2str(myBest), ' | gbestPos: ', num2str(ourBest)];
disp(X);
scatter(currentPos(:,1),currentPos(:,2), 'MarkerEdgeColor','k','MarkerFaceColor',[1 1 0]);
xlim([7.0 7.4]);
ylim([6.1 6.5]);
xlabel('\theta_{11}');ylabel('\theta_{12}');
axis square;
title(['Plot of Particle Movement']);
dim = [.55 .30 .6 .6];
str = {'Parameters',strcat('Swarm size: ', num2str(swarmSize)), strcat('Particle spread: ', num2str(epsilon)), strcat('Inertia: ', num2str(inertia)), strcat('c: ', num2str(c_1)), strcat('s: ',num2str(c_2))};
annotation('textbox',dim,'String',str,'FitBoxToText','on','Tag' , 'particleMovement');
pause(.001) %// pause 0.1 seconds to slow things down
drawnow
frame = getframe(1);
im{iter} = frame2im(frame);
[A,map] = rgb2ind(im{iter},256);
if iter == 1
imwrite(A,map,filename,'gif','LoopCount',Inf,'DelayTime',0.50);
else
imwrite(A,map,filename,'gif','WriteMode','append','DelayTime',0.50);
end
%-------------------------------------------------------------------------------------------------------------------------------------
for particle = 1:swarmSize
% Update velocity
cognitive_component = c_1 * rand(1,nvars) .* (pbestPos(particle,:) - currentPos(particle,:));
social_component = c_2 * rand(1,nvars) .* (gbestPos - currentPos(particle,:));
velocityMatrix(particle,:) = inertia * velocityMatrix(particle,:) + cognitive_component + social_component;
% Update position
currentPos(particle,:) = currentPos(particle,:) + velocityMatrix(particle,:);
end
J_hist = [J_hist' [iter; costFunction(gbestPos'); mean(pBestJVec);mean(currentPosJVec)]]';
end
nn_params = gbestPos';
cost = costFunction(gbestPos');
trigger = 1;
end
function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
% p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
% trained weights of a neural network (Theta1, Theta2)
m = size(X, 1);
num_labels = size(Theta2, 1);
p = zeros(size(X, 1), 1);
h1 = sin([ones(m, 1) X] * Theta1');
h2 = sigmoid([ones(m, 1) h1] * Theta2');
[dummy, p] = max(h2, [], 2);
% =========================================================================
end
function W = randInitializeWeights( L_in, L_out )
%RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in
%incoming connections and L_out outgoing connections
%W = zeros(L_out, 1 + L_in);
epsilon = 0.12;
W = rand(L_out, 1 + L_in) * 2 * epsilon - epsilon;
end
function W = randomizeInputHidden( L_in, L_out )
%RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in
%incoming connections and L_out outgoing connections
% W = ones(L_out, 1 + L_in);
epsilon = 0.12;
W = 7 + rand(L_out, 1 + L_in) * 2 * epsilon - epsilon;
end
function g = sigmoid(z)
g = 1 ./ (1 + exp(-z));
end
function g = sigmoidGradient(z)
g = zeros(size(z));
g = sigmoid(z) .* (1 - sigmoid(z));
end
function g = tanhGradient(z)
%SIGMOIDGRADIENT returns the gradient of the tanh function
%evaluated at z
% g = TANHGRADIENT(z) computes the gradient of the tanh function
% evaluated at z. This should work regardless if z is a matrix or a
% vector. In particular, if z is a vector or matrix, it should return
% the gradient for each element.
g = zeros(size(z));
g = 1 - tanh(z).^2;
end
%**************************************************************************
% CI: Twin Spirals Problem using Neural Networks
% -------------------------------------------------------------------------
% Name: MIRANDA, Lester James Validad
% ID No: 44161652-3
% -------------------------------------------------------------------------
%% Initialization
clear; close all; clc
%% Set-up the parameters
input_layer_size = 2;
hidden_layer_size = 16;
num_labels = 2;
%% ============================= Part 0: Load two_spiral dataset from csv file ===================
% In this part, we load the two_spiral dataset from the csv file found in
% the same folder.
fprintf('Status: Displaying data\n');
cd('C:\Users\Lj Miranda\Documents\Waseda\2016-01 Fall\110400F Computational Intelligence\reports\report-ci-final\twin-spirals')
two_spiral = csvread('two_spiral.csv');
X = two_spiral(:,1:2);
y = two_spiral(:,3);
% gscatter(X(:,1),X(:,2),y,'rb','xo');
% xlabel('x');
% ylabel('y');
fprintf('Program paused. Press enter to continue.\n');
pause;
%% ============================= Part 1: Initial Weight Parameters ===============================
% In this part, we initialize the weight parameters. We call the function
% randomizeInputHidden( ) and randInitializeWeights( ) in order to give
% a random initialization for both the I-H and H-O weights. It is important
% to randomize this in order to break symmetry. Afterwards, we unroll the
% initial parameters into a variable initial_nn_params
fprintf('\nStatus: Initializing Neural Network Weight Parameters\n');
initial_Theta1 = randomizeInputHidden(input_layer_size,hidden_layer_size);
initial_Theta2 = randInitializeWeights(hidden_layer_size,num_labels);
% Unroll parameters
initial_nn_params = [initial_Theta1(:); initial_Theta2(:)];
fprintf('Program paused. Press enter to continue.\n');
pause;
%% ============================== Part 2: Train the Neural Network ================================
% We train the neural network using different optimization techniques that
% we have built. This part is still divided into different sections, in
% order to highlight different steps in neural network training.
fprintf('\nStatus: Training Neural Network\n');
%------------------------------------ 2.1 Training Parameters --------------------------------------
% This section puts all training parameters for all of the optimization
% techniques that we have.
trigger = 0;
% 1. Minimum Unconstrained Gradient Descent Parameters:
options_fmincg = optimset('MaxIter', 10000);
%options_fminunc = optimoptions('fminunc','Algorithm','trust-region','GradObj','on','Display','iter');
% 2. Stochastic Gradient Descent Parameters:
lambda = 0;
learning_rate = 1.2;
epochs = 20000;
% 3. Particle Swarm Optimization Parameters:
epsilon = 0.1;
nvars = 82;
swarmSize = 50;
maxIter = 100;
c_1 = 1.5; % Cognitive component - exploration parameter
c_2 = 0.9; % Social component - exploitation parameter
inertiaWeight = 0.7; % Inertia Weight, smaller values makes your particles easy to converge,
% Larger values makes your particles explore more. A
% recommended rule of thumb would be:
% inertiaWeight > 0.5 * (cogEff + socEff) -1
v_initial = 0.08;
% 4. Differential Evolution Parameters:
epsilon_de = 0.1;
nvars_de = 82;
population = 20;
genMax = 300;
mutationF = 0.19; % Mutation Parameter - exploration parameter [0->2]
recombinationC = 0.07; % Recombination parameter - exploitation parameter [0->1]
%---------------------------------------------------------------------------------------------------
%-------------------------------- 2.1 Neural Network Training --------------------------------------
% In this part, we create a handle for our cost function. Thus, we have a
% costFunction that is a function that only takes one argument (the neural
% network parameters). Now, in order to make a new optimization algorithm,
% just follow these guidelines:
% 1. Let this function have the costFunction as its input argument.
% 2. The output arguments must include both the final parameters
% nn_params, and the cost.
% 3. Make sure you have a J_hist array to keep record of the costs
% so that you can graph them later on.
costFunction = @(p) nnCostFunction(p, input_layer_size, hidden_layer_size, num_labels, X, y, lambda);
% ----------------------------------------------------------------
% Optimization Algorithms (just uncomment the one you will use):
% ----------------------------------------------------------------
% 1. Minimum Unconstrained Gradient Descent
%[nn_params, cost] = fmincg(costFunction, initial_nn_params, options_fmincg);
% 2. Particle Swarm Optimization
getBenchmark1 = load('85_fmincg_theta1.mat');
getBenchmark2 = load('85_fmincg_theta2.mat');
benchmarkParams = [getBenchmark1.Theta1(:); getBenchmark2.Theta2(:)];
%[nn_params, cost, J_hist, trigger] = particleSwarmOptimization(costFunction, benchmarkParams, epsilon, nvars, swarmSize, maxIter, c_1, c_2, inertiaWeight, v_initial);
% 3. Differential Evolution
getBenchmark1 = load('85_fmincg_theta1.mat');
getBenchmark2 = load('85_fmincg_theta2.mat');
benchmarkParams = [getBenchmark1.Theta1(:); getBenchmark2.Theta2(:)];
[nn_params, cost, J_hist, trigger] = differentialEvolution(costFunction, benchmarkParams, epsilon_de, nvars_de, population, genMax, mutationF, recombinationC);
%[nn_params, cost, J_hist, trigger] = differentialEvolution(costFunction, initial_nn_params, epsilon_de, nvars_de, population, genMax, mutationF, recombinationC);
% Obtain Theta1 and Theta2 back from nn_params
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), num_labels, (hidden_layer_size + 1));
fprintf('Program paused. Press enter to continue.\n');
pause;
%% ======================================================== Part 3: Implement Predict ====================================================================
% After training the neural network, we would like to use it to predict
% the labels. The "predict" function is use in the
% neural network to predict the labels of the training set. This lets
% you compute the training set accuracy.
pred = predict(Theta1, Theta2, X);
accuracy = mean(double(pred == y)) * 100;
fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == y)) * 100);
fprintf('Program paused. Press enter to continue.\n');
pause;
%% ======================================================== Part 4: Graph cost vs. iterations ====================================================================
fprintf('\nStatus: Plotting Cost History\n');
if trigger == 1
my_iter = int16(J_hist(:,1));
plot(my_iter,J_hist(:,2),'DisplayName','Global Best');
title('Cost History');
xlabel('Iterations');
ylabel('Cost, J(\theta)');
axis square
hold on
plot(my_iter,J_hist(:,3),'DisplayName','Personal Best (Mean)');
plot(my_iter,J_hist(:,4),'DisplayName','Current Position (Mean)');
legend('show')
delete(findall(gcf,'Tag','particleMovement'))
elseif trigger == 2
my_iter = int16(J_hist(:,1));
plot(my_iter,J_hist(:,2));
title('Cost History');
xlabel('Generations');
ylabel('Cost, J(\theta)');
axis square;
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment