ljvmiranda921/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Neural Network for Solving the Two-Spiral Problem

This is a simple implementation of a 2-M-1 neural network trained using different optimization algorithms in order to solve the two-spiral problem. The two-spiral problem is a particularly difficult problem that requires separating two logistic spirals from one another [1] [2].

Files Included:

Neural Network:


twin_spiral_vanilla.m - the main file containing the neural network structure. From here, different functions are called.
nnCostFunction.m - function that contains the feedforward and backpropagation methods for the neural network. Here, both the cost and the gradient is computed and returned back to the main function via a handling variable.
randInitializeWeights.m - function that randomly initializes the weights of the neural network in order for it to break symmetry.
predict.m - function that predicts the classes of the input data given the trained parameters for the neural network.
visualizeData.m - maps the generalization ability of the neural network given the trained parameters.

Optimization Algorithms:


particleSwarmOptimization.m - takes the objective function, initial parameters, and hyperparameters and finds the optima using a global best PSO algorithm.
differentialEvolution.m - takes the objective function, initial parameters, and hyperparameters and finds the optima using the differential evolution algorithm.
fmincg.m - stands for Function minimize nonlinear conjugant gradient, an off-the-shelf algorithm that was used as a benchmark solution

Installation:

First, make sure that you have MATLAB installed. Compatibility to open-source software, such as Octave, has not yet been tested. If you're set, then just clone this repository:
$ git clone https://github.com/ljvmiranda921/two-spiral-neural-net.git
Usage:

Parameter Setting:

Both PSO and DE have different parameters that one can experiment on. In this implementation, the parameters present are summarized in the table below
Particle Swarm Optimization


Parameter
Description


maxIter
Number of iterations that the PSO algorithm will run


swarmSize
Number of particles in the search space.


epsilon
Scattering degree of the particles during initialization


c_1
Cognitive component (exploration parameter).


c_2
Social component (exploitation parameter).


inertiaWeight
Inertia weight that controls the movement of particles.


Differential Evolution


Parameter
Description


genMax
Number of generations that the DE algorithm will run


population
Number of particles in the search space.


epsilon_de
Scattering degree of the particles during initialization


mutationF
Degree of mutation effect (exploration parameter)


recombinationC
Degree of recombination effect (exploitation parameter)


Implementation Notes:

Generalization Ability


Particle Swarm Optimization (84.86%)

 
Differential Evolution (84.102%)


Minimize Nonlinear Conjugant Gradient (100.00%)


License

This project is licensed under the MIT License - see the LICENSE.txt file for details
References

[1] Kevin J. Lang and Michael J. Witbrock: Learning to Tell Two Spirals Apart. In: Proceedings of the 1988 Connectionist Models Summer School, Morgan Kauffman, 1998.

[2] http://www.ibiblio.org/pub/academic/computer-science/neural-networks/programs/bench/two-spirals
(A more comprehensive explanation of the theory and the implementation results for the various optimization algorithms will be linked here as a separate documentation)

  
## differentialEvolution.m
function [nn_params, cost, J_hist, trigger] = differentialEvolution(costFunction, initial_nn_params, epsilon, nvars, populationSize, generation, f, cr)
%DIFFERENTIALEVOLUTION Finds the optimal solution of an objective
% function using the Differential Evolution algorithm.
%
% DE Steps:
% 1. Initialization - initialize particles.
% 2. Mutation - randomly select three vectors and compute for trial vector.
% 3. Recombination - from a donor vector, do some exchange of genes using
%                     the probability C.
% 4. Selection - do tournament selection.


%% =============================== Initialize Parameters ===================================
% Particle Vector
particleVec = [];
for iter = 1:populationSize
    temp = epsilon.* randn(nvars,1) + initial_nn_params;
    particleVec(iter,:) = vertcat(temp);
end

% Cost Tracker
J = [];

% Cost History
J_hist = [];

for iter = 1:generation
%% =============================== Mutation ================================================
% For mutation, you choose three random vectors, and relate them in order
% to build the donor vector (donorVec). This method is controlled by the
% parameter f.

    randIndex = randperm(populationSize,3);
    donorVec = particleVec(randIndex(3),:) + f * (particleVec(randIndex(1),:) - particleVec(randIndex(2),:));

%% ============================ Recombination ==============================================
% For recombination, you build your trial vector (trialVec) using the
% elements of the donor vector (donorVec) and the elements of your target vector.
% This means that for each particle in the population (for each row of
% particle) and for each dimension of that particle (for each column of
% each row of particle), you have to do the comparison.

    trialVec = zeros(populationSize,nvars);
    randomNumVec = rand(populationSize,nvars);
    i_rand = randperm(nvars,1);
    for particleIndex = 1:size(particleVec,1)
        for dimensionIndex = 1:nvars
            if randomNumVec(particleIndex,dimensionIndex) <= cr || dimensionIndex == i_rand
                trialVec(particleIndex,dimensionIndex) = donorVec(1,dimensionIndex);
            elseif randomNumVec(particleIndex,dimensionIndex) > cr && dimensionIndex ~= i_rand
                trialVec(particleIndex,dimensionIndex) = particleVec(particleIndex,dimensionIndex);
            end
        end
    end

%% ============================ Selection ==============================================
% For selection, the target vector (particle) is then compared against the
% trial vector (trialVec) if costFunction(trialVec(particleIndex,:)) is less than the computed
% costFunction(particleVec(particleIndex,:)), then you just replace that
% particle with the one captured from trialVec. After all particles are
% tested, we then have the second generation.

    for particleIndex = 1:size(particleVec,1)
        if costFunction(trialVec(particleIndex,:)') <= costFunction(particleVec(particleIndex,:)')
            particleVec(particleIndex,:) = trialVec(particleIndex,:);
        end
    end

%% ============================ Get Fittest Gene for this Generation =========================================
    for particleIndex = 1:size(particleVec,1)
        J(particleIndex) = costFunction(particleVec(particleIndex,:)');
    end

    [minimaJ,  minimaJIndex] = min(J);
    J_hist = [J_hist' [iter; minimaJ]]';


    Y = ['Generation ',num2str(iter),'| Fittest Gene Cost: ', num2str(minimaJ)];
    disp(Y);

end

nn_params = particleVec(minimaJIndex,:)';
cost = min(J);
trigger = 2;

end

## gradientDescentNN.m
function [nn_params, cost] = gradientDescentNN(costFunction, initial_nn_params, learning_rate, num_iters)
% [gradientDescentNN] Performs gradient descent to learn theta

for iter = 1 : num_iters
    [computedCost, gradient] = costFunction(initial_nn_params);
    fprintf('\nIteration %i | Cost: %j \n',iter,computedCost);
    disp(computedCost);
    initial_nn_params = initial_nn_params - learning_rate * gradient;
end

nn_params = initial_nn_params;
cost = computedCost;

end

## nnCostFunction.m
function [J, grad] = nnCostFunction(nn_params, ...
                                    input_layer_size, ...
                                    hidden_layer_size, ...
                                    num_labels, ...
                                    X, y, lambda)

%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
%
% Parameters:
% nn_params - for first-use, put the initial neural network parameters you
%             have
% input_layer_size - number of nodes in the input layer
% hidden_layer_size - number of nodes in the hidden layer
% num_labels - the number of categories you have
% X - input variables
% y - target variables
% lambda - regularization parameter.

Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));


m = size(X, 1);

J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));


%% ============================================== FEEDFORWARD IMPLEMENTATION =========================================

a_1 = [ones(m, 1) X];
z_2 = a_1 * Theta1';
a_2 = sin(z_2);
a_2 = [ones(size(a_2,1), 1) a_2];
z_3 = a_2 * Theta2';
a_3 = sigmoid(z_3);
hypothesis = a_3;


y_i = zeros(m,num_labels);
for i = 1:m
    y_i(i,y(i)) = 1;
end

J = 1/m * sum(sum(-1 * y_i .* log(hypothesis)-(1-y_i) .* log(1-hypothesis)));
% J = 1/m * sumsqr(hypothesis - y_i);
%J = (1/m) * sum(sum(-1 * ((y_i + 1) / 2) .* log((hypothesis + 1) / 2) - (1 - (y_i / 2)) .* log(1 - (hypothesis + 1) / 2)));
regularization_term = (lambda / (2 * m) * (sum(sumsqr(Theta1(:,2:input_layer_size+1))) + sum(sumsqr(Theta2(:,2:hidden_layer_size+1)))));
J = J + regularization_term;

%% ============================================== BACKPROPAGATION IMPLEMENTATION =========================================

for t = 1:m
	a_1 = [1; X(t,:)'];      % Attach bias vector (with ones) on a1
	z_2 = Theta1 * a_1;       % Compute pre-activation z2
	a_2 = [1; sin(z_2)];     % Compute activation a2 (using sin), attach bias vector
	z_3 = Theta2 * a_2;       % Compute pre-activation z3
	a_3 = sigmoid(z_3);       % Compute activation a3 (using sigmoid)
	target_output = ([1:num_labels]==y(t))';

	delta_3 = a_3 - target_output;   % Compute the difference between the activation result and the target output
	delta_2 = (Theta2' * delta_3) .* [1; cos(z_2)]; % Compute delta_2 , always get the g' of layer 2.
	delta_2 = delta_2(2:end);

	% Accumulate Delta
	Theta1_grad = Theta1_grad + delta_2 * a_1';
	Theta2_grad = Theta2_grad + delta_3 * a_2';
end

% Compute for the partial derivative terms
Theta1_grad = (1/m) * Theta1_grad + (lambda/m) * [zeros(size(Theta1, 1), 1) Theta1(:,2:end)];
Theta2_grad = (1/m) * Theta2_grad + (lambda/m) * [zeros(size(Theta2, 1), 1) Theta2(:,2:end)];

grad = [Theta1_grad(:) ; Theta2_grad(:)];
% ======================================================================================================================


end

## particleSwarmOptimization.m
function [nn_params, cost, J_hist, trigger] = particleSwarmOptimization(costFunction, initial_nn_params, epsilon, nvars, swarmSize, maxIter, c_1, c_2, inertia, v_initial)
% PARTICLESWARMOPTIMIZATION Finds the optimal solution of an objective
% function using the PSO optimization algorithm.
%
%
% Parameters:
% epsilon - defines the spread of the particles in the search area.
% nvars - defines the number of parameters to be minimized in the objective
%         function
% swarmSize - defines the number of particles found in the swarm.
% maxIter - number of iterations for the algorithm to run.
% c_1 - cognitive coefficient (exploration parameter)
% c_2 - social coefficient (exploitation parameter)
% inertiaWeight - defines the inertia for the particle movement.
% v_initial - sets the initial nudge of the particles.

%% =================== Initialize Different Matrices =======================
% currentPos
currentPos = [];
for iter = 1:swarmSize
tempSwarm = epsilon.* randn(82,1) + initial_nn_params;
currentPos(iter,:) = vertcat(tempSwarm);
end

% pbestPos
pbestPos = currentPos;

% gbestPos
for particle = 1:swarmSize
    temp = currentPos(particle,:)';
    [J] = costFunction(temp);
    jMat(particle,:) = vertcat(J);
end
[~, minIndex] = min(jMat);
gbestPos = currentPos(minIndex,:);


% velocityMatrix
velocityMatrix = v_initial * ones(swarmSize,nvars);

% J_hist and other graphing vectors
J_hist = [];
pBestJVec = [];
currentPosJVec = [];


%% ========================================= Actual PSO ====================================================
filename = 'pso_finalrun_1.gif';
for iter = 1:maxIter
    for particle = 1:swarmSize
        % Set the personal best option
        currentPosJVec = [currentPosJVec costFunction(currentPos(particle,:))];
        if costFunction(currentPos(particle,:)') < costFunction(pbestPos(particle,:)')
            pbestPos(particle,:) = currentPos(particle,:);
        end
        pBestJVec = [pBestJVec costFunction(pbestPos(particle,:))];
        % Set the global best option
        if costFunction(currentPos(particle,:)') < costFunction(gbestPos')
             gbestPos = currentPos(particle,:);
        end
    end

%-------------------------------------------------------------------------------------------------------------------------------------
p = 1;
myCurrent = costFunction(currentPos(p,:)');
myBest = costFunction(pbestPos(p,:)');
ourBest = costFunction(gbestPos');
X = ['Iteration ',num2str(iter),'| currentPos: ', num2str(myCurrent), ' | pbestPos: ', num2str(myBest), ' | gbestPos: ', num2str(ourBest)];
disp(X);


scatter(currentPos(:,1),currentPos(:,2), 'MarkerEdgeColor','k','MarkerFaceColor',[1 1 0]);
xlim([7.0 7.4]);
ylim([6.1 6.5]);
xlabel('\theta_{11}');ylabel('\theta_{12}');
axis square;
title(['Plot of Particle Movement']);
dim = [.55 .30 .6 .6];
str = {'Parameters',strcat('Swarm size: ', num2str(swarmSize)), strcat('Particle spread: ', num2str(epsilon)), strcat('Inertia: ', num2str(inertia)), strcat('c: ', num2str(c_1)), strcat('s: ',num2str(c_2))};
annotation('textbox',dim,'String',str,'FitBoxToText','on','Tag' , 'particleMovement');
pause(.001) %// pause 0.1 seconds to slow things down
drawnow
frame = getframe(1);
im{iter} = frame2im(frame);
[A,map] = rgb2ind(im{iter},256);
if iter == 1
   imwrite(A,map,filename,'gif','LoopCount',Inf,'DelayTime',0.50);
else
   imwrite(A,map,filename,'gif','WriteMode','append','DelayTime',0.50);
end

%-------------------------------------------------------------------------------------------------------------------------------------
    for particle = 1:swarmSize
        % Update velocity
        cognitive_component = c_1 * rand(1,nvars) .* (pbestPos(particle,:) - currentPos(particle,:));
        social_component = c_2 * rand(1,nvars) .* (gbestPos - currentPos(particle,:));
        velocityMatrix(particle,:) = inertia * velocityMatrix(particle,:) + cognitive_component + social_component;
        % Update position
        currentPos(particle,:) = currentPos(particle,:) + velocityMatrix(particle,:);
    end

J_hist = [J_hist' [iter; costFunction(gbestPos'); mean(pBestJVec);mean(currentPosJVec)]]';
end

nn_params = gbestPos';
cost = costFunction(gbestPos');
trigger = 1;


end

## predict.m
function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
%   trained weights of a neural network (Theta1, Theta2)


m = size(X, 1);
num_labels = size(Theta2, 1);

p = zeros(size(X, 1), 1);

h1 = sin([ones(m, 1) X] * Theta1');
h2 = sigmoid([ones(m, 1) h1] * Theta2');
[dummy, p] = max(h2, [], 2);

% =========================================================================

end

## randInitializeWeights.m
function W = randInitializeWeights( L_in, L_out )
%RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in
%incoming connections and L_out outgoing connections

%W = zeros(L_out, 1 + L_in);
epsilon = 0.12;
W = rand(L_out, 1 + L_in) * 2 * epsilon - epsilon;

end

## randomizeInputHidden.m
function W = randomizeInputHidden( L_in, L_out )
%RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in
%incoming connections and L_out outgoing connections

% W = ones(L_out, 1 + L_in);
epsilon = 0.12;
 W = 7 + rand(L_out, 1 + L_in) * 2 * epsilon - epsilon;

end

## sigmoid.m
function g = sigmoid(z)

g = 1 ./ (1 + exp(-z));

end

## sigmoidGradient.m
function g = sigmoidGradient(z)
g = zeros(size(z));
g = sigmoid(z) .* (1 - sigmoid(z));

end

## tanhGradient.m
function g = tanhGradient(z)
%SIGMOIDGRADIENT returns the gradient of the tanh function
%evaluated at z
%   g = TANHGRADIENT(z) computes the gradient of the tanh function
%   evaluated at z. This should work regardless if z is a matrix or a
%   vector. In particular, if z is a vector or matrix, it should return
%   the gradient for each element.

g = zeros(size(z));
g = 1 - tanh(z).^2;

end

## twin_spiral_vanilla.m
%**************************************************************************
%           CI: Twin Spirals Problem using Neural Networks
% -------------------------------------------------------------------------
% Name: MIRANDA, Lester James Validad
% ID No: 44161652-3
% -------------------------------------------------------------------------

%% Initialization
clear; close all; clc

%% Set-up the parameters
input_layer_size = 2;
hidden_layer_size = 16;
num_labels = 2;

%% ============================= Part 0: Load two_spiral dataset from csv file ===================
% In this part, we load the two_spiral dataset from the csv file found in
% the same folder.

fprintf('Status: Displaying data\n');
cd('C:\Users\Lj Miranda\Documents\Waseda\2016-01 Fall\110400F Computational Intelligence\reports\report-ci-final\twin-spirals')
two_spiral = csvread('two_spiral.csv');
X = two_spiral(:,1:2);
y = two_spiral(:,3);
% gscatter(X(:,1),X(:,2),y,'rb','xo');
% xlabel('x');
% ylabel('y');
fprintf('Program paused. Press enter to continue.\n');
pause;

%% ============================= Part 1: Initial Weight Parameters ===============================
% In this part, we initialize the weight parameters. We call the function
% randomizeInputHidden( ) and randInitializeWeights( ) in order to give
% a random initialization for both the I-H and H-O weights. It is important
% to randomize this in order to break symmetry. Afterwards, we unroll the
% initial parameters into a variable initial_nn_params

fprintf('\nStatus: Initializing Neural Network Weight Parameters\n');

initial_Theta1 = randomizeInputHidden(input_layer_size,hidden_layer_size);
initial_Theta2 = randInitializeWeights(hidden_layer_size,num_labels);

% Unroll parameters
initial_nn_params = [initial_Theta1(:); initial_Theta2(:)];

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ============================== Part 2: Train the Neural Network ================================
% We train the neural network using different optimization techniques that
% we have built. This part is still divided into different sections, in
% order to highlight different steps in neural network training.

fprintf('\nStatus: Training Neural Network\n');

%------------------------------------ 2.1 Training Parameters --------------------------------------
% This section puts all training parameters for all of the optimization
% techniques that we have.
trigger = 0;
% 1. Minimum Unconstrained Gradient Descent Parameters:
options_fmincg = optimset('MaxIter', 10000);
%options_fminunc = optimoptions('fminunc','Algorithm','trust-region','GradObj','on','Display','iter');

% 2. Stochastic Gradient Descent Parameters:
lambda = 0;
learning_rate = 1.2;
epochs = 20000;

% 3. Particle Swarm Optimization Parameters:
epsilon = 0.1;
nvars = 82;
swarmSize = 50;
maxIter = 100;
c_1 = 1.5; % Cognitive component - exploration parameter
c_2 = 0.9; % Social component - exploitation parameter
inertiaWeight = 0.7; % Inertia Weight, smaller values makes your particles easy to converge,
                     % Larger values makes your particles explore more. A
                     % recommended rule of thumb would be:
                     % inertiaWeight > 0.5 * (cogEff + socEff) -1
v_initial = 0.08;

% 4. Differential Evolution Parameters:
epsilon_de = 0.1;
nvars_de = 82;
population = 20;
genMax = 300;
mutationF = 0.19; % Mutation Parameter - exploration parameter [0->2]
recombinationC = 0.07; % Recombination parameter - exploitation parameter [0->1]

%---------------------------------------------------------------------------------------------------

%-------------------------------- 2.1 Neural Network Training --------------------------------------
% In this part, we create a handle for our cost function. Thus, we have a
% costFunction that is a function that only takes one argument (the neural
% network parameters). Now, in order to make a new optimization algorithm,
% just follow these guidelines:
%       1. Let this function have the costFunction as its input argument.
%       2. The output arguments must include both the final parameters
%          nn_params, and the cost.
%       3. Make sure you have a J_hist array to keep record of the costs
%          so that you can graph them later on.

costFunction = @(p) nnCostFunction(p, input_layer_size, hidden_layer_size, num_labels, X, y, lambda);

% ----------------------------------------------------------------
% Optimization Algorithms (just uncomment the one you will use):
% ----------------------------------------------------------------

% 1. Minimum Unconstrained Gradient Descent
%[nn_params, cost] = fmincg(costFunction, initial_nn_params, options_fmincg);

% 2. Particle Swarm Optimization
getBenchmark1 = load('85_fmincg_theta1.mat');
getBenchmark2 = load('85_fmincg_theta2.mat');
benchmarkParams = [getBenchmark1.Theta1(:); getBenchmark2.Theta2(:)];
%[nn_params, cost, J_hist, trigger] = particleSwarmOptimization(costFunction, benchmarkParams, epsilon, nvars, swarmSize, maxIter, c_1, c_2, inertiaWeight, v_initial);

% 3. Differential Evolution
 getBenchmark1 = load('85_fmincg_theta1.mat');
 getBenchmark2 = load('85_fmincg_theta2.mat');
 benchmarkParams = [getBenchmark1.Theta1(:); getBenchmark2.Theta2(:)];
[nn_params, cost, J_hist, trigger] = differentialEvolution(costFunction, benchmarkParams, epsilon_de, nvars_de, population, genMax, mutationF, recombinationC);
%[nn_params, cost, J_hist, trigger] = differentialEvolution(costFunction, initial_nn_params, epsilon_de, nvars_de, population, genMax, mutationF, recombinationC);


% Obtain Theta1 and Theta2 back from nn_params
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), num_labels, (hidden_layer_size + 1));

fprintf('Program paused. Press enter to continue.\n');
pause;


%% ======================================================== Part 3: Implement Predict ====================================================================
%  After training the neural network, we would like to use it to predict
%  the labels. The "predict" function is use in the
%  neural network to predict the labels of the training set. This lets
%  you compute the training set accuracy.

pred = predict(Theta1, Theta2, X);
accuracy = mean(double(pred == y)) * 100;
fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == y)) * 100);
fprintf('Program paused. Press enter to continue.\n');
pause;

%% ======================================================== Part 4: Graph cost vs. iterations ====================================================================
fprintf('\nStatus: Plotting Cost History\n');
if trigger == 1
    my_iter = int16(J_hist(:,1));
    plot(my_iter,J_hist(:,2),'DisplayName','Global Best');
    title('Cost History');
    xlabel('Iterations');
    ylabel('Cost, J(\theta)');
    axis square
    hold on
    plot(my_iter,J_hist(:,3),'DisplayName','Personal Best (Mean)');
    plot(my_iter,J_hist(:,4),'DisplayName','Current Position (Mean)');
    legend('show')
    delete(findall(gcf,'Tag','particleMovement'))

elseif trigger == 2
    my_iter = int16(J_hist(:,1));
    plot(my_iter,J_hist(:,2));
    title('Cost History');
    xlabel('Generations');
    ylabel('Cost, J(\theta)');
    axis square;
end
Parameter	Description
`maxIter`	Number of iterations that the PSO algorithm will run
`swarmSize`	Number of particles in the search space.
`epsilon`	Scattering degree of the particles during initialization
`c_1`	Cognitive component (exploration parameter).
`c_2`	Social component (exploitation parameter).
`inertiaWeight`	Inertia weight that controls the movement of particles.
Parameter	Description
`genMax`	Number of generations that the DE algorithm will run
`population`	Number of particles in the search space.
`epsilon_de`	Scattering degree of the particles during initialization
`mutationF`	Degree of mutation effect (exploration parameter)
`recombinationC`	Degree of recombination effect (exploitation parameter)
	function [nn_params, cost, J_hist, trigger] = differentialEvolution(costFunction, initial_nn_params, epsilon, nvars, populationSize, generation, f, cr)
	%DIFFERENTIALEVOLUTION Finds the optimal solution of an objective
	% function using the Differential Evolution algorithm.
	%
	% DE Steps:
	% 1. Initialization - initialize particles.
	% 2. Mutation - randomly select three vectors and compute for trial vector.
	% 3. Recombination - from a donor vector, do some exchange of genes using
	% the probability C.
	% 4. Selection - do tournament selection.



	%% =============================== Initialize Parameters ===================================
	% Particle Vector
	particleVec = [];
	for iter = 1:populationSize
	temp = epsilon.* randn(nvars,1) + initial_nn_params;
	particleVec(iter,:) = vertcat(temp);
	end

	% Cost Tracker
	J = [];

	% Cost History
	J_hist = [];

	for iter = 1:generation
	%% =============================== Mutation ================================================
	% For mutation, you choose three random vectors, and relate them in order
	% to build the donor vector (donorVec). This method is controlled by the
	% parameter f.

	randIndex = randperm(populationSize,3);
	donorVec = particleVec(randIndex(3),:) + f * (particleVec(randIndex(1),:) - particleVec(randIndex(2),:));

	%% ============================ Recombination ==============================================
	% For recombination, you build your trial vector (trialVec) using the
	% elements of the donor vector (donorVec) and the elements of your target vector.
	% This means that for each particle in the population (for each row of
	% particle) and for each dimension of that particle (for each column of
	% each row of particle), you have to do the comparison.

	trialVec = zeros(populationSize,nvars);
	randomNumVec = rand(populationSize,nvars);
	i_rand = randperm(nvars,1);
	for particleIndex = 1:size(particleVec,1)
	for dimensionIndex = 1:nvars
	if randomNumVec(particleIndex,dimensionIndex) <= cr \|\| dimensionIndex == i_rand
	trialVec(particleIndex,dimensionIndex) = donorVec(1,dimensionIndex);
	elseif randomNumVec(particleIndex,dimensionIndex) > cr && dimensionIndex ~= i_rand
	trialVec(particleIndex,dimensionIndex) = particleVec(particleIndex,dimensionIndex);
	end
	end
	end

	%% ============================ Selection ==============================================
	% For selection, the target vector (particle) is then compared against the
	% trial vector (trialVec) if costFunction(trialVec(particleIndex,:)) is less than the computed
	% costFunction(particleVec(particleIndex,:)), then you just replace that
	% particle with the one captured from trialVec. After all particles are
	% tested, we then have the second generation.

	for particleIndex = 1:size(particleVec,1)
	if costFunction(trialVec(particleIndex,:)') <= costFunction(particleVec(particleIndex,:)')
	particleVec(particleIndex,:) = trialVec(particleIndex,:);
	end
	end

	%% ============================ Get Fittest Gene for this Generation =========================================
	for particleIndex = 1:size(particleVec,1)
	J(particleIndex) = costFunction(particleVec(particleIndex,:)');
	end

	[minimaJ, minimaJIndex] = min(J);
	J_hist = [J_hist' [iter; minimaJ]]';


	Y = ['Generation ',num2str(iter),'\| Fittest Gene Cost: ', num2str(minimaJ)];
	disp(Y);

	end

	nn_params = particleVec(minimaJIndex,:)';
	cost = min(J);
	trigger = 2;

	end
	function [nn_params, cost] = gradientDescentNN(costFunction, initial_nn_params, learning_rate, num_iters)
	% [gradientDescentNN] Performs gradient descent to learn theta

	for iter = 1 : num_iters
	[computedCost, gradient] = costFunction(initial_nn_params);
	fprintf('\nIteration %i \| Cost: %j \n',iter,computedCost);
	disp(computedCost);
	initial_nn_params = initial_nn_params - learning_rate * gradient;
	end

	nn_params = initial_nn_params;
	cost = computedCost;

	end
	function [J, grad] = nnCostFunction(nn_params, ...
	input_layer_size, ...
	hidden_layer_size, ...
	num_labels, ...
	X, y, lambda)

	%NNCOSTFUNCTION Implements the neural network cost function for a two layer
	%neural network which performs classification
	%
	% Parameters:
	% nn_params - for first-use, put the initial neural network parameters you
	% have
	% input_layer_size - number of nodes in the input layer
	% hidden_layer_size - number of nodes in the hidden layer
	% num_labels - the number of categories you have
	% X - input variables
	% y - target variables
	% lambda - regularization parameter.

	Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
	hidden_layer_size, (input_layer_size + 1));

	Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
	num_labels, (hidden_layer_size + 1));


	m = size(X, 1);

	J = 0;
	Theta1_grad = zeros(size(Theta1));
	Theta2_grad = zeros(size(Theta2));



	%% ============================================== FEEDFORWARD IMPLEMENTATION =========================================

	a_1 = [ones(m, 1) X];
	z_2 = a_1 * Theta1';
	a_2 = sin(z_2);
	a_2 = [ones(size(a_2,1), 1) a_2];
	z_3 = a_2 * Theta2';
	a_3 = sigmoid(z_3);
	hypothesis = a_3;


	y_i = zeros(m,num_labels);
	for i = 1:m
	y_i(i,y(i)) = 1;
	end

	J = 1/m * sum(sum(-1 * y_i .* log(hypothesis)-(1-y_i) .* log(1-hypothesis)));
	% J = 1/m * sumsqr(hypothesis - y_i);
	%J = (1/m) * sum(sum(-1 * ((y_i + 1) / 2) .* log((hypothesis + 1) / 2) - (1 - (y_i / 2)) .* log(1 - (hypothesis + 1) / 2)));
	regularization_term = (lambda / (2 * m) * (sum(sumsqr(Theta1(:,2:input_layer_size+1))) + sum(sumsqr(Theta2(:,2:hidden_layer_size+1)))));
	J = J + regularization_term;

	%% ============================================== BACKPROPAGATION IMPLEMENTATION =========================================

	for t = 1:m
	a_1 = [1; X(t,:)']; % Attach bias vector (with ones) on a1
	z_2 = Theta1 * a_1; % Compute pre-activation z2
	a_2 = [1; sin(z_2)]; % Compute activation a2 (using sin), attach bias vector
	z_3 = Theta2 * a_2; % Compute pre-activation z3
	a_3 = sigmoid(z_3); % Compute activation a3 (using sigmoid)
	target_output = ([1:num_labels]==y(t))';

	delta_3 = a_3 - target_output; % Compute the difference between the activation result and the target output
	delta_2 = (Theta2' * delta_3) .* [1; cos(z_2)]; % Compute delta_2 , always get the g' of layer 2.
	delta_2 = delta_2(2:end);

	% Accumulate Delta
	Theta1_grad = Theta1_grad + delta_2 * a_1';
	Theta2_grad = Theta2_grad + delta_3 * a_2';
	end

	% Compute for the partial derivative terms
	Theta1_grad = (1/m) * Theta1_grad + (lambda/m) * [zeros(size(Theta1, 1), 1) Theta1(:,2:end)];
	Theta2_grad = (1/m) * Theta2_grad + (lambda/m) * [zeros(size(Theta2, 1), 1) Theta2(:,2:end)];

	grad = [Theta1_grad(:) ; Theta2_grad(:)];
	% ======================================================================================================================


	end
	function [nn_params, cost, J_hist, trigger] = particleSwarmOptimization(costFunction, initial_nn_params, epsilon, nvars, swarmSize, maxIter, c_1, c_2, inertia, v_initial)
	% PARTICLESWARMOPTIMIZATION Finds the optimal solution of an objective
	% function using the PSO optimization algorithm.
	%
	%
	% Parameters:
	% epsilon - defines the spread of the particles in the search area.
	% nvars - defines the number of parameters to be minimized in the objective
	% function
	% swarmSize - defines the number of particles found in the swarm.
	% maxIter - number of iterations for the algorithm to run.
	% c_1 - cognitive coefficient (exploration parameter)
	% c_2 - social coefficient (exploitation parameter)
	% inertiaWeight - defines the inertia for the particle movement.
	% v_initial - sets the initial nudge of the particles.

	%% =================== Initialize Different Matrices =======================
	% currentPos
	currentPos = [];
	for iter = 1:swarmSize
	tempSwarm = epsilon.* randn(82,1) + initial_nn_params;
	currentPos(iter,:) = vertcat(tempSwarm);
	end

	% pbestPos
	pbestPos = currentPos;

	% gbestPos
	for particle = 1:swarmSize
	temp = currentPos(particle,:)';
	[J] = costFunction(temp);
	jMat(particle,:) = vertcat(J);
	end
	[~, minIndex] = min(jMat);
	gbestPos = currentPos(minIndex,:);


	% velocityMatrix
	velocityMatrix = v_initial * ones(swarmSize,nvars);

	% J_hist and other graphing vectors
	J_hist = [];
	pBestJVec = [];
	currentPosJVec = [];


	%% ========================================= Actual PSO ====================================================
	filename = 'pso_finalrun_1.gif';
	for iter = 1:maxIter
	for particle = 1:swarmSize
	% Set the personal best option
	currentPosJVec = [currentPosJVec costFunction(currentPos(particle,:))];
	if costFunction(currentPos(particle,:)') < costFunction(pbestPos(particle,:)')
	pbestPos(particle,:) = currentPos(particle,:);
	end
	pBestJVec = [pBestJVec costFunction(pbestPos(particle,:))];
	% Set the global best option
	if costFunction(currentPos(particle,:)') < costFunction(gbestPos')
	gbestPos = currentPos(particle,:);
	end
	end

	%-------------------------------------------------------------------------------------------------------------------------------------
	p = 1;
	myCurrent = costFunction(currentPos(p,:)');
	myBest = costFunction(pbestPos(p,:)');
	ourBest = costFunction(gbestPos');
	X = ['Iteration ',num2str(iter),'\| currentPos: ', num2str(myCurrent), ' \| pbestPos: ', num2str(myBest), ' \| gbestPos: ', num2str(ourBest)];
	disp(X);



	scatter(currentPos(:,1),currentPos(:,2), 'MarkerEdgeColor','k','MarkerFaceColor',[1 1 0]);
	xlim([7.0 7.4]);
	ylim([6.1 6.5]);
	xlabel('\theta_{11}');ylabel('\theta_{12}');
	axis square;
	title(['Plot of Particle Movement']);
	dim = [.55 .30 .6 .6];
	str = {'Parameters',strcat('Swarm size: ', num2str(swarmSize)), strcat('Particle spread: ', num2str(epsilon)), strcat('Inertia: ', num2str(inertia)), strcat('c: ', num2str(c_1)), strcat('s: ',num2str(c_2))};
	annotation('textbox',dim,'String',str,'FitBoxToText','on','Tag' , 'particleMovement');
	pause(.001) %// pause 0.1 seconds to slow things down
	drawnow
	frame = getframe(1);
	im{iter} = frame2im(frame);
	[A,map] = rgb2ind(im{iter},256);
	if iter == 1
	imwrite(A,map,filename,'gif','LoopCount',Inf,'DelayTime',0.50);
	else
	imwrite(A,map,filename,'gif','WriteMode','append','DelayTime',0.50);
	end

	%-------------------------------------------------------------------------------------------------------------------------------------
	for particle = 1:swarmSize
	% Update velocity
	cognitive_component = c_1 * rand(1,nvars) .* (pbestPos(particle,:) - currentPos(particle,:));
	social_component = c_2 * rand(1,nvars) .* (gbestPos - currentPos(particle,:));
	velocityMatrix(particle,:) = inertia * velocityMatrix(particle,:) + cognitive_component + social_component;
	% Update position
	currentPos(particle,:) = currentPos(particle,:) + velocityMatrix(particle,:);
	end

	J_hist = [J_hist' [iter; costFunction(gbestPos'); mean(pBestJVec);mean(currentPosJVec)]]';
	end

	nn_params = gbestPos';
	cost = costFunction(gbestPos');
	trigger = 1;


	end
	function p = predict(Theta1, Theta2, X)
	%PREDICT Predict the label of an input given a trained neural network
	% p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
	% trained weights of a neural network (Theta1, Theta2)


	m = size(X, 1);
	num_labels = size(Theta2, 1);

	p = zeros(size(X, 1), 1);

	h1 = sin([ones(m, 1) X] * Theta1');
	h2 = sigmoid([ones(m, 1) h1] * Theta2');
	[dummy, p] = max(h2, [], 2);

	% =========================================================================

	end
	function W = randInitializeWeights( L_in, L_out )
	%RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in
	%incoming connections and L_out outgoing connections

	%W = zeros(L_out, 1 + L_in);
	epsilon = 0.12;
	W = rand(L_out, 1 + L_in) * 2 * epsilon - epsilon;

	end
	function W = randomizeInputHidden( L_in, L_out )
	%RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in
	%incoming connections and L_out outgoing connections

	% W = ones(L_out, 1 + L_in);
	epsilon = 0.12;
	W = 7 + rand(L_out, 1 + L_in) * 2 * epsilon - epsilon;

	end
	function g = sigmoidGradient(z)
	g = zeros(size(z));
	g = sigmoid(z) .* (1 - sigmoid(z));

	end