Skip to content

Instantly share code, notes, and snippets.

@Jeraldy
Last active July 25, 2019 14:58
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save Jeraldy/1aa6ae6fefa46b7a9cc02b6573cfeefe to your computer and use it in GitHub Desktop.
Save Jeraldy/1aa6ae6fefa46b7a9cc02b6573cfeefe to your computer and use it in GitHub Desktop.
Implementing an Artificial Neural Network in Pure Java (No external dependencies)
/**
*
* @author Deus Jeraldy
* @Email: deusjeraldy@gmail.com
* BSD License
*/
// np.java -> https://gist.github.com/Jeraldy/7d4262db0536d27906b1e397662512bc
import java.util.Arrays;
public class NN {
public static void main(String[] args) {
double[][] X = {{0, 0}, {0, 1}, {1, 0}, {1, 1}};
double[][] Y = {{0}, {1}, {1}, {0}};
int m = 4;
int nodes = 400;
X = np.T(X);
Y = np.T(Y);
double[][] W1 = np.random(nodes, 2);
double[][] b1 = new double[nodes][m];
double[][] W2 = np.random(1, nodes);
double[][] b2 = new double[1][m];
for (int i = 0; i < 4000; i++) {
// Foward Prop
// LAYER 1
double[][] Z1 = np.add(np.dot(W1, X), b1);
double[][] A1 = np.sigmoid(Z1);
//LAYER 2
double[][] Z2 = np.add(np.dot(W2, A1), b2);
double[][] A2 = np.sigmoid(Z2);
double cost = np.cross_entropy(m, Y, A2);
//costs.getData().add(new XYChart.Data(i, cost));
// Back Prop
//LAYER 2
double[][] dZ2 = np.subtract(A2, Y);
double[][] dW2 = np.divide(np.dot(dZ2, np.T(A1)), m);
double[][] db2 = np.divide(dZ2, m);
//LAYER 1
double[][] dZ1 = np.multiply(np.dot(np.T(W2), dZ2), np.subtract(1.0, np.power(A1, 2)));
double[][] dW1 = np.divide(np.dot(dZ1, np.T(X)), m);
double[][] db1 = np.divide(dZ1, m);
// G.D
W1 = np.subtract(W1, np.multiply(0.01, dW1));
b1 = np.subtract(b1, np.multiply(0.01, db1));
W2 = np.subtract(W2, np.multiply(0.01, dW2));
b2 = np.subtract(b2, np.multiply(0.01, db2));
if (i % 400 == 0) {
print("==============");
print("Cost = " + cost);
print("Predictions = " + Arrays.deepToString(A2));
}
}
}
}
@spmasterman
Copy link

Hi - Saw your article on medium - thanks.

You didn't specify any dependency for

costs.getData().add(new XYChart.Data(i, cost));

Also "print" should be np.print I think?

@Jeraldy
Copy link
Author

Jeraldy commented Aug 19, 2018

Thanks for spotting that.. I Used it to plot the losses on the chart.. you can just comment it.
For the case of np.print I imported it as "import static np.print;" so I can just call print("val")

@kobezorro
Copy link

I think
double[][] dZ1 = np.multiply(np.dot(np.T(W2), dZ2), np.subtract(1.0, np.power(A1, 2)));
might should be
double[][] dZ1 = np.multiply(np.dot(np.T(W2), dZ2), np.subtract(A1, np.power(A1, 2)));
theoretically?

@gmanjon
Copy link

gmanjon commented Dec 23, 2018

Great article! Thanks!

It would be very useful to have meaningful names for variables instead of X, Y, m... it would make it much more readable for begginers like me. I will try to suggest with a Pull Request! (I hope in the near future)

Thanks

@IsraelCgutierrez
Copy link

I also get dZ1 = dZ2 W2 (1-A1)A1
I did the derivates many times

@h-sadegh
Copy link

hi
it help me
but how to test this network ?

@astrojr1
Copy link

astrojr1 commented Apr 3, 2019

Hi, when I try to test the NN, it misbehaves. I am preserving W1, b1, W2, and b2 as the NN weights and backprop. The As and Zs are recalculated. Here is the output with your original XOR training data and the test result I added in main().
I don't understand why the single test behaves differently.

Single pair of X1 X2 testing:

Expected Results
test X1 X2 Prediction
0 1 >.9
1 1 <.1
1 0 >.9
0 0 <.1

Actual Results
test X1 X2 Prediction Analysis
0 1 .01 Wrong
1 1 .02 Right
1 0 .02 Wrong
0 0 .02 Right

Output follows from the test for 0,1

run:

Cost = NaN
Predictions = [[1.0, 1.0, 1.0, 1.0]]

Cost = 0.2969025436010251
Predictions = [[0.29144265296508315, 0.8154787569733192, 0.6852817198123105, 0.22985828140028192]]

Cost = 0.16802804016304362
Predictions = [[0.1538408192545183, 0.8621884062632224, 0.8315669091608777, 0.15830654731289984]]

Cost = 0.11413292681227041
Predictions = [[0.100817252589133, 0.8976364304662805, 0.8876819432980427, 0.11585198272332121]]

Cost = 0.08493924982058358
Predictions = [[0.07375924489343984, 0.9207673719914042, 0.9167687321976236, 0.08943320369408177]]

Cost = 0.06689411340161885
Predictions = [[0.05758306011535834, 0.9362877197000213, 0.9344502020033382, 0.07192206109098881]]

Cost = 0.05476093241324496
Predictions = [[0.0469131616725125, 0.9471897271303298, 0.9462716036200162, 0.05965924778335137]]

Cost = 0.046106028390985966
Predictions = [[0.03938979283558176, 0.9551735726479308, 0.9546934657492366, 0.050680984981294905]]

Cost = 0.03965499406430109
Predictions = [[0.03382311552031198, 0.961227563955176, 0.960974113482648, 0.04386840008668915]]

Cost = 0.03468063619422532
Predictions = [[0.029551254900865218, 0.9659518219563465, 0.9658230501302526, 0.03854728631364289]]

test input=[[0.0], [1.0]]

Cost = 0.008072757222331885
Test Prediction = [[0.03177524030447454]]
BUILD SUCCESSFUL (total time: 1 second)

=======================================================================
Source Code

/*

  • To change this license header, choose License Headers in Project Properties.
  • To change this template file, choose Tools | Templates
  • and open the template in the editor.
    */
    package nn3;

import java.util.Arrays;

public class NN3 {

/**
*

// np.java -> https://gist.github.com/Jeraldy/7d4262db0536d27906b1e397662512bc

public static void main(String[] args) {

    double[][] W1;
    double[][] b1;

    double[][] W2;
    double[][] b2;

    double[][] X = {{0, 0}, {0, 1}, {1, 0}, {1, 1}};
    double[][] Y = {{0}, {1}, {1}, {0}};

    int m = 4;
    int nodes = 400;

    X = np.T(X);
    Y = np.T(Y);

    W1 = np.random(nodes, 2);
    b1 = new double[nodes][m];

    W2 = np.random(1, nodes);
    b2 = new double[1][m];

    for (int i = 0; i < 4000; i++) {
        // Foward Prop
        // LAYER 1
        double[][] Z1 = np.add(np.dot(W1, X), b1);
        double[][] A1 = np.sigmoid(Z1);

        //LAYER 2
        double[][] Z2 = np.add(np.dot(W2, A1), b2);
        double[][] A2 = np.sigmoid(Z2);

        double cost = np.cross_entropy(m, Y, A2);
        //costs.getData().add(new XYChart.Data(i, cost));
     
        // Back Prop
        //LAYER 2
        double[][] dZ2 = np.subtract(A2, Y);
        double[][] dW2 = np.divide(np.dot(dZ2, np.T(A1)), m);
        double[][] db2 = np.divide(dZ2, m);

        //LAYER 1
        double[][] dZ1 = np.multiply(np.dot(np.T(W2), dZ2), np.subtract(A1, np.power(A1, 2)));

// double[][] dZ1 = np.multiply(np.dot(np.T(W2), dZ2), np.subtract(1.0, np.power(A1, 2)));
double[][] dW1 = np.divide(np.dot(dZ1, np.T(X)), m);
double[][] db1 = np.divide(dZ1, m);

        // G.D
        W1 = np.subtract(W1, np.multiply(0.01, dW1));
        b1 = np.subtract(b1, np.multiply(0.01, db1));

        W2 = np.subtract(W2, np.multiply(0.01, dW2));
        b2 = np.subtract(b2, np.multiply(0.01, db2));

        if (i % 400 == 0) {
            System.out.println("==============");
            System.out.println("Cost = " + cost);
            System.out.println("Predictions = " + Arrays.deepToString(A2));
        }

 } // end of training

        // now to test
        // X is the new input

        double[][] tX = {{0,1}};
        tX = np.T(tX);
        System.out.println("\r\n");
        System.out.println("test input="+Arrays.deepToString(tX));

// Forward Prop
// LAYER 1

        double[][] tZ1 = np.add(np.dot(W1, tX), b1);
        double[][] tA1 = np.sigmoid(tZ1);

//LAYER 2
double[][] tZ2 = np.add(np.dot(W2, tA1), b2);
double[][] tA2 = np.sigmoid(tZ2); // Prediction (Get Output here)

        double cost = np.cross_entropy(m, Y, tA2);
        //costs.getData().add(new XYChart.Data(i, cost));

        System.out.println("==============");
        System.out.println("Cost = " + cost);
        System.out.println("Test Prediction = " + Arrays.deepToString(tA2));

}

}

@astrojr1
Copy link

astrojr1 commented Apr 3, 2019

However, what is happening, is that the NN hasn't learned what I thought it was going to learn. When I try four inputs, no matter what I give it, it outputs the learned pattern:

test input=[[1.0, 0.0, 1.0, 1.0], [0.0, 1.0, 1.0, 1.0]]

Cost = 0.02185745791739911
Test Prediction = [[0.003408446482782547, 0.9872463542830693, 0.9452257141301419, 0.014738669159170737]]

test input=[[1.0, 1.0, 0.0, 0.0], [0.0, 1.0, 0.0, 1.0]]

Cost = 0.03372416671312117
Test Prediction = [[0.002095908947609844, 0.9235729681325365, 0.998439571534914, 0.05041614843514693]]

So I think the NN has "learned" to output the static pattern from the training data rather than perform an XOR operation on a single pair.

When I test with 3 pairs, it outputs the first three static results of the training data:
test input=[[0.0, 0.0, 0.0], [1.0, 0.0, 0.0]]

Cost = 0.0018916002564269888
Test Prediction = [[0.00336438364815533, 0.9975507610865536, 0.9982574181545878]]
I can make each prediction wrong by inverting the test data from the training data.

It doesn't seem like the X inputs are making a difference, only the Y.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment