jpjacobs/nn.ijs

## nn.ijs
NB. Neural Network Demo
NB. (roughly based on chapters 1-3 of http://neuralnetworksanddeeplearning.com/ by Michael Nielsen)
NB. The main differences with the book are:
NB. - the removal of the minibatch function and loops in favor of matrix functions
NB. - joining bias and weight calculations.

NB. *** Preparations ***
NB. install addons needed from github for reading CSV files
{{install^:(-.fexist'~addons/tables/',y)'github:jsoftware/tables_',y}}&>;:'dsv csv'
NB. require the needed addons
require 'tables/csv plot viewmat stats/base'
NB. Onehot and back (inverse)
onehot_z_   =: (0 1 (#"1)~ 1,.~ ]) :. (i.&1"1)
NB. split train/test data: x : number of random samples to take as test;
splitdat =: [ |.@split ({~ ?~@#)@]

NB. *** Data download & preprocessing ***
httpget^:(-.fexist'~temp/spirals.csv')'https://raw.githubusercontent.com/reisanar/datasets/master/spiral.csv'
num  =: (/: {:"1) ".&> }. readcsv '~temp/spiral.csv' NB. read data; discard column headers
data =: <: +: (%"1 >./)@(-"1 <./) }:"1 num NB. normalised: _1 < data < 1
labs =: onehot <: {:"1 num                 NB. labels one-hot encoded
jit  =:  15%~normalrand ($data), 3         NB. data jitter to be added 3x
datext=: ,/ 0 2 1|: data + jit             NB. new dataset with 3x jittered data
labext=: 3 # labs                          NB. corresponding labels

NB. *** Neural network class ***
cocurrent 'nn' NB. wrap in a J class.
destroy =: codestroy     NB. to discard a network, do destroy__obj ''
sig   =: %@:>:@:^@:-     NB. sigmoid activation; note @: vs. @ makes enormous difference
dsig  =: (* -.)@:sig     NB. d sig(y) / d y (note: scalar)
loss  =: (+/)@:-:@:*:@:- NB. Quadratic loss; assume features in rows.
dLoss =: -~              NB. gradient of loss
mpb =: +/ .* 1 , ]       NB. biased matrix product

NB. Create object; called by conew; could be used to re-initialise weights.
NB. takes y: array of layer sizes, from input (=#feature dims) to output (=#~.labels)
create =: {{
  sizes  =: y
  bw     =: ([: <@normalrand@|. 1 0&+)"1]   2 ]\ sizes NB. bias ,. weight arrays.
  repint =: 1000         NB. reporting every repint epochs in sgd training
  0 0 $ histloss =:2 0$0 NB. keeps training & test loss
}}
NB. Forward pass; assumes features in rows; returns labels with classes in rows
fwd =: {{for_p. bw do. y =. sig (>p) mpb y end.}}
pred =: fwd&.:|: NB. shortcut for operating on instances in rows

NB. Stochastic Gradient Descent.
NB. x takes #epochs, batch size and learning rate;
NB.  if batchsize = _, it performs back prop on the tentire dataset.
NB. y is training data or training data;testing data (for display only).
NB. Data arrays have instances in rows, and columns are: features,. labels.
NB. The numbers of label columns are derived from sizes__obj.
sgd =: {{
  'ep bs lr' =. x                       NB. epochs, batch size and learning rate
  'tr ts'  =. 2 {. boxopen y            NB. split training and test data (if present)
  tr=. ({~ ?~@#) tr                     NB. random shuffle training data
  'fttr lbtr' =. ({.sizes) split |: tr  NB. first {. sizes rows deemed to be labels
  'ftts lbts' =. ({.sizes) split |: ts  NB. the same for testing data
  losstr =. lossts =. ep$0              NB. for keeping training loss statistics
  for_e. i. ep do.                      NB. repeat for all episodes
    (-bs) lr&backprop\ tr               NB.   do BP per batch of max batch size
    NB. Do forward pass to compute and record loss for stats.
                losstr =. losstr e}~ lltr=. (+/%#) lbtr loss fwd fttr
    if. #ts do. NB. the same for testing data if present.
      lossts =. lossts e}~ llts=. (+/%#) lbts loss fwd ftts end.
    if. 0=repint|e do.                  NB. echo state at each repint interval
        if. #ts do. echo 'Epoch complete: ',(":e),' losses: ',(": lltr,llts)
        else.       echo 'Epoch complete: ',(":e),' loss: '  ,(": lltr) end.
    end.
  end.
  0 0 $ histloss=: histloss,.losstr,:lossts NB. append losses
}}

NB. Backpropagation to update biases & weights
NB. x: learning rate
NB. y: data ,. labels (rows = instances, # columns for labels derived from sizes.
backprop =: {{
  'act lab' =. ({.sizes) split |: y  NB. y: inst x ft -> ft x inst
  NB. ---Foward pass---
  zs =. 0 $ as =. <act NB. keep z and activations; first activation is input data.
  delta =. ($bw) $ a:  NB. very similar to fwd above, but will keep zs
  for_p. bw do.        NB. and as for each layer for backward pass.
    act =. sig z =. (>p) mpb act NB. activations and z's for layer
    as  =. as,<act     NB. store them for reuse in backward pass
    zs  =. zs,<z
  end.
  NB. ---Backward pass---
  last=.1                 NB. for triggering special treatment of output layer as first factor
  for_L. |.i.<:#sizes do. NB. for all layers, going backwards (|.)
    if. last do.          NB. first factor f1 in err depends on whether the layer is last or not.
      last=.0 [ f1 =. lab dLoss _1 {:: as NB. last, so get loss w.r.t. last activations
    else. f1=. (|: }."1 > bw {~ L+1) +/ .* err end. NB. else, update from previous err(L+1)
    err   =. f1 * dsig L {:: zs                     NB. error, i.e. small delta in the URL above
    delta =. delta L}~ < err +/ .* |: 1,L {:: as    NB. get update multiplying error with activation, 1 for bias
  end.
  bs =. #{.y                      NB. batch size for normalization
  bw =: bw (- bs %~ x&*)&.> delta NB. update bw__obj for each layer
}}
nn_z_ =: conew&'nn' NB. shortcut function for creation.

NB. Report results on data, taken as y=boxed training/test data split
report =: {{
  echo 'NN training report'
  'dtr ltr'=. ({.sizes) split|: 0{:: y NB. split training in data and labels
  'dts lts'=. ({.sizes) split|: 1{:: y NB. Same for testing data
  NB. data plots
  pd 'reset'
  pd 'sub 1 2'
  pd 'new'
  pd 'type marker; aspect 1; title #Training = ',":{:$ltr
  pd&> ;/@|:&.> dtr (</.~ #.)&|: ltr
  pd 'use'
  pd 'type marker; aspect 1; title #Testing = ',":{:$lts
  pd&> ;/@|:&.> dts (</.~ #.)&|: lts
  pd 'endsub'
  pd 'show'
  NB. Return accuracies on training and testing data and confusion matrix
  echo 'training accuracy (%)   : ',": 100*(+/%#) *./ ltr=prtr =. (="1 >./) fwd dtr
  echo 'testing  accuracy (%)   : ',": 100*(+/%#) *./ lts=prts =. (="1 >./) fwd dts
  echo 'confusion matrices training; testing' NB. convert labels into indices into matrix, add all possible label combo's, <:@#/. , reshape KxK
  echo conf =. (ltr;lts) ([: ($~ ,~@%:@#) [: <:@#/.~  (,~ ,/@:>@{@,~@:<@~.@,)@/:~@ ,.&(onehot inv@|:) ) each prtr;prts
  NB. Show training / testing loss evolution plot.
  'title Loss;xcaption epoch; key train test;keypos rti' plot histloss
  echo 'decision boundary' NB. shows RGB for classes, the brighter/more pure/crisper, the better, NOTE: works only for 3 classes, since it uses RGB
  viewrgb (,~@>: |.@|:@$  256 #. 255 <.@:* [: (%"1 >./)@:(-"1 <./) [: pred [: ,/@:>@{@,~@:<@i: 1+j.) 300
  NB. entropy of predicted labels: higher entropy = higher uncertainty = decision boundary
  'density; mesh 0;aspect 1' plot (,~@>: $ [: -@:(+/"1@:(*^.)) [: (%"1 >./)@:(-"1 <./) [: pred [: ,/@:>@{@,~@:<@i: 1+j.) 300
}}
cocurrent 'base'

NB. Fix funky errors preventing larger jviewmat pictures:
enclength_jzlib_=: 3 : 0
ix=. <: +/({:"1 lz_length)<:y
code=. 257 + ix
ex=. y-(<ix,1){lz_length
assert. (0<:ex)*.(300>:ex) NB. upped limit here.
code, 1000+ex
)

NB. Create neural network. First and last neuron count should be 2 and 3, respectively,
NB.   for matching data dimensionality and label count
net =: nn 2 15 8 3
5000 _ 0.1 sgd__net spl =: 300 splitdat datext ,. labext
report__net spl
	NB. Neural Network Demo
	NB. (roughly based on chapters 1-3 of http://neuralnetworksanddeeplearning.com/ by Michael Nielsen)
	NB. The main differences with the book are:
	NB. - the removal of the minibatch function and loops in favor of matrix functions
	NB. - joining bias and weight calculations.

	NB. * Preparations *
	NB. install addons needed from github for reading CSV files
	{{install^:(-.fexist'~addons/tables/',y)'github:jsoftware/tables_',y}}&>;:'dsv csv'
	NB. require the needed addons
	require 'tables/csv plot viewmat stats/base'
	NB. Onehot and back (inverse)
	onehot_z_ =: (0 1 (#"1)~ 1,.~ ]) :. (i.&1"1)
	NB. split train/test data: x : number of random samples to take as test;
	splitdat =: [ \|.@split ({~ ?~@#)@]

	NB. * Data download & preprocessing *
	httpget^:(-.fexist'~temp/spirals.csv')'https://raw.githubusercontent.com/reisanar/datasets/master/spiral.csv'
	num =: (/: {:"1) ".&> }. readcsv '~temp/spiral.csv' NB. read data; discard column headers
	data =: <: +: (%"1 >./)@(-"1 <./) }:"1 num NB. normalised: _1 < data < 1
	labs =: onehot <: {:"1 num NB. labels one-hot encoded
	jit =: 15%~normalrand ($data), 3 NB. data jitter to be added 3x
	datext=: ,/ 0 2 1\|: data + jit NB. new dataset with 3x jittered data
	labext=: 3 # labs NB. corresponding labels

	NB. * Neural network class *
	cocurrent 'nn' NB. wrap in a J class.
	destroy =: codestroy NB. to discard a network, do destroy__obj ''
	sig =: %@:>:@:^@:- NB. sigmoid activation; note @: vs. @ makes enormous difference
	dsig =: (* -.)@:sig NB. d sig(y) / d y (note: scalar)
	loss =: (+/)@:-:@:*:@:- NB. Quadratic loss; assume features in rows.
	dLoss =: -~ NB. gradient of loss
	mpb =: +/ .* 1 , ] NB. biased matrix product

	NB. Create object; called by conew; could be used to re-initialise weights.
	NB. takes y: array of layer sizes, from input (=#feature dims) to output (=#~.labels)
	create =: {{
	sizes =: y
	bw =: ([: <@normalrand@\|. 1 0&+)"1] 2 ]\ sizes NB. bias ,. weight arrays.
	repint =: 1000 NB. reporting every repint epochs in sgd training
	0 0 $ histloss =:2 0$0 NB. keeps training & test loss
	}}
	NB. Forward pass; assumes features in rows; returns labels with classes in rows
	fwd =: {{for_p. bw do. y =. sig (>p) mpb y end.}}
	pred =: fwd&.:\|: NB. shortcut for operating on instances in rows

	NB. Stochastic Gradient Descent.
	NB. x takes #epochs, batch size and learning rate;
	NB. if batchsize = _, it performs back prop on the tentire dataset.
	NB. y is training data or training data;testing data (for display only).
	NB. Data arrays have instances in rows, and columns are: features,. labels.
	NB. The numbers of label columns are derived from sizes__obj.
	sgd =: {{
	'ep bs lr' =. x NB. epochs, batch size and learning rate
	'tr ts' =. 2 {. boxopen y NB. split training and test data (if present)
	tr=. ({~ ?~@#) tr NB. random shuffle training data
	'fttr lbtr' =. ({.sizes) split \|: tr NB. first {. sizes rows deemed to be labels
	'ftts lbts' =. ({.sizes) split \|: ts NB. the same for testing data
	losstr =. lossts =. ep$0 NB. for keeping training loss statistics
	for_e. i. ep do. NB. repeat for all episodes
	(-bs) lr&backprop\ tr NB. do BP per batch of max batch size
	NB. Do forward pass to compute and record loss for stats.
	losstr =. losstr e}~ lltr=. (+/%#) lbtr loss fwd fttr
	if. #ts do. NB. the same for testing data if present.
	lossts =. lossts e}~ llts=. (+/%#) lbts loss fwd ftts end.
	if. 0=repint\|e do. NB. echo state at each repint interval
	if. #ts do. echo 'Epoch complete: ',(":e),' losses: ',(": lltr,llts)
	else. echo 'Epoch complete: ',(":e),' loss: ' ,(": lltr) end.
	end.
	end.
	0 0 $ histloss=: histloss,.losstr,:lossts NB. append losses
	}}

	NB. Backpropagation to update biases & weights
	NB. x: learning rate
	NB. y: data ,. labels (rows = instances, # columns for labels derived from sizes.
	backprop =: {{
	'act lab' =. ({.sizes) split \|: y NB. y: inst x ft -> ft x inst
	NB. ---Foward pass---
	zs =. 0 $ as =. <act NB. keep z and activations; first activation is input data.
	delta =. ($bw) $ a: NB. very similar to fwd above, but will keep zs
	for_p. bw do. NB. and as for each layer for backward pass.
	act =. sig z =. (>p) mpb act NB. activations and z's for layer
	as =. as,<act NB. store them for reuse in backward pass
	zs =. zs,<z
	end.
	NB. ---Backward pass---
	last=.1 NB. for triggering special treatment of output layer as first factor
	for_L. \|.i.<:#sizes do. NB. for all layers, going backwards (\|.)
	if. last do. NB. first factor f1 in err depends on whether the layer is last or not.
	last=.0 [ f1 =. lab dLoss _1 {:: as NB. last, so get loss w.r.t. last activations
	else. f1=. (\|: }."1 > bw {~ L+1) +/ .* err end. NB. else, update from previous err(L+1)
	err =. f1 * dsig L {:: zs NB. error, i.e. small delta in the URL above
	delta =. delta L}~ < err +/ .* \|: 1,L {:: as NB. get update multiplying error with activation, 1 for bias
	end.
	bs =. #{.y NB. batch size for normalization
	bw =: bw (- bs %~ x&*)&.> delta NB. update bw__obj for each layer
	}}
	nn_z_ =: conew&'nn' NB. shortcut function for creation.

	NB. Report results on data, taken as y=boxed training/test data split
	report =: {{
	echo 'NN training report'
	'dtr ltr'=. ({.sizes) split\|: 0{:: y NB. split training in data and labels
	'dts lts'=. ({.sizes) split\|: 1{:: y NB. Same for testing data
	NB. data plots
	pd 'reset'
	pd 'sub 1 2'
	pd 'new'
	pd 'type marker; aspect 1; title #Training = ',":{:$ltr
	pd&> ;/@\|:&.> dtr (</.~ #.)&\|: ltr
	pd 'use'
	pd 'type marker; aspect 1; title #Testing = ',":{:$lts
	pd&> ;/@\|:&.> dts (</.~ #.)&\|: lts
	pd 'endsub'
	pd 'show'
	NB. Return accuracies on training and testing data and confusion matrix
	echo 'training accuracy (%) : ',": 100(+/%#) ./ ltr=prtr =. (="1 >./) fwd dtr
	echo 'testing accuracy (%) : ',": 100(+/%#) ./ lts=prts =. (="1 >./) fwd dts
	echo 'confusion matrices training; testing' NB. convert labels into indices into matrix, add all possible label combo's, <:@#/. , reshape KxK
	echo conf =. (ltr;lts) ([: ($~ ,~@%:@#) [: <:@#/.~ (,~ ,/@:>@{@,~@:<@~.@,)@/:~@ ,.&(onehot inv@\|:) ) each prtr;prts
	NB. Show training / testing loss evolution plot.
	'title Loss;xcaption epoch; key train test;keypos rti' plot histloss
	echo 'decision boundary' NB. shows RGB for classes, the brighter/more pure/crisper, the better, NOTE: works only for 3 classes, since it uses RGB
	viewrgb (,~@>: \|.@\|:@$ 256 #. 255 <.@:* [: (%"1 >./)@:(-"1 <./) [: pred [: ,/@:>@{@,~@:<@i: 1+j.) 300
	NB. entropy of predicted labels: higher entropy = higher uncertainty = decision boundary
	'density; mesh 0;aspect 1' plot (,~@>: $ [: -@:(+/"1@:(*^.)) [: (%"1 >./)@:(-"1 <./) [: pred [: ,/@:>@{@,~@:<@i: 1+j.) 300
	}}
	cocurrent 'base'

	NB. Fix funky errors preventing larger jviewmat pictures:
	enclength_jzlib_=: 3 : 0
	ix=. <: +/({:"1 lz_length)<:y
	code=. 257 + ix
	ex=. y-(<ix,1){lz_length
	assert. (0<:ex)*.(300>:ex) NB. upped limit here.
	code, 1000+ex
	)

	NB. Create neural network. First and last neuron count should be 2 and 3, respectively,
	NB. for matching data dimensionality and label count
	net =: nn 2 15 8 3
	5000 _ 0.1 sgd__net spl =: 300 splitdat datext ,. labext
	report__net spl