koaning/gist:331610c504fac3be01c8

## gistfile1.txt
# Neural PCA/Compression in R

In this document I will show a small experiment that shows how a neural network can be used as a pca-algorithm. It should also help explain how neural networks work and why neural networks can be such a powerful machine learning algorithm.

The code will be written in R and needs the following dependencies.

### Packages Used

```{r packages}
library(RSNNS)
library(ggplot2)
library(devtools)
library(reshape)
library(parallel)
source_url('https://gist.githubusercontent.com/fawda123/7471137/raw/466c1474d0a505ff044412703516c34f1a4684a5/nnet_plot_update.r')
```

### Generating Bits

I will start out by generation random bits as data in a data frame.

```{r init, fig.width=7, fig.height=6}
set.seed(1)
num.vars = 6
num.obs = 500
df = data.frame(replicate(num.vars,sample(0:1,num.obs,rep=TRUE)))
head(df)
```

This bits will then be used as input **and** output for a neural network. The goal is to see if a neural network can be trained to fit the data back into its original format while only using a limited amount of nodes in the hidden layer. Such a neural network might look like so:

```{r neuralplot}
mod = mlp(df, df, size=c(4))
plot.nnet(mod)
```

We can check the performace of such a neural network.

```{r acc_cat}
bit.acc = sum( round(mod$fitted.values) == df ) / sum( df == df )
row.acc = sum(rowSums(round(mod$fitted.values) == df) == 6)/nrow(df)
cat(bit.acc, row.acc)
```


Notice that in this case we have about 91% of the bits estimated correctly and about 58% of the rows estimated correctly. That means that we are able to maintain almost 60% of the variance in this dataset by only using 4 hidden nodes (out of 6 total data sources). This is interesting because right now we are using a neural network to simulate compression and it gives us an alternative to eigenvalue based principle component analysis.

### Repeating for different network sizes

To further investigate this effect we would need to set up a proper simulation run. Neural network algorithms usually have random initialization values, so I will simulate multiple neural networks per hidden node size. Below I have created the simulation function that I will use throughout the document. It uses the ```parallel``` packages to run neural network estimations in parallel.

```{r def_sim}
simulation = function(df, num.sims){
  results = data.frame(vars = as.numeric(c()), acc = as.numeric(c()), type=as.factor(c()))
  num.vars = ncol(df)
  for(i in 1:num.vars){
    for(j in 1:num.sims){
	    # cat("hidden nodes :", i ,"sample #", j, "\n")
	    mod = mlp(df, df, size=c(i))
	    bit.acc = sum( round(mod$fitted.values) == df ) / sum( df == df )
	    row.acc = sum(rowSums(round(mod$fitted.values) == df) == num.vars)/nrow(df)
	    results = rbind(results, data.frame(vars = i, acc = bit.acc, type= "bit"))
	    results = rbind(results, data.frame(vars = i, acc = row.acc, type= "row"))
	  }
	}
  results
}
```

I will also need a function to plot my results.

```{r def_plot}
plotsim = function(results, title){
  agg = aggregate(data=results, acc ~ vars + type , FUN = mean)
  p = ggplot() + geom_point(data=results, aes(vars, acc), alpha=0.6)
  p = p + geom_line(data=agg, aes(vars, acc), alpha=0.3, size=2)
  p = p + facet_grid( . ~ type )
  p + ggtitle(title)
}
```

### Simulation 1

```{r sim1}
# simulation 1, random values
num.vars = 8
num.obs = 200
df = data.frame(replicate(num.vars,sample(0:1,num.obs,rep=TRUE)))
sims = simulation(df, 5)
plotsim(sims, 'simulation 1')
```

We see the effect that we would expect. It is much easier to predict a correct bit, theres about 50% of guessing a bit correctly so it doesn't come off as suprising that the bit performance is always above 50% accuracy. Estimating the entire row prooves to be much more difficult. We need six out of eight nodes to achieve an accuracy above 50%.

### Simulation 2

We will now simulate a much larger neural network.

```{r sim2}
# simulation 2, random values very large dataset
num.vars = 20
num.obs = 200
df = data.frame(replicate(num.vars,sample(0:1,num.obs,rep=TRUE)))
sims = simulation(df, 5)
plotsim(sims, 'simulation 2')
```

We see a very similar effect.

### Simulation 3

We will now simulate the data slightly differently. We will cause some correlation between columns. Notice

```{r sim3}
# simulation 3, clustered random values
num.vars = 4
num.obs = 200
df = data.frame(replicate(num.vars,sample(0:1,num.obs,rep=TRUE)))
df = cbind(df, sapply(df$X1, function(x) if(runif(1)<0.8) x else sample(0:1)[1]))
df = cbind(df, sapply(df$X2, function(x) if(runif(1)<0.8) x else sample(0:1)[1]))
df = cbind(df, sapply(df$X3, function(x) if(runif(1)<0.8) x else sample(0:1)[1]))
df = cbind(df, sapply(df$X4, function(x) if(runif(1)<0.8) x else sample(0:1)[1]))
sims = simulation(df, 5)
plotsim(sims, 'simulation 3 : correlation between columns')
```

### Simulation 4

Same as original but now make the random process not 50/50 but 10/90.

```{r sim4}
# simulation 4, clustered random values very large dataset
num.vars = 20
num.obs = 200
df = data.frame(replicate(num.vars,sample(c(0,1,1,1),num.obs,rep=TRUE)))
sims = simulation(df, 5)
plotsim(sims, 'simulation 4 : biased input 90/10')
```

## Conclusion

I do somehow feel like only a very niche of humanity cares about this sort of thing. Nonetheless I think I have been able to produce yet another example showing that neural networks are really really cool simply because they you can apply and approach the phenomenon from many different angles.

Feel free to comment/edit.
	# Neural PCA/Compression in R

	In this document I will show a small experiment that shows how a neural network can be used as a pca-algorithm. It should also help explain how neural networks work and why neural networks can be such a powerful machine learning algorithm.

	The code will be written in R and needs the following dependencies.

	### Packages Used

	```{r packages}
	library(RSNNS)
	library(ggplot2)
	library(devtools)
	library(reshape)
	library(parallel)
	source_url('https://gist.githubusercontent.com/fawda123/7471137/raw/466c1474d0a505ff044412703516c34f1a4684a5/nnet_plot_update.r')
	```

	### Generating Bits

	I will start out by generation random bits as data in a data frame.

	```{r init, fig.width=7, fig.height=6}
	set.seed(1)
	num.vars = 6
	num.obs = 500
	df = data.frame(replicate(num.vars,sample(0:1,num.obs,rep=TRUE)))
	head(df)
	```

	This bits will then be used as input and output for a neural network. The goal is to see if a neural network can be trained to fit the data back into its original format while only using a limited amount of nodes in the hidden layer. Such a neural network might look like so:

	```{r neuralplot}
	mod = mlp(df, df, size=c(4))
	plot.nnet(mod)
	```

	We can check the performace of such a neural network.

	```{r acc_cat}
	bit.acc = sum( round(mod$fitted.values) == df ) / sum( df == df )
	row.acc = sum(rowSums(round(mod$fitted.values) == df) == 6)/nrow(df)
	cat(bit.acc, row.acc)
	```


	Notice that in this case we have about 91% of the bits estimated correctly and about 58% of the rows estimated correctly. That means that we are able to maintain almost 60% of the variance in this dataset by only using 4 hidden nodes (out of 6 total data sources). This is interesting because right now we are using a neural network to simulate compression and it gives us an alternative to eigenvalue based principle component analysis.

	### Repeating for different network sizes

	To further investigate this effect we would need to set up a proper simulation run. Neural network algorithms usually have random initialization values, so I will simulate multiple neural networks per hidden node size. Below I have created the simulation function that I will use throughout the document. It uses the ```parallel``` packages to run neural network estimations in parallel.

	```{r def_sim}
	simulation = function(df, num.sims){
	results = data.frame(vars = as.numeric(c()), acc = as.numeric(c()), type=as.factor(c()))
	num.vars = ncol(df)
	for(i in 1:num.vars){
	for(j in 1:num.sims){
	# cat("hidden nodes :", i ,"sample #", j, "\n")
	mod = mlp(df, df, size=c(i))
	bit.acc = sum( round(mod$fitted.values) == df ) / sum( df == df )
	row.acc = sum(rowSums(round(mod$fitted.values) == df) == num.vars)/nrow(df)
	results = rbind(results, data.frame(vars = i, acc = bit.acc, type= "bit"))
	results = rbind(results, data.frame(vars = i, acc = row.acc, type= "row"))
	}
	}
	results
	}
	```

	I will also need a function to plot my results.

	```{r def_plot}
	plotsim = function(results, title){
	agg = aggregate(data=results, acc ~ vars + type , FUN = mean)
	p = ggplot() + geom_point(data=results, aes(vars, acc), alpha=0.6)
	p = p + geom_line(data=agg, aes(vars, acc), alpha=0.3, size=2)
	p = p + facet_grid( . ~ type )
	p + ggtitle(title)
	}
	```

	### Simulation 1

	```{r sim1}
	# simulation 1, random values
	num.vars = 8
	num.obs = 200
	df = data.frame(replicate(num.vars,sample(0:1,num.obs,rep=TRUE)))
	sims = simulation(df, 5)
	plotsim(sims, 'simulation 1')
	```

	We see the effect that we would expect. It is much easier to predict a correct bit, theres about 50% of guessing a bit correctly so it doesn't come off as suprising that the bit performance is always above 50% accuracy. Estimating the entire row prooves to be much more difficult. We need six out of eight nodes to achieve an accuracy above 50%.

	### Simulation 2

	We will now simulate a much larger neural network.

	```{r sim2}
	# simulation 2, random values very large dataset
	num.vars = 20
	num.obs = 200
	df = data.frame(replicate(num.vars,sample(0:1,num.obs,rep=TRUE)))
	sims = simulation(df, 5)
	plotsim(sims, 'simulation 2')
	```

	We see a very similar effect.

	### Simulation 3

	We will now simulate the data slightly differently. We will cause some correlation between columns. Notice

	```{r sim3}
	# simulation 3, clustered random values
	num.vars = 4
	num.obs = 200
	df = data.frame(replicate(num.vars,sample(0:1,num.obs,rep=TRUE)))
	df = cbind(df, sapply(df$X1, function(x) if(runif(1)<0.8) x else sample(0:1)[1]))
	df = cbind(df, sapply(df$X2, function(x) if(runif(1)<0.8) x else sample(0:1)[1]))
	df = cbind(df, sapply(df$X3, function(x) if(runif(1)<0.8) x else sample(0:1)[1]))
	df = cbind(df, sapply(df$X4, function(x) if(runif(1)<0.8) x else sample(0:1)[1]))
	sims = simulation(df, 5)
	plotsim(sims, 'simulation 3 : correlation between columns')
	```

	### Simulation 4

	Same as original but now make the random process not 50/50 but 10/90.

	```{r sim4}
	# simulation 4, clustered random values very large dataset
	num.vars = 20
	num.obs = 200
	df = data.frame(replicate(num.vars,sample(c(0,1,1,1),num.obs,rep=TRUE)))
	sims = simulation(df, 5)
	plotsim(sims, 'simulation 4 : biased input 90/10')
	```

	## Conclusion

	I do somehow feel like only a very niche of humanity cares about this sort of thing. Nonetheless I think I have been able to produce yet another example showing that neural networks are really really cool simply because they you can apply and approach the phenomenon from many different angles.

	Feel free to comment/edit.