An Artificial Neural Network (ANN) with a Multi-Layer Structure can be seen as a Cascade of Linear Combinations of Neurons Transfer Functions
In case of a Binary Classification Task the implicit goal for the ANN is to learn a Function which separates the ANN Input Space in 2 regions: one for each possible label. Let's call this function the Input Space Separation Function.
What if the Neuron Transfer Function would be linear ?
In that case the ANN would only be able to learn a Linear Input Space Separation Function hence
- it will just be able to solve Binary Classification Tasks which are Linearly Separable, like in Fig.1
- it will never be able to solve inherently Nonlinear Classification tasks like in case of the XOR Classification Problem, like in Fig.2
Introducing Nonlinearities in the Neurons Transfer Functions allows the ANN to learn a much larger set of functions in the Input Space but it leads to other kind of problems like the issue related to perform Gradient based Training of Saturating Nonlinearities (e.g. Sigmoid like Transfer Functions)
Fig1. - Binary Classification solvable with Linear Separation Fig2. - The XOR Problem