Fully Connected Neural Network

To overcome problems arising from the Simple Perceptron model we can join together multiple Perceptron units into a more complex network of interaction in which the output of a neuron feed-forward the input of the next one. This is the Multi-layers Perceptron (MLP) configuration and if the graph is fully connected, i.e each neuron is connected to all the others, we talk about fully connected neural networks (or dense neural network, DNN).

Given the Perceptron formulas, the extrapolation to the MLP architecture is straight-forward and given by

$y = \sigma\left(X \cdot W + W_0 \right)$

where we simply pass from the vector formulation to the matrix one. The updating rule consequentially becomes

$\delta W = \delta W + X^T \cdot \left( \frac{\partial f(y)}{\partial y} \cdot \delta^l \right) \quad\quad \delta W_0 = \sum_{i=0}^{m}\frac{\partial f(y)}{\partial y_i} \cdot \delta_i^l$

where also in this case we simply pass to the matrix formalism and we convert the discrete format to a continuous one, i.e with continuous values we convert the error to a partial derivative. In the above equation $\delta^l$ represents the error passed from the next layer in the network structure\footnote{ In the Back-Propagation Algorithm the error is passed by each layer to the previous one, starting from the output error computed according to chosen loss function. }.

From the re-iteration of such structures we can join together multiple fully connected layers and so obtain multiple neuron layers jointly together with different levels of complexity and units (an input layer followed by multiple hidden layers).

The fully connected Neural Networks overcome the told above Perceptron problems using a combination of linear functions (single Perceptron units) and they gain more useful properties:

If the activation functions of all the hidden units in the Neural Network are linear, then the network architecture is equivalent to a network without hidden units.
If the number of hidden units is smaller than either the number of input units either the number of output ones, then the network can generate transformations from inputs to outputs as much general as possible since the information is lost in the dimensionality reduction performed by the hidden units.
We can find multiple weight configurations, i.e W matrices, which give us the same mapping function from inputs to outputs.

Given all the theoretical information about this kind of model we can now pass to practical (numerical) considerations about their implementations.

next »

PhDthesis

PhD thesis in Applied Physics

Fully Connected Neural Network