View on GitHub

PhDthesis

PhD thesis in Applied Physics

Activation Functions

Activation functions (or transfer functions) are linear or non linear equations which process the output of a Neural Network neuron and bound it into a limit range of values (commonly or ). The output of a simple neuron 1 can be computed as dot product of the input and neuron weights (see previous section); in this case the output values range from to and they are equivalent to a simple linear function. Linear functions are very simple to trait but they are limited in their complexity and thus in their learning power. Neural Networks without activation functions are just simple linear regression model (see the fully connected Neural Network properties in the previous section). Neural Networks are considered as Universal Function Approximators so the introduction of non-linearity allows them to model a wide range of functions and to learn more complex relations in pattern data. From a biological point-of-view the activation functions model the on/off state of a neuron in the output decision process.

Many activation functions have been proposed during the years and each one has its characteristics, but not an appropriated application field. The best activation function to use in a given situation (to a particular problem) is still an open question. Each one has its pros and cons in some situations, so each Neural Network library implements a wide range of them and it leaves to the user to perform his own tests. In the following Table we show the list of activation functions implemented in our NumPyNet and Byron libraries, with mathematical formulation and corresponding derivative (ref. activations.py for the code implementation). An important feature of any activation function, in fact, is that it should be differentiable since the main procedure of model optimization implies the back-propagation of the error gradients.

Name Equation Derivative        
Linear        
Logistic        
Loggy        
Relu        
Elu        
Relie        
Ramp        
Tanh        
Plse        
Leaky        
HardTan        
LhTan        
Selu        
SoftPlus        
SoftSign $$f(x) = \frac{x}{ x + 1}$$ $$f’(x) = \frac{1}{( f(x) + 1)^{2}}$$
Elliot $$f(x) = \frac{\frac{1}{2} * S * x}{1 + x + S } + \frac{1}{2}$$ $$f’(x) = \frac{\frac{1}{2} * S}{(1 + f(x) + S )^{2}} $$
SymmElliot $$f(x) = \frac{S * x}{1 + x * S }$$ $$f’(x) = \frac{S}{(1 + f(x) * S )^{2}}$$

As can be seen in the table it is easier to compute the activation function derivative as function of it. This is a (well known) important type of optimization in computation term, since it reduces the number of operations and it allows to apply the backward gradient directly.

To better understand the effects of activation functions, we can apply these functions on a test image. This can be easy done using the example scripts inserted inside our NumPyNet library. In Fig. 1 the effects of the previously described functions are reported on a test image. For each function we show the output of the activation function and its gradient. For visualization purposes the image values have been rescaled before the input to the functions.

Activation functions applied on a testing image. **(top)** Elu function and corresponding gradient. **(center)** Logistic function and corresponding gradient. **(bottom)** Relu function and corresponding gradient.

From the results given in Figure we can better appreciate the differences between the several mathematical formulas: a simple Logistic function does not produce evident effects on the test image while a Relu activation tends to overshadow the image pixels. This feature of the Relu activation function is very useful in Neural Network model and is also determines important theoretical consequences, which led it to be one of the most prominent solution for many Neural Network models.

The ReLU (Rectified Linear Unit) activation function is, in fact, the most used into the modern Neural Networks models. Its diffusion is imputed to its numerical efficiency and to the benefits it brings [Glorot2011Relu]:

In the next sections we will discuss about different kind of Neural Network models and in all of them we choose to use Relu activation function in the major part of the layers.

next »

  1. We assume for simplicity a fully connected Neural Network neuron.