Layers¶
In this section, we will cover all relevant layers implemented by Slugnet, and their specific use cases. This includes convolutional deep networks and layers associated with them.
Fully Connected Deep Networks¶
Slugnet implements fully connected deep networks via the Dense
layer. When operating in feedforward mode, the dense layer computes the
following term
where is activated output,
is the activation function, are weights,
is the bias term. The dense layer does not
implement any activation function, instead it is injected
at runtime via the activation
parameter. This mean
that on feedforward, the dense layer is incredibly simple,
it performs matrix multiplication between an input matrix
and a matrix of weights, then adds a bias vector, and
that’s it.
On feed backward, or backpropagation, the dense layer is responsible for calculating two values. The value defined will be used to calculate the weight and bias gradient at this layer. The value will be used to calculate gradients at all previous layers. This process is easy to follow in the backpropagation algorithm given in the introduction section of this documentation.
When looking at the implementation of Dense
, there is a notable absence
of
and .
This is because their dot product is calculated in the previous layer.
The model propagates that gradient to the current layer.
-
class
slugnet.layers.
Dense
(outshape, inshape=None, activation=<slugnet.activation.Noop object>, init=<slugnet.initializations.GlorotUniform object>)[source]¶ Bases:
slugnet.layers.Layer
A common densely connected neural network layer.
Parameters: - outshape (int) – The output shape at this layer.
- inshape (int) – The input shape at this layer.
- activation (slugnet.activation.Activation) – The activation function to be used at the layer.
- init (slugnet.initializations.Initializer) – The initialization function to be used
An example of using two dense layers to train a multi-layer deep network to classify mnist data can be seen below.
from slugnet.activation import ReLU, Softmax
from slugnet.layers import Dense
from slugnet.loss import SoftmaxCategoricalCrossEntropy as SCCE
from slugnet.model import Model
from slugnet.optimizers import RMSProp
from slugnet.data.mnist import get_mnist
X, y = get_mnist()
model = Model(lr=0.01, n_epoch=3, loss=SCCE(),
metrics=['loss', 'accuracy'],
optimizer=RMSProp())
model.add_layer(Dense(200, inshape=784, activation=ReLU()))
model.add_layer(Dense(10, activation=Softmax()))
model.fit(X, y)
If you have slugnet installed locally, this script can be
executed by running the following command. It will output
training and validation statistics to stdout
as the
model is trained.
$ python3 -m slugnet.examples.mnist
Note this snippet makes use of several components that have not yet been reviewed, such as loss and optimization functions. There are corresponding documentation sections for these components, and jumping ahead to learn about them is encouraged.
Dropout¶
Dropout is a method of regularization that trains subnetworks by turning off non-output nodes with some probability .
This approximates bagging, which involves training an ensemble of models to overcome weaknesses in any given model and prevent overfitting [1].
We can formalize dropout by representing the subnetworks created by dropout with a mask vector . Now, we note each subnetwork defines a new probability distribution of as [1]. If we define as the probability distribution of mask vectors , we can write the mean of all subnetworks as
The problem with evaluating this term is the exponential number of mask vectors. In practice, we approximate this probability distribution by including all nodes during inference, multiplying each output by , the probability that any node is included in the network during training, and running the feedforward operation just once. This rule is called the weight scaling inference rule [1].
-
class
slugnet.layers.
Dropout
(p=0.0, *args, **kwargs)[source]¶ Bases:
slugnet.layers.Layer
A layer that removes units from a network with probability
p
.Parameters: p (float) – The probability of a non-ouput node being removed from the network.
An example of using a Dropout
layer with slugnet is presented below.
from slugnet.activation import ReLU, Softmax
from slugnet.layers import Dense, Dropout
from slugnet.loss import SoftmaxCategoricalCrossEntropy as SCCE
from slugnet.model import Model
from slugnet.optimizers import RMSProp
from slugnet.data.mnist import get_mnist
X, y = get_mnist()
model = Model(lr=0.01, n_epoch=3, loss=SCCE(),
metrics=['loss', 'accuracy'],
optimizer=RMSProp())
model.add_layer(Dense(200, inshape=784, activation=ReLU()))
model.add_layer(Dropout(0.5))
model.add_layer(Dense(10, activation=Softmax()))
model.fit(X, y)
If you have slugnet installed locally, this script can be
run by executing the following command. It will output training
and validation statistics to stdout
as the model
is trained. Note that this model is slower to train than
the model without dropout. This is widely noted in the
literature [2].
$ python3 -m slugnet.examples.mnist_dropout
Convolutional Deep Networks¶
Convolutional deep networks are most often used in image classification tasks. There are several specialized layers used in these networks. The most obvious is the convolution layer, less obvious are pooling layers, specifically max-pooling and mean-pooling. In this section we will mathematically review all these layers in depth.
Convolution Layer¶
In the general case, a discrete convolution operation implements the function:
where is the input and is the kernel, or in some cases the weighting function.
In the case of convolutional deep networks, the input is typically a two dimensional image , and it follows that we have a two dimensional kernel . Now we can write out convolution function with both axes:
Note that we can write the infinite sum over the domains of and as discrete sums because we assume that the kernel is zero everywhere but the set of points in which we store data [1].
The motivation for using the convolution operation in a deep network is best described using an example of an image. In a densely connected deep network, each node at layer is connected to every node at layer . This does not lend itself to image processing, where location of a shape relative to another shape is important. For instance, finding a right angle involves detecting two edges that are perpendicular, and whose lines cross one another. If we make the nonzero parts of the kernel smaller than the input image, we can process parts of the image at a time, thereby ensuring locality of the input signals. To process the entire image, we slide the kernel over the input, along both axes. At each step, an output is produced which will be used as input for the next layer. This configuration allows us to learn the parameters of the kernel the same way we’d learn ordinary parameters in a densely connected deep network.
From figure 3, we can see that the output size can be determined from the input size and kernel size. The equation is given by
Figure 3 features a one dimensional input and output. As we mentioned earlier, most convolutional deep networks feature two dimensional inputs and outputs, such as images. In figure 4, we show how the convolution operation behaves when we are using two dimensional inputs, kernels, and outputs.
The stride width determines how far the kernel moves at each step. Of course, to learn anything interesting, we require multiple kernels at each layer. These are all configurable hyperparameters that can be set upon network instantiation. When the network is operating in feedforward mode, the output at each layer is a three dimensional tensor, rather than a matrix. This is due to the fact that each kernel produces its own two dimensional output, and there are multiple kernels at every layer.
-
class
slugnet.layers.
Convolution
(nb_kernel, kernel_size, stride=1, inshape=None, init=<slugnet.initializations.GlorotUniform object>, activation=None)[source]¶ Bases:
slugnet.layers.Layer
A layer that implements the convolution operation.
Parameters: - nb_kernel (int) – The number of kernels to use.
- kernel_size ((int, int)) – The size of the kernel as a tuple, heigh by width
- stride (int) – The stide width to use
- init (slugnet.initializations.Initializer) – The initializer to use
- activation (slugnet.activation.Activation) – The activation function to be used at the layer.
Pooling¶
Mean pooling is a method of downsampling typically used in convolutional deep networks. Pooling makes the representations at a subsequent layer approximately invariant to translations of the output from the previous layer [1]. This is useful when we care about the presence of some feature but not necessarily the exact location of the feature within the input.
The mean pooling operation implements the function
where are vectors of kernel indices over the image. This operation is depicted in figure 2.
The max-pooling operation implements the function
where are vectors of kernel indices over the image. This operation is depicted in figure 3.
-
class
slugnet.layers.
MeanPooling
(pool_size, inshape=None)[source]¶ Bases:
slugnet.layers.Layer
Pool outputs using the arithmetic mean.
We have now documented all the necessary parts of a convolutional deep network. This makes training one to classify mnist data simple.
import numpy as np
from slugnet.activation import ReLU, Softmax
from slugnet.layers import Convolution, Dense, MeanPooling, Flatten
from slugnet.loss import SoftmaxCategoricalCrossEntropy as SCCE
from slugnet.model import Model
from slugnet.optimizers import SGD
from slugnet.data.mnist import get_mnist
X, y = get_mnist()
X = X.reshape((-1, 1, 28, 28)) / 255.0
np.random.seed(100)
X = np.random.permutation(X)[:1000]
np.random.seed(100)
y = np.random.permutation(y)[:1000]
model = Model(lr=0.001, n_epoch=100, batch_size=3, loss=SCCE(),
metrics=['loss', 'accuracy'], optimizer=SGD())
model.add_layer(Convolution(1, (3, 3), inshape=(None, 1, 28, 28)))
model.add_layer(MeanPooling((2, 2)))
model.add_layer(Convolution(2, (4, 4)))
model.add_layer(MeanPooling((2, 2)))
model.add_layer(Flatten())
model.add_layer(Dense(10, activation=Softmax()))
model.fit(X, y)
Note that because Slugnet is implemented using numpy, and thus runs on a single CPU core, training this model is very slow.
[1] | (1, 2, 3, 4, 5) Goodfellow, Bengio, Courville (2016), Deep Learning, Chapter 9, http://www.deeplearningbook.org |
[2] | S. Wang and C. D. Manning. Fast dropout training. In Proceedings of the 30th International Conference on Machine Learning, pages 118–126. ACM, 2013. |