Optimization Functions

class slugnet.optimizers.SGD(lr=0.001, clip=-1, decay=0.0, lr_min=0.0, lr_max=inf)[source]

Bases: slugnet.optimizers.Optimizer

Optimize model parameters using common stochastic gradient descent.

class slugnet.optimizers.RMSProp(rho=0.9, epsilon=1e-06, *args, **kwargs)[source]

Bases: slugnet.optimizers.Optimizer

RMSProp updates Scale learning rates by dividing with the moving average of the root mean squared (RMS) gradients. See [1] for further description.

  • rho (float) – Gradient moving average decay factor.
  • epsilon (float) – Small value added for numerical stability.

rho should be between 0 and 1. A value of rho close to 1 will decay the moving average slowly and a value close to 0 will decay the moving average fast. Using the step size \eta and a decay factor \rho the learning rate \eta_t is calculated as:

r_t &= \rho r_{t-1} + (1-\rho)*g^2\\
\eta_t &= \frac{\eta}{\sqrt{r_t + \epsilon}}

[1]Tieleman, T. and Hinton, G. (2012): Neural Networks for Machine Learning, Lecture 6.5 - rmsprop. Coursera. http://www.youtube.com/watch?v=O3sxAc4hxZU (formula @5:20)