Learning Rate Scheduler#
Poly#
Scala:
val lrScheduler = Poly(power=0.5, maxIteration=1000)
Python:
lr_scheduler = Poly(power=0.5, max_iteration=1000, bigdl_type="float")
A learning rate decay policy, where the effective learning rate follows a polynomial decay, to be zero by the max_iteration. Calculation: base_lr (1 - iter/maxIteration) ^
(power)
power
coeffient of decay, refer to calculation formula
maxIteration
max iteration when lr becomes zero
Scala example:
import com.intel.analytics.bigdl.dllib.optim.SGD._
import com.intel.analytics.bigdl.dllib.optim._
import com.intel.analytics.bigdl.dllib.tensor.{Storage, Tensor}
import com.intel.analytics.bigdl.dllib.tensor.TensorNumericMath.TensorNumeric.NumericFloat
import com.intel.analytics.bigdl.dllib.utils.T
val optimMethod = new SGD[Double](0.1)
optimMethod.learningRateSchedule = Poly(3, 100)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.1
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.0970299
Python example:
optim_method = SGD(0.1)
optimMethod.learningRateSchedule = Poly(3, 100)
Default#
It is the default learning rate schedule. For each iteration, the learning rate would update with the following formula:
l_{n + 1} = l / (1 + n * learning_rate_decay) where l
is the initial learning rate
Scala:
val lrScheduler = Default()
Python:
lr_scheduler = Default()
Scala example:
val optimMethod = new SGD[Double](0.1, 0.1)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.1
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.09090909090909091
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.08333333333333334
Python example:
optimMethod = SGD(leaningrate_schedule=Default())
NaturalExp#
A learning rate schedule, which rescale the learning rate by exp ( -decay_rate * iter / decay_step ) referring to tensorflow’s learning rate decay # natural_exp_decay
decay_step
how often to apply decay
gamma
the decay rate. e.g. 0.96
Scala:
val learningRateScheduler = NaturalExp(1, 1)
Scala example:
val optimMethod = new SGD[Double](0.1)
optimMethod.learningRateSchedule = NaturalExp(1, 1)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
(0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
val state = T("epoch" -> 0, "evalCounter" -> 0)
optimMethod.state = state
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.1
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.036787944117144235
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.013533528323661271
Exponential#
A learning rate schedule, which rescale the learning rate by lr_{n + 1} = lr * decayRate ^
(iter / decayStep)
decayStep
the inteval for lr decay
decayRate
decay rate
stairCase
if true, iter / decayStep is an integer division and the decayed learning rate follows a staircase function.
Scala:
val learningRateSchedule = Exponential(10, 0.96)
Python:
exponential = Exponential(100, 0.1)
Scala example:
val optimMethod = new SGD[Double](0.05)
optimMethod.learningRateSchedule = Exponential(10, 0.96)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
(0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
val state = T("epoch" -> 0, "evalCounter" -> 0)
optimMethod.state = state
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.05
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.049796306069892535
Python example:
optimMethod = SGD(leaningrate_schedule=Exponential(100, 0.1))
Plateau#
Plateau is the learning rate schedule when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. It monitors a quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced.
monitor
quantity to be monitored, can be Loss or score
factor
factor by which the learning rate will be reduced. new_lr = lr * factor
patience
number of epochs with no improvement after which learning rate will be reduced.
mode
one of {min, max}. In min mode, lr will be reduced when the quantity monitored has stopped decreasing;
in max mode it will be reduced when the quantity monitored has stopped increasing
epsilon
threshold for measuring the new optimum, to only focus on significant changes.
cooldown
number of epochs to wait before resuming normal operation after lr has been reduced.
minLr
lower bound on the learning rate.
Scala:
val learningRateSchedule = Plateau(monitor="score", factor=0.1, patience=10, mode="min", epsilon=1e-4f, cooldown=0, minLr=0)
Python:
plateau = Plateau("score", factor=0.1, patience=10, mode="min", epsilon=1e-4, cooldown=0, minLr=0)
Scala example:
val optimMethod = new SGD[Double](0.05)
optimMethod.learningRateSchedule = Plateau(monitor="score", factor=0.1, patience=10, mode="min", epsilon=1e-4f, cooldown=0, minLr=0)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
(0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
val state = T("epoch" -> 0, "evalCounter" -> 0)
optimMethod.state = state
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
Python example:
optimMethod = SGD(leaningrate_schedule=Plateau("score"))
Warmup#
A learning rate gradual increase policy, where the effective learning rate increase delta after each iteration. Calculation: base_lr + delta * iteration
delta
increase amount after each iteration
Scala:
val learningRateSchedule = Warmup(delta = 0.05)
Python:
warmup = Warmup(delta=0.05)
Scala example:
val lrSchedules = new SequentialSchedule(100)
lrSchedules.add(Warmup(0.3), 3).add(Poly(3, 100), 100)
val optimMethod = new SGD[Double](learningRate = 0.1, learningRateSchedule = lrSchedules)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.1
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.4
Python example:
optimMethod = SGD(leaningrate_schedule=Warmup(0.05))
SequentialSchedule#
A learning rate scheduler which can stack several learning rate schedulers.
iterationPerEpoch
iteration numbers per epoch
Scala:
val learningRateSchedule = SequentialSchedule(iterationPerEpoch=100)
Python:
sequentialSchedule = SequentialSchedule(iteration_per_epoch=5)
Scala example:
val lrSchedules = new SequentialSchedule(100)
lrSchedules.add(Warmup(0.3), 3).add(Poly(3, 100), 100)
val optimMethod = new SGD[Double](learningRate = 0.1, learningRateSchedule = lrSchedules)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.1
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.4
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.7
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-1.0
optimMethod.optimize(feval, x)
> print(optimMethod.learningRateSchedule.currentRate)
-0.9702989999999999
Python example:
sequentialSchedule = SequentialSchedule(5)
poly = Poly(0.5, 2)
sequentialSchedule.add(poly, 5)
EpochDecay#
Scala:
def decay(epoch: Int): Double =
if (epoch >= 1) 2.0 else if (epoch >= 2) 1.0 else 0.0
val learningRateSchedule = EpochDecay(decay)
It is an epoch decay learning rate schedule. The learning rate decays through a function argument on number of run epochs l_{n + 1} = l_{n} * 0.1 ^
decayType(epoch)
decayType
is a function with number of run epochs as the argument
Scala example:
def decay(epoch: Int): Double =
if (epoch == 1) 2.0 else if (epoch == 2) 1.0 else 0.0
val optimMethod = new SGD[Double](1000)
optimMethod.learningRateSchedule = EpochDecay(decay)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
val state = T("epoch" -> 0)
for(e <- 1 to 3) {
state("epoch") = e
optimMethod.state = state
optimMethod.optimize(feval, x)
if(e <= 1) {
assert(optimMethod.learningRateSchedule.currentRate==10)
} else if (e <= 2) {
assert(optimMethod.learningRateSchedule.currentRate==100)
} else {
assert(optimMethod.learningRateSchedule.currentRate==1000)
}
}
Regime#
A structure to specify hyper parameters by start epoch and end epoch. Usually work with [[EpochSchedule]].
startEpoch
start epoch
endEpoch
end epoch
config
config table contains hyper parameters
EpochSchedule#
A learning rate schedule which configure the learning rate according to some pre-defined [[Regime]]. If the running epoch is within the interval of a regime r
[r.startEpoch, r.endEpoch], then the learning
rate will take the “learningRate” in r.config.
regimes
an array of pre-defined [[Regime]].
Scala:
val regimes: Array[Regime] = Array(
Regime(1, 3, T("learningRate" -> 1e-2, "weightDecay" -> 2e-4)),
Regime(4, 7, T("learningRate" -> 5e-3, "weightDecay" -> 2e-4)),
Regime(8, 10, T("learningRate" -> 1e-3, "weightDecay" -> 0.0))
)
val learningRateScheduler = EpochSchedule(regimes)
Scala example:
val regimes: Array[Regime] = Array(
Regime(1, 3, T("learningRate" -> 1e-2, "weightDecay" -> 2e-4)),
Regime(4, 7, T("learningRate" -> 5e-3, "weightDecay" -> 2e-4)),
Regime(8, 10, T("learningRate" -> 1e-3, "weightDecay" -> 0.0))
)
val state = T("epoch" -> 0)
val optimMethod = new SGD[Double](0.1)
optimMethod.learningRateSchedule = EpochSchedule(regimes)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
return (0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
for(e <- 1 to 10) {
state("epoch") = e
optimMethod.state = state
optimMethod.optimize(feval, x)
if(e <= 3) {
assert(optimMethod.learningRateSchedule.currentRate==-1e-2)
assert(optimMethod.weightDecay==2e-4)
} else if (e <= 7) {
assert(optimMethod.learningRateSchedule.currentRate==-5e-3)
assert(optimMethod.weightDecay==2e-4)
} else if (e <= 10) {
assert(optimMethod.learningRateSchedule.currentRate==-1e-3)
assert(optimMethod.weightDecay==0.0)
}
}
EpochStep#
A learning rate schedule which rescale the learning rate by gamma
for each stepSize
epochs.
stepSize
For how many epochs to update the learning rate once
gamma
the rescale factor
Scala:
val learningRateScheduler = EpochStep(1, 0.5)
Scala example:
val optimMethod = new SGD[Double](0.1)
optimMethod.learningRateSchedule = EpochStep(1, 0.5)
def feval(x: Tensor[Double]): (Double, Tensor[Double]) = {
(0.1, Tensor[Double](Storage(Array(1.0, 1.0))))
}
val x = Tensor[Double](Storage(Array(10.0, 10.0)))
val state = T("epoch" -> 0)
for(e <- 1 to 10) {
state("epoch") = e
optimMethod.state = state
optimMethod.optimize(feval, x)
assert(optimMethod.learningRateSchedule.currentRate==(-0.1 * Math.pow(0.5, e)))
}