DLlib in 5 minutes#

Overview#

DLlib is a distributed deep learning library for Apache Spark; with DLlib, users can write their deep learning applications as standard Spark programs (using either Scala or Python APIs).

It includes the functionalities of the original BigDL project, and provides following high-level APIs for distributed deep learning on Spark:

Scala Example#

This section show a single example of how to use dllib to build a deep learning application on Spark, using Keras APIs

LeNet Model on MNIST using Keras-Style API#

This tutorial is an explanation of what is happening in the lenet example

A bigdl-dllib program starts with initialize as follows.

val conf = Engine.createSparkConf()
  .setAppName("Train Lenet on MNIST")
  .set("spark.task.maxFailures", "1")
val sc = new SparkContext(conf)
Engine.init

After the initialization, we need to:

Load train and validation data by creating the DataSet (e.g., SampleToGreyImg, GreyImgNormalizer and GreyImgToBatch):

val trainSet = (if (sc.isDefined) {
    DataSet.array(load(trainData, trainLabel), sc.get, param.nodeNumber)
  } else {
    DataSet.array(load(trainData, trainLabel))
  }) -> SampleToGreyImg(28, 28) -> GreyImgNormalizer(trainMean, trainStd) -> GreyImgToBatch(
    param.batchSize)

val validationSet = DataSet.array(load(validationData, validationLabel), sc) ->
    BytesToGreyImg(28, 28) -> GreyImgNormalizer(testMean, testStd) -> GreyImgToBatch(
    param.batchSize)

We then define Lenet model using Keras-style api

val input = Input(inputShape = Shape(28, 28, 1))
val reshape = Reshape(Array(1, 28, 28)).inputs(input)
val conv1 = Convolution2D(6, 5, 5, activation = "tanh").setName("conv1_5x5").inputs(reshape)
val pool1 = MaxPooling2D().inputs(conv1)
val conv2 = Convolution2D(12, 5, 5, activation = "tanh").setName("conv2_5x5").inputs(pool1)
val pool2 = MaxPooling2D().inputs(conv2)
val flatten = Flatten().inputs(pool2)
val fc1 = Dense(100, activation = "tanh").setName("fc1").inputs(flatten)
val fc2 = Dense(classNum, activation = "softmax").setName("fc2").inputs(fc1)
Model(input, fc2)

After that, we configure the learning process. Set the optimization method and the Criterion (which, given input and target, computes gradient per given loss function):

model.compile(optimizer = optimMethod,
        loss = ClassNLLCriterion[Float](logProbAsInput = false),
        metrics = Array(new Top1Accuracy[Float](), new Top5Accuracy[Float](), new Loss[Float]))

Finally we train the model by calling model.fit:

model.fit(trainSet, nbEpoch = param.maxEpoch, validationData = validationSet)

Python Example#

Initialize NN Context#

NNContext is the main entry for provisioning the dllib program on the underlying cluster (such as K8s or Hadoop cluster), or just on a single laptop.

An dlllib program usually starts with the initialization of NNContext as follows:

from bigdl.dllib.nncontext import *
init_nncontext()

In init_nncontext, the user may specify cluster mode for the dllib program:

Cluster mode=: “local”, “yarn-client”, “yarn-cluster”, “k8s-client”, “standalone” and “spark-submit”. Default to be “local”.

The dllib program simply runs init_nncontext on the local machine, which will automatically provision the runtime Python environment and distributed execution engine on the underlying computing environment (such as a single laptop, a large K8s or Hadoop cluster, etc.).

Autograd Examples using bigdl-dllb keras Python API#

This tutorial describes the Autograd.

The example first do the initializton using init_nncontext():

sc = init_nncontext()

It then generate the input data X_, Y_

data_len = 1000
X_ = np.random.uniform(0, 1, (1000, 2))
Y_ = ((2 * X_).sum(1) + 0.4).reshape([data_len, 1])

It then define the custom loss

def mean_absolute_error(y_true, y_pred):
    result = mean(abs(y_true - y_pred), axis=1)
    return result

After that, the example creates the model as follows and set the criterion as the custom loss:

a = Input(shape=(2,))
b = Dense(1)(a)
c = Lambda(function=add_one_func)(b)
model = Model(input=a, output=c)

model.compile(optimizer=SGD(learningrate=1e-2),
              loss=mean_absolute_error)

Finally the example trains the model by calling model.fit:

model.fit(x=X_,
          y=Y_,
          batch_size=32,
          nb_epoch=int(options.nb_epoch),
          distributed=False)