DLlib User Guide

1. Overview

DLlib is a distributed deep learning library for Apache Spark; with DLlib, users can write their deep learning applications as standard Spark programs (using either Scala or Python APIs).

It includes the functionalities of the original BigDL project, and provides following high-level APIs for distributed deep learning on Spark:

2. Scala user guide

2.1 Install and Run

Please refer scala guide for details.


2.2 Get started


This section show a single example of how to use dllib to build a deep learning application on Spark, using Keras APIs


LeNet Model on MNIST using Keras-Style API

This tutorial is an explanation of what is happening in the lenet example

A bigdl-dllib program starts with initialize as follows.

      val conf = Engine.createSparkConf()
        .setAppName("Train Lenet on MNIST")
        .set("spark.task.maxFailures", "1")
      val sc = new SparkContext(conf)
      Engine.init

After the initialization, we need to:

  1. Load train and validation data by creating the DataSet (e.g., SampleToGreyImg, GreyImgNormalizer and GreyImgToBatch):

    val trainSet = (if (sc.isDefined) {
        DataSet.array(load(trainData, trainLabel), sc.get, param.nodeNumber)
      } else {
        DataSet.array(load(trainData, trainLabel))
      }) -> SampleToGreyImg(28, 28) -> GreyImgNormalizer(trainMean, trainStd) -> GreyImgToBatch(
        param.batchSize)

    val validationSet = DataSet.array(load(validationData, validationLabel), sc) ->
        BytesToGreyImg(28, 28) -> GreyImgNormalizer(testMean, testStd) -> GreyImgToBatch(
        param.batchSize)
  1. We then define Lenet model using Keras-style api

    val input = Input(inputShape = Shape(28, 28, 1))
    val reshape = Reshape(Array(1, 28, 28)).inputs(input)
    val conv1 = Convolution2D(6, 5, 5, activation = "tanh").setName("conv1_5x5").inputs(reshape)
    val pool1 = MaxPooling2D().inputs(conv1)
    val conv2 = Convolution2D(12, 5, 5, activation = "tanh").setName("conv2_5x5").inputs(pool1)
    val pool2 = MaxPooling2D().inputs(conv2)
    val flatten = Flatten().inputs(pool2)
    val fc1 = Dense(100, activation = "tanh").setName("fc1").inputs(flatten)
    val fc2 = Dense(classNum, activation = "softmax").setName("fc2").inputs(fc1)
    Model(input, fc2)
  1. After that, we configure the learning process. Set the optimization method and the Criterion (which, given input and target, computes gradient per given loss function):

  model.compile(optimizer = optimMethod,
          loss = ClassNLLCriterion[Float](logProbAsInput = false),
          metrics = Array(new Top1Accuracy[Float](), new Top5Accuracy[Float](), new Loss[Float]))

Finally we train the model by calling model.fit:

  model.fit(trainSet, nbEpoch = param.maxEpoch, validationData = validationSet)

3. Python user guide

3.1 Install

3.1.1 Official Release

Run below command to install bigdl-dllib.

conda create -n my_env python=3.7
conda activate my_env
pip install bigdl-dllib

3.1.2 Nightly build

You can install the latest nightly build of bigdl-dllib as follows:

pip install --pre --upgrade bigdl-dllib

3.2 Run

3.2.1 Interactive Shell

You may test if the installation is successful using the interactive Python shell as follows:

  • Type python in the command line to start a REPL.

  • Try to run the example code below to verify the installation:

    from bigdl.dllib.utils.nncontext import *
    
    sc = init_nncontext()  # Initiation of bigdl-dllib on the underlying cluster.
    

3.2.2 Jupyter Notebook

You can start the Jupyter notebook as you normally do using the following command and run bigdl-dllib programs directly in a Jupyter notebook:

jupyter notebook --notebook-dir=./ --ip=* --no-browser

3.2.3 Python Script

You can directly write bigdl-dlllib programs in a Python file (e.g. script.py) and run in the command line as a normal Python program:

python script.py

3.3 Get started

NN Context

NNContext is the main entry for provisioning the dllib program on the underlying cluster (such as K8s or Hadoop cluster), or just on a single laptop.

An dlllib program usually starts with the initialization of NNContext as follows:

from bigdl.dllib.nncontext import *
init_nncontext()

In init_nncontext, the user may specify cluster mode for the dllib program:

  • Cluster mode=: “local”, “yarn-client”, “yarn-cluster”, “k8s-client”, “standalone” and “spark-submit”. Default to be “local”.

The dllib program simply runs init_nncontext on the local machine, which will automatically provision the runtime Python environment and distributed execution engine on the underlying computing environment (such as a single laptop, a large K8s or Hadoop cluster, etc.).

Autograd Examples using bigdl-dllb keras Python API

This tutorial describes the Autograd.

The example first do the initializton using init_nncontext():

  sc = init_nncontext()

It then generate the input data X_, Y_

    data_len = 1000
    X_ = np.random.uniform(0, 1, (1000, 2))
    Y_ = ((2 * X_).sum(1) + 0.4).reshape([data_len, 1])

It then define the custom loss

def mean_absolute_error(y_true, y_pred):
    result = mean(abs(y_true - y_pred), axis=1)
    return result

After that, the example creates the model as follows and set the criterion as the custom loss:

    a = Input(shape=(2,))
    b = Dense(1)(a)
    c = Lambda(function=add_one_func)(b)
    model = Model(input=a, output=c)

    model.compile(optimizer=SGD(learningrate=1e-2),
                  loss=mean_absolute_error)

Finally the example trains the model by calling model.fit:

    model.fit(x=X_,
              y=Y_,
              batch_size=32,
              nb_epoch=int(options.nb_epoch),
              distributed=False)