Install Cluster Serving#


It is recommended to install Cluster Serving by pulling the pre-built Docker image to your local node, which have packaged all the required dependencies. Alternatively, you may also manually install Cluster Serving (through either pip or direct downloading), Redis on the local node.


docker pull intelanalytics/bigdl-cluster-serving

then, (or directly run docker run, it will pull the image if it does not exist)

docker run --name cluster-serving -itd --net=host intelanalytics/bigdl-cluster-serving:0.9.0

Log into the container

docker exec -it cluster-serving bash

cd ./cluster-serving, you can see all the environments prepared.

Manual installation#


Non-Docker users need to install Flink 1.10.0+, 1.10.0 by default, Redis 5.0.0+, 5.0.5 by default.

For users do not have above dependencies, we provide following command to quickly set up.


$ export REDIS_VERSION=5.0.5
$ wget${REDIS_VERSION}.tar.gz && \
    tar xzf redis-${REDIS_VERSION}.tar.gz && \
    rm redis-${REDIS_VERSION}.tar.gz && \
    cd redis-${REDIS_VERSION} && \


$ export FLINK_VERSION=1.11.2
$ wget${FLINK_VERSION}/flink-${FLINK_VERSION}-bin-scala_2.11.tgz && \
    tar xzf flink-${FLINK_VERSION}-bin-scala_2.11.tgz && \
    rm flink-${FLINK_VERSION}-bin-scala_2.11.tgz.tgz

After preparing dependencies above, make sure the environment variable $FLINK_HOME (/path/to/flink-FLINK_VERSION-bin), $REDIS_HOME(/path/to/redis-REDIS_VERSION) is set before following steps.

Install release version#

pip install bigdl-serving

Install nightly version#

Download package from here, run following command to install Cluster Serving

pip install bigdl_serving-*.whl

For users who need to deploy and start Cluster Serving, run cluster-serving-init to download and prepare dependencies.

For users who need to do inference, aka. predict data only, the environment is ready.


Set up cluster#

Cluster Serving uses Flink cluster, make sure you have it according to Installation.

For docker user, the cluster should be already started. You could use netstat -tnlp | grep 8081 to check if Flink REST port is working, if not, call $FLINK_HOME/bin/ to start Flink cluster.

If you need to start Flink on yarn, refer to Flink on Yarn, or K8s, refer to Flink on K8s at Flink official documentation.

If you use Flink standalone, call $FLINK_HOME/bin/ to start Flink cluster.

Configuration file#

After Installation, you will see a config file config.yaml in your current working directory. This file contains all the configurations that you can customize for your Cluster Serving. See an example of config.yaml below.

## BigDL Cluster Serving Config Example
# model path must be provided
modelPath: /path/to/model

Preparing Model#

Currently BigDL Cluster Serving supports TensorFlow, OpenVINO, PyTorch, BigDL, Caffe models. Supported types are listed below.

You need to put your model file into a directory with layout like following according to model type, note that only one model is allowed in your directory. Then, set in config.yaml file with modelPath:/path/to/dir.

Tensorflow Tensorflow SavedModel

|-- model
   |-- saved_model.pb
   |-- variables
       |-- variables.index

Tensorflow Frozen Graph

|-- model
   |-- frozen_inference_graph.pb
   |-- graph_meta.json

note: .pb is the weight file which name must be frozen_inference_graph.pb, .json is the inputs and outputs definition file which name must be graph_meta.json, with contents like {"input_names":["input:0"],"output_names":["output:0"]}

Tensorflow Checkpoint Please refer to freeze checkpoint example


|-- model

Running Pytorch model needs extra dependency and config. Refer to here to install dependencies, and set environment variable $PYTHONHOME to your python, e.g. python could be run by $PYTHONHOME/bin/python and library is at $PYTHONHOME/lib/.


|-- model
   |-- xx.xml
   |-- xx.bin


   |-- xx.model


|-- model
   |-- xx.prototxt
   |-- xx.caffemodel

Other Configuration#

The field params contains your inference parameter configuration.

  • core_number: the batch size you use for model inference, usually the core number of your machine is recommended. Thus you could just provide your machine core number at this field. We recommend this value to be not smaller than 4 and not larger than 512. In general, using larger batch size means higher throughput, but also increase the latency between batches accordingly.