Trusted Cluster Serving with Graphene on Kubernetes#
Prerequisites#
Prior to deploying PPML Cluster Serving, please make sure the following is setup
Hardware that supports SGX
A fully configured Kubernetes cluster
Intel SGX Device Plugin to use SGX in K8S cluster (install following instructions here)
Java
Deploy Trusted Realtime ML for Kubernetes#
Pull docker image from dockerhub
$ docker pull intelanalytics/bigdl-ppml-trusted-realtime-ml-scala-graphene:2.1.0-SNAPSHOT
Pull the source code of BigDL and enter PPML graphene k8s directory
$ git clone https://github.com/intel-analytics/BigDL.git $ cd BigDL/ppml/trusted-realtime-ml/scala/docker-graphene/kubernetes
Generate secure keys and passwords, and deploy as secrets (Refer here for details)
Generate keys and passwords
Note: Make sure to add
${JAVA_HOME}/bin
to$PATH
to avoidkeytool: command not found
error.$ sudo ../../../../scripts/generate-keys.sh $ openssl genrsa -3 -out enclave-key.pem 3072 $ ../../../../scripts/generate-password.sh <used_password_when_generate_keys>
Deploy as secrets for Kubernetes
$ kubectl apply -f keys/keys.yaml $ kubectl apply -f password/password.yaml
In
values.yaml
, configure pulled image name, path ofenclave-key.pem
generated in step 3 and path of scriptstart-all-but-flink.sh
.If kernel version is 5.11+ with built-in SGX support, create soft links for SGX device
$ sudo ln -s /dev/sgx_enclave /dev/sgx/enclave $ sudo ln -s /dev/sgx_provision /dev/sgx/provision
Configure SGX mode#
In templates/flink-configuration-configmap.yaml
, configure sgx.mode
to sgx
or nonsgx
to determine whether to run the workload with SGX.
Configure Resource for Components#
Configure jobmanager resource allocation in
templates/jobmanager-deployment.yaml
... env: - name: SGX_MEM_SIZE value: "16G" ... resources: requests: cpu: 2 memory: 16Gi sgx.intel.com/enclave: "1" sgx.intel.com/epc: 16Gi limits: cpu: 2 memory: 16Gi sgx.intel.com/enclave: "1" sgx.intel.com/epc: 16Gi ...
Configure Taskmanager resource allocation
Memory allocation in
templates/flink-configuration-configmap.yaml
taskmanager.memory.managed.size: 4gb taskmanager.memory.task.heap.size: 5gb xmx.size: 5g
Pod resource allocation
Use
taskmanager-deployment.yaml
instead oftaskmanager-statefulset.yaml
for functionality test$ mv templates/taskmanager-statefulset.yaml ./ $ mv taskmanager-deployment.yaml.back templates/taskmanager-deployment.yaml
Configure resource in
templates/taskmanager-deployment.yaml
(allocate 16 cores in this example, please configure according to scenario)... env: - name: CORE_NUM value: "16" - name: SGX_MEM_SIZE value: "32G" ... resources: requests: cpu: 16 memory: 32Gi sgx.intel.com/enclave: "1" sgx.intel.com/epc: 32Gi limits: cpu: 16 memory: 32Gi sgx.intel.com/enclave: "1" sgx.intel.com/epc: 32Gi ...
Configure Redis and client resource allocation
SGX memory allocation in
start-all-but-flink.sh
... cd /ppml/trusted-realtime-ml/java export SGX_MEM_SIZE=16G test "$SGX_MODE" = sgx && ./init.sh echo "java initiated" ...
Pod resource allocation in
templates/master-deployment.yaml
... env: - name: CORE_NUM #batchsize per instance value: "16" ... resources: requests: cpu: 12 memory: 32Gi sgx.intel.com/enclave: "1" sgx.intel.com/epc: 32Gi limits: cpu: 12 memory: 32Gi sgx.intel.com/enclave: "1" sgx.intel.com/epc: 32Gi ...
Deploy Cluster Serving#
Deploy all components and start job
Download helm from release page and install
Deploy cluster serving
$ helm install ppml ./
Port forwarding
Set up port forwarding of jobmanager Rest port for access to Flink WebUI on host
Run
kubectl port-forward <flink-jobmanager-pod> --address 0.0.0.0 8081:8081
to forward jobmanager’s web UI port to 8081 on host.Navigate to
http://<host-IP>:8081
in web browser to check status of Flink cluster and job.
Performance benchmark
$ kubectl exec <master-deployment-pod> -it -- bash $ cd /ppml/trusted-realtime-ml/java/work/benchmark/ $ bash init-benchmark.sh $ python3 e2e_throughput.py -n <image_num> -i ../data/ILSVRC2012_val_00000001.JPEG
The
e2e_throughput.py
script pushes test image for-n
times (default 1000 if not manually set), and time the process from push images (enqueue) to retrieve all inference results (dequeue), to calculate cluster serving end-to-end throughput. The output should look likeServed xxx images in xxx sec, e2e throughput is xxx images/sec