View the runnable example on GitHub
Accelerate TensorFlow Keras Training using Multiple Instances#
bigdl.nano.tf.keras.Sequential which extend
tf.keras.Sequential separately with various optimizations. To use multi-instance training on a server with multiple CPU cores or sockets, you just replace
Sequential in your code with
Sequential, and call
fit with specified
Before starting your TensorFlow Keras application, it is highly recommended to run
source bigdl-nano-initto set several environment variables based on your current hardware. Empirically, these variables will bring big performance increase for most TensorFlow Keras applications on training workloads.
bigdl.nano.tf.keras instead of
tf.keras. Let’s take the
Model class here as an example:
# from tf.keras import Model from bigdl.nano.tf.keras import Model
Suppose we would like to train a ResNet50 model (pretrained on ImageNet dataset) on the imagenette dataset, we need to create the corresponding train/test datasets, and define the model:
# create train/test datasets train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32) # Model creation steps are the same as using tf.keras.Model inputs, outputs = define_model_inputs_outputs(num_classes=ds_info.features['label'].num_classes, img_size=224) model = Model(inputs=inputs, outputs=outputs) model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
The definition of
define_model_inputs_outputs can be found in the runnable example.
You could then call the
fit method with
num_processes set to an integer larger than 1 to launch the specific number of processes for data-parallel training:
model.fit(train_ds, epochs=10, steps_per_epoch=(ds_info.splits['train'].num_examples // 32), num_processes=2)
num_processes, CPU cores will be automatically and evenly distributed among processes to avoid conflicts and maximize training throughput.
During Nano TensorFlow Keras multi-instance training, the effective batch size is still the
batch_sizespecified in datasets (32 in this example). Because we choose to match the semantics of TensorFlow distributed training (
MultiWorkerMirroredStrategy), which intends to split the batch into multiple sub-batches for different workers.