View the runnable example on GitHub
Accelerate TensorFlow Keras Training using Multiple Instances#
BigDL-Nano provides bigdl.nano.tf.keras.Model
and bigdl.nano.tf.keras.Sequential
which extend tf.keras.Model
and tf.keras.Sequential
separately with various optimizations. To use multi-instance training on a server with multiple CPU cores or sockets, you just replace tf.keras.Model
/Sequential
in your code with bigdl.nano.tf.keras.Model
/Sequential
, and call fit
with specified num_processes
.
📝 Note
Before starting your TensorFlow Keras application, it is highly recommended to run
source bigdl-nano-init
to set several environment variables based on your current hardware. Empirically, these variables will bring big performance increase for most TensorFlow Keras applications on training workloads.
First, import Model
or Sequential
from bigdl.nano.tf.keras
instead of tf.keras
. Let’s take the Model
class here as an example:
[ ]:
# from tf.keras import Model
from bigdl.nano.tf.keras import Model
Suppose we would like to train a ResNet50 model (pretrained on ImageNet dataset) on the imagenette dataset, we need to create the corresponding train/test datasets, and define the model:
[ ]:
# create train/test datasets
train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)
# Model creation steps are the same as using tf.keras.Model
inputs, outputs = define_model_inputs_outputs(num_classes=ds_info.features['label'].num_classes,
img_size=224)
model = Model(inputs=inputs, outputs=outputs)
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
The definition of create_datasets
and define_model_inputs_outputs
can be found in the runnable example.
You could then call the fit
method with num_processes
set to an integer larger than 1 to launch the specific number of processes for data-parallel training:
[ ]:
model.fit(train_ds,
epochs=10,
steps_per_epoch=(ds_info.splits['train'].num_examples // 32),
num_processes=2)
📝 Note
By setting
num_processes
, CPU cores will be automatically and evenly distributed among processes to avoid conflicts and maximize training throughput.During Nano TensorFlow Keras multi-instance training, the effective batch size is still the
batch_size
specified in datasets (32 in this example). Because we choose to match the semantics of TensorFlow distributed training (MultiWorkerMirroredStrategy
), which intends to split the batch into multiple sub-batches for different workers.