Open In Colab

image.png

BigDL-Nano Hyperparameter Tuning (TensorFlow Sequential/Functional API) Quickstart#

In this notebook we demonstrates how to use Nano HPO to tune the hyperparameters in tensorflow training. The model is built using either tensorflow keras sequential API or functional API.

Step 0: Prepare Environment#

You can install the latest pre-release version with nano support using below commands.

We recommend to run below commands, especially source bigdl-nano-init before jupyter kernel is started, or some of the optimizations may not take effect.

[ ]:
# Install latest pre-release version of bigdl-nano
!pip install --pre bigdl-nano[tensorflow]
!pip install setuptools==58.0.4
!pip install protobuf==3.20.1
!source bigdl-nano-init
[ ]:
# Install other dependecies for Nano HPO
!pip install ConfigSpace
!pip install optuna<=3.1.1

Step 1: Init Nano AutoML#

We need to enable Nano HPO before we use it for tensorflow training.

[ ]:
import bigdl.nano.automl as automl
automl.hpo_config.enable_hpo_tf()

Step 2: Prepare data#

We use MNIST dataset for demonstration.

[2]:
from tensorflow import keras

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

CLASSES = 10
img_x, img_y = x_train.shape[1], x_train.shape[2]
input_shape = (img_x, img_y, 1)
x_train = x_train.reshape(-1, img_x, img_y,1).astype("float32") / 255
x_test = x_test.reshape(-1, img_x, img_y,1).astype("float32") / 255

Step 3: Build model and specify search spaces#

We now create our model.

Change the imports from tensorflow.keras to bigdl.nano as below, and you will be able to specify search spaces as you define the model. For how to specify search space, refer to user doc.

[ ]:
from bigdl.nano.automl.tf.keras import Sequential
from bigdl.nano.tf.keras.layers import Dense, Flatten, Conv2D
from bigdl.nano.tf.keras import Input
from bigdl.nano.automl.tf.keras import Model
import bigdl.nano.automl.hpo.space as space

Below two cells show how to define the model with search spaces using either sequential or functional API respectively. You can choose one of them to run.

[ ]:
model = Sequential()
model.add(Conv2D(
    filters=space.Categorical(32, 64),
    kernel_size=space.Categorical(3, 5),
    strides=space.Categorical(1, 2),
    activation=space.Categorical("relu", "linear"),
    input_shape=input_shape))
model.add(Flatten())
model.add(Dense(CLASSES, activation="softmax"))
[ ]:
inputs = Input(shape=(28,28,1))
x = Conv2D(
    filters=space.Categorical(32, 64),
    kernel_size=space.Categorical(3, 5),
    strides=space.Categorical(1, 2),
    activation=space.Categorical("relu", "linear"),
    input_shape=input_shape)(inputs)
x = Flatten()(x)
outputs = Dense(CLASSES, activation="softmax")(x)
model = Model(inputs=inputs, outputs=outputs, name="mnist_model")

Step 4: Compile model#

We now compile our model with loss function, optimizer and metrics. If you want to tune learning rate and batch size, refer to user guide.

[5]:
from tensorflow.keras.optimizers import RMSprop
model.compile(
    loss="sparse_categorical_crossentropy",
    optimizer=RMSprop(learning_rate=0.001),
    metrics=["accuracy"]
)

Step 6: (Optional) Resume training from memory#

You can resume the previous search when a search completes by setting resume=True. Refer to user doc for more details.

[ ]:
%%time
model.search(
    n_trials=4,
    target_metric='val_accuracy',
    direction="maximize",
    pruner=PrunerType.HyperBand,
    pruner_kwargs={'min_resource':1, 'max_resource':100, 'reduction_factor':3},
    x=x_train,
    y=y_train,
    batch_size=128,
    epochs=5,
    validation_split=0.2,
    verbose=False,
    resume = True
)
[9]:
print(model.search_summary())
Number of finished trials: 12
Best trial:
  Value: 0.9822499752044678
  Params:
    activation▁choice: 0
    filters▁choice: 0
    kernel_size▁choice: 1
    strides▁choice: 0
<optuna.study.study.Study object at 0x7f1de549de50>

Step 7: Fit with the best hyperparameters#

After search, model.fit will autotmatically use the best hyperparmeters found in search to fit the model.

[10]:
history = model.fit(x_train, y_train,
                    batch_size=128, epochs=5, validation_split=0.2)

test_scores = model.evaluate(x_test, y_test, verbose=2)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])
Epoch 1/5
375/375 [==============================] - 10s 25ms/step - loss: 0.2363 - accuracy: 0.9301 - val_loss: 0.0978 - val_accuracy: 0.9741
Epoch 2/5
375/375 [==============================] - 10s 26ms/step - loss: 0.0786 - accuracy: 0.9768 - val_loss: 0.0781 - val_accuracy: 0.9775
Epoch 3/5
375/375 [==============================] - 9s 25ms/step - loss: 0.0559 - accuracy: 0.9835 - val_loss: 0.0673 - val_accuracy: 0.9811
Epoch 4/5
375/375 [==============================] - 9s 23ms/step - loss: 0.0444 - accuracy: 0.9869 - val_loss: 0.0617 - val_accuracy: 0.9825
Epoch 5/5
375/375 [==============================] - 9s 23ms/step - loss: 0.0372 - accuracy: 0.9891 - val_loss: 0.0617 - val_accuracy: 0.9827
313/313 - 6s - loss: 0.0482 - accuracy: 0.9840 - 6s/epoch - 19ms/step
Test loss: 0.04822046682238579
Test accuracy: 0.984000027179718

Step 8: HPO Result Analysis and Visualization#

Check out the summary of the model. The model has already been built with the best hyperparameters found by nano hpo.

[11]:
print(model.summary())
study = model.search_summary()
Model: "mnist_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 input_2 (InputLayer)        [(None, 28, 28, 1)]       0

 conv2d_1 (Conv2D)           (None, 24, 24, 32)        832

 flatten_1 (Flatten)         (None, 18432)             0

 dense_1 (Dense)             (None, 10)                184330

=================================================================
Total params: 185,162
Trainable params: 185,162
Non-trainable params: 0
_________________________________________________________________
None
Number of finished trials: 12
Best trial:
  Value: 0.9822499752044678
  Params:
    activation▁choice: 0
    filters▁choice: 0
    kernel_size▁choice: 1
    strides▁choice: 0
[12]:
study.trials_dataframe(attrs=("number", "value", "params", "state"))
[12]:
number value params_activation▁choice params_filters▁choice params_kernel_size▁choice params_strides▁choice state
0 0 0.979083 0 0 1 1 COMPLETE
1 1 0.919500 1 1 0 1 PRUNED
2 2 0.980750 0 1 1 1 COMPLETE
3 3 0.920583 1 1 0 0 COMPLETE
4 4 0.917000 1 0 1 0 PRUNED
5 5 0.982250 0 0 1 0 COMPLETE
6 6 0.975917 0 1 1 1 PRUNED
7 7 0.921917 1 1 0 1 COMPLETE
8 8 0.972500 0 0 0 1 COMPLETE
9 9 0.925167 1 1 0 1 PRUNED
10 10 0.979500 0 1 0 0 PRUNED
11 11 0.917583 1 0 1 0 PRUNED
[13]:
from bigdl.nano.automl.hpo.visualization import plot_optimization_history
plot_optimization_history(study)