BigDL-Nano Hyperparameter Tuning (TensorFlow Sequential/Functional API) Quickstart#
In this notebook we demonstrates how to use Nano HPO to tune the hyperparameters in tensorflow training. The model is built using either tensorflow keras sequential API or functional API.
Step 0: Prepare Environment#
You can install the latest pre-release version with nano support using below commands.
We recommend to run below commands, especially source bigdl-nano-init
before jupyter kernel is started, or some of the optimizations may not take effect.
[ ]:
# Install latest pre-release version of bigdl-nano
!pip install --pre bigdl-nano[tensorflow]
!pip install setuptools==58.0.4
!pip install protobuf==3.20.1
!source bigdl-nano-init
[ ]:
# Install other dependecies for Nano HPO
!pip install ConfigSpace
!pip install optuna<=3.1.1
Step 1: Init Nano AutoML#
We need to enable Nano HPO before we use it for tensorflow training.
[ ]:
import bigdl.nano.automl as automl
automl.hpo_config.enable_hpo_tf()
Step 2: Prepare data#
We use MNIST dataset for demonstration.
[2]:
from tensorflow import keras
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
CLASSES = 10
img_x, img_y = x_train.shape[1], x_train.shape[2]
input_shape = (img_x, img_y, 1)
x_train = x_train.reshape(-1, img_x, img_y,1).astype("float32") / 255
x_test = x_test.reshape(-1, img_x, img_y,1).astype("float32") / 255
Step 3: Build model and specify search spaces#
We now create our model.
Change the imports from tensorflow.keras to bigdl.nano as below, and you will be able to specify search spaces as you define the model. For how to specify search space, refer to user doc.
[ ]:
from bigdl.nano.automl.tf.keras import Sequential
from bigdl.nano.tf.keras.layers import Dense, Flatten, Conv2D
from bigdl.nano.tf.keras import Input
from bigdl.nano.automl.tf.keras import Model
import bigdl.nano.automl.hpo.space as space
Below two cells show how to define the model with search spaces using either sequential or functional API respectively. You can choose one of them to run.
[ ]:
model = Sequential()
model.add(Conv2D(
filters=space.Categorical(32, 64),
kernel_size=space.Categorical(3, 5),
strides=space.Categorical(1, 2),
activation=space.Categorical("relu", "linear"),
input_shape=input_shape))
model.add(Flatten())
model.add(Dense(CLASSES, activation="softmax"))
[ ]:
inputs = Input(shape=(28,28,1))
x = Conv2D(
filters=space.Categorical(32, 64),
kernel_size=space.Categorical(3, 5),
strides=space.Categorical(1, 2),
activation=space.Categorical("relu", "linear"),
input_shape=input_shape)(inputs)
x = Flatten()(x)
outputs = Dense(CLASSES, activation="softmax")(x)
model = Model(inputs=inputs, outputs=outputs, name="mnist_model")
Step 4: Compile model#
We now compile our model with loss function, optimizer and metrics. If you want to tune learning rate and batch size, refer to user guide.
[5]:
from tensorflow.keras.optimizers import RMSprop
model.compile(
loss="sparse_categorical_crossentropy",
optimizer=RMSprop(learning_rate=0.001),
metrics=["accuracy"]
)
Step 5: Run hyperparameter search#
Run hyperparameter search by calling model.search
. Set n_trials
to the number of trials you want to run, and set the target_metric
and direction
so that HPO optimizes the target_metric
in the specified direction
. Each trial will use a different set of hyperparameters in the search space range. After search completes, you can use search_summary
to retrive the search results for analysis. For more details, refer to user
doc
[ ]:
%%time
from bigdl.nano.automl.hpo.backend import PrunerType
model.search(
n_trials=8,
target_metric='val_accuracy',
direction="maximize",
pruner=PrunerType.HyperBand,
pruner_kwargs={'min_resource':1, 'max_resource':100, 'reduction_factor':3},
x=x_train,
y=y_train,
batch_size=128,
epochs=5,
validation_split=0.2,
verbose=False
)
[7]:
print(model.search_summary())
Number of finished trials: 8
Best trial:
Value: 0.9822499752044678
Params:
activation▁choice: 0
filters▁choice: 0
kernel_size▁choice: 1
strides▁choice: 0
<optuna.study.study.Study object at 0x7f1de549de50>
Step 6: (Optional) Resume training from memory#
You can resume the previous search when a search completes by setting resume=True
. Refer to user doc for more details.
[ ]:
%%time
model.search(
n_trials=4,
target_metric='val_accuracy',
direction="maximize",
pruner=PrunerType.HyperBand,
pruner_kwargs={'min_resource':1, 'max_resource':100, 'reduction_factor':3},
x=x_train,
y=y_train,
batch_size=128,
epochs=5,
validation_split=0.2,
verbose=False,
resume = True
)
[9]:
print(model.search_summary())
Number of finished trials: 12
Best trial:
Value: 0.9822499752044678
Params:
activation▁choice: 0
filters▁choice: 0
kernel_size▁choice: 1
strides▁choice: 0
<optuna.study.study.Study object at 0x7f1de549de50>
Step 7: Fit with the best hyperparameters#
After search, model.fit
will autotmatically use the best hyperparmeters found in search to fit the model.
[10]:
history = model.fit(x_train, y_train,
batch_size=128, epochs=5, validation_split=0.2)
test_scores = model.evaluate(x_test, y_test, verbose=2)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])
Epoch 1/5
375/375 [==============================] - 10s 25ms/step - loss: 0.2363 - accuracy: 0.9301 - val_loss: 0.0978 - val_accuracy: 0.9741
Epoch 2/5
375/375 [==============================] - 10s 26ms/step - loss: 0.0786 - accuracy: 0.9768 - val_loss: 0.0781 - val_accuracy: 0.9775
Epoch 3/5
375/375 [==============================] - 9s 25ms/step - loss: 0.0559 - accuracy: 0.9835 - val_loss: 0.0673 - val_accuracy: 0.9811
Epoch 4/5
375/375 [==============================] - 9s 23ms/step - loss: 0.0444 - accuracy: 0.9869 - val_loss: 0.0617 - val_accuracy: 0.9825
Epoch 5/5
375/375 [==============================] - 9s 23ms/step - loss: 0.0372 - accuracy: 0.9891 - val_loss: 0.0617 - val_accuracy: 0.9827
313/313 - 6s - loss: 0.0482 - accuracy: 0.9840 - 6s/epoch - 19ms/step
Test loss: 0.04822046682238579
Test accuracy: 0.984000027179718
Step 8: HPO Result Analysis and Visualization#
Check out the summary of the model. The model has already been built with the best hyperparameters found by nano hpo.
[11]:
print(model.summary())
study = model.search_summary()
Model: "mnist_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 28, 28, 1)] 0
conv2d_1 (Conv2D) (None, 24, 24, 32) 832
flatten_1 (Flatten) (None, 18432) 0
dense_1 (Dense) (None, 10) 184330
=================================================================
Total params: 185,162
Trainable params: 185,162
Non-trainable params: 0
_________________________________________________________________
None
Number of finished trials: 12
Best trial:
Value: 0.9822499752044678
Params:
activation▁choice: 0
filters▁choice: 0
kernel_size▁choice: 1
strides▁choice: 0
[12]:
study.trials_dataframe(attrs=("number", "value", "params", "state"))
[12]:
number | value | params_activation▁choice | params_filters▁choice | params_kernel_size▁choice | params_strides▁choice | state | |
---|---|---|---|---|---|---|---|
0 | 0 | 0.979083 | 0 | 0 | 1 | 1 | COMPLETE |
1 | 1 | 0.919500 | 1 | 1 | 0 | 1 | PRUNED |
2 | 2 | 0.980750 | 0 | 1 | 1 | 1 | COMPLETE |
3 | 3 | 0.920583 | 1 | 1 | 0 | 0 | COMPLETE |
4 | 4 | 0.917000 | 1 | 0 | 1 | 0 | PRUNED |
5 | 5 | 0.982250 | 0 | 0 | 1 | 0 | COMPLETE |
6 | 6 | 0.975917 | 0 | 1 | 1 | 1 | PRUNED |
7 | 7 | 0.921917 | 1 | 1 | 0 | 1 | COMPLETE |
8 | 8 | 0.972500 | 0 | 0 | 0 | 1 | COMPLETE |
9 | 9 | 0.925167 | 1 | 1 | 0 | 1 | PRUNED |
10 | 10 | 0.979500 | 0 | 1 | 0 | 0 | PRUNED |
11 | 11 | 0.917583 | 1 | 0 | 1 | 0 | PRUNED |
[13]:
from bigdl.nano.automl.hpo.visualization import plot_optimization_history
plot_optimization_history(study)