BigDL-Nano TensorFlow SparseEmbedding and SparseAdam#
In this guide we demonstrates how to use SparseEmbedding
and SparseAdam
to obtain stroger performance with sparse gradient.
Step 0: Prepare Environment#
We recommend using conda to prepare the environment. Please refer to the install guide for more details.
conda create py37 python==3.7.10 setuptools==58.0.4
conda activate py37
# nightly bulit version
pip install --pre --upgrade bigdl-nano[tensorflow]
# set env variables for your conda environment
source bigdl-nano-init
pip install tensorflow-datasets
Step 1: Import BigDL-Nano#
The optimizations in BigDL-Nano are delivered through BigDL-Nano’s Model
and Sequential
classes. For most cases, you can just replace your tf.keras.Model
to bigdl.nano.tf.keras.Model
and tf.keras.Sequential
to bigdl.nano.tf.keras.Sequential
to benefits from BigDL-Nano.
from bigdl.nano.tf.keras import Model, Sequential
Step 2: Load the data#
We demonstrate with imdb_reviews, a large movie review dataset.
import tensorflow_datasets as tfds
(raw_train_ds, raw_val_ds, raw_test_ds), info = tfds.load(
"imdb_reviews",
split=['train[:80%]', 'train[80%:]', 'test'],
as_supervised=True,
batch_size=32,
shuffle_files=False,
with_info=True
)
Step 3: Parepre the Data#
In particular, we remove
tags.
import tensorflow as tf
from tensorflow.keras.layers import TextVectorization
import string
import re
def custom_standardization(input_data):
lowercase = tf.strings.lower(input_data)
stripped_html = tf.strings.regex_replace(lowercase, "<br />", " ")
return tf.strings.regex_replace(
stripped_html, f"[{re.escape(string.punctuation)}]", ""
)
max_features = 20000
embedding_dim = 128
sequence_length = 500
vectorize_layer = TextVectorization(
standardize=custom_standardization,
max_tokens=max_features,
output_mode="int",
output_sequence_length=sequence_length,
)
# Let's make a text-only dataset (no labels):
text_ds = raw_train_ds.map(lambda x, y: x)
# Let's call `adapt`:
vectorize_layer.adapt(text_ds)
def vectorize_text(text, label):
text = tf.expand_dims(text, -1)
return vectorize_layer(text), label
# Vectorize the data.
train_ds = raw_train_ds.map(vectorize_text)
val_ds = raw_val_ds.map(vectorize_text)
test_ds = raw_test_ds.map(vectorize_text)
# Do async prefetching / buffering of the data for best performance on GPU.
train_ds = train_ds.cache().prefetch(buffer_size=10)
val_ds = val_ds.cache().prefetch(buffer_size=10)
test_ds = test_ds.cache().prefetch(buffer_size=10)
Step 4: Build Model#
bigdl.nano.tf.keras.Embedding
is a slightly modified version of tf.keras.Embedding
layer, this embedding layer only applies regularizer to the output of the embedding layer, so that the gradient to embeddings is sparse. bigdl.nano.tf.optimzers.Adam
is a variant of the Adam
optimizer that handles sparse updates more efficiently.
Here we create two models, one using normal Embedding layer and Adam optimizer, the other using SparseEmbedding
and SparseAdam
.
from tensorflow.keras import layers
from bigdl.nano.tf.keras.layers import Embedding
from bigdl.nano.tf.optimizers import SparseAdam
from tensorflow.keras import layers
from bigdl.nano.tf.keras.layers import Embedding
from bigdl.nano.tf.optimizers import SparseAdam
inputs = tf.keras.Input(shape=(None,), dtype="int64")
# Embedding layer can only be used as the first layer in a model,
# you need to provide the argument inputShape (a Single Shape, does not include the batch dimension).
x = Embedding(max_features, embedding_dim)(inputs)
x = layers.Dropout(0.5)(x)
# Conv1D + global max pooling
x = layers.Conv1D(128, 7, padding="valid", activation="relu", strides=3)(x)
x = layers.Conv1D(128, 7, padding="valid", activation="relu", strides=3)(x)
x = layers.GlobalMaxPooling1D()(x)
# We add a vanilla hidden layer:
x = layers.Dense(128, activation="relu")(x)
x = layers.Dropout(0.5)(x)
# We project onto a single unit output layer, and squash it with a sigmoid:
predictions = layers.Dense(1, activation="sigmoid", name="predictions")(x)
model = Model(inputs, predictions)
# Compile the model with binary crossentropy loss and an SparseAdam optimizer.
model.compile(loss="binary_crossentropy", optimizer=SparseAdam(), metrics=["accuracy"])
Step 5: Training#
# Fit the model using the train and val datasets.
model.fit(train_ds, validation_data=val_ds, epochs=3)
model.evaluate(test_ds)
You can find the detailed result of training from here