View the runnable example on GitHub

# Apply `SparseAdam`

Optimizer for Large Embeddings#

Embedding layers are often used to encode categorical items in deep learning applications. However, in applications such as recommendation systems, the embedding size may become huge due to large number of items or users, leading to extensive computational costs and space.

For large embeddings, the batch size could be orders of magnitude smaller compared to the embedding matrix size. Thus, gradients to the embedding matrix in each batch could be sparse. Taking advantage of this, BigDL-Nano provides `bigdl.nano.tf.keras.layers.Embedding`

and `bigdl.nano.tf.optimizers.SparseAdam`

to accelerate large embeddings. `bigdl.nano.tf.optimizers.SparseAdam`

is a variant of Adam which handles updates of sparse tensor more efficiently.
`bigdl.nano.tf.keras.layers.Embedding`

intends to avoid applying regularizer function directly to the embedding matrix, which further avoids making the sparse gradient dense.

📝

NoteBefore starting your TensorFlow Keras application, it is highly recommended to run

`source bigdl-nano-init`

to set several environment variables based on your current hardware. Empirically, these variables will bring big performance increase for most TensorFlow Keras applications on training workloads.

To optimize your model for large embedding, you need to **import Nano’s** `Embedding`

**and** `SparseAdam`

**first:**

```
[ ]:
```

```
from bigdl.nano.tf.keras.layers import Embedding
from bigdl.nano.tf.optimizers import SparseAdam
# from tf.keras import Model
from bigdl.nano.tf.keras import Model
```

📝

NoteYou could import

`Model`

/`Sequential`

from`bigdl.nano.tf.keras`

instead of`tf.keras`

to gain more optimizations from Nano. Please refer to API documentation for more information.

Let’s take the imdb_reviews dataset as an example, and suppose we would like to train a model to classify movie reviews as positive/negative. Assuming that the vocabulary size of reviews is \(20000\), and we want to fix the word vector to a length of \(128\), we would have a big embedding matrix with size \(20000 \times 128\).

To prepare the data for training, we need to process the samples as sequences of positive integers:

```
[ ]:
```

```
train_ds, val_ds, test_ds = create_datasets()
```

*The definition of* `create_datasets`

*can be found in the* runnable example.

We could then define the model. Same as using `tf.keras.layers.Embedding`

, you could **instantiate a Nano’s** `Embedding`

**layer** as the first layer in the model:

```
[ ]:
```

```
inputs = tf.keras.Input(shape=(None,), dtype="int64")
# 20000 is the vocabulary size,
# 128 is the embedding dimension
x = Embedding(input_dim=20000, output_dim=128)(inputs)
```

📝

NoteIf you would like to apply a regularizer function to the embedding matrix through setting

`embeddings_regularizer`

, Nano will apply the regularizer to the output tensors of the embedding layer instead to avoid making the sparse gradient dense (if`activity_regularize=None`

).Please refer to API document for more information on

`bigdl.nano.tf.keras.layers.Embedding`

.

Next, you could define the remaining parts of the model, and **configure the model for training with** `SparseAdam`

**optimizer**:

```
[ ]:
```

```
# define the remaining layers of the model
predictions = make_backbone()(x)
model = Model(inputs, predictions)
# Configure the model with Nano's SparseAdam optimizer
model.compile(loss="binary_crossentropy", optimizer=SparseAdam(), metrics=["accuracy"])
```

*The definition of* `make_backbone`

*can be found in the* runnable example.

📝

Note

`SparseAdam`

optimizer is a variant of`tf.keras.optimizers.Adam`

. This method only updates moments that show up in the gradient, and applies only those portions of gradient to the trainable variables.Please refer to API document for more information on

`bigdl.nano.tf.optimizers.SparseAdam`

.

You could then train and evaluate your model as normal:

```
[ ]:
```

```
model.fit(train_ds, validation_data=val_ds, epochs=10)
model.evaluate(test_ds)
```

📚

Related Readings