Choose the Number of Processes for Multi-Instance Training#

BigDL-Nano supports multi-instance training on a server with multiple CPU cores or sockets. With Nano, you could launch a self-defined number of processes to perform data-parallel training. When choosing the number of processes, there are 3 empirical recommendations for better training performance:

There should be at least 7 CPU cores assigned to each process.
For multiple sockets, the CPU cores assiged to each process should belong to the same socket (due to NUMA issue). That is, the number of CPU cores per process should be a divisor of the number of CPU cores placed in each sockets.
Only physical CPU cores should be considered (do not count in CPU cores for hyperthreading).

Note

By default, Nano will distribute CPU cores evenly among processes.

Here is an example. Suppose we have a sever with 2 sockets. Each socket has 28 physical CPU cores. For this case, the number of CPU cores per process c should satisfiy:

\[\begin{split}\begin{cases} c \text{ is divisor of } 28 \\ c \ge 7 \\ \end{cases} \Rightarrow c \in \{7, 14, 28\}\end{split}\]

Based on that, the number of processes np can be calculated as:

\[\begin{split}\begin{cases} np = \frac{28+28}{c}\ , c \in \{7, 14, 28\} \\ np > 1 \\ \end{cases} \Rightarrow np = \text{8 or 4 or 2}\end{split}\]

That is, empirically, we could set the number of processes to 2, 4 or 8 here for good training performance.