BigDL-Nano Features#
Feature | Meaning |
---|---|
Intel-openmp | Use Intel-openmp library to improve performance of multithread programs |
Jemalloc | Use jemalloc as allocator |
Tcmalloc | Use tcmalloc as allocator |
Neural-Compressor | Neural-Compressor int8 quantization |
OpenVINO | OpenVINO fp32/bf16/fp16/int8 acceleration on CPU/GPU/VPU |
ONNXRuntime | ONNXRuntime fp32/int8 acceleration |
CUDA patch | Run CUDA code even without GPU |
JIT | PyTorch JIT optimization |
Channel last | Channel last memory format |
BF16 | BFloat16 mixed precision training and inference |
IPEX | Intel-extension-for-pytorch optimization |
Multi-instance | Multi-process training and inference |
ray | Use ray as multi-process backend |
Common Feature Support (Can be used in both PyTorch and TensorFlow)#
Feature | Ubuntu (20.04/22.04) | CentOS7 | MacOS (Intel chip) | MacOS (M-series chip) | Windows |
---|---|---|---|---|---|
Intel-openmp | ✅ | ✅ | ✅ | ② | ✅ |
Jemalloc | ✅ | ✅ | ✅ | ❌ | ❌ |
Tcmalloc | ✅ | ❌ | ❌ | ❌ | ❌ |
Neural-Compressor | ✅ | ✅ | ❌ | ❌ | ? |
OpenVINO | ✅ | ① | ❌ | ❌ | ④ |
ONNXRuntime | ✅ | ① | ✅ | ❌ | ✅ |
ray | ✅ | ? | ? | ? | ④ |
PyTorch Feature Support#
Feature | Ubuntu (20.04/22.04) | CentOS7 | MacOS (Intel chip) | MacOS (M-series chip) | Windows |
---|---|---|---|---|---|
CUDA patch | ✅ | ✅ | ✅ | ? | ✅ |
JIT | ✅ | ✅ | ✅ | ? | ✅ |
Channel last | ✅ | ✅ | ✅ | ? | ✅ |
BF16 | ✅ | ✅ | ⭕ | ⭕ | ✅ |
IPEX | ✅ | ✅ | ❌ | ❌ | ❌ |
Multi-instance | ✅ | ✅ | ② | ② | ② |
TensorFlow Feature Support#
Feature | Ubuntu (20.04/22.04) | CentOS7 | MacOS (Intel chip) | MacOS (M-series chip) | Windows |
---|---|---|---|---|---|
BF16 | ✅ | ✅ | ⭕ | ⭕ | ✅ |
Multi-instance | ③ | ③ | ②③ | ②③ | ❌ |
Symbol Meaning#
Symbol | Meaning |
---|---|
✅ | Supported |
❌ | Not supported |
⭕ | All Mac machines (Intel/M-series chip) do not support bf16 instruction set, so this feature is pointless |
① | This feature is only supported when used together with jemalloc |
② | This feature is supported but without any performance guarantee |
③ | Only Multi-instance training is supported for now |
④ | This feature is only supported when using PyTorch |
? | Not tested |