# BigDL-Nano Features | Feature | Meaning | | --------------------- | ----------------------------------------------------------------------- | | **Intel-openmp** | Use Intel-openmp library to improve performance of multithread programs | | **Jemalloc** | Use jemalloc as allocator | | **Tcmalloc** | Use tcmalloc as allocator | | **Neural-Compressor** | Neural-Compressor int8 quantization | | **OpenVINO** | OpenVINO fp32/bf16/fp16/int8 acceleration on CPU/GPU/VPU | | **ONNXRuntime** | ONNXRuntime fp32/int8 acceleration | | **CUDA patch** | Run CUDA code even without GPU | | **JIT** | PyTorch JIT optimization | | **Channel last** | Channel last memory format | | **BF16** | BFloat16 mixed precision training and inference | | **IPEX** | Intel-extension-for-pytorch optimization | | **Multi-instance** | Multi-process training and inference | | **ray** | Use ray as multi-process backend | ## Common Feature Support (Can be used in both PyTorch and TensorFlow) | Feature | Ubuntu (20.04/22.04) | CentOS7 | MacOS (Intel chip) | MacOS (M-series chip) | Windows | | --------------------- | -------------------- | ------- | ------------------ | --------------------- | ------- | | **Intel-openmp** | ✅ | ✅ | ✅ | ② | ✅ | | **Jemalloc** | ✅ | ✅ | ✅ | ❌ | ❌ | | **Tcmalloc** | ✅ | ❌ | ❌ | ❌ | ❌ | | **Neural-Compressor** | ✅ | ✅ | ❌ | ❌ | ? | | **OpenVINO** | ✅ | ① | ❌ | ❌ | ④ | | **ONNXRuntime** | ✅ | ① | ✅ | ❌ | ✅ | | **ray** | ✅ | ? | ? | ? | ④ | ## PyTorch Feature Support | Feature | Ubuntu (20.04/22.04) | CentOS7 | MacOS (Intel chip) | MacOS (M-series chip) | Windows | | ------------------ | -------------------- | ------- | ------------------ | --------------------- | ------- | | **CUDA patch** | ✅ | ✅ | ✅ | ? | ✅ | | **JIT** | ✅ | ✅ | ✅ | ? | ✅ | | **Channel last** | ✅ | ✅ | ✅ | ? | ✅ | | **BF16** | ✅ | ✅ | ⭕ | ⭕ | ✅ | | **IPEX** | ✅ | ✅ | ❌ | ❌ | ❌ | | **Multi-instance** | ✅ | ✅ | ② | ② | ② | ## TensorFlow Feature Support | Feature | Ubuntu (20.04/22.04) | CentOS7 | MacOS (Intel chip) | MacOS (M-series chip) | Windows | | ------------------ | -------------------- | ------- | ------------------ | --------------------- | ------- | | **BF16** | ✅ | ✅ | ⭕ | ⭕ | ✅ | | **Multi-instance** | ③ | ③ | ②③ | ②③ | ❌ | ## Symbol Meaning | Symbol | Meaning | | ------ | -------------------------------------------------------------------------------------------------------- | | ✅ | Supported | | ❌ | Not supported | | ⭕ | All Mac machines (Intel/M-series chip) do not support bf16 instruction set, so this feature is pointless | | ① | This feature is only supported when used together with jemalloc | | ② | This feature is supported but without any performance guarantee | | ③ | Only Multi-instance training is supported for now | | ④ | This feature is only supported when using PyTorch | | ? | Not tested |