The BigDL Project#


BigDL-LLM#

bigdl-llm is a library for running LLM (large language model) on Intel XPU (from Laptop to GPU to Cloud) using INT4/FP4/INT8/FP8 with very low latency [1] (for any PyTorch model).

Note

It is built on top of the excellent work of llama.cpp, gptq, bitsandbytes, qlora, etc.

Latest update 🔥#

  • [2024/03] LangChain added support for bigdl-llm; see the details here.

  • [2024/02] bigdl-llm now supports directly loading model from ModelScope (魔搭).

  • [2024/02] bigdl-llm added inital INT2 support (based on llama.cpp IQ2 mechanism), which makes it possible to run large-size LLM (e.g., Mixtral-8x7B) on Intel GPU with 16GB VRAM.

  • [2024/02] Users can now use bigdl-llm through Text-Generation-WebUI GUI.

  • [2024/02] bigdl-llm now supports Self-Speculative Decoding, which in practice brings ~30% speedup for FP16 and BF16 inference latency on Intel GPU and CPU respectively.

  • [2024/02] bigdl-llm now supports a comprehensive list of LLM finetuning on Intel GPU (including LoRA, QLoRA, DPO, QA-LoRA and ReLoRA).

  • [2024/01] Using bigdl-llm QLoRA, we managed to finetune LLaMA2-7B in 21 minutes and LLaMA2-70B in 3.14 hours on 8 Intel Max 1550 GPU for Standford-Alpaca (see the blog here).

  • [2024/01] 🔔🔔🔔 The default bigdl-llm GPU Linux installation has switched from PyTorch 2.0 to PyTorch 2.1, which requires new oneAPI and GPU driver versions. (See the GPU installation guide for more details.)

  • [2023/12] bigdl-llm now supports ReLoRA (see “ReLoRA: High-Rank Training Through Low-Rank Updates”).

  • [2023/12] bigdl-llm now supports Mixtral-8x7B on both Intel GPU and CPU.

  • [2023/12] bigdl-llm now supports QA-LoRA (see “QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models”).

  • [2023/12] bigdl-llm now supports FP8 and FP4 inference on Intel GPU.

  • [2023/11] Initial support for directly loading GGUF, AWQ and GPTQ models in to bigdl-llm is available.

  • [2023/11] bigdl-llm now supports vLLM continuous batching on both Intel GPU and CPU.

  • [2023/10] bigdl-llm now supports QLoRA finetuning on both Intel GPU and CPU.

  • [2023/10] bigdl-llm now supports FastChat serving on on both Intel CPU and GPU.

  • [2023/09] bigdl-llm now supports Intel GPU (including Arc, Flex and MAX)

  • [2023/09] bigdl-llm tutorial is released.

  • Over 30 models have been verified on bigdl-llm, including LLaMA/LLaMA2, ChatGLM2/ChatGLM3, Mistral, Falcon, MPT, LLaVA, WizardCoder, Dolly, Whisper, Baichuan/Baichuan2, InternLM, Skywork, QWen/Qwen-VL, Aquila, MOSS and more; see the complete list here.

bigdl-llm demos#

See the optimized performance of chatglm2-6b and llama-2-13b-chat models on 12th Gen Intel Core CPU and Intel Arc GPU below.

12th Gen Intel Core CPU Intel Arc GPU
chatglm2-6b llama-2-13b-chat chatglm2-6b llama-2-13b-chat

bigdl-llm quickstart#

CPU Quickstart#

You may install bigdl-llm on Intel CPU as follows as follows:

Note

See the CPU installation guide for more details.

pip install --pre --upgrade bigdl-llm[all]

Note

bigdl-llm has been tested on Python 3.9, 3.10 and 3.11

You can then apply INT4 optimizations to any Hugging Face Transformers models as follows.

#load Hugging Face Transformers model with INT4 optimizations
from bigdl.llm.transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)

#run the optimized model on Intel CPU
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
input_ids = tokenizer.encode(input_str, ...)
output_ids = model.generate(input_ids, ...)
output = tokenizer.batch_decode(output_ids)

GPU Quickstart#

You may install bigdl-llm on Intel GPU as follows as follows:

Note

See the GPU installation guide for more details.

# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu

Note

bigdl-llm has been tested on Python 3.9, 3.10 and 3.11

You can then apply INT4 optimizations to any Hugging Face Transformers models on Intel GPU as follows.

#load Hugging Face Transformers model with INT4 optimizations
from bigdl.llm.transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)

#run the optimized model on Intel GPU
model = model.to('xpu')

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
input_ids = tokenizer.encode(input_str, ...).to('xpu')
output_ids = model.generate(input_ids, ...)
output = tokenizer.batch_decode(output_ids.cpu())

For more details, please refer to the bigdl-llm Document, Readme, Tutorial and API Doc.


Overview of the complete BigDL project#

BigDL seamlessly scales your data analytics & AI applications from laptop to cloud, with the following libraries:

  • LLM: Low-bit (INT3/INT4/INT5/INT8) large language model library for Intel CPU/GPU

  • Orca: Distributed Big Data & AI (TF & PyTorch) Pipeline on Spark and Ray

  • Nano: Transparent Acceleration of Tensorflow & PyTorch Programs on Intel CPU/GPU

  • DLlib: “Equivalent of Spark MLlib” for Deep Learning

  • Chronos: Scalable Time Series Analysis using AutoML

  • Friesian: End-to-End Recommendation Systems

  • PPML: Secure Big Data and AI (with SGX Hardware Security)


Choosing the right BigDL library#

digraph BigDLDecisionTree { graph [pad=0.1 ranksep=0.3 tooltip=" "] node [color="#0171c3" shape=box fontname="Arial" fontsize=14 tooltip=" "] edge [tooltip=" "] Feature1 [label="Hardware Secured Big Data & AI?"] Feature2 [label="Python vs. Scala/Java?"] Feature3 [label="What type of application?"] Feature4 [label="Domain?"] LLM[href="https://github.com/intel-analytics/BigDL/blob/main/python/llm" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-LLM document"] Orca[href="../doc/Orca/index.html" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Orca document"] Nano[href="../doc/Nano/index.html" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Nano document"] DLlib1[label="DLlib" href="../doc/DLlib/index.html" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-DLlib document"] DLlib2[label="DLlib" href="../doc/DLlib/index.html" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-DLlib document"] Chronos[href="../doc/Chronos/index.html" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Chronos document"] Friesian[href="../doc/Friesian/index.html" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Friesian document"] PPML[href="../doc/PPML/index.html" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-PPML document"] ArrowLabel1[label="No" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] ArrowLabel2[label="Yes" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] ArrowLabel3[label="Python" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] ArrowLabel4[label="Scala/Java" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] ArrowLabel5[label="Large Language Model" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] ArrowLabel6[label="Big Data + \n AI (TF/PyTorch)" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] ArrowLabel7[label="Accelerate \n TensorFlow / PyTorch" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] ArrowLabel8[label="DL for Spark MLlib" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] ArrowLabel9[label="High Level App Framework" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] ArrowLabel10[label="Time Series" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] ArrowLabel11[label="Recommender System" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] Feature1 -> ArrowLabel1[dir=none] ArrowLabel1 -> Feature2 Feature1 -> ArrowLabel2[dir=none] ArrowLabel2 -> PPML Feature2 -> ArrowLabel3[dir=none] ArrowLabel3 -> Feature3 Feature2 -> ArrowLabel4[dir=none] ArrowLabel4 -> DLlib1 Feature3 -> ArrowLabel5[dir=none] ArrowLabel5 -> LLM Feature3 -> ArrowLabel6[dir=none] ArrowLabel6 -> Orca Feature3 -> ArrowLabel7[dir=none] ArrowLabel7 -> Nano Feature3 -> ArrowLabel8[dir=none] ArrowLabel8 -> DLlib2 Feature3 -> ArrowLabel9[dir=none] ArrowLabel9 -> Feature4 Feature4 -> ArrowLabel10[dir=none] ArrowLabel10 -> Chronos Feature4 -> ArrowLabel11[dir=none] ArrowLabel11 -> Friesian }


[1] Performance varies by use, configuration and other factors. bigdl-llm may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.