Finetune (QLoRA)#

We also support finetuning LLMs (large language models) using QLoRA with BigDL-LLM 4bit optimizations on Intel GPUs.

Note

Currently, only Hugging Face Transformers models are supported running QLoRA finetuning.

To help you better understand the finetuning process, here we use model Llama-2-7b-hf as an example.

Make sure you have prepared environment following instructions here. First of all, you need to import intel_extension_for_pytorch to run on Intel GPUs:

import intel_extension_for_pytorch as ipex

First, load model using transformers-style API and set it to to('xpu'). We specify load_in_low_bit="nf4" here to apply 4-bit NormalFloat optimization. According to the QLoRA paper, using "nf4" could yield better model quality than "int4".

import intel_extension_for_pytorch as ipex
from bigdl.llm.transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf",
                                             load_in_low_bit="nf4",
                                             optimize_model=False,
                                             torch_dtype=torch.float16,
                                             modules_to_not_convert=["lm_head"],)
model = model.to('xpu')

Then, we have to apply some preprocessing to the model to prepare it for training.

from bigdl.llm.transformers.qlora import prepare_model_for_kbit_training
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

Next, we can obtain a Peft model from the optimized model and a configuration object containing the parameters as follows:

from bigdl.llm.transformers.qlora import get_peft_model
from peft import LoraConfig
config = LoraConfig(r=8, 
                    lora_alpha=32, 
                    target_modules=["q_proj", "k_proj", "v_proj"], 
                    lora_dropout=0.05, 
                    bias="none", 
                    task_type="CAUSAL_LM")
model = get_peft_model(model, config)

Important

Instead of from peft import prepare_model_for_kbit_training, get_peft_model as we did for regular QLoRA using bitandbytes and cuda, we import them from bigdl.llm.transformers.qlora here to get a BigDL-LLM compatible Peft model. And the rest is just the same as regular LoRA finetuning process using peft.