Inference Unlimited

How to Configure GPU for Working with LLM Models

Configuring a GPU for working with large language models (LLM) requires considering several factors, such as hardware compatibility, driver installation, software configuration, and environment optimization. In this article, we will discuss step-by-step how to prepare your GPU for efficient work with LLM models.

1. Choosing the Right GPU

Before starting the configuration, it is important to choose the right GPU. LLM models require a lot of GPU memory (VRAM) and computational power. The most popular options are:

2. Installing GPU Drivers

For NVIDIA Cards

  1. Downloading Drivers:

  2. Installing Drivers:

    sudo apt update
    sudo apt install -y nvidia-driver-535
    sudo reboot
    
  3. Checking Installation:

    nvidia-smi
    

For AMD Cards

  1. Downloading Drivers:

  2. Installing Drivers:

    sudo apt update
    sudo apt install -y rocm-opencl-runtime
    sudo reboot
    
  3. Checking Installation:

    rocminfo
    

3. Installing CUDA and cuDNN

Installing CUDA

  1. Downloading CUDA:

  2. Installing CUDA:

    sudo dpkg -i cuda-repo-<distro>_<version>_amd64.deb
    sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/<distro>/x86_64/3bf863cc.pub
    sudo apt update
    sudo apt install -y cuda
    
  3. Adding CUDA to the PATH Variable:

    echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
    echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
    source ~/.bashrc
    

Installing cuDNN

  1. Downloading cuDNN:

    • Register on the NVIDIA cuDNN page.
    • Download the appropriate version for your system.
  2. Installing cuDNN:

    sudo dpkg -i cudnn-local-repo-<distro>_<version>_amd64.deb
    sudo apt update
    sudo apt install -y libcudnn8
    

4. Configuring the Development Environment

Installing Python and Libraries

  1. Installing Python:

    sudo apt update
    sudo apt install -y python3 python3-pip python3-venv
    
  2. Creating a Virtual Environment:

    python3 -m venv llm-env
    source llm-env/bin/activate
    
  3. Installing Libraries:

    pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
    pip install transformers datasets accelerate
    

5. Configuring the LLM Model

Example of Model Configuration with the Transformers Library

from transformers import AutoModelForCausalLM, AutoTokenizer

# Loading the model and tokenizer
model_name = "bigscience/bloom"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# Preparing the input
input_text = "How to configure GPU for working with LLM models?"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

# Generating the response
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

6. Optimizing Performance

Using the Accelerate Library

The Accelerate library allows for easy scaling of models across multiple GPUs.

from accelerate import Accelerator

accelerator = Accelerator()
model, optimizer = accelerator.prepare(model, optimizer)

Using DeepSpeed

DeepSpeed is a tool for optimizing large models.

deepspeed --num_gpus=4 train.py

Summary

Configuring a GPU for working with LLM models requires considering several factors, such as choosing the right hardware, installing drivers, configuring software, and optimizing the environment. Thanks to this article, you should be able to prepare your GPU for efficient work with large language models.

Język: EN | Wyświetlenia: 13

← Powrót do listy artykułów