Inference Unlimited

Guide: How to Run Phi-2 on a Computer with 32GB RAM

Introduction

Phi-2 is a powerful language model that requires sufficiently powerful hardware to run. In this guide, we will show you how to install and run Phi-2 on a computer with 32GB RAM. We will try to cover all key steps, from preparing the environment to running the model.

Prerequisites

Before starting the installation, make sure your system meets the following requirements:

Environment Installation

1. Installing Python

Phi-2 requires Python 3.8 or a newer version. You can install it using the package manager:

sudo apt update
sudo apt install python3.8 python3.8-venv

2. Creating a Virtual Environment

Creating a virtual environment will help avoid conflicts with other packages:

python3.8 -m venv phi2_env
source phi2_env/bin/activate

3. Installing Dependencies

Install the necessary packages:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install transformers accelerate bitsandbytes

Downloading the Phi-2 Model

You can download the Phi-2 model using the Hugging Face Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "microsoft/phi-2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)

Memory Configuration

For a computer with 32GB RAM, it is recommended to use memory optimizations such as 8-bit quantization:

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    load_in_8bit=True,
    offload_folder="offload",
    offload_state_dict=True,
)

Running the Model

Now you can run the model and test it:

prompt = "What is the meaning of life?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Optimizations

1. Using DeepSpeed

DeepSpeed is a tool for optimizing memory and performance:

pip install deepspeed

2. DeepSpeed Configuration

Create a ds_config.json file:

{
    "train_batch_size": "auto",
    "gradient_accumulation_steps": "auto",
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": 1e-8,
            "weight_decay": 0.01
        }
    },
    "fp16": {
        "enabled": "auto"
    },
    "zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        }
    }
}

3. Running with DeepSpeed

from transformers import AutoModelForCausalLM, AutoTokenizer
import deepspeed

model_name = "microsoft/phi-2"
tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(model_name, export=False)

ds_config = {
    "train_batch_size": "auto",
    "gradient_accumulation_steps": "auto",
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": 1e-8,
            "weight_decay": 0.01
        }
    },
    "fp16": {
        "enabled": "auto"
    },
    "zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        }
    }
}

model_engine, optimizer, _, _ = deepspeed.initialize(
    model=model,
    config=ds_config
)

Summary

Running Phi-2 on a computer with 32GB RAM requires proper environment preparation and memory optimizations. In this guide, we discussed key steps such as installing Python, creating a virtual environment, downloading the model, and memory configuration. With these steps, you should be able to run Phi-2 and enjoy its powerful capabilities.

Język: EN | Wyświetlenia: 13

← Powrót do listy artykułów