Guide: How to Run Phi-2 on a Computer with 32GB RAM
Introduction
Phi-2 is a powerful language model that requires sufficiently powerful hardware to run. In this guide, we will show you how to install and run Phi-2 on a computer with 32GB RAM. We will try to cover all key steps, from preparing the environment to running the model.
Prerequisites
Before starting the installation, make sure your system meets the following requirements:
- Operating System: Linux (Ubuntu 20.04/22.04 recommended) or Windows 10/11
- Processor: Intel i7 or newer, AMD Ryzen 7 or newer
- RAM: 32GB (64GB recommended for better performance)
- Graphics Card: NVIDIA RTX 3060 or newer (RTX 4090 recommended for better performance)
- Disk Space: at least 50GB of free space
Environment Installation
1. Installing Python
Phi-2 requires Python 3.8 or a newer version. You can install it using the package manager:
sudo apt update
sudo apt install python3.8 python3.8-venv
2. Creating a Virtual Environment
Creating a virtual environment will help avoid conflicts with other packages:
python3.8 -m venv phi2_env
source phi2_env/bin/activate
3. Installing Dependencies
Install the necessary packages:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install transformers accelerate bitsandbytes
Downloading the Phi-2 Model
You can download the Phi-2 model using the Hugging Face Transformers library:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "microsoft/phi-2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
Memory Configuration
For a computer with 32GB RAM, it is recommended to use memory optimizations such as 8-bit quantization:
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
load_in_8bit=True,
offload_folder="offload",
offload_state_dict=True,
)
Running the Model
Now you can run the model and test it:
prompt = "What is the meaning of life?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Optimizations
1. Using DeepSpeed
DeepSpeed is a tool for optimizing memory and performance:
pip install deepspeed
2. DeepSpeed Configuration
Create a ds_config.json file:
{
"train_batch_size": "auto",
"gradient_accumulation_steps": "auto",
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": 1e-8,
"weight_decay": 0.01
}
},
"fp16": {
"enabled": "auto"
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
}
}
}
3. Running with DeepSpeed
from transformers import AutoModelForCausalLM, AutoTokenizer
import deepspeed
model_name = "microsoft/phi-2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, export=False)
ds_config = {
"train_batch_size": "auto",
"gradient_accumulation_steps": "auto",
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": 1e-8,
"weight_decay": 0.01
}
},
"fp16": {
"enabled": "auto"
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
}
}
}
model_engine, optimizer, _, _ = deepspeed.initialize(
model=model,
config=ds_config
)
Summary
Running Phi-2 on a computer with 32GB RAM requires proper environment preparation and memory optimizations. In this guide, we discussed key steps such as installing Python, creating a virtual environment, downloading the model, and memory configuration. With these steps, you should be able to run Phi-2 and enjoy its powerful capabilities.