0 如何让Ollama使用GPU运行LLM模型
baixin edited this page 2024-04-25 10:57:22 +08:00
This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

说明:以 GPU 模式运行 Ollama 需要有 NVIDIA 显卡支持。

1. 安装英伟达容器安装包

我们以 Ubuntu22.04 为例(其他系统请参考:英伟达官方文档

  • 配置apt源
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
  • 更新源
sudo apt-get update
  • 安装工具包
sudo apt-get install -y nvidia-container-toolkit

2. 使用 GPU 运行 Ollama

docker run --gpus all -d -v /opt/ai/ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

3. 使用 Ollama 下载模型

docker exec -it ollama ollama run qwen:7b

4. 在 MaxKB 的模型设置中添加模型进行对接

image