README_CN.md · main · fm-2020/gpustack

fm-2020/gpustack

Public

WeChat Login

Code Issues Pull requests Events Packages Insights

main

gpustack/README_CN.md

linyinli<yinlin@gpustack.ai>

docs: fix typos and minor corrections

0b90cf50

0 commits

PreviewCode viewBlame

English | 简体中文 | 日本語

demo

GPUStack 是一个用于运行 AI 模型的开源 GPU 集群管理器。

核心特性

广泛的 GPU 兼容性：无缝支持 Apple Mac、Windows PC 和 Linux 服务器上各种供应商的 GPU。
广泛的模型支持：支持各种模型，包括 LLM、多模态 VLM、图像模型、语音模型、文本嵌入模型和重排序模型。
灵活的推理后端：支持与 llama-box（llama.cpp 和 stable-diffusion.cpp）、vox-box、vLLM 和 Ascend MindIE 等多种推理后端的灵活集成。
多版本后端支持：同时运行推理后端的多个版本，以满足不同模型的不同运行依赖。
分布式推理：支持单机和多机多卡并行推理，包括跨供应商和运行环境的异构 GPU。
可扩展的 GPU 架构：通过向基础设施添加更多 GPU 或节点轻松进行扩展。
强大的模型稳定性：通过自动故障恢复、多实例冗余和推理请求的负载平衡确保高可用性。
智能部署评估：自动评估模型资源需求、后端和架构兼容性、操作系统兼容性以及其他与部署相关的因素。
自动调度：根据可用资源动态分配模型。
轻量级 Python 包：最小依赖性和低操作开销。
OpenAI 兼容 API：完全兼容 OpenAI 的 API 规范，实现无缝集成。
用户和 API 密钥管理：简化用户和 API 密钥的管理。
实时 GPU 监控：实时跟踪 GPU 性能和利用率。
令牌和速率指标：监控 Token 使用情况和 API 请求速率。

安装

Linux 或 macOS

GPUStack 提供了安装脚本，可以将其安装为 Linux 的 systemd 服务或 macOS 的 launchd 服务，默认端口为 80。要使用此方法安装 GPUStack，执行以下命令：


curl -sfL https://get.gpustack.ai | INSTALL_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple sh -s -

Windows

以管理员身份运行 PowerShell（避免使用 PowerShell ISE），然后执行以下命令安装 GPUStack：


$env:INSTALL_INDEX_URL = "https://pypi.tuna.tsinghua.edu.cn/simple"
Invoke-Expression (Invoke-WebRequest -Uri "https://get.gpustack.ai" -UseBasicParsing).Content

其他安装方式

有关 pip 安装、Docker 安装或详细配置选项，请参考安装文档。

新手入门

在命令行运行 llama3.2 模型并进行对话：


gpustack chat llama3.2 "tell me a joke."

运行 stable-diffusion-v3-5-large-turbo 模型并生成图像：

💡 Tip

此命令将从 Hugging Face 下载模型（约 12GB）。下载时间取决于你的网络速度。确保你有足够的磁盘空间和 VRAM（12GB）来运行模型。如果遇到问题，你可以跳过此步骤并转到下一步。


gpustack draw hf.co/gpustack/stable-diffusion-v3-5-large-turbo-GGUF:stable-diffusion-v3-5-large-turbo-Q4_0.gguf \
"A minion holding a sign that says 'GPUStack'. The background is filled with futuristic elements like neon lights, circuit boards, and holographic displays. The minion is wearing a tech-themed outfit, possibly with LED lights or digital patterns. The sign itself has a sleek, modern design with glowing edges. The overall atmosphere is high-tech and vibrant, with a mix of dark and neon colors." \
--sample-steps 5 --show

命令完成后，生成的图像将出现在默认查看器中。你可以尝试修改 prompt 和 CLI 参数来定制输出。

Generated Image

在浏览器中打开 http://your_host_ip，访问 GPUStack 界面。使用“admin”用户名和默认密码登录 GPUStack。可以执行以下命令获取默认密码：

Linux 或 macOS


cat /var/lib/gpustack/initial_admin_password

Windows


Get-Content -Path "$env:APPDATA\gpustack\initial_admin_password" -Raw

在菜单中点击“试验场 - 对话”，现在你可以在试验场中与 LLM 进行对话。

Playground Screenshot

在菜单中点击“API 秘钥”，然后点击“新建 API 秘钥”按钮。
填写“名称”，然后点击“保存”按钮。
复制生成的 API 密钥并将其保存。请注意，秘钥只在创建时可见。
现在你可以使用 API 密钥访问 OpenAI 兼容 API。例如，curl 的用法如下：


export GPUSTACK_API_KEY=your_api_key
curl http://your_gpustack_server_url/v1-openai/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $GPUSTACK_API_KEY" \
  -d '{
    "model": "llama3.2",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "stream": true
  }'

平台支持

macOS
Linux
Windows

加速框架支持

模型支持

GPUStack 使用 llama-box（基于 llama.cpp 和 stable-diffusion.cpp）、vLLM、 Ascend MindIE 和 vox-box 作为后端并提供广泛的模型支持。支持从以下来源部署模型：

Hugging Face
ModelScope
本地文件路径

示例模型

类别	模型
大语言模型（LLM）	Qwen, LLaMA, Mistral, DeepSeek, Phi, Gemma
多模态模型（VLM）	Llama3.2-Vision, Pixtral , Qwen2.5-VL, LLaVA, InternVL2.5
Diffusion 扩散模型	Stable Diffusion, FLUX
Embedding 模型	BGE, BCE, Jina
Reranker 模型	BGE, BCE, Jina
语音模型	Whisper (Speech-to-Text), CosyVoice (Text-to-Speech)

有关支持模型的完整列表，请参阅 inference backends 文档中的 Supported Models 部分。

OpenAI 兼容 API

GPUStack 在 /v1-openai 路径提供以下 OpenAI 兼容 API：

例如，你可以使用官方的 OpenAI Python API 库来调用 API：


from openai import OpenAI
client = OpenAI(base_url="http://your_gpustack_server_url/v1-openai", api_key="your_api_key")

completion = client.chat.completions.create(
  model="llama3.2",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]
)

print(completion.choices[0].message)