logo
8
3
WeChat Login

DeepSeek-OCR vLLM Docker Image

This Docker image packages DeepSeek-OCR model with vLLM for serving OCR requests via OpenAI-compatible API.

Prerequisites

Before building the image, clone the model repository locally:

git clone https://cnb.cool/ai-models/deepseek-ai/DeepSeek-OCR model

Build the Image

docker build -t deepseek-ocr/deepseek-ocr:latest .

Run the Container

docker run -d \
  --name deepseek-ocr \
  -p 8080:8080 \
  --gpus all \
  --ipc=host \
  deepseek-ocr/deepseek-ocr:latest

直接启动构建好的

docker run -d \
  --name deepseek-ocr \
  -p 8080:8080 \
  --gpus all \
  --ipc=host \
  docker.cnb.cool/ai-models/deepseek-ai/deepseek-ocr-vllm:latest

限制显存

docker run -d \
  --name deepseek-ocr \
  -p 8080:8080 \
  --gpus '"device=0,memory=10G"' \
  --ipc=host \
  docker.cnb.cool/ai-models/deepseek-ai/deepseek-ocr-vllm:latest

Usage Example

查看有哪些模型:

curl http://localhost:8080/v1/models

进行OCR:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer EMPTY" \
  -d '{
    "model": "deepseek-ocr",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
            }
          },
          {
            "type": "text",
            "text": "Free OCR."
          }
        ]
      }
    ],
    "max_tokens": 2048,
    "temperature": 0.0,
    "skip_special_tokens": false,
    "extra_body": {
      "vllm_xargs": {
        "ngram_size": 30,
        "window_size": 90,
        "whitelist_token_ids": [128821, 128822]
      }
    }
  }'

进行图片解读:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer EMPTY" \
  -d '{
    "model": "deepseek-ocr",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
            }
          },
          {
            "type": "text",
            "text": "\n 这是一张"
          }
        ]
      }
    ],
    "max_tokens": 2048,
    "temperature": 0.0,
    "skip_special_tokens": false,
    "extra_body": {
      "vllm_xargs": {
        "ngram_size": 30,
        "window_size": 90,
        "whitelist_token_ids": [128821, 128822]
      }
    }
  }'

import time
from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://localhost:8080/v1",
    timeout=3600
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
                }
            },
            {
                "type": "text",
                "text": "Free OCR."
            }
        ]
    }
]

start = time.time()
response = client.chat.completions.create(
    model="/workspace/model",
    messages=messages,
    max_tokens=2048,
    temperature=0.0,
    extra_body={
        "skip_special_tokens": False,
        # args used to control custom logits processor
        "vllm_xargs": {
            "ngram_size": 30,
            "window_size": 90,
            # whitelist: <td>, </td>
            "whitelist_token_ids": [128821, 128822],
        },
    },
)
print(f"Response costs: {time.time() - start:.2f}s")
print(f"Generated text: {response.choices[0].message.content}")

Alternative Usage with Grounding

For document-to-markdown conversion:

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "your_image_url"
                }
            },
            {
                "type": "text",
                "text": "<|grounding|>Convert the document to markdown."
            }
        ]
    }
]

Custom Parameters

You can pass additional vLLM parameters when running the container:

docker run -d \
  --name deepseek-ocr \
  -p 8080:8080 \
  --gpus all \
  --ipc=host \
  deepseek-ocr/deepseek-ocr:latest \
  --max-model-len 8192 \
  --max-num-batched-tokens 4096

Model Sizes

DeepSeek-OCR supports different processing modes:

  • Tiny: base_size = 512, image_size = 512
  • Small: base_size = 640, image_size = 640
  • Base: base_size = 1024, image_size = 1024
  • Large: base_size = 1280, image_size = 1280
  • Gundam: base_size = 1024, image_size = 640, crop_mode = True

References

About

run DeepSeek-OCR via vLLM

Language
Dockerfile84.4%
Shell15.6%