l8ai/opensource/chunkr

Public

WeChat Login

Code Issues Pull requests Events Packages Insights

main

Branch

Tag

100

Chunkr | Open Source Document Intelligence API

Production-ready API service for document layout analysis, OCR, and semantic chunking.
Convert PDFs, PPTs, Word docs & images into RAG/LLM-ready chunks.

Layout Analysis | OCR + Bounding Boxes | Structured HTML and markdown | VLM Processing controls

Table of Contents
(Super) Quick Start
Documentation
Self-Hosted Deployment Options
- Quick Start with Docker Compose
  - HTTPS Setup for Docker Compose
- Deployment with Kubernetes
LLM Configuration
Licensing
Connect With Us

(Super) Quick Start

Go to chunkr.ai
Make an account and copy your API key
Install our Python SDK:


pip install chunkr-ai

Use the SDK to process your documents:


from chunkr_ai import Chunkr

# Initialize with your API key from chunkr.ai
chunkr = Chunkr(api_key="your_api_key")

# Upload a document (URL or local file path)
url = "https://chunkr-web.s3.us-east-1.amazonaws.com/landing_page/input/science.pdf"
task = chunkr.upload(url)

# Export results in various formats
html = task.html(output_file="output.html")
markdown = task.markdown(output_file="output.md")
content = task.content(output_file="output.txt")
task.json(output_file="output.json")

# Clean up
chunkr.close()

Documentation

Visit our docs for more information and examples.

Self-Hosted Deployment Options

Quick Start with Docker Compose

Prerequisites:
- Docker and Docker Compose
- NVIDIA Container Toolkit (for GPU support, optional)
Clone the repo:


git clone https://github.com/lumina-ai-inc/chunkr
cd chunkr

Set up environment variables:


# Copy the example environment file
cp .env.example .env

# Configure your llm models
cp models.example.yaml models.yaml

For more information on how to set up LLMs, see here.

Start the services:


# For GPU deployment, use the following command:
docker compose up -d

# For CPU deployment, use the following command:
docker compose -f compose-cpu.yaml up -d

# For Mac ARM architecture (eg. M2, M3 etc.) deployment, use the following command:
docker compose -f compose-cpu.yaml -f compose-mac.yaml up -d

Access the services:
- Web UI: http://localhost:5173
- API: http://localhost:8000
Stop the services when done:


# For GPU deployment, use the following command:
docker compose down

# For CPU deployment, use the following command:
docker compose -f compose-cpu.yaml down

# For Mac ARM architecture (eg. M2, M3 etc.) deployment, use the following command:
docker compose -f compose-cpu.yaml -f compose-mac.yaml down

HTTPS Setup for Docker Compose

This section explains how to set up HTTPS using a self signed certificate with Docker Compose when hosting Chunkr on a VM. This allows you to access the web UI, API, Keycloak (authentication service) and MinIO (object storage service) over HTTPS.

Generate a self-signed certificate:


# Create a certs directory
mkdir certs

# Generate the certificate
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout certs/nginx.key -out certs/nginx.crt -subj "/CN=localhost" -addext "subjectAltName=DNS:localhost,IP:127.0.0.1"

Update the .env file with your VM's IP address:

Important: Replace all instances of "localhost" with your VM's actual IP address. Note that you must use "https://" instead of "http://" and the ports are different from the HTTP setup (No port for web, 8444 for API, 8443 for Keycloak, 9100 for MinIO):


AWS__PRESIGNED_URL_ENDPOINT=https://your_vm_ip_address:9100
WORKER__SERVER_URL=https://your_vm_ip_address:8444
VITE_API_URL=https://your_vm_ip_address:8444
VITE_KEYCLOAK_POST_LOGOUT_REDIRECT_URI=https://your_vm_ip_address
VITE_KEYCLOAK_REDIRECT_URI=https://your_vm_ip_address
VITE_KEYCLOAK_URL=https://your_vm_ip_address:8443

Start the services:


# For GPU deployment, use the following command:
docker compose --profile proxy up -d

# For CPU deployment, use the following command:
docker compose -f compose-cpu.yaml --profile proxy up -d

# For Mac ARM architecture (eg. M2, M3 etc.) deployment, use the following command:
docker compose -f compose-cpu.yaml -f compose-mac.yaml --profile proxy up -d

Access the services:
- Web UI: https://your_vm_ip_address
- API: https://your_vm_ip_address:8444
Stop the services when done:


# For GPU deployment, use the following command:
docker compose --profile proxy down

# For CPU deployment, use the following command:
docker compose -f compose-cpu.yaml --profile proxy down

# For Mac ARM architecture (eg. M2, M3 etc.) deployment, use the following command:
docker compose -f compose-cpu.yaml -f compose-mac.yaml --profile proxy down

Deployment with Kubernetes

For production environments, we provide a Helm chart and detailed deployment instructions:

See our detailed guide at kube/README.md
Includes configurations for high availability and scaling

For enterprise support and deployment assistance, contact us.

LLM Configuration

Chunkr supports two ways to configure LLMs:

models.yaml file: Advanced configuration for multiple LLMs with additional options
Environment variables: Simple configuration for a single LLM

Using models.yaml (Recommended)

For more flexible configuration with multiple models, default/fallback options, and rate limits:

Copy the example file to create your configuration:


cp models.example.yaml models.yaml

Edit the models.yaml file with your configuration. Example:


models:
  - id: gpt-4o
    model: gpt-4o
    provider_url: https://api.openai.com/v1/chat/completions
    api_key: "your_openai_api_key_here"
    default: true
    rate-limit: 200 # requests per minute - optional

Benefits of using models.yaml:

Configure multiple LLM providers simultaneously
Set default and fallback models
Add distributed rate limits per model
Reference models by ID in API requests (see docs for more info)

Read the models.example.yaml file for more information on the available options.

Using environment variables (Basic)

You can use any OpenAI API compatible endpoint by setting the following variables in your .env file:


LLM__KEY:
LLM__MODEL:
LLM__URL:

Common LLM API Providers

Below is a table of common LLM providers and their configuration details to get you started:

Provider	API URL	Documentation
OpenAI	https://api.openai.com/v1/chat/completions	OpenAI Docs
Google AI Studio	https://generativelanguage.googleapis.com/v1beta/openai/chat/completions	Google AI Docs
OpenRouter	https://openrouter.ai/api/v1/chat/completions	OpenRouter Models
Self-Hosted	http://localhost:8000/v1	VLLM or Ollama

Licensing

The core of this project is dual-licensed:

GNU Affero General Public License v3.0 (AGPL-3.0)
Commercial License

To use Chunkr without complying with the AGPL-3.0 license terms you can contact us or visit our website.

Connect With Us

📧 Email: mehul@chunkr.ai
📅 Schedule a call: Book a 30-minute meeting
🌐 Visit our website: chunkr.ai

About

https://github.com/lumina-ai-inc/chunkr.git

464.34 MiB

2 forks 0 stars 81 branches 100 TagREADMEOther licenseAGPL-3.0 license

Release
0

Tag

100

Contributors
18

Language

Rust29.2%

TypeScript29.2%

Python15.7%

CSS10.4%

Others15.5%

35/F,Tencent Building,Kejizhongyi Avenue,Nanshan District,Shenzhen

京ICP备11018762号-111

.dev
.github
.vscode
apps
clients
core
docker
images
kube
nginx
packages
services
.cnb.yml
.codespellrc
.dockerignore
.env.example
.gitignore
.npmrc
.release-please-config.json
.release-please-manifest.json
CHANGELOG.md
COMMERCIAL_LICENSE.md
LICENSE
README.md
THIRD-PARTY-NOTICES.md
build_dockers.sh
compose-cpu.yaml
compose-mac.yaml
compose.yaml
git.sh
models.example.yaml
realm-export.json