The LightRAG Server is designed to provide a Web UI and API support. The Web UI facilitates document indexing, knowledge graph exploration, and a simple RAG query interface. LightRAG Server also provides an Ollama-compatible interface, aiming to emulate LightRAG as an Ollama chat model. This allows AI chat bots, such as Open WebUI, to access LightRAG easily.



### Install LightRAG Server as tool using uv (recommended)
uv tool install "lightrag-hku[api]"
### Or using pip
# python -m venv .venv
# source .venv/bin/activate # Windows: .venv\Scripts\activate
# pip install "lightrag-hku[api]"
# Clone the repository
git clone https://github.com/HKUDS/lightrag.git
# Change to the repository directory
cd lightrag
# Using uv (recommended)
# Note: uv sync automatically creates a virtual environment in .venv/
uv sync --extra api
source .venv/bin/activate # Activate the virtual environment (Linux/macOS)
# Or on Windows: .venv\Scripts\activate
# Or using pip with virtual environment
# python -m venv .venv
# source .venv/bin/activate # Windows: .venv\Scripts\activate
# pip install -e ".[api]"
# Build front-end artifacts
cd lightrag_webui
bun install --frozen-lockfile
bun run build
cd ..
LightRAG necessitates the integration of both an LLM (Large Language Model) and an Embedding Model to effectively execute document indexing and querying operations. Prior to the initial deployment of the LightRAG server, it is essential to configure the settings for both the LLM and the Embedding Model. LightRAG supports binding to various LLM/Embedding backends:
It is recommended to use environment variables to configure the LightRAG Server. There is an example environment variable file named env.example in the root directory of the project. Please copy this file to the startup directory and rename it to .env. After that, you can modify the parameters related to the LLM and Embedding models in the .env file. It is important to note that the LightRAG Server will load the environment variables from .env into the system environment variables each time it starts. LightRAG Server will prioritize the settings in the system environment variables to .env file.
Since VS Code with the Python extension may automatically load the .env file in the integrated terminal, please open a new terminal session after each modification to the .env file.
Here are some examples of common settings for LLM and Embedding models:
LLM_BINDING=openai LLM_MODEL=gpt-4o LLM_BINDING_HOST=https://api.openai.com/v1 LLM_BINDING_API_KEY=your_api_key EMBEDDING_BINDING=ollama EMBEDDING_BINDING_HOST=http://localhost:11434 EMBEDDING_MODEL=bge-m3:latest EMBEDDING_DIM=1024 # EMBEDDING_BINDING_API_KEY=your_api_key
When targeting Google Gemini, set
LLM_BINDING=gemini, choose a model such asLLM_MODEL=gemini-flash-latest, and provide your Gemini key viaLLM_BINDING_API_KEY(orGEMINI_API_KEY).
LLM_BINDING=ollama LLM_MODEL=mistral-nemo:latest LLM_BINDING_HOST=http://localhost:11434 # LLM_BINDING_API_KEY=your_api_key ### Ollama Server context length (Must be larger than MAX_TOTAL_TOKENS+2000) OLLAMA_LLM_NUM_CTX=16384 EMBEDDING_BINDING=ollama EMBEDDING_BINDING_HOST=http://localhost:11434 EMBEDDING_MODEL=bge-m3:latest EMBEDDING_DIM=1024 # EMBEDDING_BINDING_API_KEY=your_api_key
Important Note: The Embedding model must be determined before document indexing, and the same model must be used during the document query phase. For certain storage solutions (e.g., PostgreSQL), the vector dimension must be defined upon initial table creation. Therefore, when changing embedding models, it is necessary to delete the existing vector-related tables and allow LightRAG to recreate them with the new dimensions.
Instead of editing env.example by hand, you can use the interactive setup wizard to generate a configured .env and, when needed, docker-compose.final.yml:
make env-base # Required first step: LLM, embedding, reranker
make env-storage # Optional: storage backends and database services
make env-server # Optional: server port, auth, and SSL
make env-security-check # Optional: audit the current .env for security risks
For a full description of every target and what each flow does, see
docs/InteractiveSetup.md.
The setup wizards update configuration only; run make env-security-check separately to audit the
current .env for security risks before deployment.
The LightRAG Server supports two operational modes:
lightrag-server
lightrag-gunicorn --workers 4
When starting LightRAG, the current working directory must contain the .env configuration file. It is intentionally designed that the .env file must be placed in the startup directory. The purpose of this is to allow users to launch multiple LightRAG instances simultaneously and configure different .env files for different instances. After modifying the .env file, you need to reopen the terminal for the new settings to take effect. This is because each time LightRAG Server starts, it loads the environment variables from the .env file into the system environment variables, and system environment variables have higher precedence.
During startup, configurations in the .env file can be overridden by command-line parameters. Common command-line parameters include:
--host: Server listening address (default: 0.0.0.0)--port: Server listening port (default: 9621)--timeout: LLM request timeout (default: 150 seconds)--log-level: Log level (default: INFO)--working-dir: Database persistence directory (default: ./rag_storage)--input-dir: Directory for uploaded files (default: ./inputs)--workspace: Workspace name, used to logically isolate data between multiple LightRAG instances (default: empty)Using Docker Compose is the most convenient way to deploy and run the LightRAG Server.
docker-compose.yml file from the LightRAG repository into your project directory..env file: Duplicate the sample file env.exampleto create a customized .env file, and configure the LLM and embedding parameters according to your specific requirements.docker compose up
# If you want the program to run in the background after startup, add the -d parameter at the end of the command.
You can get the official docker compose file from here: docker-compose.yml. For historical versions of LightRAG docker images, visit this link: LightRAG Docker Images. For more details about docker deployment, please refer to DockerDeployment.md.
When using Nginx as a reverse proxy in front of LightRAG Server, you need to configure client_max_body_size for the /documents/upload endpoint to handle large file uploads. Without this configuration, Nginx will reject files larger than 1MB (the default limit) with a 413 Request Entity Too Large error before the request reaches LightRAG.
Recommended Configuration:
server { listen 80; server_name your-domain.com; # Global default: 8MB for LLM queries with long context client_max_body_size 8M; # Upload endpoint: 100MB for large file uploads location /documents/upload { client_max_body_size 100M; proxy_pass http://localhost:9621; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # Increase timeouts for large file uploads proxy_read_timeout 300s; proxy_send_timeout 300s; } # Streaming endpoints: LLM response streaming location ~ ^/(query/stream|api/chat|api/generate) { gzip off; # Disable compression for streaming responses proxy_pass http://localhost:9621; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # Long timeout for LLM generation proxy_read_timeout 300s; } # Other endpoints location / { proxy_pass http://localhost:9621; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } }
Key Points:
MAX_UPLOAD_SIZE in your .env file. The default MAX_UPLOAD_SIZE is 100MB.gzip off) for streaming endpoints to ensure real-time response delivery. LightRAG automatically sets X-Accel-Buffering: no header to disable response buffering.proxy_read_timeout and proxy_send_timeout accordingly.Content-Length header firstOfficial LightRAG Docker images are fully compatible with offline or air-gapped environments. If you want to build up you own offline enviroment, please refer to Offline Deployment Guide.
There are two ways to start multiple LightRAG instances. The first way is to configure a completely independent working environment for each instance. This requires creating a separate working directory for each instance and placing a dedicated .env configuration file in that directory. The server listening ports in the configuration files of different instances cannot be the same. Then, you can start the service by running lightrag-server in the working directory.
The second way is for all instances to share the same set of .env configuration files, and then use command-line arguments to specify different server listening ports and workspaces for each instance. You can start multiple LightRAG instances in the same working directory with different command-line arguments. For example:
# Start instance 1 lightrag-server --port 9621 --workspace space1 # Start instance 2 lightrag-server --port 9622 --workspace space2
The purpose of a workspace is to achieve data isolation between different instances. Therefore, the workspace parameter must be different for different instances; otherwise, it will lead to data confusion and corruption.
When launching multiple LightRAG instances via Docker Compose, simply specify unique WORKSPACE and PORT environment variables for each container within your docker-compose.yml. Even if all instances share a common .env file, the container-specific environment variables defined in Compose will take precedence, ensuring independent configurations for each instance.
Configuring an independent working directory and a dedicated .env configuration file for each instance can generally ensure that locally persisted files in the in-memory database are saved in their respective working directories, achieving data isolation. By default, LightRAG uses all in-memory databases, and this method of data isolation is sufficient. However, if you are using an external database, and different instances access the same database instance, you need to use workspaces to achieve data isolation; otherwise, the data of different instances will conflict and be destroyed.
The command-line workspace argument and the WORKSPACE environment variable in the .env file can both be used to specify the workspace name for the current instance, with the command-line argument having higher priority. Here is how workspaces are implemented for different types of storage:
JsonKVStorage, JsonDocStatusStorage, NetworkXStorage, NanoVectorDBStorage, FaissVectorDBStorage.RedisKVStorage, RedisDocStatusStorage, MilvusVectorDBStorage, MongoKVStorage, MongoDocStatusStorage, MongoVectorDBStorage, MongoGraphStorage, PGGraphStorage.QdrantVectorDBStorage uses shared collections with payload filtering for unlimited workspace scalability.workspace field to the tables for logical data separation: PGKVStorage, PGVectorStorage, PGDocStatusStorage.Neo4JStorage, MemgraphStorageOpenSearchKVStorage, OpenSearchDocStatusStorage, OpenSearchGraphStorage, OpenSearchVectorDBStorageTo maintain compatibility with legacy data, the default workspace for PostgreSQL is default and for Neo4j is base when no workspace is configured. For all external storages, the system provides dedicated workspace environment variables to override the common WORKSPACE environment variable configuration. These storage-specific workspace environment variables are: REDIS_WORKSPACE, MILVUS_WORKSPACE, QDRANT_WORKSPACE, MONGODB_WORKSPACE, POSTGRES_WORKSPACE, NEO4J_WORKSPACE, MEMGRAPH_WORKSPACE, OPENSEARCH_WORKSPACE.
The LightRAG Server can operate in the Gunicorn + Uvicorn preload mode. Gunicorn's multiple worker (multiprocess) capability prevents document indexing tasks from blocking RAG queries. Using CPU-exhaustive document extraction tools, such as docling, can lead to the entire system being blocked in pure Uvicorn mode.
Though LightRAG Server uses one worker to process the document indexing pipeline, with the async task support of Uvicorn, multiple files can be processed in parallel. The bottleneck of document indexing speed mainly lies with the LLM. If your LLM supports high concurrency, you can accelerate document indexing by increasing the concurrency level of the LLM. Below are several environment variables related to concurrent processing, along with their default values:
### Number of worker processes, not greater than (2 x number_of_cores) + 1 WORKERS=2 ### Number of parallel files to process in one batch MAX_PARALLEL_INSERT=2 ### Max concurrent requests to the LLM MAX_ASYNC=4
Create your service file lightrag.service from the sample file: lightrag.service.example. Modify the start options the service file:
# Set Enviroment to your Python virtual enviroment Environment="PATH=/home/netman/lightrag-xyj/venv/bin" WorkingDirectory=/home/netman/lightrag-xyj # ExecStart=/home/netman/lightrag-xyj/venv/bin/lightrag-server ExecStart=/home/netman/lightrag-xyj/venv/bin/lightrag-gunicorn
The ExecStart command must be either
lightrag-gunicornorlightrag-server; no wrapper scripts are allowed. This is because service termination requires the main process to be one of these two executables.
Install LightRAG service. If your system is Ubuntu, the following commands will work:
sudo cp lightrag.service /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl start lightrag.service sudo systemctl status lightrag.service sudo systemctl enable lightrag.service
We provide Ollama-compatible interfaces for LightRAG, aiming to emulate LightRAG as an Ollama chat model. This allows AI chat frontends supporting Ollama, such as Open WebUI, to access LightRAG easily.
After starting the lightrag-server, you can add an Ollama-type connection in the Open WebUI admin panel. And then a model named lightrag:latest will appear in Open WebUI's model management interface. Users can then send queries to LightRAG through the chat interface. You should install LightRAG as a service for this use case.
Open WebUI uses an LLM to do the session title and session keyword generation task. So the Ollama chat completion API detects and forwards OpenWebUI session-related requests directly to the underlying LLM. Screenshot from Open WebUI:

The default query mode is hybrid if you send a message (query) from the Ollama interface of LightRAG. You can select query mode by sending a message with a query prefix.
A query prefix in the query string can determine which LightRAG query mode is used to generate the response for the query. The supported prefixes include:
/local /global /hybrid /naive /mix /bypass /context /localcontext /globalcontext /hybridcontext /naivecontext /mixcontext
For example, the chat message /mix What's LightRAG? will trigger a mix mode query for LightRAG. A chat message without a query prefix will trigger a hybrid mode query by default.
/bypass is not a LightRAG query mode; it will tell the API Server to pass the query directly to the underlying LLM, including the chat history. So the user can use the LLM to answer questions based on the chat history. If you are using Open WebUI as a front end, you can just switch the model to a normal LLM instead of using the /bypass prefix.
/context is also not a LightRAG query mode; it will tell LightRAG to return only the context information prepared for the LLM. You can check the context if it's what you want, or process the context by yourself.
When using LightRAG for content queries, avoid combining the search process with unrelated output processing, as this significantly impacts query effectiveness. User prompt is specifically designed to address this issue — it does not participate in the RAG retrieval phase, but rather guides the LLM on how to process the retrieved results after the query is completed. We can append square brackets to the query prefix to provide the LLM with the user prompt:
/[Use mermaid format for diagrams] Please draw a character relationship diagram for Scrooge /mix[Use mermaid format for diagrams] Please draw a character relationship diagram for Scrooge
By default, the LightRAG Server can be accessed without any authentication. We can configure the server with an API Key or account credentials to secure it.
LIGHTRAG_API_KEY=your-secure-api-key-here WHITELIST_PATHS=/health,/api/*
Health check and Ollama emulation endpoints are excluded from API Key check by default. For security reasons, remove
/api/*fromWHITELIST_PATHSif the Ollama service is not required.
The API key is passed using the request header X-API-Key. Below is an example of accessing the LightRAG Server via API:
curl -X 'POST' \ 'http://localhost:9621/documents/scan' \ -H 'accept: application/json' \ -H 'X-API-Key: your-secure-api-key-here-123' \ -d ''
LightRAG API Server implements JWT-based authentication using the HS256 algorithm. To enable secure access control, the following environment variables are required:
# For jwt auth
AUTH_ACCOUNTS='admin:admin123,user1:pass456'
TOKEN_SECRET='your-key'
TOKEN_EXPIRE_HOURS=4
Currently, only the configuration of an administrator account and password is supported. A comprehensive account system is yet to be developed and implemented.
If Account credentials are not configured, the Web UI will access the system as a Guest. Therefore, even if only an API Key is configured, all APIs can still be accessed through the Guest account, which remains insecure. Hence, to safeguard the API, it is necessary to configure both authentication methods simultaneously.
Azure OpenAI API can be created using the following commands in Azure CLI (you need to install Azure CLI first from https://docs.microsoft.com/en-us/cli/azure/install-azure-cli):
# Change the resource group name, location, and OpenAI resource name as needed
RESOURCE_GROUP_NAME=LightRAG
LOCATION=swedencentral
RESOURCE_NAME=LightRAG-OpenAI
az login
az group create --name $RESOURCE_GROUP_NAME --location $LOCATION
az cognitiveservices account create --name $RESOURCE_NAME --resource-group $RESOURCE_GROUP_NAME --kind OpenAI --sku S0 --location swedencentral
az cognitiveservices account deployment create --resource-group $RESOURCE_GROUP_NAME --model-format OpenAI --name $RESOURCE_NAME --deployment-name gpt-4o --model-name gpt-4o --model-version "2024-08-06" --sku-capacity 100 --sku-name "Standard"
az cognitiveservices account deployment create --resource-group $RESOURCE_GROUP_NAME --model-format OpenAI --name $RESOURCE_NAME --deployment-name text-embedding-3-large --model-name text-embedding-3-large --model-version "1" --sku-capacity 80 --sku-name "Standard"
az cognitiveservices account show --name $RESOURCE_NAME --resource-group $RESOURCE_GROUP_NAME --query "properties.endpoint"
az cognitiveservices account keys list --name $RESOURCE_NAME -g $RESOURCE_GROUP_NAME
The output of the last command will give you the endpoint and the key for the OpenAI API. You can use these values to set the environment variables in the .env file.
# Azure OpenAI Configuration in .env: LLM_BINDING=azure_openai LLM_BINDING_HOST=your-azure-endpoint LLM_MODEL=your-model-deployment-name LLM_BINDING_API_KEY=your-azure-api-key ### API version is optional, defaults to latest version AZURE_OPENAI_API_VERSION=2024-08-01-preview ### If using Azure OpenAI for embeddings EMBEDDING_BINDING=azure_openai EMBEDDING_MODEL=your-embedding-deployment-name
The API Server can be configured in three ways (highest priority first):
Most of the configurations come with default settings; check out the details in the sample file: .env.example. Data storage configuration can also be set by config.ini. A sample file config.ini.example is provided for your convenience.
LightRAG supports binding to various LLM/Embedding backends:
Use environment variables LLM_BINDING or CLI argument --llm-binding to select the LLM backend type. Use environment variables EMBEDDING_BINDING or CLI argument --embedding-binding to select the Embedding backend type.
For LLM and embedding configuration examples, please refer to the env.example file in the project's root directory. To view the complete list of configurable options for OpenAI and Ollama-compatible LLM interfaces, use the following commands:
lightrag-server --llm-binding openai --help lightrag-server --llm-binding ollama --help lightrag-server --embedding-binding ollama --help
Please use OpenAI-compatible method to access LLMs deployed by OpenRouter or vLLM/SGLang. You can pass additional parameters to OpenRouter or vLLM/SGLang through the
OPENAI_LLM_EXTRA_BODYenvironment variable to disable reasoning mode or achieve other personalized controls.
Set the max_tokens to prevent excessively long or endless output loop during the entity relationship extraction phase for Large Language Model (LLM) responses. The purpose of setting max_tokens parameter is to truncate LLM output before timeouts occur, thereby preventing document extraction failures. This addresses issues where certain text blocks (e.g., tables or citations) containing numerous entities and relationships can lead to overly long or even endless loop outputs from LLMs. This setting is particularly crucial for locally deployed, smaller-parameter models. Max tokens value can be calculated by this formula: LLM_TIMEOUT * llm_output_tokens/second (i.e. 180s * 50 tokens/s = 9000)
# For vLLM/SGLang doployed models, or most of OpenAI compatible API provider OPENAI_LLM_MAX_TOKENS=9000 # For Ollama Deployed Modeles OLLAMA_LLM_NUM_PREDICT=9000 # For OpenAI o1-mini or newer modles OPENAI_LLM_MAX_COMPLETION_TOKENS=9000
It's very common to set ENABLE_LLM_CACHE_FOR_EXTRACT to true for a test environment to reduce the cost of LLM calls.
LightRAG uses 4 types of storage for different purposes:
LightRAG Server offers various storage implementations, with the default being an in-memory database that persists data to the WORKING_DIR directory. Additionally, LightRAG supports a wide range of storage solutions including PostgreSQL, MongoDB, FAISS, Milvus, Qdrant, Neo4j, Memgraph, and Redis. For detailed information on supported storage options, please refer to the storage section in the README.md file located in the root directory.
Milvus Index Configuration: LightRAG now supports configurable index types for Milvus vector storage (AUTOINDEX, HNSW, HNSW_SQ, IVF_FLAT, etc.) through environment variables. HNSW_SQ requires Milvus 2.6.8+ and provides significant memory savings. See the "Using Milvus for Vector Storage" section in the main README.md for complete configuration options.
You can select the storage implementation by configuring environment variables. For instance, prior to the initial launch of the API server, you can set the following environment variable to specify your desired storage implementation:
LIGHTRAG_KV_STORAGE=PGKVStorage LIGHTRAG_VECTOR_STORAGE=PGVectorStorage LIGHTRAG_GRAPH_STORAGE=PGGraphStorage LIGHTRAG_DOC_STATUS_STORAGE=PGDocStatusStorage
You cannot change storage implementation selection after adding documents to LightRAG. Data migration from one storage implementation to another is not supported yet. For further information, please read the sample env file or config.ini file.
When switching the storage implementation in LightRAG, the LLM cache can be migrated from the existing storage to the new one. Subsequently, when re-uploading files to the new storage, the pre-existing LLM cache will significantly accelerate file processing. For detailed instructions on using the LLM cache migration tool, please refer toREADME_MIGRATE_LLM_CACHE.md
| Parameter | Default | Description |
|---|---|---|
| --host | 0.0.0.0 | Server host |
| --port | 9621 | Server port |
| --working-dir | ./rag_storage | Working directory for RAG storage |
| --input-dir | ./inputs | Directory containing input documents |
| --max-async | 4 | Maximum number of async operations |
| --log-level | INFO | Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
| --verbose | - | Verbose debug output (True, False) |
| --key | None | API key for authentication. Protects the LightRAG server against unauthorized access |
| --ssl | False | Enable HTTPS |
| --ssl-certfile | None | Path to SSL certificate file (required if --ssl is enabled) |
| --ssl-keyfile | None | Path to SSL private key file (required if --ssl is enabled) |
| --llm-binding | ollama | LLM binding type (lollms, ollama, openai, openai-ollama, azure_openai, aws_bedrock) |
| --embedding-binding | ollama | Embedding binding type (lollms, ollama, openai, azure_openai, aws_bedrock) |
Reranking query-recalled chunks can significantly enhance retrieval quality by re-ordering documents based on an optimized relevance scoring model. LightRAG currently supports the following rerank providers:
v2/rerank endpoint. As vLLM provides a Cohere-compatible reranker API, all reranker models deployed via vLLM are also supported.The rerank provider is configured via the .env file. Below is an example configuration for a rerank model deployed locally using vLLM:
RERANK_BINDING=cohere RERANK_MODEL=BAAI/bge-reranker-v2-m3 RERANK_BINDING_HOST=http://localhost:8000/rerank RERANK_BINDING_API_KEY=your_rerank_api_key_here
Here is an example configuration for utilizing the Reranker service provided by Aliyun:
RERANK_BINDING=aliyun RERANK_MODEL=gte-rerank-v2 RERANK_BINDING_HOST=https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank RERANK_BINDING_API_KEY=your_rerank_api_key_here
For comprehensive reranker configuration examples, please refer to the env.example file.
Reranking can be enabled or disabled on a per-query basis.
The /query and /query/stream API endpoints include an enable_rerank parameter, which is set to true by default, controlling whether reranking is active for the current query. To change the default value of the enable_rerank parameter to false, set the following environment variable:
RERANK_BY_DEFAULT=False
By default, the /query and /query/stream endpoints return references with only reference_id and file_path. For evaluation, debugging, or citation purposes, you can request the actual retrieved chunk content to be included in references.
The include_chunk_content parameter (default: false) controls whether the actual text content of retrieved chunks is included in the response references. This is particularly useful for:
Important: The content field is an array of strings, where each string represents a chunk from the same file. A single file may correspond to multiple chunks, so the content is returned as a list to preserve chunk boundaries.
Example API Request:
{
"query": "What is LightRAG?",
"mode": "mix",
"include_references": true,
"include_chunk_content": true
}
Example Response (with chunk content):
{
"response": "LightRAG is a graph-based RAG system...",
"references": [
{
"reference_id": "1",
"file_path": "/documents/intro.md",
"content": [
"LightRAG is a retrieval-augmented generation system that combines knowledge graphs with vector similarity search...",
"The system uses a dual-indexing approach with both vector embeddings and graph structures for enhanced retrieval..."
]
},
{
"reference_id": "2",
"file_path": "/documents/features.md",
"content": [
"The system provides multiple query modes including local, global, hybrid, and mix modes..."
]
}
]
}
Notes:
include_references=true. Setting include_chunk_content=true without including references has no effect.content as a single concatenated string. Now it returns an array of strings to preserve individual chunk boundaries. If you need a single string, join the array elements with your preferred separator (e.g., "\n\n".join(content)).### Server Configuration
# HOST=0.0.0.0
PORT=9621
WORKERS=2
### Settings for document indexing
ENABLE_LLM_CACHE_FOR_EXTRACT=true
SUMMARY_LANGUAGE=Chinese
MAX_PARALLEL_INSERT=2
### LLM Configuration (Use valid host. For local services installed with docker, you can use host.docker.internal)
TIMEOUT=150
MAX_ASYNC=4
LLM_BINDING=openai
LLM_MODEL=gpt-4o-mini
LLM_BINDING_HOST=https://api.openai.com/v1
LLM_BINDING_API_KEY=your-api-key
### Embedding Configuration (Use valid host. For local services installed with docker, you can use host.docker.internal)
# see also env.ollama-binding-options.example for fine tuning ollama
EMBEDDING_MODEL=bge-m3:latest
EMBEDDING_DIM=1024
EMBEDDING_BINDING=ollama
EMBEDDING_BINDING_HOST=http://localhost:11434
### For JWT Auth
# AUTH_ACCOUNTS='admin:admin123,user1:pass456'
# TOKEN_SECRET=your-key-for-LightRAG-API-Server-xxx
# TOKEN_EXPIRE_HOURS=48
# LIGHTRAG_API_KEY=your-secure-api-key-here-123
# WHITELIST_PATHS=/api/*
# WHITELIST_PATHS=/health,/api/*
The document processing pipeline in LightRAG is somewhat complex and is divided into two primary stages: the Extraction stage (entity and relationship extraction) and the Merging stage (entity and relationship merging). There are two key parameters that control pipeline concurrency: the maximum number of files processed in parallel (MAX_PARALLEL_INSERT) and the maximum number of concurrent LLM requests (MAX_ASYNC). The workflow is described as follows:
MAX_ASYNC.Large files should be divided into smaller segments to enable incremental processing. Reprocessing of failed files can be initiated by pressing the "Scan" button on the web UI.
All servers (LoLLMs, Ollama, OpenAI and Azure OpenAI) provide the same REST API endpoints for RAG functionality. When the API Server is running, visit:
You can test the API endpoints using the provided curl commands or through the Swagger UI interface. Make sure to:
LightRAG implements asynchronous document indexing to enable frontend monitoring and querying of document processing progress. Upon uploading files or inserting text through designated endpoints, a unique Track ID is returned to facilitate real-time progress monitoring.
API Endpoints Supporting Track ID Generation:
/documents/upload/documents/text/documents/textsDocument Processing Status Query Endpoint:
/track_status/{track_id}This endpoint provides comprehensive status information including: