Public

WeChat Login

Code Issues Pull requests Events Packages Insights

main

LightRAG/lightrag/tools/README_CLEAN_LLM_QUERY_CACHE.md

yangdx<gzdaniel@me.com>

Add LLM query cache cleanup tool for KV storage backends

1485cb82

0 commits

PreviewCode viewBlame

LLM Query Cache Cleanup Tool - User Guide

Overview

This tool cleans up LightRAG's LLM query cache from KV storage implementations. It specifically targets query caches generated during RAG query operations (modes: mix, hybrid, local, global), including both query and keywords caches.

Supported Storage Types

JsonKVStorage - File-based JSON storage
RedisKVStorage - Redis database storage
PGKVStorage - PostgreSQL database storage
MongoKVStorage - MongoDB database storage

Cache Types

The tool cleans up the following query cache types:

Query Cache Modes (4 types)

mix:* - Mixed mode query caches
hybrid:* - Hybrid mode query caches
local:* - Local mode query caches
global:* - Global mode query caches

Cache Content Types (2 types)

*:query:* - Query result caches
*:keywords:* - Keywords extraction caches

Cache Key Format


<mode>:<cache_type>:<hash>

Examples:

mix:query:5ce04d25e957c290216cee5bfe6344fa
mix:keywords:fee77b98244a0b047ce95e21060de60e
global:query:abc123def456...
local:keywords:789xyz...

Important Note: This tool does NOT clean extraction caches (default:extract:* and default:summary:*). Use the migration tool or manual deletion for those caches.

Prerequisites

The tool reads storage configuration from environment variables or config.ini
Ensure the target storage is properly configured and accessible
Backup important data before running cleanup operations

Usage

Basic Usage

Run from the LightRAG project root directory:


python -m lightrag.tools.clean_llm_query_cache
# or
python lightrag/tools/clean_llm_query_cache.py

Interactive Workflow

The tool guides you through the following steps:

1. Select Storage Type


============================================================
LLM Query Cache Cleanup Tool - LightRAG
============================================================

=== Storage Setup ===

Supported KV Storage Types:
[1] JsonKVStorage
[2] RedisKVStorage
[3] PGKVStorage
[4] MongoKVStorage

Select storage type (1-4) (Press Enter to exit): 1

Note: You can press Enter or type 0 at any prompt to exit gracefully.

2. Storage Validation

The tool will:

Check required environment variables
Auto-detect workspace configuration
Initialize and connect to storage
Verify connection status


Checking configuration...
✓ All required environment variables are set

Initializing storage...
- Storage Type: JsonKVStorage
- Workspace: space1
- Connection Status: ✓ Success

3. View Cache Statistics

The tool displays a detailed breakdown of query caches by mode and type:


Counting query cache records...

📊 Query Cache Statistics (Before Cleanup):
┌────────────┬────────────┬────────────┬────────────┐
│ Mode       │ Query      │ Keywords   │ Total      │
├────────────┼────────────┼────────────┼────────────┤
│ mix        │      1,234 │        567 │      1,801 │
│ hybrid     │        890 │        423 │      1,313 │
│ local      │      2,345 │      1,123 │      3,468 │
│ global     │        678 │        345 │      1,023 │
├────────────┼────────────┼────────────┼────────────┤
│ Total      │      5,147 │      2,458 │      7,605 │
└────────────┴────────────┴────────────┴────────────┘

4. Select Cleanup Scope

Choose what type of caches to delete:


=== Cleanup Options ===
[1] Delete all query caches (both query and keywords)
[2] Delete query caches only (keep keywords)
[3] Delete keywords caches only (keep query)
[0] Cancel

Select cleanup option (0-3): 1

Cleanup Types:

Option 1 (all): Deletes both query and keywords caches across all modes
Option 2 (query): Deletes only query caches, preserves keywords caches
Option 3 (keywords): Deletes only keywords caches, preserves query caches

5. Confirm Deletion

Review the cleanup plan and confirm:


============================================================
Cleanup Confirmation
============================================================
Storage: JsonKVStorage (workspace: space1)
Cleanup Type: all
Records to Delete: 7,605 / 7,605

⚠️  WARNING: This will delete ALL query caches across all modes!

Continue with deletion? (y/n): y

6. Execute Cleanup

The tool performs batch deletion with real-time progress:

JsonKVStorage Example:


=== Starting Cleanup ===
💡 Processing 1,000 records at a time from JsonKVStorage

Batch 1/8: ████░░░░░░░░░░░░░░░░ 1,000/7,605 (13.1%) ✓
Batch 2/8: ████████░░░░░░░░░░░░ 2,000/7,605 (26.3%) ✓
...
Batch 8/8: ████████████████████ 7,605/7,605 (100.0%) ✓

Persisting changes to storage...
✓ Changes persisted successfully

RedisKVStorage Example:


=== Starting Cleanup ===
💡 Processing Redis keys in batches of 1,000

Batch 1: Deleted 1,000 keys (Total: 1,000) ✓
Batch 2: Deleted 1,000 keys (Total: 2,000) ✓
...

PostgreSQL Example:


=== Starting Cleanup ===
💡 Executing PostgreSQL DELETE query

✓ Deleted 7,605 records in 0.45s

MongoDB Example:


=== Starting Cleanup ===
💡 Executing MongoDB deleteMany operations

Pattern 1/8: Deleted 1,234 records ✓
Pattern 2/8: Deleted 567 records ✓
...
Total deleted: 7,605 records

7. Review Cleanup Report

The tool provides a comprehensive final report:

Successful Cleanup:


============================================================
Cleanup Complete - Final Report
============================================================

📊 Statistics:
  Total records to delete:  7,605
  Total batches:            8
  Successful batches:       8
  Failed batches:           0
  Successfully deleted:     7,605
  Failed to delete:         0
  Success rate:             100.00%

📈 Before/After Comparison:
  Total caches before:      7,605
  Total caches after:       0
  Net reduction:            7,605

============================================================
✓ SUCCESS: All records cleaned up successfully!
============================================================

📊 Query Cache Statistics (After Cleanup):
┌────────────┬────────────┬────────────┬────────────┐
│ Mode       │ Query      │ Keywords   │ Total      │
├────────────┼────────────┼────────────┼────────────┤
│ mix        │          0 │          0 │          0 │
│ hybrid     │          0 │          0 │          0 │
│ local      │          0 │          0 │          0 │
│ global     │          0 │          0 │          0 │
├────────────┼────────────┼────────────┼────────────┤
│ Total      │          0 │          0 │          0 │
└────────────┴────────────┴────────────┴────────────┘

Cleanup with Errors:


============================================================
Cleanup Complete - Final Report
============================================================

📊 Statistics:
  Total records to delete:  7,605
  Total batches:            8
  Successful batches:       7
  Failed batches:           1
  Successfully deleted:     6,605
  Failed to delete:         1,000
  Success rate:             86.85%

📈 Before/After Comparison:
  Total caches before:      7,605
  Total caches after:       1,000
  Net reduction:            6,605

⚠️  Errors encountered: 1

Error Details:
------------------------------------------------------------

Error Summary:
  - ConnectionError: 1 occurrence(s)

First 5 errors:

  1. Batch 3
     Type: ConnectionError
     Message: Connection timeout after 30s
     Records lost: 1,000

============================================================
⚠️  WARNING: Cleanup completed with errors!
   Please review the error details above.
============================================================

Technical Details

Workspace Handling

The tool retrieves workspace in the following priority order:

Storage-specific workspace environment variables
- PGKVStorage: POSTGRES_WORKSPACE
- MongoKVStorage: MONGODB_WORKSPACE
- RedisKVStorage: REDIS_WORKSPACE
Generic workspace environment variable
- WORKSPACE
Default value
- Empty string (uses storage's default workspace)

Batch Deletion

Default batch size: 1000 records/batch
Prevents memory overflow and connection timeouts
Each batch is processed independently
Failed batches are logged but don't stop cleanup

Storage-Specific Deletion Strategies

JsonKVStorage

Collects all matching keys first (snapshot approach)
Deletes in batches with lock protection
Fast in-memory operations

RedisKVStorage

Uses SCAN with pattern matching
Pipeline DELETE for batch operations
Cursor-based iteration for large datasets

PostgreSQL

Single DELETE query with OR conditions
Efficient server-side bulk deletion
Uses LIKE patterns for mode/type matching

MongoDB

Multiple deleteMany operations (one per pattern)
Regex-based document matching
Returns exact deletion counts

Pattern Matching Implementation

JsonKVStorage:


# Direct key prefix matching
if key.startswith("mix:query:") or key.startswith("mix:keywords:")

RedisKVStorage:


# SCAN with namespace-prefixed patterns
pattern = f"{namespace}:mix:query:*"
cursor, keys = await redis.scan(cursor, match=pattern)

PostgreSQL:


# SQL LIKE conditions
WHERE id LIKE 'mix:query:%' OR id LIKE 'mix:keywords:%'

MongoDB:


# Regex queries on _id field
{"_id": {"$regex": "^mix:query:"}}

Error Handling & Resilience

The tool implements comprehensive error tracking:

Batch-Level Error Tracking

Each batch is independently error-checked
Failed batches are logged with full details
Successful batches commit even if later batches fail
Real-time progress shows ✓ (success) or ✗ (failed)

Error Reporting

After cleanup completes, a detailed report includes:

Statistics: Total records, success/failure counts, success rate
Before/After Comparison: Net reduction in cache count
Error Summary: Grouped by error type with occurrence counts
Error Details: Batch number, error type, message, and records lost
Recommendations: Clear indication of success or need for review

Verification

Post-cleanup count verification
Before/after statistics comparison
Identifies partial cleanup scenarios

Important Notes

Irreversible Operation
- Deleted caches cannot be recovered
- Always backup important data before cleanup
- Test on non-production data first
Performance Impact
- Query performance may degrade temporarily after cleanup
- Caches will rebuild on subsequent queries
- Consider cleanup during off-peak hours
Selective Cleanup
- Choose cleanup scope carefully
- Keywords caches may be valuable for future queries
- Query caches rebuild faster than keywords caches
Workspace Isolation
- Cleanup only affects the selected workspace
- Other workspaces remain untouched
- Verify workspace before confirming cleanup
Interrupt and Resume
- Cleanup can be interrupted at any time (Ctrl+C)
- Already deleted records cannot be recovered
- No automatic resume - must run tool again

Storage Configuration

The tool supports multiple configuration methods with the following priority:

Environment variables (highest priority)
config.ini file (medium priority)
Default values (lowest priority)

Environment Variable Configuration

Configure storage settings in your .env file:

Workspace Configuration (Optional)


# Generic workspace (shared by all storages)
WORKSPACE=space1

# Or configure independent workspace for specific storage
POSTGRES_WORKSPACE=pg_space
MONGODB_WORKSPACE=mongo_space
REDIS_WORKSPACE=redis_space

Workspace Priority: Storage-specific > Generic WORKSPACE > Empty string

JsonKVStorage


WORKING_DIR=./rag_storage

RedisKVStorage


REDIS_URI=redis://localhost:6379

PGKVStorage


POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=your_username
POSTGRES_PASSWORD=your_password
POSTGRES_DATABASE=your_database

MongoKVStorage


MONGO_URI=mongodb://root:root@localhost:27017/
MONGO_DATABASE=LightRAG

config.ini Configuration

Alternatively, create a config.ini file in the project root:


[redis]
uri = redis://localhost:6379

[postgres]
host = localhost
port = 5432
user = postgres
password = yourpassword
database = lightrag

[mongodb]
uri = mongodb://root:root@localhost:27017/
database = LightRAG

Note: Environment variables take precedence over config.ini settings.

Troubleshooting

Missing Environment Variables


⚠️  Warning: Missing environment variables: POSTGRES_USER, POSTGRES_PASSWORD

Solution: Add missing variables to your .env file or configure in config.ini

Connection Failed


✗ Initialization failed: Connection refused

Solutions:

Check if database service is running
Verify connection parameters (host, port, credentials)
Check firewall settings
Ensure network connectivity for remote databases

No Caches Found


⚠️  No query caches found in storage

Possible Reasons:

No queries have been run yet
Caches were already cleaned
Wrong workspace selected
Different storage type was used for queries

Partial Cleanup


⚠️  WARNING: Cleanup completed with errors!

Solutions:

Check error details in the report
Verify storage connection stability
Re-run tool to clean remaining caches
Check storage capacity and permissions

Use Cases

Use Case 1: Clean All Query Caches

Scenario: Free up storage space by removing all query caches


# Run tool
python -m lightrag.tools.clean_llm_query_cache

# Select: Storage type -> Option 1 (all) -> Confirm (y)

Result: All query and keywords caches deleted, maximum storage freed

Use Case 2: Refresh Query Caches Only

Scenario: Force query cache rebuild while keeping keywords


# Run tool
python -m lightrag.tools.clean_llm_query_cache

# Select: Storage type -> Option 2 (query only) -> Confirm (y)

Result: Query caches deleted, keywords preserved for faster rebuild

Use Case 3: Clean Stale Keywords

Scenario: Remove outdated keywords while keeping recent query results


# Run tool
python -m lightrag.tools.clean_llm_query_cache

# Select: Storage type -> Option 3 (keywords only) -> Confirm (y)

Result: Keywords deleted, query caches preserved

Use Case 4: Workspace-Specific Cleanup

Scenario: Clean caches for a specific workspace


# Configure workspace
export WORKSPACE=development

# Run tool
python -m lightrag.tools.clean_llm_query_cache

# Select: Storage type -> Cleanup option -> Confirm (y)

Result: Only development workspace caches cleaned

Best Practices

Backup Before Cleanup
- Always backup your storage before major cleanup
- Test cleanup on non-production data first
- Document cleanup decisions
Monitor Performance
- Watch storage metrics during cleanup
- Monitor query performance after cleanup
- Allow time for cache rebuild
Scheduled Cleanup
- Clean caches periodically (weekly/monthly)
- Automate cleanup for development environments
- Keep production cleanup manual for safety
Selective Deletion
- Consider cleanup scope based on needs
- Keywords caches are harder to rebuild
- Query caches rebuild automatically
Storage Capacity
- Monitor storage usage trends
- Clean caches before reaching capacity limits
- Archive old data if needed

Comparison with Migration Tool

Feature	Cleanup Tool	Migration Tool
Purpose	Delete query caches	Migrate extraction caches
Cache Types	mix/hybrid/local/global	default:extract/summary
Modes	query, keywords	extract, summary
Operation	Deletion	Copy between storages
Reversible	No	Yes (source unchanged)
Use Case	Free storage, refresh caches	Change storage backend

Limitations

Single Storage Operation
- Can only clean one storage type at a time
- To clean multiple storages, run tool multiple times
No Dry Run Mode
- Deletion is immediate after confirmation
- No preview-only mode available
- Test on non-production first
No Selective Mode Cleanup
- Cannot clean only specific modes (e.g., only mix)
- Cleanup applies to all modes for selected cache type
- All-or-nothing per cache type
No Scheduled Cleanup
- Manual execution required
- No built-in scheduling
- Use cron/scheduler if automation needed
Verification Limitations
- Post-cleanup verification may fail in error scenarios
- Manual verification recommended for critical operations

Future Enhancements

Potential improvements for future versions:

Selective mode cleanup (e.g., clean only mix mode)
Age-based cleanup (delete caches older than X days)
Size-based cleanup (delete largest caches first)
Dry run mode for safe preview
Automated scheduling support
Cache statistics export
Incremental cleanup with pause/resume

Support

For issues, questions, or feature requests:

Check the error details in the cleanup report
Review storage configuration
Verify workspace settings
Test with a small dataset first
Report bugs through project issue tracker

35/F,Tencent Building,Kejizhongyi Avenue,Nanshan District,Shenzhen

京ICP备11018762号-111