LLM Vision plugin for MarkItDown that extracts text from images embedded in PDF, DOCX, PPTX, and XLSX files.
Uses the same llm_client / llm_model pattern that MarkItDown already supports for image descriptions — no new ML libraries or binary dependencies required.
pip install markitdown-ocr
The plugin uses whatever OpenAI-compatible client you already have. Install one if you don't have it yet:
pip install openai
markitdown document.pdf --use-plugins --llm-client openai --llm-model gpt-4o
Pass llm_client and llm_model to MarkItDown() exactly as you would for image descriptions:
from markitdown import MarkItDown
from openai import OpenAI
md = MarkItDown(
enable_plugins=True,
llm_client=OpenAI(),
llm_model="gpt-4o",
)
result = md.convert("document_with_images.pdf")
print(result.text_content)
If no llm_client is provided the plugin still loads, but OCR is silently skipped — falling back to the standard built-in converter.
Override the default extraction prompt for specialized documents:
md = MarkItDown(
enable_plugins=True,
llm_client=OpenAI(),
llm_model="gpt-4o",
llm_prompt="Extract all text from this image, preserving table structure.",
)
Works with any client that follows the OpenAI API:
from openai import AzureOpenAI
md = MarkItDown(
enable_plugins=True,
llm_client=AzureOpenAI(
api_key="...",
azure_endpoint="https://your-resource.openai.azure.com/",
api_version="2024-02-01",
),
llm_model="gpt-4o",
)
When MarkItDown(enable_plugins=True, llm_client=..., llm_model=...) is called:
markitdown.plugin entry point groupregister_converters(), forwarding all kwargs including llm_client and llm_modelLLMVisionOCRService from those kwargsWhen a file is converted:
page.images / page XObjects) and OCR'd inline, interleaved with the surrounding text in vertical reading order.doc.part.rels).*[Image OCR]...[End OCR]* blocks after conversion.llm_client is configured, the LLM is asked for a description first; OCR is used as the fallback when no description is returned.sheet._images) are extracted per sheet.### Images in this sheet: section after the sheet's data table — they are not interleaved into the table rows.Every extracted OCR block is wrapped as:
*[Image OCR] <extracted text> [End OCR]*
The most likely cause is a missing llm_client or llm_model. Verify:
from openai import OpenAI
from markitdown import MarkItDown
md = MarkItDown(
enable_plugins=True,
llm_client=OpenAI(), # required
llm_model="gpt-4o", # required
)
Confirm the plugin is installed and discovered:
markitdown --list-plugins # should show: ocr
The plugin propagates LLM API errors as warnings and continues conversion. Check your API key, quota, and that the chosen model supports vision inputs.
cd packages/markitdown-ocr
pytest tests/ -v
git clone https://github.com/microsoft/markitdown.git
cd markitdown/packages/markitdown-ocr
pip install -e .
Contributions are welcome! See the MarkItDown repository for guidelines.
MIT — see LICENSE.