You need to install a CUDA version that is compatible with torch's requirements. For details, please refer to the official PyTorch website.
If Anaconda is already installed, you can skip this step.
Download link: https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Windows-x86_64.exe
conda create -n mineru 'python=3.12' -y
conda activate mineru
pip install -U magic-pdf[full]
IMPORTANT
After installation, you can check the version of magic-pdf using the following command:
magic-pdf --version
Refer to detailed instructions on how to download model files.
After completing the 5. Download Models step, the script will automatically generate a magic-pdf.json file in the user directory and configure the default model path.
You can find the magic-pdf.json file in your 【user directory】 .
TIP
The user directory for Windows is "C:/Users/username".
Download a sample file from the repository and test it.
wget https://github.com/opendatalab/MinerU/raw/master/demo/pdfs/small_ocr.pdf -O small_ocr.pdf magic-pdf -p small_ocr.pdf -o ./output
If your graphics card has at least 6GB of VRAM, follow these steps to test CUDA-accelerated parsing performance.
Overwrite the installation of torch and torchvision supporting CUDA.(Please select the appropriate index-url based on your CUDA version. For more details, refer to the PyTorch official website.)
pip install --force-reinstall torch torchvision --index-url https://download.pytorch.org/whl/cu124
Modify the value of "device-mode" in the magic-pdf.json configuration file located in your user directory.
{
"device-mode": "cuda"
}
Run the following command to test CUDA acceleration:
magic-pdf -p small_ocr.pdf -o ./output