Installation

Install Structure-D via pip with optional extras, or build the Rust CLI from source.

Requirements

  • Python 3.10 or higher
  • pip 23+ (for editable installs with pyproject.toml)
  • Rust 1.80+ (optional — only if building the CLI)

Python package

Install from PyPI (or directly from the repo) with only the extras you need:

bash
# Minimal — pipeline core only (no parsers, no LLM SDK)
pip install structure-d

# Recommended for most users
pip install "structure-d[ingestion,api,llm]"

# Everything
pip install "structure-d[all]"

Extras reference

Extra Contents When to use
ingestion pymupdf, pdfplumber, pytesseract, python-docx, openpyxl, beautifulsoup4 Processing PDFs, images, Office files, HTML
llm openai ≥1.30, anthropic ≥0.28, google-generativeai ≥0.8 Using cloud LLM providers
inference vllm ≥0.6, transformers, torch Self-hosted vLLM inference
retrieval sentence-transformers, chromadb, langchain, llama-index RAG pipelines and vector search
storage sqlalchemy, asyncpg, pgvector, asyncmy Writing to databases
connectors boto3, google-cloud-storage, azure-storage-blob, paramiko Reading from S3, GCS, Azure, SFTP
destinations snowflake-connector-python, google-cloud-bigquery Writing to Snowflake, BigQuery
api fastapi, uvicorn, python-multipart Running the HTTP service
monitoring prometheus-client, opentelemetry Metrics and tracing
dev pytest, pytest-asyncio, ruff, mypy, pre-commit Development and testing
all All of the above Full feature set

Rust CLI

The structure-d CLI is a native Rust binary for high-throughput batch extraction. It ships the same pipeline logic without the Python overhead.

bash
# Build with Cargo (requires Rust 1.80+)
cargo install --path cli/

# Or use the Makefile shortcut
make install

Verify the CLI is available:

bash
structure-d --version
# structure-d 0.2.0

structure-d formats   # list supported file formats
structure-d models    # list registered models

From source

bash
git clone https://github.com/jagguvarma15/Structure-D.git
cd Structure-D

# Set up virtual env
python -m venv .venv
source .venv/bin/activate

# Install in editable mode with all extras
pip install -e ".[all]"

# Build Rust CLI
make install

Verify the install

bash
python -c "import structure_d; print(structure_d.__version__)"
# 0.2.0

# Run the test suite
pytest tests/ -v -m "not slow and not integration"

OCR dependencies

The tesseract_image and ocr_pdf parsers require Tesseract to be installed on your system: brew install tesseract on macOS or apt install tesseract-ocr on Linux.