Installation
Install Structure-D via pip with optional extras, or build the Rust CLI from source.
Requirements
- Python 3.10 or higher
- pip 23+ (for editable installs with
pyproject.toml) - Rust 1.80+ (optional — only if building the CLI)
Python package
Install from PyPI (or directly from the repo) with only the extras you need:
bash
# Minimal — pipeline core only (no parsers, no LLM SDK)
pip install structure-d
# Recommended for most users
pip install "structure-d[ingestion,api,llm]"
# Everything
pip install "structure-d[all]" Extras reference
| Extra | Contents | When to use |
|---|---|---|
ingestion | pymupdf, pdfplumber, pytesseract, python-docx, openpyxl, beautifulsoup4 | Processing PDFs, images, Office files, HTML |
llm | openai ≥1.30, anthropic ≥0.28, google-generativeai ≥0.8 | Using cloud LLM providers |
inference | vllm ≥0.6, transformers, torch | Self-hosted vLLM inference |
retrieval | sentence-transformers, chromadb, langchain, llama-index | RAG pipelines and vector search |
storage | sqlalchemy, asyncpg, pgvector, asyncmy | Writing to databases |
connectors | boto3, google-cloud-storage, azure-storage-blob, paramiko | Reading from S3, GCS, Azure, SFTP |
destinations | snowflake-connector-python, google-cloud-bigquery | Writing to Snowflake, BigQuery |
api | fastapi, uvicorn, python-multipart | Running the HTTP service |
monitoring | prometheus-client, opentelemetry | Metrics and tracing |
dev | pytest, pytest-asyncio, ruff, mypy, pre-commit | Development and testing |
all | All of the above | Full feature set |
Rust CLI
The structure-d CLI is a native Rust binary for high-throughput batch extraction.
It ships the same pipeline logic without the Python overhead.
bash
# Build with Cargo (requires Rust 1.80+)
cargo install --path cli/
# Or use the Makefile shortcut
make install Verify the CLI is available:
bash
structure-d --version
# structure-d 0.2.0
structure-d formats # list supported file formats
structure-d models # list registered models From source
bash
git clone https://github.com/jagguvarma15/Structure-D.git
cd Structure-D
# Set up virtual env
python -m venv .venv
source .venv/bin/activate
# Install in editable mode with all extras
pip install -e ".[all]"
# Build Rust CLI
make install Verify the install
bash
python -c "import structure_d; print(structure_d.__version__)"
# 0.2.0
# Run the test suite
pytest tests/ -v -m "not slow and not integration" OCR dependencies
The tesseract_image and ocr_pdf parsers require Tesseract to be installed on your system: brew install tesseract on macOS or apt install tesseract-ocr on Linux.