HTTP API
REST API reference for the Structure-D FastAPI service — endpoints, request format, and response schema.
Structure-D ships a production-ready FastAPI application that exposes the pipeline over HTTP. Use it when you need to call Structure-D from any language, integrate it into an existing microservice mesh, or deploy it as a standalone extraction service.
/api/v1/extract Upload a document and extract structured data /api/v1/health Liveness / readiness check /api/v1/models List all registered LLM models /api/v1/schemas List all available extraction schemas /api/v1/formats List supported file formats and extensions Starting the server
# Development (auto-reload)
uvicorn structure_d.api.app:create_app --factory --reload --port 8080
# Production
uvicorn structure_d.api.app:create_app \
--factory \
--host 0.0.0.0 \
--port 8080 \
--workers 4
# Docker
docker compose up api Or create the app programmatically:
from structure_d.api.app import create_app
app = create_app() # returns a FastAPI instance
# use with any ASGI server: Uvicorn, Hypercorn, Daphne Authentication
Authentication is optional in v0.2. Pass SD_API_KEY as an environment variable to
enable bearer token auth. When set, every request must include:
Authorization: Bearer <your-api-key> Full API key + JWT auth ships in v0.4.0.
POST /api/v1/extract
Upload a document file and receive extracted structured data. Accepts multipart/form-data.
/api/v1/extract Request — multipart/form-data
key_value, table, entity, classification, summary, form, document_structure, generic. Defaults to key_value. schema if provided. pymupdf, pdfplumber, ocr_pdf, tesseract_image, html, docx, etc. Defaults to auto. openai, anthropic, gemini, ollama, vllm. Uses configured default if omitted. jsonl, csv. Omit to skip saving. Response — 200 OK
{
"status": "success",
"filename": "invoice.pdf",
"schema": "key_value",
"results": [
{
"result_id": "3e4f2a1b-...",
"document_id": "7c8d9e0f-...",
"is_valid": true,
"structured_output": {
"pairs": [
{"key": "vendor", "value": "Acme Corp", "confidence": 0.97},
{"key": "total", "value": "1240.00", "confidence": 0.99},
{"key": "due_date", "value": "2026-04-15", "confidence": 0.91}
]
},
"model_used": "gpt-4o",
"latency_ms": 812,
"token_usage": {
"prompt_tokens": 420,
"completion_tokens": 88,
"total_tokens": 508
},
"validation_errors": [],
"source_format": "PDF"
}
],
"chunk_count": 1,
"processing_ms": 1340
} cURL example
curl -X POST http://localhost:8080/api/v1/extract \
-F "file=@invoice.pdf" \
-F "schema=key_value" \
-F "provider=openai" GET /api/v1/health
Liveness and readiness check. Returns provider connectivity status.
/api/v1/health Response — 200 OK
{
"status": "healthy",
"version": "0.2.0",
"provider": "openai",
"provider_status": "reachable",
"uptime_seconds": 3602
} GET /api/v1/models
Returns all models registered in the ModelRegistry.
{
"models": [
{
"alias": "llama-3.1-8b",
"model_id": "meta-llama/Llama-3.1-8B-Instruct",
"provider": "vllm",
"supported_tasks": ["extraction", "classification", "summarisation"],
"context_length": 131072
},
{
"alias": "gpt-4o",
"model_id": "gpt-4o",
"provider": "openai",
"supported_tasks": ["extraction", "classification", "summarisation", "reasoning"],
"context_length": 128000
}
]
} GET /api/v1/schemas
Returns all built-in extraction schemas with their JSON Schema definitions.
{
"schemas": [
{
"name": "key_value",
"class": "KeyValueExtraction",
"description": "Extracts key-value pairs with confidence scores",
"json_schema": { "..." }
}
]
} GET /api/v1/formats
Returns all supported file extensions and their default parsers.
{
"formats": [
{"extension": ".pdf", "parser": "pymupdf", "format": "PDF" },
{"extension": ".png", "parser": "tesseract_image", "format": "IMAGE" },
{"extension": ".docx", "parser": "docx", "format": "DOCX" },
{"extension": ".html", "parser": "html", "format": "HTML" }
]
} Error responses
All errors return a JSON body with error, detail, and code.
| HTTP status | code | When it happens |
|---|---|---|
| 400 | UNSUPPORTED_FORMAT | File extension is not in the supported formats list |
| 400 | INVALID_SCHEMA | schema_json is not valid JSON Schema |
| 401 | UNAUTHORIZED | API key is missing or invalid (when auth is enabled) |
| 422 | VALIDATION_FAILED | All retry attempts exhausted; extraction result is invalid |
| 429 | RATE_LIMITED | Too many requests (v0.4.0+) |
| 500 | INFERENCE_ERROR | Provider returned an error or is unreachable |
| 500 | PARSER_ERROR | Parser failed to process the uploaded file |
{
"error": "INFERENCE_ERROR",
"detail": "OpenAI API returned 429: rate limit exceeded",
"code": 500,
"request_id": "req_abc123"
} Python client
Call the HTTP API from Python using httpx:
import httpx
from pathlib import Path
BASE_URL = "http://localhost:8080"
async def extract(file_path: Path, schema: str = "key_value") -> dict:
async with httpx.AsyncClient(timeout=60.0) as client:
with open(file_path, "rb") as f:
response = await client.post(
f"{BASE_URL}/api/v1/extract",
files={"file": (file_path.name, f, "application/octet-stream")},
data={"schema": schema},
)
response.raise_for_status()
return response.json()
# Use it
import asyncio
result = asyncio.run(extract(Path("contract.pdf"), schema="key_value"))
print(result["results"][0]["structured_output"]) OpenAPI / Swagger UI
The running service exposes an interactive OpenAPI UI at http://localhost:8080/docs and a ReDoc view at http://localhost:8080/redoc.