Models & Providers

LLM providers (OpenAI, Anthropic, Gemini, Ollama, vLLM), ModelRegistry, and ModelRouter API reference.

BaseLLMProvider

Abstract base class for all providers. Implement generate() and generate_text() to create a custom provider.

python
class BaseLLMProvider(ABC):
    async def generate(
        self,
        prompt: str,
        schema: Type[BaseModel],
        system_prompt: str | None = None,
        temperature: float = 0.0,
        max_tokens: int = 2048,
        model: str | None = None,
        **kwargs,
    ) -> ProviderResult: ...

    async def generate_text(
        self,
        prompt: str,
        system_prompt: str | None = None,
        temperature: float = 0.0,
        max_tokens: int = 2048,
        model: str | None = None,
        **kwargs,
    ) -> str: ...

@dataclass
class ProviderResult:
    output: BaseModel           # validated Pydantic instance
    raw_text: str               # raw model output
    model_used: str
    token_usage: dict[str, int] # prompt_tokens, completion_tokens, total_tokens

OpenAIProvider

OpenAIProvider(api_key, model, base_url)

Uses OpenAI's structured outputs (response_format) for guaranteed JSON.

Parameters

api_key str | None Reads OPENAI_API_KEY env var if None. optional
model str Default: "gpt-4o". optional
base_url str | None Override for Azure OpenAI or proxy endpoints. optional

AnthropicProvider

AnthropicProvider(api_key, model)

Uses Anthropic's tool use (not prompt engineering) for structured output — more reliable than JSON-mode prompting.

Parameters

api_key str | None Reads ANTHROPIC_API_KEY if None. optional
model str Default: "claude-3-5-sonnet-20241022". optional

GeminiProvider

GeminiProvider(api_key, model)

Uses Google's Gemini API via google-generativeai.

Parameters

api_key str | None Reads GOOGLE_API_KEY if None. optional
model str Default: "gemini-1.5-pro". optional

OllamaProvider

OllamaProvider(base_url, model)

Connects to a local Ollama server. No API key required — fully air-gapped.

Parameters

base_url str Default: "http://localhost:11434". optional
model str Default: "llama3.1:8b". Must be pulled in Ollama first. optional

VLLMProvider

VLLMProvider(api_base, api_key, model)

Connects to a vLLM server using guided_json constrained decoding — the most reliable structured output method for open models.

Parameters

api_base str | None vLLM server URL. Reads VLLM_API_BASE if None. optional
model str | None Model name as registered in vLLM. Uses model routing if None. optional

FallbackProvider

FallbackProvider(primary, fallback)

Wraps two providers. Transparently retries on InferenceError. Both providers must implement BaseLLMProvider.

Provider factory

python
from structure_d.inference.providers import get_provider, resolve_provider

# By name — returns a configured instance
provider = get_provider("openai", api_key="sk-...", model="gpt-4o-mini")
provider = get_provider("anthropic")
provider = get_provider("ollama", model="mistral:7b")

# From config — reads settings and builds FallbackProvider if configured
from structure_d.config import get_settings
provider = resolve_provider(get_settings())

ModelRegistry

Manages the catalogue of available models loaded from configs/models.yaml.

python
from structure_d.models.registry import ModelRegistry, ModelEntry
from structure_d.schemas.base import TaskType

registry = ModelRegistry.from_yaml("configs/models.yaml")

# Query
registry.list_models()                          # list[ModelEntry]
registry.get("llama-3.1-8b")                    # ModelEntry | None
registry.get_by_task(TaskType.EXTRACTION)        # list[ModelEntry]
registry.get_default_for_task(TaskType.REASONING) # ModelEntry | None
registry.get_multimodal()                        # list[ModelEntry]

# Register a custom model
registry.register(ModelEntry(
    name="mistralai/Mixtral-8x7B-Instruct-v0.1",
    alias="mixtral-8x7b",
    tasks=[TaskType.EXTRACTION, TaskType.SUMMARISATION],
    size_b=46.7,
    quantisation=None,
    max_context=32768,
    cost_per_1k_tokens=0.03,
    supports_structured_output=True,
    multimodal=False,
))

ModelRouter

router.route(task, *, input_tokens, prefer_multimodal, max_cost_per_1k, max_size_b) → ModelEntry

Select the best model for a task based on registered criteria.

Parameters

task TaskType The inference task. required
max_cost_per_1k float | None Filter models above this cost threshold. optional
max_size_b float | None Filter models larger than N billion parameters. optional

Registered models

Default models in configs/models.yaml:

AliasModelSizeDefault forCost/1k
qwen-1.5b Qwen/Qwen2.5-1.5B-Instruct 1.5B classification, sentiment $0.001
mistral-7b mistralai/Mistral-7B-Instruct-v0.3 7B classification, extraction $0.005
llama-3.1-8b meta-llama/Llama-3.1-8B-Instruct 8B extraction, classification, summarisation $0.008
llama-3.1-13b meta-llama/Llama-3.1-13B-Instruct 13B extraction, summarisation $0.015
deepseek-r1-70b deepseek-ai/DeepSeek-R1-Distill-Llama-70B 70B AWQ extraction, reasoning $0.06
llama-3.1-70b meta-llama/Llama-3.1-70B-Instruct 70B GPTQ extraction, reasoning $0.06
qwen-vl-7b Qwen/Qwen2-VL-7B-Instruct 7B multimodal multimodal $0.01

Add your own models

Add entries to configs/models.yaml to register custom or fine-tuned models. The router will include them in its selection pool.