Models & Providers
LLM providers (OpenAI, Anthropic, Gemini, Ollama, vLLM), ModelRegistry, and ModelRouter API reference.
BaseLLMProvider
Abstract base class for all providers. Implement generate() and generate_text() to create a custom provider.
class BaseLLMProvider(ABC):
async def generate(
self,
prompt: str,
schema: Type[BaseModel],
system_prompt: str | None = None,
temperature: float = 0.0,
max_tokens: int = 2048,
model: str | None = None,
**kwargs,
) -> ProviderResult: ...
async def generate_text(
self,
prompt: str,
system_prompt: str | None = None,
temperature: float = 0.0,
max_tokens: int = 2048,
model: str | None = None,
**kwargs,
) -> str: ...
@dataclass
class ProviderResult:
output: BaseModel # validated Pydantic instance
raw_text: str # raw model output
model_used: str
token_usage: dict[str, int] # prompt_tokens, completion_tokens, total_tokens OpenAIProvider
Uses OpenAI's structured outputs (response_format) for guaranteed JSON.
Parameters
OPENAI_API_KEY env var if None. optional "gpt-4o". optional AnthropicProvider
Uses Anthropic's tool use (not prompt engineering) for structured output — more reliable than JSON-mode prompting.
Parameters
ANTHROPIC_API_KEY if None. optional "claude-3-5-sonnet-20241022". optional GeminiProvider
Uses Google's Gemini API via google-generativeai.
Parameters
GOOGLE_API_KEY if None. optional "gemini-1.5-pro". optional OllamaProvider
Connects to a local Ollama server. No API key required — fully air-gapped.
Parameters
"http://localhost:11434". optional "llama3.1:8b". Must be pulled in Ollama first. optional VLLMProvider
Connects to a vLLM server using guided_json constrained decoding — the most reliable structured output method for open models.
Parameters
VLLM_API_BASE if None. optional None. optional FallbackProvider
Wraps two providers. Transparently retries on InferenceError. Both providers must implement BaseLLMProvider.
Provider factory
from structure_d.inference.providers import get_provider, resolve_provider
# By name — returns a configured instance
provider = get_provider("openai", api_key="sk-...", model="gpt-4o-mini")
provider = get_provider("anthropic")
provider = get_provider("ollama", model="mistral:7b")
# From config — reads settings and builds FallbackProvider if configured
from structure_d.config import get_settings
provider = resolve_provider(get_settings()) ModelRegistry
Manages the catalogue of available models loaded from configs/models.yaml.
from structure_d.models.registry import ModelRegistry, ModelEntry
from structure_d.schemas.base import TaskType
registry = ModelRegistry.from_yaml("configs/models.yaml")
# Query
registry.list_models() # list[ModelEntry]
registry.get("llama-3.1-8b") # ModelEntry | None
registry.get_by_task(TaskType.EXTRACTION) # list[ModelEntry]
registry.get_default_for_task(TaskType.REASONING) # ModelEntry | None
registry.get_multimodal() # list[ModelEntry]
# Register a custom model
registry.register(ModelEntry(
name="mistralai/Mixtral-8x7B-Instruct-v0.1",
alias="mixtral-8x7b",
tasks=[TaskType.EXTRACTION, TaskType.SUMMARISATION],
size_b=46.7,
quantisation=None,
max_context=32768,
cost_per_1k_tokens=0.03,
supports_structured_output=True,
multimodal=False,
)) ModelRouter
Select the best model for a task based on registered criteria.
Parameters
Registered models
Default models in configs/models.yaml:
| Alias | Model | Size | Default for | Cost/1k |
|---|---|---|---|---|
qwen-1.5b | Qwen/Qwen2.5-1.5B-Instruct | 1.5B | classification, sentiment | $0.001 |
mistral-7b | mistralai/Mistral-7B-Instruct-v0.3 | 7B | classification, extraction | $0.005 |
llama-3.1-8b | meta-llama/Llama-3.1-8B-Instruct | 8B | extraction, classification, summarisation | $0.008 |
llama-3.1-13b | meta-llama/Llama-3.1-13B-Instruct | 13B | extraction, summarisation | $0.015 |
deepseek-r1-70b | deepseek-ai/DeepSeek-R1-Distill-Llama-70B | 70B AWQ | extraction, reasoning | $0.06 |
llama-3.1-70b | meta-llama/Llama-3.1-70B-Instruct | 70B GPTQ | extraction, reasoning | $0.06 |
qwen-vl-7b | Qwen/Qwen2-VL-7B-Instruct | 7B multimodal | multimodal | $0.01 |
Add your own models
Add entries to configs/models.yaml to register custom or fine-tuned models. The router will include them in its selection pool.