Basics
Choose model providers to test with
Empirical can test how different models and model configurations work for your application. You can define which models and configurations to test in the configuration file.
Empirical supports two types of model providers:
model
: API calls to off-the-shelf LLMs, like OpenAI’s GPT4py-script
: Custom models or applications defined by a Python module
To configure a custom model with Python, see the Python guide.
The rest of this doc focuses on the model
type.
Run configuration for LLMs
To test an LLM, specify the following properties in the configuration:
provider
: Name of the inference provider (e.g.openai
, or other supported providers)model
: Name of the model (e.g.gpt-3.5-turbo
orclaude-3-haiku
)prompt
: Prompt sent to the model, with optional placeholdersname
[optional]: A name or label for this run (auto-generated if not specified)
You can configure as many model providers as you like. These models will be shown in a side-by-side comparison view in the web reporter.
Placeholders
Define placeholders in the prompt with Handlebars syntax (like {{user_name}}
) to inject values
from the dataset sample. These placeholders will be replaced with the corresponding input value
during execution.
See dataset to learn more about sample inputs.
Supported providers
Provider | Description |
---|---|
openai | All chat models are supported. Requires OPENAI_API_KEY environment variable. |
azure-openai | All chat models from OpenAI that are hosted on Azure are supported. Requires AZURE_OPENAI_API_KEY and AZURE_OPENAI_RESOURCE_NAME environment variables. |
anthropic | Claude 3 models are supported. Requires ANTHROPIC_API_KEY environment variable. |
mistral | All chat models are supported. Requires MISTRAL_API_KEY environment variable. |
google | Gemini Pro models are supported. Requires GOOGLE_API_KEY environment variable. |
fireworks | Models hosted on Fireworks (e.g. dbrx-instruct ) are supported. Requires FIREWORKS_API_KEY environment variable. |
Environment variables
API calls to model providers require API keys, which are stored as environment variables. The CLI can work with:
- Existing environment variables (using
process.env
) - Environment variables defined in
.env
or.env.local
files, in the current working directory- For .env files that are located elsewhere, you can pass the
--env-file
flag
- For .env files that are located elsewhere, you can pass the
Model parameters
To override parameters like temperature
or max_tokens
, you can pass parameters
alongwith the provider
configuration. All OpenAI parameters (see their API reference)
are supported, except for a few limitations.
For non-OpenAI models, we coerce these parameters to the most appropriate target parameter (e.g. stop
in OpenAI
becomes stop_sequences
for Anthropic.)
You can add other parameters or override this behavior with passthrough.
Passthrough
If your models rely on other parameters, you can still specify them in the configuration. These parameters will be passed as-is to the model.
For example, Mistral models support a safePrompt
parameter for guardrailing.
Configuring request timeout
You can set the timeout duration in milliseconds under model parameters in the empiricalrc.json
file. This might be required for prompt completions that are expected to take more time, for example while running models like Claude Opus. If no specific value is assigned, the default timeout duration of 30 seconds will be applied.
Limitations
- These parameters are not supported today:
logit_bias
,tools
,tool_choice
,user
,stream
If this limitation is blocking your use of Empirical, please file a feature request.