Empirical can test how different models and model configurations work for your application. You can define which models and configurations to test in the configuration file.

Empirical supports two types of model providers:

  • model: API calls to off-the-shelf LLMs, like OpenAI’s GPT4
  • py-script: Custom models or applications defined by a Python module

To configure a custom model with Python, see the Python guide.

The rest of this doc focuses on the model type.

Run configuration for LLMs

To test an LLM, specify the following properties in the configuration:

  • provider: Name of the inference provider (e.g. openai, or other supported providers)
  • model: Name of the model (e.g. gpt-3.5-turbo or claude-3-haiku)
  • prompt: Prompt sent to the model, with optional placeholders
  • name [optional]: A name or label for this run (auto-generated if not specified)

You can configure as many model providers as you like. These models will be shown in a side-by-side comparison view in the web reporter.

empiricalrc.json
"runs": [
  {
    "type": "model",
    "provider": "openai",
    "model": "gpt-3.5-turbo",
    "prompt": "Hey I'm {{user_name}}"
  }
]

Placeholders

Define placeholders in the prompt with Handlebars syntax (like {{user_name}}) to inject values from the dataset sample. These placeholders will be replaced with the corresponding input value during execution.

See dataset to learn more about sample inputs.

Supported providers

ProviderDescription
openaiAll chat models are supported. Requires OPENAI_API_KEY environment variable.
azure-openaiAll chat models from OpenAI that are hosted on Azure are supported. Requires AZURE_OPENAI_API_KEY and AZURE_OPENAI_RESOURCE_NAME environment variables.
anthropicClaude 3 models are supported. Requires ANTHROPIC_API_KEY environment variable.
mistralAll chat models are supported. Requires MISTRAL_API_KEY environment variable.
googleGemini Pro models are supported. Requires GOOGLE_API_KEY environment variable.
fireworksModels hosted on Fireworks (e.g. dbrx-instruct) are supported. Requires FIREWORKS_API_KEY environment variable.

Environment variables

API calls to model providers require API keys, which are stored as environment variables. The CLI can work with:

  • Existing environment variables (using process.env)
  • Environment variables defined in .env or .env.local files, in the current working directory
    • For .env files that are located elsewhere, you can pass the --env-file flag
npx @empiricalrun/cli --env-file <PATH_TO_ENV_FILE>

Model parameters

To override parameters like temperature or max_tokens, you can pass parameters alongwith the provider configuration. All OpenAI parameters (see their API reference) are supported, except for a few limitations.

For non-OpenAI models, we coerce these parameters to the most appropriate target parameter (e.g. stop in OpenAI becomes stop_sequences for Anthropic.)

You can add other parameters or override this behavior with passthrough.

empiricalrc.json
"runs": [
  {
    "type": "model",
    "provider": "openai",
    "model": "gpt-3.5-turbo",
    "prompt": "Hey I'm {{user_name}}",
    "parameters": {
      "temperature": 0.1
    }
  }
]

Passthrough

If your models rely on other parameters, you can still specify them in the configuration. These parameters will be passed as-is to the model.

For example, Mistral models support a safePrompt parameter for guardrailing.

empiricalrc.json
"runs": [
  {
    "type": "model",
    "provider": "mistral",
    "model": "mistral-tiny",
    "prompt": "Hey I'm {{user_name}}",
    "parameters": {
      "temperature": 0.1,
      "safePrompt": true
    }
  }
]

Configuring request timeout

You can set the timeout duration in milliseconds under model parameters in the empiricalrc.json file. This might be required for prompt completions that are expected to take more time, for example while running models like Claude Opus. If no specific value is assigned, the default timeout duration of 30 seconds will be applied.

empiricalrc.json
"runs": [
  {
    "type": "model",
    "provider": "anthropic",
    "model": "claude-3-opus",
    "prompt": "Hey I'm {{user_name}}",
    "parameters": {
      "timeout": 10000
    }
  }
]

Limitations

  • These parameters are not supported today: logit_bias, tools, tool_choice, user, stream

If this limitation is blocking your use of Empirical, please file a feature request.