Guardrail Configurations#

Guardrail configurations define the safety policies that protect your LLM interactions. A configuration specifies which safety checks (rails) to apply, the models to use, and the prompts to use for the safety checks.

Configuration Structure#

A guardrail configuration contains several properties that customize how the service interacts with models and applies safety checks. Most configs will define the following core components:

Models – The models to use for each task.
Prompts – Prompt templates to use for each task.
Rails – Configuration that defines how to apply checks on the user input or LLM output.

Models#

A model configuration defines the LLM to use for a specific task. It consists of the following fields:

type: The task the model is used for.
engine: The model provider. For most cases, use nim.
model: The name of the model to use for the task. For the main model, an incoming Guardrails request can override the model.
parameters: Additional properties to configure interacting with the model. The following fields are supported for all model types:
- base_url: The URL to use for inference with this model.
- custom_headers: Custom HTTP headers to include in requests to this model. Each key-value pair represents a header name (key) and its default value (value). At inference-time, you can override the default value for a header by including it in the request headers.

The model’s type determines the task it’s used for during the guardrail process.

The main model is the model that an end user uses for chat and chat-like interactions.

You can also configure task-specific models for any task that occurs during the guardrail process. The following are common tasks that can be configured:

content_safety: Content Safety check for detecting harmful content.
topic_control: Topic Control check for keeping conversations on-topic.
jailbreak: Jailbreak detection check.
self_check_input: Safety check that automatically checks the user input using the main model for inference.
self_check_output: Safety check that automatically checks the final LLM output using the main model for inference.

models = [
    {
        "type": "content_safety",
        "engine": "nim",
        "model": "default/nvidia-llama-3-1-nemoguard-8b-content-safety"
    },
    {
        "type": "topic_control",
        "engine": "nim",
        "model": "default/nvidia-llama-3-1-nemoguard-8b-topic-control"
    }
]

Tip

Model Entities are automatically created when you:

Deploy a NIM via the Inference Gateway (see Deploy Models)
Create a Model Provider pointing to an external API (see Tutorials)

When using models deployed via the Inference Gateway, NeMo Guardrails automatically routes requests through Inference Gateway. You don’t need to explicitly set a Base URL for your model.

Use sdk.models.list() to see available Model Entities in your workspace.

Using Direct URLs#

Inference Gateway is the recommended approach for interacting with models. If you require a direct connection to specific endpoints, you can explicitly set parameters.base_url:

models = [
    {
        "type": "main",
        "engine": "nim",
        "parameters": {
            "base_url": "http://my-local-nim:8000/v1"
        }
    },
    {
        "type": "content_safety",
        "engine": "nim",
        "model": "nvidia/llama-3.1-nemoguard-8b-content-safety",
        "parameters": {
            "base_url": "http://my-content-safety-nim:8000/v1"
        }
    }
]

Prompts#

A prompt is used by the model during a task to evaluate a message. It consists of the following fields:

task: The task to apply the prompt to.
content: The content of the prompt.
- Prompts that require a dynamic input variable(s) use Jinja2 templating. For example, the {{ user_input }} variable is replaced with the end user’s input at runtime.
output_parser: Name of output parser to process the model’s response.
max_tokens: Maximum number of tokens the model can generate.
max_length: Maximum prompt length in characters. When the maximum length is exceeded, the prompt is truncated by removing older turns from the conversation history until the length of the prompt is less than or equal to the maximum length. The default is 16000 characters.

For Content Safety and Topic Control checks, prompts must include the model reference in the task name:

{
    "task": "content_safety_check_input $model=content_safety",
    "content": "Task: Check for unsafe content...",
    "output_parser": "nemoguard_parse_prompt_safety",
    "max_tokens": 50
},
{
    "task": "topic_safety_check_input $model=topic_control",
    "content": "Ensure the user messages meet the following guidelines: ...",
    "max_tokens": 50
}

Rails#

Rails specify which flows to apply to the user input and LLM output. The two most common rail types are:

Input rails – Check the user’s messages before they reach the LLM.
Output rails – Check the LLM’s response before returning it to the user.

rails = {
    "input": {
        "flows": ["self check input"],
        "parallel": False
    },
    "output": {
        "flows": ["self check output"],
        "streaming": {
            "enabled": True,
            "chunk_size": 200
        }
    }
}

A guardrail configuration also supports general options that can be customized.

General Instructions#

Instructions provide context to the model about the expected behavior. It gets appended to the beginning of every prompt (similar to a system prompt).

instructions = [
    {
        "type": "general",
        "content": """You are a customer service bot for ABC Company.
You answer questions about products and policies.
If you don't know an answer, say so honestly.
Always be polite and professional."""
    }
]

Sample Conversation#

The sample conversation sets the tone for how the conversation between the user and the bot should go. It will help the LLM learn better the format, the tone of the conversation, and how verbose responses should be. This section should have a minimum of two turns. This sample conversation is appended to every prompt, so it is recommended to keep it short and relevant.

sample_conversation = """user "Hello there!"
  express greeting
bot express greeting
  "Hello! How can I assist you today?"
user "What can you do for me?"
  ask about capabilities
bot respond about capabilities
  "As an AI assistant, I can help provide more information on NeMo Guardrails toolkit. This includes question answering on how to set it up, use it, and customize it for your application."
user "Tell me a bit about the what the toolkit can do?"
  ask general question
bot response for general question
  "NeMo Guardrails provides a range of options for quickly and easily adding programmable guardrails to LLM-based conversational systems. The toolkit includes examples on how you can create custom guardrails and compose them together."
user "what kind of rails can I include?"
  request more information
bot provide more information
  "You can include guardrails for detecting and preventing offensive language, helping the bot stay on topic, do fact checking, perform output moderation. Basically, if you want to control the output of the bot, you can do it with guardrails."
user "thanks"
  express appreciation
bot express appreciation and offer additional help
  "You're welcome. If you have any more questions or if there's anything else I can help you with, please don't hesitate to ask."
"""

Managing Configurations#

Guardrail configuration management operations (create, update, retrieve, list, delete) are available through sdk.guardrail.configs.

Setup#

from nemo_microservices import NeMoMicroservices

sdk = NeMoMicroservices(base_url="http://localhost:8080", workspace="default")

Create a Configuration#

Create a new guardrail configuration for a workspace.

config_data = {
    "prompts": [
        {
            "task": "self_check_input",
            "content": 'Your task is to check if the user message below complies with company policy.\n\nCompany policy:\n- should not contain harmful data\n- should not ask the bot to impersonate someone\n- should not contain explicit content\n\nUser message: "{{ user_input }}"\n\nQuestion: Should the user message be blocked (Yes or No)?\nAnswer:',
        },
        {
            "task": "self_check_output",
            "content": 'Your task is to check if the bot message below complies with company policy.\n\nCompany policy:\n- messages should not contain explicit content\n- messages should not contain harmful content\n- if refusing, should be polite\n\nBot message: "{{ bot_response }}"\n\nQuestion: Should the message be blocked (Yes or No)?\nAnswer:',
        },
    ],
    "instructions": [
        {
            "type": "general",
            "content": "Below is a conversation between a user and a helpful assistant bot.",
        }
    ],
    "rails": {
        "input": {"flows": ["self check input"]},
        "output": {"flows": ["self check output"]},
    },
}

config = sdk.guardrail.configs.create(
    name="my-guardrail-config",
    description="Self-check guardrail configuration",
    data=config_data,
)

List Configurations#

List configurations in the given workspace.

# List all configurations in a workspace
configs = sdk.guardrail.configs.list()

print(f"Found {len(configs.data)} configurations:")
for c in configs.data:
    print(f"{c.name}: {c.description}")

Get Configuration Details#

Retrieve a specific configuration in the given workspace by name.

# Retrieve a specific configuration
config = sdk.guardrail.configs.retrieve(
    name="my-guardrail-config",
)

Update a Configuration#

Update one or more fields on an existing configuration.

# Update specific fields (partial update)
updated_config = sdk.guardrail.configs.update(
    name="my-guardrail-config",
    description="Updated guardrail configuration",
    data={
        "rails": {
            "output": {
                "streaming": {"enabled": True, "chunk_size": 300, "context_size": 100}
            }
        }
    },
)

print(f"✅ Updated config: {updated_config.name}")

Delete a Configuration#

Delete a configuration in the given workspace by name.

# Delete a configuration
sdk.guardrail.configs.delete(
    name="my-guardrail-config",
)

print("✅ Config deleted")