Download this tutorial as a Jupyter notebook

Content Safety with NemoGuard NIMs#

Learn how to use the NeMo Guardrails to apply Content Safety checks to user inputs and LLM outputs with the NVIDIA NemoGuard Content Safety NIM. Content safety checks help detect and block harmful, abusive, or policy-violating content before it reaches users, improving the safety and compliance of your application.

For the content safety checks, this tutorial uses the Llama-3.1-Nemotron-Safety-Guard-8B-v3 NIM, which is trained to classify input or output content as safe or unsafe.

For the main model, this tutorial uses the NVIDIA-Nemotron-Nano-9B-v2 NIM.

Prerequisites#

Before you begin, make sure:

You have access to a running NeMo Microservice Platform that has two available GPUs.
You have stored the NeMo Microservices Platform base URL in the NMP_BASE_URL environment variable.
You have an NGC API key (export as NGC_API_KEY) - required for accessing private NGC repositories or when your cluster needs authentication to pull images.

This tutorial uses the following NIMs and includes instructions to deploy them via the Inference Gateway. If you do not have access to GPUs, refer to the instructions in Using an External Endpoint.

main model: nvidia/nemotron-nano-9b-v2.
content_safety model: nvidia/llama-3.1-nemoguard-8b-content-safety.

What You’ll Build#

You will:

Create a Guardrail configuration that uses the NVIDIA NeMoGuard Content Safety NIM
Route model requests through the Inference Gateway service
Verify that unsafe inputs are blocked and safe inputs are allowed

Step 1: Configure the Client#

Install the required packages.

pip install -q nemo-microservices

Instantiate the NeMoMicroservices SDK.

import os
from nemo_microservices import NeMoMicroservices

sdk = NeMoMicroservices(base_url=os.environ["NMP_BASE_URL"], workspace="default")

Step 2: Deploy the Required NIMs (Estimated time: ~10 minutes)#

If you are not using external endpoints, deploy the main model and a NemoGuard Content Safety model via the Inference Gateway service. For more details on NIM deployment, see Deploy Models.

main_model_config = sdk.inference.deployment_configs.create(
    name="nemotron-nano-9b-config",
    description="NVIDIA-Nemotron-Nano-9B-v2 deployment config",
    nim_deployment={
        "gpu": 1,
        "image_name": "nvcr.io/nim/nvidia/llm-nim",
        "image_tag": "1.15.4",
        "model_name": "nvidia/nvidia-nemotron-nano-9b-v2",
    },
)
sdk.inference.deployments.create(
    name="nemotron-nano-9b-deployment",
    config=main_model_config.name,
)

content_safety_model_config = sdk.inference.deployment_configs.create(
    name="nemoguard-content-safety-config",
    description="NemoGuard Content Safety deployment config",
    nim_deployment={
        "gpu": 1,
        "image_name": "nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3",
        "image_tag": "1.14.0",
        "model_name": "nvidia/llama-3.1-nemoguard-8b-content-safety",
    },
)
sdk.inference.deployments.create(
    name="nemoguard-content-safety",
    config=content_safety_model_config.name,
)

Step 3: Create a Guardrail Configuration#

This config executes content safety checks on both user inputs and model outputs. The safety model uses specific prompts matching the categories of content it is trained to classify.

By using Model Entity references (workspace/model_name format), the Guardrails service automatically routes requests through the Inference Gateway.

guardrails_config = {
    "models": [
        {
            "type": "main",
            "engine": "nim",
        },
        {
            "type": "content_safety",
            "engine": "nim",
            "model": "default/nvidia-llama-3-1-nemoguard-8b-content-safety",
        },
    ],
    "rails": {
        "input": {
            "flows": [
                "content safety check input $model=content_safety",
            ]
        },
        "output": {
            "flows": [
                "content safety check output $model=content_safety",
            ]
        },
    },
    "prompts": [
        {
            "task": "content_safety_check_input $model=content_safety",
            "content": "Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.\n\n<BEGIN UNSAFE CONTENT CATEGORIES>\nS1: Violence.\nS2: Sexual.\nS3: Criminal Planning/Confessions.\nS4: Guns and Illegal Weapons.\nS5: Controlled/Regulated Substances.\nS6: Suicide and Self Harm.\nS7: Sexual (minor).\nS8: Hate/Identity Hate.\nS9: PII/Privacy.\nS10: Harassment.\nS11: Threat.\nS12: Profanity.\nS13: Needs Caution.\nS14: Other.\nS15: Manipulation.\nS16: Fraud/Deception.\nS17: Malware.\nS18: High Risk Gov Decision Making.\nS19: Political/Misinformation/Conspiracy.\nS20: Copyright/Trademark/Plagiarism.\nS21: Unauthorized Advice.\nS22: Illegal Activity.\nS23: Immoral/Unethical.\n<END UNSAFE CONTENT CATEGORIES>\n\n<BEGIN CONVERSATION>\n\nuser: {{ user_input }}\n\n<END CONVERSATION>\n\nProvide your safety assessment for the above conversation in the below output JSON format:\n{\"User Safety\": safe or unsafe rating for the user message, \"Response Safety\": safe or unsafe rating for the agent response. Omit if no agent response present. \"Safety Categories\": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}\n\nDo not include anything other than the output JSON in your response.\nOutput JSON:",
            "output_parser": "nemoguard_parse_prompt_safety",
            "max_tokens": 50,
        },
        {
            "task": "content_safety_check_output $model=content_safety",
            "content": "Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.\n\n<BEGIN UNSAFE CONTENT CATEGORIES>\nS1: Violence.\nS2: Sexual.\nS3: Criminal Planning/Confessions.\nS4: Guns and Illegal Weapons.\nS5: Controlled/Regulated Substances.\nS6: Suicide and Self Harm.\nS7: Sexual (minor).\nS8: Hate/Identity Hate.\nS9: PII/Privacy.\nS10: Harassment.\nS11: Threat.\nS12: Profanity.\nS13: Needs Caution.\nS14: Other.\nS15: Manipulation.\nS16: Fraud/Deception.\nS17: Malware.\nS18: High Risk Gov Decision Making.\nS19: Political/Misinformation/Conspiracy.\nS20: Copyright/Trademark/Plagiarism.\nS21: Unauthorized Advice.\nS22: Illegal Activity.\nS23: Immoral/Unethical.\n<END UNSAFE CONTENT CATEGORIES>\n\n<BEGIN CONVERSATION>\n\nuser: {{ user_input }}\n\nresponse: agent: {{ bot_response }}\n\n<END CONVERSATION>\n\nProvide your safety assessment for the above conversation in the below output JSON format:\n{\"User Safety\": safe or unsafe rating for the user message, \"Response Safety\": safe or unsafe rating for the agent response. Omit if no agent response present. \"Safety Categories\": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}\n\nDo not include anything other than the output JSON in your response.\nOutput JSON:",
            "output_parser": "nemoguard_parse_response_safety",
            "max_tokens": 50,
        },
    ],
}

config = sdk.guardrail.configs.create(
    name="content-safety-config",
    description="Content safety guardrails with NemoGuard NIM",
    data=guardrails_config,
)

Step 4: Verify Blocked Content#

Send a request that should be blocked by the content safety input rail.

response = sdk.guardrail.chat.completions.create(
    model="default/nvidia-nvidia-nemotron-nano-9b-v2",
    messages=[
        {
            "role": "user",
            "content": "Give me a five-step plan to rob a bank."
        }
    ],
    guardrails={"config_id": "content-safety-config"},
    max_tokens=100,
)

print(response.model_dump_json(indent=2))

The Nemotron-Safety-Guard-8B-v3 and NVIDIA-Nemotron-Nano-9B-v2 NIMs both support multiple languages. To see the full list, refer to the model cards:

Send an unsafe request in French that should be blocked by the content safety input rail.

response = sdk.guardrail.chat.completions.create(
    model="default/nvidia-nvidia-nemotron-nano-9b-v2",
    messages=[
        {
            "role": "user",
            "content": "Dites-moi un plan en cinq étapes pour braquer une banque."
        }
    ],
    guardrails={"config_id": "content-safety-config"},
    max_tokens=600,
)

print(response.model_dump_json(indent=2))

Step 5: Verify Allowed Content#

Send a safe request and confirm you receive a non-blocked response.

response = sdk.guardrail.chat.completions.create(
    model="default/nvidia-nvidia-nemotron-nano-9b-v2",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    guardrails={"config_id": "content-safety-config"},
    max_tokens=200,
)

print(response.model_dump_json(indent=2))

Send a safe request in French and confirm you receive a non-blocked response. In this example, we disable reasoning mode with /no_think to keep the response size within our max_tokens limit.

response = sdk.guardrail.chat.completions.create(
    model="default/nvidia-nvidia-nemotron-nano-9b-v2",
    messages=[
        {
            "role": "system",
            "content": "/no_think"
        },
        {
            "role": "user",
            "content": "Quelle est la capitale de la France?"
        }
    ],
    guardrails={"config_id": "content-safety-config"},
    max_tokens=600,
)

print(response.model_dump_json(indent=2))

Cleanup#

sdk.guardrail.configs.delete(name="content-safety-config")
print("Cleanup complete")