Download this tutorial as a Jupyter notebook

Injection Detection with Guardrails#

Detect potential exploitation attempts (such as code injection, cross-site scripting, SQL injection, and template injection) using NeMo Guardrails.

About Injection Detection#

Injection detection is primarily intended for agentic systems as part of a defense-in-depth strategy.

The first part of injection detection is YARA rules. A YARA rule specifies a set of strings (text or binary patterns) to match and a Boolean expression that specifies the rule logic. YARA rules are familiar to many security teams and are easy to audit.

The second part of injection detection is choosing an action when a rule is triggered. You can choose to reject the response and return a refusal like: “I’m sorry, the desired output triggered rule(s) designed to mitigate exploitation of {detections}.” Rejecting the output is the safest action and most appropriate for production deployments. As an alternative, you can omit only the triggering text.

About the Tutorial#

This tutorial demonstrates how to configure basic YARA rules that are part of the NeMo Guardrails toolkit. You can view the default rules in the yara_rules directory. The default rules support SQL injection, cross-site scripting (XSS), Jinja template injection, and Python code that uses shells, networking, and more.

The tutorial uses the Meta Llama 3.3 70B Instruct model as the main LLM. The model is available as a downloadable container from NVIDIA NGC.

Prerequisites#

Before you begin, make sure:

You have access to a running NeMo Microservice Platform that has one available GPU.
The NMP_BASE_URL environment variable is set to your platform base URL.
You have an NGC API key (export as NGC_API_KEY) - required for accessing private NGC repositories or when your cluster needs authentication to pull images.

This tutorial uses the following NIMs and includes instructions to deploy them via the Inference Gateway. If you do not have access to GPUs, refer to the instructions in Using an External Endpoint.

main model: meta/llama-3.3-70b-instruct.

Step 1: Configure the Client#

Install the required packages.

pip install -q nemo-microservices

Instantiate the NeMoMicroservices SDK.

import os
from nemo_microservices import NeMoMicroservices

sdk = NeMoMicroservices(base_url=os.environ["NMP_BASE_URL"], workspace="default")

Step 2: Deploy the Main LLM#

Deploy the main LLM using the Models Service and Inference Gateway.

llm_config = sdk.inference.deployment_configs.create(
    name="llama-3-3-70b-config",
    description="Llama 3.3 70B Instruct deployment config",
    nim_deployment={
        "gpu": 1,
        "image_name": "nvcr.io/nim/meta/llama-3.3-70b-instruct",
        "image_tag": "1.8.6",
        "model_name": "meta/llama-3.3-70b-instruct",
    },
)
sdk.inference.deployments.create(
    name="llama-3.3-70b-deployment",
    config=llm_config.name,
)

Step 3: Create a Guardrail Configuration#

This config enables injection detection and applies it to model output.

guardrails_config = {
    "models": [
        {
            "type": "main",
            "engine": "nim",
        }
    ],
    "rails": {
        "config": {
            "injection_detection": {
                "injections": ["code", "sqli", "template", "xss"],
                "action": "reject",
            }
        },
        "output": {"flows": ["injection detection"]},
    },
}

config = sdk.guardrail.configs.create(
    name="injection-detection-config",
    description="Injection detection guardrails",
    data=guardrails_config,
)

The rails.config.injection_detection field configures how to apply the injection detection rules. It supports the following fields:

Field

Type

Description

Default value

injections

array

Specifies the injection detection rules to use. The following injections are part of the library:

code - Python code using shells, networking, etc.
sqli - SQL injection
template - Jinja template injection
xss - Cross-site scripting

[]

action

string

Action to take when injection is detected:

reject - Returns a message to the user indicating that the query could not be handled and they should try again.
omit - Returns the model response, removing the offending detected content.

reject

yara_rules

object

Specifies inline YARA rules. The field is a dictionary that maps a rule name (string) to the rule content (string).

None

Step 4: Verify Blocked Content#

Send a possible unsafe request for Python code that uses networking packages that is likely to trigger injection detection:

response = sdk.guardrail.chat.completions.create(
    model="default/meta-llama-3-3-70b-instruct",
    messages=[
        {
            "role": "user",
            "content": "Write a Python script that uses requests and urllib to fetch weather data."
        }
    ],
    guardrails={"config_id": "injection-detection-config"},
    max_tokens=200,
)

print(response.model_dump_json(indent=2))

Step 5: Verify Allowed Content#

Send a safe request and confirm you receive a normal response:

response = sdk.guardrail.chat.completions.create(
    model="default/meta-llama-3-3-70b-instruct",
    messages=[
        {"role": "user", "content": "Tell me about Cape Hatteras National Seashore in 50 words or less."}
    ],
    guardrails={"config_id": "injection-detection-config"},
    max_tokens=100,
)

print(response.model_dump_json(indent=2))

Specify Inline Rules (Optional)#

You can provide custom YARA rules inline. The example below performs a case-insensitive check for the word “Ethernet” and rejects the response if it appears.

inline_rules_config = sdk.guardrail.configs.create(
    name="injection-detection-inline-config",
    description="Injection detection with inline YARA rules",
    data={
        "rails": {
            "config": {
                "injection_detection": {
                    "yara_rules": {
                      "reject_ethernet": "rule reject_ethernet {\n   strings:\n      $string = \"ethernet\" nocase\n   condition:\n      $string\n}"
                    },
                    "action": "reject",
                }
            },
            "output": {"flows": ["injection detection"]},
        },
    },
)

Send a request that contains the word “ethernet”, which triggers the rule:

response = sdk.guardrail.chat.completions.create(
    model="default/meta-llama-3-3-70b-instruct",
    messages=[{"role": "user", "content": "Explain Ethernet headers."}],
    guardrails={"config_id": "injection-detection-inline-config"},
    max_tokens=100,
)

print(response.model_dump_json(indent=2))

Cleanup#

sdk.guardrail.configs.delete(name="injection-detection-config")
sdk.guardrail.configs.delete(name="injection-detection-inline-config")
print("Cleanup complete")