{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "(guardrails-tutorials-parallel-rails)=\n", "# Parallel Execution of Input and Output Rails\n", "\n", "Run input and output rails in parallel to improve the response time of guardrail checks. This tutorial shows how to enable parallel rails using the {{ngm_short_name}} Python SDK.\n", "\n", "## When to Use Parallel Rails Execution\n", "\n", "Parallel execution is most effective for the following:\n", "\n", "- I/O-bound rails, such as external API calls to models or third-party integrations.\n", "- Independent input or output rails without shared state dependencies.\n", "- Production environments where response latency affects user experience and business metrics.\n", "\n", "```{note}\n", "Input rail mutations can lead to erroneous results during parallel execution because of race conditions that arise from the execution order and timing of parallel operations. This can result in output divergence compared to sequential execution. For such cases, use sequential mode.\n", "```\n", "\n", "## When Not to Use Parallel Rails Execution\n", "\n", "Sequential execution is recommended for the following:\n", "\n", "- CPU-bound rails; it might not improve performance and can introduce overhead.\n", "- Development and testing for debugging and simpler workflows.\n", "\n", "---\n", "\n", "## Prerequisites\n", "\n", "Before you begin, make sure:\n", "\n", "- You have access to a running NeMo Microservice Platform that has three available GPUs.\n", "- You have stored the NeMo Microservices Platform base URL in the `NMP_BASE_URL` environment variable.\n", "- You have an NGC API key (export as `NGC_API_KEY`) - required for accessing private NGC repositories or when your cluster needs authentication to pull images.\n", "\n", "This tutorial uses the following NIMs and includes instructions to deploy them via the Inference Gateway. If you do not have access to GPUs, refer to the instructions in {ref}`Using an External Endpoint `.\n", "- `main` model: `meta/llama-3.3-70b-instruct`.\n", "- `content_safety` model: `nvidia/llama-3.1-nemoguard-8b-content-safety`.\n", "- `topic_control` model: `nvidia/llama-3.1-nemoguard-8b-topic-control`.\n", "\n", "---\n", "\n", "## Step 1: Configure the Client\n", "\n", "Install the required packages." ] }, { "cell_type": "code", "metadata": { "language": "sh", "vscode": { "languageId": "shellscript" } }, "source": [ "pip install -q nemo-microservices" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instantiate the NeMoMicroservices SDK." ] }, { "cell_type": "code", "metadata": {}, "source": [ "import os\n", "from nemo_microservices import NeMoMicroservices\n", "\n", "sdk = NeMoMicroservices(base_url=os.environ[\"NMP_BASE_URL\"], workspace=\"default\")" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## Step 2: Deploy the Required Models\n", "\n", "Deploy the main NIM and NeMoGuard NIMs using the Models Service and Inference Gateway." ] }, { "cell_type": "code", "metadata": {}, "source": [ "# Main NIM\n", "llm_config = sdk.inference.deployment_configs.create(\n", " name=\"llama-3-3-70b-config\",\n", " description=\"Llama 3.3 70B Instruct deployment config\",\n", " nim_deployment={\n", " \"image_name\": \"nvcr.io/nim/nvidia/llm-nim\",\n", " \"image_tag\": \"1.15.4\",\n", " \"model_name\": \"meta/llama-3.3-70b-instruct\",\n", " \"gpu\": 1,\n", " },\n", ")\n", "\n", "sdk.inference.deployments.create(\n", " name=\"llama-3.3-70b-deployment\",\n", " config=llm_config.name,\n", ")\n", "\n", "# NemoGuard Content Safety\n", "cs_config = sdk.inference.deployment_configs.create(\n", " name=\"nemoguard-content-safety-config\",\n", " description=\"NemoGuard content safety deployment config\",\n", " nim_deployment={\n", " \"image_name\": \"nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3\",\n", " \"image_tag\": \"1.14.0\",\n", " \"model_name\": \"nvidia/llama-3.1-nemoguard-8b-content-safety\",\n", " \"gpu\": 1,\n", " },\n", ")\n", "\n", "sdk.inference.deployments.create(\n", " name=\"nemoguard-content-safety\",\n", " config=cs_config.name,\n", ")\n", "\n", "# NemoGuard Topic Control\n", "tc_config = sdk.inference.deployment_configs.create(\n", " name=\"nemoguard-topic-control-config\",\n", " description=\"NemoGuard topic control deployment config\",\n", " nim_deployment={\n", " \"image_name\": \"nvcr.io/nim/nvidia/llama-3.1-nemoguard-topic-control-8b\",\n", " \"image_tag\": \"1.14.0\",\n", " \"model_name\": \"nvidia/llama-3.1-nemoguard-8b-topic-control\",\n", " \"gpu\": 1,\n", " },\n", ")\n", "\n", "sdk.inference.deployments.create(\n", " name=\"nemoguard-topic-control\",\n", " config=tc_config.name,\n", ")" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## Step 3: Create a Guardrail Configuration\n", "\n", "Create a configuration that enables parallel execution for input and output rails. This example uses both content safety and topic control." ] }, { "cell_type": "code", "metadata": {}, "source": [ "guardrails_config = {\n", " \"models\": [\n", " {\n", " \"type\": \"main\",\n", " \"engine\": \"nim\",\n", " },\n", " {\n", " \"type\": \"content_safety\",\n", " \"engine\": \"nim\",\n", " \"model\": \"default/nvidia-llama-3-1-nemoguard-8b-content-safety\",\n", " },\n", " {\n", " \"type\": \"topic_control\",\n", " \"engine\": \"nim\",\n", " \"model\": \"default/nvidia-llama-3-1-nemoguard-8b-topic-control\",\n", " },\n", " ],\n", " \"rails\": {\n", " \"input\": {\n", " \"parallel\": True,\n", " \"flows\": [\n", " \"content safety check input $model=content_safety\",\n", " \"topic safety check input $model=topic_control\",\n", " ],\n", " },\n", " \"output\": {\n", " \"parallel\": True,\n", " \"flows\": [\n", " \"content safety check output $model=content_safety\",\n", " \"topic safety check output $model=topic_control\",\n", " ],\n", " },\n", " },\n", " \"prompts\": [\n", " {\n", " \"task\": \"content_safety_check_input $model=content_safety\",\n", " \"content\": \"Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.\\n\\n\\nS1: Violence.\\nS2: Sexual.\\nS3: Criminal Planning/Confessions.\\nS4: Guns and Illegal Weapons.\\nS5: Controlled/Regulated Substances.\\nS6: Suicide and Self Harm.\\nS7: Sexual (minor).\\nS8: Hate/Identity Hate.\\nS9: PII/Privacy.\\nS10: Harassment.\\nS11: Threat.\\nS12: Profanity.\\nS13: Needs Caution.\\nS14: Other.\\nS15: Manipulation.\\nS16: Fraud/Deception.\\nS17: Malware.\\nS18: High Risk Gov Decision Making.\\nS19: Political/Misinformation/Conspiracy.\\nS20: Copyright/Trademark/Plagiarism.\\nS21: Unauthorized Advice.\\nS22: Illegal Activity.\\nS23: Immoral/Unethical.\\n\\n\\n\\n\\nuser: {{ user_input }}\\n\\n\\n\\nProvide your safety assessment for the above conversation in the below output JSON format:\\n{\\\"User Safety\\\": safe or unsafe rating for the user message, \\\"Response Safety\\\": safe or unsafe rating for the agent response. Omit if no agent response present. \\\"Safety Categories\\\": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}\\n\\nDo not include anything other than the output JSON in your response.\\nOutput JSON:\",\n", " \"output_parser\": \"nemoguard_parse_prompt_safety\",\n", " \"max_tokens\": 50,\n", " },\n", " {\n", " \"task\": \"content_safety_check_output $model=content_safety\",\n", " \"content\": \"Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.\\n\\n\\nS1: Violence.\\nS2: Sexual.\\nS3: Criminal Planning/Confessions.\\nS4: Guns and Illegal Weapons.\\nS5: Controlled/Regulated Substances.\\nS6: Suicide and Self Harm.\\nS7: Sexual (minor).\\nS8: Hate/Identity Hate.\\nS9: PII/Privacy.\\nS10: Harassment.\\nS11: Threat.\\nS12: Profanity.\\nS13: Needs Caution.\\nS14: Other.\\nS15: Manipulation.\\nS16: Fraud/Deception.\\nS17: Malware.\\nS18: High Risk Gov Decision Making.\\nS19: Political/Misinformation/Conspiracy.\\nS20: Copyright/Trademark/Plagiarism.\\nS21: Unauthorized Advice.\\nS22: Illegal Activity.\\nS23: Immoral/Unethical.\\n\\n\\n\\n\\nuser: {{ user_input }}\\n\\nresponse: agent: {{ bot_response }}\\n\\n\\n\\nProvide your safety assessment for the above conversation in the below output JSON format:\\n{\\\"User Safety\\\": safe or unsafe rating for the user message, \\\"Response Safety\\\": safe or unsafe rating for the agent response. Omit if no agent response present. \\\"Safety Categories\\\": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}\\n\\nDo not include anything other than the output JSON in your response.\\nOutput JSON:\",\n", " \"output_parser\": \"nemoguard_parse_response_safety\",\n", " \"max_tokens\": 50,\n", " },\n", " {\n", " \"task\": \"topic_safety_check_input $model=topic_control\",\n", " \"content\": \"You are a topic safety evaluator. Determine whether the user message is in scope for the allowed topics. Reply with `allow` or `block` and a short reason.\\n\\nAllowed topics:\\n- Renewable energy\\n- Sustainability\\n- Climate policy\\n\\nUser message: \\\"{{ user_input }}\\\"\\n\\nAnswer (allow or block):\",\n", " \"max_tokens\": 50,\n", " },\n", " {\n", " \"task\": \"topic_safety_check_output $model=topic_control\",\n", " \"content\": \"You are a topic safety evaluator. Determine whether the assistant response stays within the allowed topics. Reply with `allow` or `block` and a short reason.\\n\\nAllowed topics:\\n- Renewable energy\\n- Sustainability\\n- Climate policy\\n\\nUser message: \\\"{{ user_input }}\\\"\\n\\nAssistant response: \\\"{{ bot_response }}\\\"\\n\\nAnswer (allow or block):\",\n", " \"max_tokens\": 50,\n", " },\n", " ],\n", "}\n", "\n", "config = sdk.guardrail.configs.create(\n", " name=\"parallel-rails-config\",\n", " description=\"Parallel rails guardrail configuration\",\n", " data=guardrails_config,\n", ")" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## Step 4: Run Chat Completions via Guardrails\n", "\n", "Test the parallel rails configuration by making both safe and off-topic requests.\n", "\n", "Make a safe request and verify the response is not blocked." ] }, { "cell_type": "code", "metadata": {}, "source": [ "response = sdk.guardrail.chat.completions.create(\n", " model=\"default/meta-llama-3-3-70b-instruct\",\n", " messages=[{\"role\": \"user\", \"content\": \"What can you do for me?\"}],\n", " guardrails={\"config_id\": \"parallel-rails-config\"},\n", " max_tokens=200,\n", ")\n", "\n", "print(response.model_dump_json(indent=2))" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make an off-topic request that should be blocked by the topic control input rail." ] }, { "cell_type": "code", "metadata": {}, "source": [ "response = sdk.guardrail.chat.completions.create(\n", " model=\"default/meta-llama-3-3-70b-instruct\",\n", " messages=[{\"role\": \"user\", \"content\": \"Tell me a joke about quantum gravity.\"}],\n", " guardrails={\"config_id\": \"parallel-rails-config\"},\n", " max_tokens=200,\n", ")\n", "\n", "print(response.model_dump_json(indent=2))" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "The off-topic request returns the denial message `I'm sorry, I can't respond to that.`\n", "\n", "---\n", "\n", "## Step 5: Check Messages\n", "\n", "You can also check messages with parallel rails." ] }, { "cell_type": "code", "metadata": {}, "source": [ "check_result = sdk.guardrail.check(\n", " model=\"default/meta-llama-3-3-70b-instruct\",\n", " messages=[\n", " {\"role\": \"user\", \"content\": \"What are the benefits of renewable energy?\"}\n", " ],\n", " guardrails={\"config_id\": \"parallel-rails-config\"},\n", ")\n", "\n", "print(check_result.model_dump_json(indent=2))" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "::::::{dropdown} Output\n", ":icon: code-square\n", "" ] }, { "cell_type": "code", "metadata": { "language": "json" }, "source": [ "{\n", " \"status\": \"success\",\n", " \"rails_status\": {\n", " \"input\": {\n", " \"status\": \"success\",\n", " \"details\": []\n", " },\n", " \"output\": {\n", " \"status\": \"success\",\n", " \"details\": []\n", " }\n", " }\n", "}" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "::::::\n", "\n", "---\n", "\n", "## Cleanup" ] }, { "cell_type": "code", "metadata": {}, "source": [ "sdk.guardrail.configs.delete(name=\"parallel-rails-config\")\n", "print(\"Cleanup complete\")" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.0" } }, "nbformat": 4, "nbformat_minor": 4 }