{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "# DPO Customization\n", "\n", "Learn how to use the NeMo Microservices Platform to create a DPO (Direct Preference Optimization) job using a custom dataset.\n", "\n", "## About\n", "\n", "DPO is an advanced fine-tuning technique for preference-based alignment. If you're new to fine-tuning, consider starting with [LoRA](./lora-customization-job) or [Full SFT](./sft-customization-job) tutorials first.\n", "\n", "Direct Preference Optimization (DPO) is an RL-free alignment algorithm that operates on preference data. Given a prompt and a pair of chosen and rejected responses, DPO aims to increase the probability of the chosen response and decrease the probability of the rejected response relative to a frozen reference model. The actor is initialized using the reference model. For more details, refer to the [DPO paper](https://arxiv.org/pdf/2305.18290).\n", "\n", "DPO shares similarities with Full SFT training workflows but differs in a few key ways:\n", "\n", "| Aspect | SFT (Supervised Fine-Tuning) | DPO (Direct Preference Optimization) |\n", "| --- | --- | --- |\n", "| Data Requirements | Labeled instruction-response pairs where the desired output is explicitly provided | Pairwise preference data, where for a given input, one response is explicitly preferred over another |\n", "| Learning Objective | Directly teaches the model to generate a specific \"correct\" response | Directly optimizes the model to align with human preferences by maximizing the probability of preferred responses and minimizing rejected ones, without needing an explicit reward model |\n", "| Alignment Focus | Aligns the model with the specific examples present in its training data | Aligns the model with broader human preferences, which can be more effective for subjective tasks or those without a single \"correct\" answer |\n", "| Computational Efficiency | Standard fine-tuning efficiency | More computationally efficient than SFT (especially when compared to full RLHF methods) as it bypasses the need to train a separate reward model |\n", "\n", "**What you can achieve with DPO:**\n", "- **Align with human preferences**: Directly optimize your model to produce outputs that align with subjective human preferences without requiring explicit reward modeling\n", "- **Refine response quality**: Improve helpfulness, harmlessness, honesty, and other nuanced qualities that are easier to compare than to define\n", "- **Control tone and style**: Adjust the model's communication style, verbosity, formality, and other subjective characteristics\n", "- **Implement safety guardrails**: Teach the model to avoid harmful or undesirable responses by training on preferred vs. rejected response pairs\n", "- **Optimize subjective tasks**: Excel at tasks where there are multiple acceptable answers but clear preferences exist (creative writing, dialogue, explanations)\n", "\n", "**When to choose DPO:**\n", "- **Subjective quality matters**: Your task involves style, tone, or other qualities where there's no single \"correct\" answer but clear preferences exist\n", "- **You have preference data**: You can collect pairwise comparisons (preferred vs. rejected responses) more easily than perfect labeled examples\n", "- **Refining existing capabilities**: You want to make targeted improvements to an already-trained model without major capability changes\n", "- **Complex evaluation**: Humans find it easier to compare which of two responses is better than to create the ideal response themselves (especially for multi-turn conversations, creative tasks, or nuanced outputs)\n", "- **Robust behavior changes**: You need more reliable behavior modification than prompting can provide, without the complexity of full RLHF\n", "- **Lower compute than RLHF**: You want human preference alignment but with simpler training that doesn't require reinforcement learning infrastructure\n", "\n", "**When to choose SFT:**\n", "- **Clear correct answers**: Your task has objectively correct outputs (code generation, structured data extraction, following specific formats)\n", "- **High-quality examples**: You have well-labeled input-output pairs that demonstrate exactly what the model should produce\n", "- **Imitation learning**: You want the model to closely mimic a specific style, format, or knowledge base from expert demonstrations\n", "- **Foundational capabilities**: You're establishing new task-specific capabilities before fine-tuning preferences (SFT is often done before DPO)\n", "- **Stable, predictable outputs**: You need consistent formatting or structure that's well-defined in your training examples\n", "- **Traditional NLP tasks**: Instruction following, translation, summarization, or classification where gold-standard labels exist" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "Before starting this tutorial, ensure you have:\n", "\n", "1. **Completed the [Quickstart](../../get-started/installation.md)** to install and deploy NeMo Microservices locally\n", "2. **Installed the Python SDK** (included with `pip install nemo-microservices`)\n", "3. **Set up organizational entities** (namespaces and projects) if you're new to the platform" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quick Start\n", "\n", "### 1. Initialize SDK\n", "\n", "The SDK needs to know your NMP server URL. By default, `http://localhost:8080` is used in accordance with the [Quickstart](../../get-started/installation.md) guide. If NMP is running at a custom location, you can override the URL by setting the `NMP_BASE_URL` environment variable:\n", "\n", "```sh\n", "export NMP_BASE_URL=\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "from nemo_microservices import NeMoMicroservices, ConflictError\n", "from nemo_microservices.types.customization import (\n", " CustomizationJobInputParam,\n", " CustomizationTargetParamParam,\n", " HyperparametersParam,\n", " DpoConfigParam\n", ")\n", "\n", "NMP_BASE_URL = os.environ.get(\"NMP_BASE_URL\", \"http://localhost:8080\")\n", "sdk = NeMoMicroservices(\n", " base_url=NMP_BASE_URL,\n", " workspace=\"default\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Prepare Dataset\n", "\n", "Create your data in JSONL format - one JSON object per line. The platform auto-detects your data format. Supported dataset formats are listed below.\n", "\n", "**Flexible Data Setup:**\n", "- **No validation file?** The platform automatically creates a 10% validation split\n", "- **Multiple files?** Upload to `training/` or `validation/` subdirectories\u2014they'll be automatically merged\n", "- **Format detection:** Your data format is auto-detected at training time\n", "\n", "In this tutorial the following dataset directory structure will be used:\n", "```\n", "my_dataset\n", "`-- training.jsonl\n", "`-- validation.jsonl\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Binary Preference Format\n", "DPO training requires preference pairs with three fields:\n", "- **`prompt`**: The input prompt (can be a string or array of message objects)\n", "- **`chosen`**: The preferred response\n", "- **`rejected`**: The less preferred response" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "language": "json" }, "outputs": [], "source": [ "{\"prompt\": [{\"role\": \"user\", \"content\": \"What is the capital of France?\"}], \"chosen\": \"The capital of France is Paris. It is the largest city in France and serves as the country's political, economic, and cultural center.\", \"rejected\": \"I think the capital of France might be London or Paris, I'm not entirely sure.\"}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Tulu3 Preference Dataset Format\n", "This format contains complete conversation histories for both the chosen (preferred) and rejected responses.\n", "\n", "Required fields:\n", "- **`chosen`**: Full conversation with the preferred response (list of message objects, last must be assistant)\n", "- **`rejected`**: Full conversation with the rejected response (list of message objects, last must be assistant)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "{\"chosen\": [{\"role\": \"user\", \"content\": \"What is the capital of France?\"}, {\"role\": \"assistant\", \"content\": \"The capital of France is Paris.\"}], \"rejected\": [{\"role\": \"user\", \"content\": \"What is the capital of France?\"}, {\"role\": \"assistant\", \"content\": \"I'm not sure, but I think it might be London or Paris.\"}]}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### HelpSteer Dataset Format\n", "This format uses numeric preference scores to indicate which response is better. The context can be either a simple string or an array of message objects.\n", "\n", "Required fields:\n", "- **`context`**: The input context (can be a string or array of message objects)\n", "- **`response1`**: First response option\n", "- **`response2`**: Second response option\n", "- **`overall_preference`**: Preference score where negative values mean response1 is preferred, positive values mean response2 is preferred, and 0 indicates a tie" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "language": "json" }, "outputs": [], "source": [ "{\"context\": \"Explain how to use git rebase\", \"response1\": \"Git rebase is a command that rewrites commit history by moving or combining commits. Use 'git rebase main' to reapply your branch commits on top of main. This creates a linear history and avoids merge commits.\", \"response2\": \"Use git rebase to change commits. Just type git rebase and it will work.\", \"overall_preference\": -2}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Create Dataset FileSet and Upload Training Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Install huggingface datasets package to download public [nvidia/HelpSteer3](https://huggingface.co/datasets/nvidia/HelpSteer3) dataset if it's not installed in your Python environment:\n", "\n", "```sh\n", "pip install datasets\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Download nvidia/HelpSteer3 Dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "from datasets import load_dataset, Dataset\n", "ds = load_dataset(\"nvidia/HelpSteer3\", \"preference\")\n", "\n", "# Adjust these values to change the size of the training and validation sets\n", "# The larger the datasets, the better the model will perform but longer the training will take\n", "# For the purpose of this tutorial, we'll use a small subset of the dataset\n", "training_size = 3000\n", "validation_size = 300\n", "DATASET_PATH = Path(\"dpo-dataset\").absolute()\n", "\n", "# Get training split and verify it's a Dataset (not IterableDataset)\n", "train_dataset = ds[\"train\"]\n", "validation_dataset = ds[\"validation\"]\n", "assert isinstance(train_dataset, Dataset), \"Expected Dataset type\"\n", "assert isinstance(validation_dataset, Dataset), \"Expected Dataset type\"\n", "\n", "# Select subsets and save to JSONL files\n", "testing_ds = train_dataset.select(range(training_size))\n", "validation_ds = validation_dataset.select(range(validation_size))\n", "\n", "# Create directory if it doesn't exist\n", "os.makedirs(DATASET_PATH, exist_ok=True)\n", "\n", "# Save subsets to JSONL files\n", "testing_ds.to_json(f\"{DATASET_PATH}/training.jsonl\")\n", "validation_ds.to_json(f\"{DATASET_PATH}/validation.jsonl\")\n", "\n", "print(f\"Saved training.jsonl with {len(testing_ds)} rows\")\n", "print(f\"Saved validation.jsonl with {len(validation_ds)} rows\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create fileset to store DPO training data\n", "DATASET_NAME = \"dpo-dataset\"\n", "\n", "try:\n", " sdk.filesets.create(\n", " workspace=\"default\",\n", " name=DATASET_NAME,\n", " description=\"dpo training data\"\n", " )\n", " print(f\"Created fileset: {DATASET_NAME}\")\n", "except ConflictError:\n", " print(f\"Fileset '{DATASET_NAME}' already exists, continuing...\")\n", "\n", "# Upload training data files individually to ensure correct structure\n", "sdk.filesets.fsspec.put(\n", " lpath=DATASET_PATH, # Local directory with your JSONL files\n", " rpath=f\"default/{DATASET_NAME}/\",\n", " recursive=True\n", ")\n", "\n", "# Validate training data is uploaded correctly\n", "print(\"Training data:\")\n", "print(sdk.filesets.list_files(name=DATASET_NAME, workspace=\"default\").model_dump_json(indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Secrets Setup\n", "\n", "If you plan to use NGC or HuggingFace models, you'll need to configure authentication:\n", "\n", "- **NGC models** (`ngc://` URIs): Requires NGC API key\n", "- **HuggingFace models** (`hf://` URIs): Requires HF token for gated/private models\n", "\n", "\n", "Configure these as secrets in your platform. See [Managing Secrets](../../set-up/manage-secrets.md) for detailed instructions.\n", "\n", "Get your credentials to access base models:\n", "- [NGC API Key](https://ngc.nvidia.com/) (Setup \u2192 Generate API Key)\n", "- [HuggingFace Token](https://huggingface.co/settings/tokens) (Create token with Read access)\n", "\n", "\n", "---\n", "\n", "#### Quick Setup Example\n", "\n", "In this tutorial we are going to work with [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main) model from HuggingFace. Ensure that you have sufficient permissions to download the model. If you cannot see the files in the [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main) Hugging Face page, request access\n", "\n", "**HuggingFace Authentication:**\n", "- For gated models (Llama, Gemma), you must provide a HuggingFace token via the `token_secret` parameter\n", "- Get your token from [HuggingFace Settings](https://huggingface.co/settings/tokens) (requires Read access)\n", "- Accept the model's terms on the HuggingFace model page before using it. Example: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main)\n", "- For public models, you can omit the `token_secret` parameter when creating a fileset for model in the next step" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Export the HF_TOKEN and NGC_API_KEY environment variables if they are not already set\n", "HF_TOKEN = os.getenv(\"HF_TOKEN\")\n", "NGC_API_KEY = os.getenv(\"NGC_API_KEY\")\n", "\n", "\n", "def create_or_get_secret(name: str, value: str | None, label: str):\n", " if not value:\n", " raise ValueError(f\"{label} is not set\")\n", " try:\n", " secret = sdk.secrets.create(\n", " name=name,\n", " workspace=\"default\",\n", " data=value,\n", " )\n", " print(f\"Created secret: {name}\")\n", " return secret\n", " except ConflictError:\n", " print(f\"Secret '{name}' already exists, continuing...\")\n", " return sdk.secrets.retrieve(name=name, workspace=\"default\")\n", "\n", "\n", "# Create HuggingFace token secret\n", "hf_secret = create_or_get_secret(\"hf-token\", HF_TOKEN, \"HF_TOKEN\")\n", "print(\"HF_TOKEN secret:\")\n", "print(hf_secret.model_dump_json(indent=2))\n", "\n", "# Create NGC API key secret\n", "# Uncomment the line below if you have NGC API Key and want to finetune NGC models\n", "# ngc_api_key = create_or_get_secret(\"ngc-api-key\", NGC_API_KEY, \"NGC_API_KEY\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Create Base Model FileSet\n", "\n", "Create a fileset pointing to [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main) model in HuggingFace that we will train with DPO. Model downloading will take place at the DPO finetuning job creation time. This step creates a pointer to the Hugging Face and does not download the model.\n", "\n", "Note: for public models, you can omit the `token_secret` parameter when creating a model fileset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a fileset pointing to the desired HuggingFace model\n", "from nemo_microservices.types.filesets import HuggingfaceStorageConfigParam\n", "\n", "HF_REPO_ID = \"meta-llama/Llama-3.2-1B-Instruct\"\n", "MODEL_NAME = \"llama-3-2-1b-base\"\n", "\n", "# Ensure you have a HuggingFace token secret created\n", "try:\n", " base_model = sdk.filesets.create(\n", " workspace=\"default\",\n", " name=MODEL_NAME,\n", " description=\"Llama 3.2 1B base model from HuggingFace\",\n", " storage=HuggingfaceStorageConfigParam(\n", " type=\"huggingface\",\n", " # repo_id is the full model name from Hugging Face\n", " repo_id=HF_REPO_ID,\n", " repo_type=\"model\",\n", " # we use the secret created in the previous step\n", " token_secret=hf_secret.name\n", " )\n", " )\n", "except ConflictError as e:\n", " print(f\"Base model fileset already exists. Skipping creation.\")\n", " base_model = sdk.filesets.retrieve(\n", " workspace=\"default\",\n", " name=\"llama-3-2-1b-base\",\n", " )\n", "\n", "print(f\"Base model fileset: fileset://default/{base_model.name}\")\n", "print(\"Base model fileset files list:\")\n", "print((sdk.filesets.list_files(name=\"llama-3-2-1b-base\", workspace=\"default\")).model_dump_json(indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6. Create DPO Finetuning Job\n", "Create a customization job with an inline target referencing the base model and dataset filesets created in previous steps." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Target `model_uri` Format:**\n", "\n", "Currently, `model_uri` must reference a FileSet:\n", "- **FileSet:** `fileset://workspace/fileset-name`\n", "\n", "Support for direct HuggingFace (`hf://`) and NGC (`ngc://`) URIs is coming soon. For now, create a fileset and upload your base model from these sources as shown in step 4.\n", "\n", "**GPU Requirements:**\n", "- 1B models: 1 GPU (24GB+ VRAM)\n", "- 3B models: 1-2 GPUs \n", "- 8B models: 2-4 GPUs\n", "- 70B models: 8+ GPUs \n", "\n", "Adjust `num_gpus_per_node` and `tensor_parallel_size` based on your model size." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import uuid\n", "job_suffix = uuid.uuid4().hex[:4]\n", "\n", "JOB_NAME = f\"my-dpo-job-{job_suffix}\"\n", "\n", "job = sdk.customization.jobs.create(\n", " name=JOB_NAME,\n", " workspace=\"default\",\n", " spec=CustomizationJobInputParam(\n", " target=CustomizationTargetParamParam(\n", " workspace=\"default\",\n", " model_uri=f\"fileset://default/{base_model.name}\"\n", " ),\n", " dataset=f\"fileset://default/{DATASET_NAME}\",\n", " hyperparameters=HyperparametersParam(\n", " training_type=\"dpo\",\n", " finetuning_type=\"all_weights\",\n", " epochs=1,\n", " batch_size=16,\n", " learning_rate=0.00005,\n", " max_seq_length=4096,\n", " dpo=DpoConfigParam(\n", " ref_policy_kl_penalty=0.1\n", " ),\n", " # GPU and parallelism settings\n", " num_gpus_per_node=1,\n", " num_nodes=1,\n", " tensor_parallel_size=1,\n", " pipeline_parallel_size=1,\n", " micro_batch_size=1,\n", " )\n", " )\n", ")\n", "\n", "print(f\"Job ID: {job.name}\")\n", "print(f\"Output model: {job.spec.output_model}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7. Track Training Progress" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time\n", "from IPython.display import clear_output\n", "\n", "# Poll job status every 10 seconds until completed\n", "while True:\n", " status = sdk.audit.jobs.get_status(\n", " name=job.name,\n", " workspace=\"default\"\n", " )\n", " \n", " clear_output(wait=True)\n", " print(f\"Job Status: {status.status}\")\n", "\n", " # Extract training progress from nested steps structure\n", " step: int | None = None\n", " max_steps: int | None = None\n", " training_phase: str | None = None\n", "\n", " for job_step in status.steps or []:\n", " if job_step.name == \"customization-training-job\":\n", " for task in job_step.tasks or []:\n", " task_details = task.status_details or {}\n", " step = task_details.get(\"step\")\n", " max_steps = task_details.get(\"max_steps\")\n", " training_phase = task_details.get(\"phase\")\n", " break\n", " break\n", "\n", " if step is not None and max_steps is not None:\n", " progress_pct = (step / max_steps) * 100\n", " print(f\"Training Progress: Step {step}/{max_steps} ({progress_pct:.1f}%)\")\n", " if training_phase:\n", " print(f\"Training Phase: {training_phase}\")\n", " else:\n", " print(\"Training step not started yet or progress info not available\")\n", " \n", " # Exit loop when job is completed (or failed/cancelled)\n", " if status.status in (\"completed\", \"failed\", \"cancelled\"):\n", " print(f\"\\nJob finished with status: {status.status}\")\n", " break\n", " \n", " time.sleep(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Interpreting DPO Training Metrics:**\n", "\n", "DPO training produces several key metrics:\n", "\n", "| Metric | Description | What to Look For |\n", "|--------|-------------|------------------|\n", "| **loss** | Total training loss (preference_loss + sft_loss) | Should decrease over training |\n", "| **preference_loss** | Core DPO loss measuring preference learning | Starts near ln(2) \u2248 0.693, should decrease |\n", "| **sft_loss** | SFT regularization term (often 0 for pure DPO) | Depends on configuration |\n", "| **accuracy** | Fraction of samples where chosen > rejected | Should increase toward 80-95%+ |\n", "| **rewards_chosen_mean** | Average implicit reward for chosen responses | Should be positive |\n", "| **rewards_rejected_mean** | Average implicit reward for rejected responses | Should be negative |\n", "\n", "**Key Indicators:**\n", "\n", "- **Reward Margin** = `rewards_chosen_mean - rewards_rejected_mean`\n", " - Should be positive and increasing\n", " - Indicates the model is learning to distinguish preferences\n", "\n", "- **Accuracy Interpretation:**\n", " - 50% = random chance (no learning)\n", " - 66-75% = early/moderate learning\n", " - 80%+ = good preference learning\n", " - 95%+ = strong preference alignment\n", "\n", "**Troubleshooting:**\n", "\n", "- **Loss near ln(2) \u2248 0.693**: Model is at random chance level, training just starting or not learning\n", "- **Accuracy stuck at ~50%**: Check data quality, increase learning rate, or verify preference labels\n", "- **Negative reward margin**: Model is learning the wrong direction\u2014check chosen/rejected labels\n", "- **Loss increasing**: Learning rate too high or data quality issues\n", "\n", "**Note:** Training metrics measure optimization progress, not final model quality. Always evaluate the deployed model on your specific use case." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 8. Deploy Fine-Tuned Model\n", "\n", "Once training completes, deploy using the Deployment Management Service:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Validate model entity exists\n", "model_entity = sdk.models.retrieve(workspace='default', name=job.spec.output_model)\n", "print(model_entity.model_dump_json(indent=2))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from nemo_microservices.types.inference import NIMDeploymentParam\n", "\n", "# Create deployment config\n", "deploy_suffix = uuid.uuid4().hex[:4]\n", "DEPLOYMENT_CONFIG_NAME = f\"dpo-model-deployment-cfg-{deploy_suffix}\"\n", "DEPLOYMENT_NAME = f\"dpo-model-deployment-{deploy_suffix}\"\n", "\n", "deployment_config = sdk.inference.deployment_configs.create(\n", " workspace=\"default\",\n", " name=DEPLOYMENT_CONFIG_NAME,\n", " nim_deployment=NIMDeploymentParam(\n", " image_name=\"nvcr.io/nim/nvidia/llm-nim\",\n", " image_tag=\"1.13.1\",\n", " gpu=1,\n", " model_name=job.spec.output_model, # ModelEntity name from training,\n", " model_namespace=\"default\", # Workspace where ModelEntity lives\n", " )\n", ")\n", "\n", "# Deploy model using deployment_config created above\n", "deployment = sdk.inference.deployments.create(\n", " workspace=\"default\",\n", " name=DEPLOYMENT_NAME,\n", " config=deployment_config.name\n", ")\n", "\n", "\n", "# Check deployment status\n", "deployment_status = sdk.inference.deployments.retrieve(\n", " name=deployment.name,\n", " workspace=\"default\"\n", ")\n", "\n", "print(f\"Deployment name: {deployment.name}\")\n", "print(f\"Deployment status: {deployment_status.status}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Monitor status of deployment" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time\n", "from IPython.display import clear_output\n", "\n", "# Poll deployment status every 15 seconds until ready\n", "TIMEOUT_MINUTES = 30\n", "start_time = time.time()\n", "timeout_seconds = TIMEOUT_MINUTES * 60\n", "\n", "print(f\"Monitoring deployment '{deployment.name}'...\")\n", "print(f\"Timeout: {TIMEOUT_MINUTES} minutes\\n\")\n", "\n", "while True:\n", " deployment_status = sdk.inference.deployments.retrieve(\n", " name=deployment.name,\n", " workspace=\"default\"\n", " )\n", " \n", " elapsed = time.time() - start_time\n", " elapsed_min = int(elapsed // 60)\n", " elapsed_sec = int(elapsed % 60)\n", " \n", " clear_output(wait=True)\n", " print(f\"Deployment: {deployment.name}\")\n", " print(f\"Status: {deployment_status.status}\")\n", " print(f\"Elapsed time: {elapsed_min}m {elapsed_sec}s\")\n", " \n", " # Check if deployment is ready\n", " if deployment_status.status == \"READY\":\n", " print(\"\\nDeployment is ready!\")\n", " break\n", " \n", " # Check for failure states\n", " if deployment_status.status in (\"FAILED\", \"ERROR\", \"TERMINATED\", \"LOST\"):\n", " print(f\"\\nDeployment failed with status: {deployment_status.status}\")\n", " break\n", " \n", " # Check timeout\n", " if elapsed > timeout_seconds:\n", " print(f\"\\nTimeout reached ({TIMEOUT_MINUTES} minutes). Deployment may still be in progress.\")\n", " print(\"You can continue to check status manually or wait longer.\")\n", " break\n", " \n", " time.sleep(15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The deployment service automatically:\n", "- Downloads model weights from the Files service\n", "- Provisions storage (PVC) for the weights\n", "- Configures and starts the NIM container\n", "\n", "**Multi-GPU Deployment:**\n", "\n", "For larger models requiring multiple GPUs, configure parallelism with environment variables:\n", "\n", "```python\n", "deployment_config = sdk.inference.deployment_configs.create(\n", " workspace=\"default\",\n", " name=\"sft-model-config-multigpu\",\n", " \n", " nim_deployment={\n", " \"image_name\": \"nvcr.io/nim/nvidia/llm-nim\",\n", " \"image_tag\": \"1.13.1\",\n", " \"gpu\": 2, # Total GPUs\n", " \"additional_envs\": {\n", " \"NIM_TENSOR_PARALLEL_SIZE\": \"2\", # Tensor parallelism\n", " \"NIM_PIPELINE_PARALLEL_SIZE\": \"1\" # Pipeline parallelism\n", " }\n", " }\n", ")\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Single-Node Constraint:** Model deployments are limited to a single node. The maximum `gpu` value depends on the total GPUs available on a single node in your cluster. Multi-node deployments are not supported.\n", "\n", "---\n", "\n", "#### GPU Parallelism\n", "\n", "By default, NIM uses all GPUs for tensor parallelism (TP). You can customize this behavior using the `NIM_TENSOR_PARALLEL_SIZE` and `NIM_PIPELINE_PARALLEL_SIZE` environment variables.\n", "\n", "| Strategy | Description | Best For |\n", "|----------|-------------|----------|\n", "| **Tensor Parallel (TP)** | Splits model layers across GPUs | Lowest latency |\n", "| **Pipeline Parallel (PP)** | Splits model depth across GPUs | Highest throughput |\n", "\n", "**Formula:** `gpu` = `NIM_TENSOR_PARALLEL_SIZE` \u00d7 `NIM_PIPELINE_PARALLEL_SIZE`\n", "\n", "---\n", "\n", "#### Example Configurations\n", "\n", "**Default (TP=8, PP=1) \u2014 Lowest Latency**\n", "```\n", "\"gpu\": 8\n", "# NIM automatically sets NIM_TENSOR_PARALLEL_SIZE=8\n", "```\n", "\n", "**Balanced (TP=4, PP=2)**\n", "```\n", "\"gpu\": 8,\n", "\"additional_envs\": {\n", " \"NIM_TENSOR_PARALLEL_SIZE\": \"4\",\n", " \"NIM_PIPELINE_PARALLEL_SIZE\": \"2\"\n", "}\n", "```\n", "\n", "**Throughput Optimized (TP=2, PP=4)**\n", "```\n", "\"gpu\": 8,\n", "\"additional_envs\": {\n", " \"NIM_TENSOR_PARALLEL_SIZE\": \"2\",\n", " \"NIM_PIPELINE_PARALLEL_SIZE\": \"4\"\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 9. Evaluate Your Model\n", "\n", "After training, evaluate whether your model meets your requirements:\n", "\n", "#### Quick Manual Evaluation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Wait for deployment to be ready, then test\n", "messages = [\n", " {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n", " {\"role\": \"user\", \"content\": \"Write a short email to my colleague.\"}\n", "]\n", "\n", "response = sdk.inference.gateway.provider.post(\n", " \"v1/chat/completions\",\n", " name=deployment.name,\n", " workspace=\"default\",\n", " body={\n", " \"model\": f\"default/{job.spec.output_model}\", # Match the model_name from deployment config\n", " \"messages\": messages,\n", " \"temperature\": 0.7,\n", " \"max_tokens\": 256\n", " }\n", ")\n", "\n", "# Display prompt and completion\n", "print(\"=\" * 60)\n", "print(\"PROMPT\")\n", "print(\"=\" * 60)\n", "for msg in messages:\n", " print(f\"[{msg['role'].upper()}]\")\n", " print(msg[\"content\"])\n", " print()\n", "\n", "print(\"=\" * 60)\n", "print(\"COMPLETION\")\n", "print(\"=\" * 60)\n", "print(\"[ASSISTANT]\")\n", "completion = response[\"choices\"][0][\"message\"][\"content\"]\n", "print(completion)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Evaluation Best Practices\n", "\n", "**Manual Evaluation** (Recommended)\n", "- Test with real-world examples from your use case\n", "- Compare responses to base model and expected outputs\n", "- Verify the model exhibits desired behavior changes\n", "- Check edge cases and error handling\n", "\n", "**What to look for:**\n", "- \u2705 Model follows your desired output format\n", "- \u2705 Applies domain knowledge correctly\n", "- \u2705 Maintains general language capabilities\n", "- \u2705 Avoids unwanted behaviors or biases\n", "- \u274c Doesn't hallucinate facts not in training data\n", "- \u274c Doesn't produce repetitive or nonsensical outputs\n", "\n", "---\n", "\n", "## Hyperparameters\n", "\n", "For detailed information on all available hyperparameters, recommended values, and tuning guidance, see the [Hyperparameter Reference](../manage-customization-jobs/hyperparameters.md).\n", "\n", "---\n", "\n", "\n", "## Troubleshooting\n", "\n", "**Job fails during model download:**\n", "- Verify authentication secrets are configured (see [Managing Secrets](../../set-up/manage-secrets.md))\n", "- For gated HuggingFace models (Llama, Gemma), accept the license on the model page\n", "- Check the `model_uri` format is correct (`fileset://`)\n", "- Ensure you have accepted the model's terms of service on HuggingFace\n", "- Check job status and logs: `sdk.customization.jobs.retrieve(name=job.name, workspace=\"default\")`\n", "\n", "**Job fails with OOM (Out of Memory) error:**\n", "1. **First try:** Reduce `micro_batch_size` from 2 to 1\n", "2. **Still OOM:** Reduce `batch_size` from 16 to 8\n", "3. **Still OOM:** Reduce `max_seq_length` from 2048 to 1024 or 512\n", "4. **Last resort:** Increase GPU count and use `tensor_parallel_size` for model sharding\n", "\n", "**Loss curves not decreasing (underfitting):**\n", "- Increase training duration: `epochs: 5-10` instead of 3\n", "- Adjust learning rate: Try `1e-5` to `1e-4`\n", "- Add warmup: Set `warmup_steps` to ~10% of total training steps\n", "- Check data quality: Verify formatting, remove duplicates, ensure diversity\n", "\n", "**Training loss decreases but validation loss increases (overfitting):**\n", "- Reduce epochs: Try `epochs: 1-2` instead of 5+\n", "- Lower learning rate: Use `2e-5` or `1e-5`\n", "- Increase dataset size and diversity\n", "- Verify train/validation split has no data leakage\n", "\n", "**Model output quality is poor despite good training metrics:**\n", "- Training metrics optimize for loss, not your actual task\u2014evaluate on real use cases\n", "- Review data quality, format, and diversity\u2014metrics can be misleading with poor data\n", "- Try a different base model size or architecture\n", "- Adjust learning rate and batch size\n", "- Compare to baseline: Test base model to ensure fine-tuning improved performance\n", "\n", "**Deployment fails:**\n", "- Verify output model exists: `sdk.models.retrieve(name=job.spec.output_model, workspace=\"default\")`\n", "- Check deployment logs: `sdk.inference.deployments.get_logs(name=deployment.name, workspace=\"default\")`\n", "- Ensure sufficient GPU resources available for model size\n", "- Verify NIM image tag `1.13.1` is compatible with your model\n", "\n", "\n", "## Next Steps\n", "\n", "- [Monitor training metrics](fine-tune-metrics) in detail\n", "- [Evaluate your fine-tuned model](../../evaluator/index) using the Evaluator service\n", "- Learn about [LoRA customization](./lora-customization-job) for resource-efficient fine-tuning\n", "- Explore [knowledge distillation](./distillation-customization-job) to compress larger models" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.14" } }, "nbformat": 4, "nbformat_minor": 4 }