NeMo Microservices#

Introduction#

NeMo Platform gives enterprises the infrastructure to build and deploy specialized AI agents with open source models. It provides synthetic data generation, model fine-tuning and evaluation, security testing, real-time protection with guardrails, and inference, along with production-grade features like RBAC and observability. Deploy locally with Docker or on Kubernetes, integrate with your existing tools, and customize models for your specific use cases while maintaining control over your AI stack.

Common use cases#

Customize and evaluate models — Generate synthetic training data, fine-tune models, and measure quality. See Example Applications for workflows like creating text-to-code datasets and fine-tuning with synthetic data.
Deploy and serve models — Run inference through the unified gateway and integrate with your existing infrastructure. See About Models and Inference and the installation for deployment examples.
Test and protect AI agents — Scan for vulnerabilities with Auditor, then block attacks in real-time with Guardrails.
Build RAG and search applications — Fine-tune embedding models for domain-specific retrieval and evaluate with RAG metrics.

Getting up and running#

Prerequisites: Python 3.11+, pip, Docker 28.3.0+, NGC API key, and Hardware and Software Requirements for NeMo Microservices.

Install the CLI and SDK:

pip install nemo-microservices

Start the quickstart (local platform):

nmp quickstart up

The quickstart prompts for your NGC API key, then downloads and starts the platform. Once complete, try chatting with the configured LLM:

nmp chat <model-name>

List available models and other key commands:

nmp models list              # See available models
nmp chat --help              # Chat options
nmp workspaces list          # View your workspaces
nmp --help                   # All commands

For full install steps, GPU config, and SDK usage, see Quickstart Installation; for all commands, see NeMo Microservices CLI.

Before you start#

Workspaces — All platform resources (models, datasets, jobs, evaluation results) belong to a workspace. Workspaces provide organizational and authorization boundaries—create separate workspaces to isolate teams, users, environments, or clients. The platform includes two built-in workspaces: default (general-purpose, editable by all) and system (read-only platform resources). When authentication is enabled, users are granted roles (Viewer, Editor, or Admin) within specific workspaces. See Workspaces for creating and managing workspaces.

NGC API Key — Required to access models and container images from NVIDIA GPU Cloud. Get your NGC API key. The quickstart prompts for this automatically during nmp quickstart up.

Configuration — The CLI stores connection settings, credentials, and preferences in ~/.config/nmp/config.yaml. For working with multiple environments (local, dev, prod), run nmp configure to set up contexts and switch between them. See CLI configuration for details.

Where to go next#

Start building:

Example Applications — End-to-end workflows combining multiple platform capabilities
Quickstart Installation — Full installation guide with GPU configuration and SDK setup

Learn the platform:

Core Concepts — Workspaces, projects, and entity organization
NeMo Microservices CLI — CLI reference and configuration
NeMo Microservice API Reference — REST API reference

Deploy to production:

About Platform Setup — Deploy on Kubernetes with Helm
Authorization — Configure role-based access control

Architecture#

Diagram goes here.