Quickstart Installation#

The NeMo Microservices Python SDK installs both a CLI and a Python SDK which provide convenient access to NeMo Microservices.

Prerequisites#

Please ensure you have the following prerequisites ready:

Python 3.11 or higher
pip package manager
Docker 28.3.0 or higher
An NGC API key, which you can obtain for free at NVIDIA NGC
Your system meets the minimum system requirements

Installing the CLI and SDK#

The CLI is the easiest and fastest way to deploy NeMo microservices locally and get started. Install both the CLI and SDK with:

pip install nemo-microservices

Note

This will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

You can verify a successful installation and see a list of all available commands in the CLI with:

nmp --help

See the CLI reference for more details.

Installing NeMo Microservices#

Optional Configuration

If you want to customize your installation, you can run the nmp quickstart configure command first to go through a setup wizard. This step can be skipped to proceed with the default configuration, which uses remote NVIDIA LLMs for inference.

nmp quickstart configure

GPU Configuration#

When running GPU workloads locally (such as model deployments or GPU-based jobs), you need to configure which GPU devices are available to the platform. This ensures that jobs and model deployments coordinate GPU allocation and avoid conflicts that could cause out-of-memory errors.

Configure GPU device IDs in your platform configuration file (typically ~/.nmp/config.yaml or the file specified during nmp quickstart configure):

platform:
  # Shared Docker configuration for jobs and models services
  docker:
    # By default, all detected GPUs are used ("all"). To use specific GPUs,
    # set to a comma-separated list of device IDs.
    # Run `nvidia-smi` to see available device IDs on your system.
    reserved_gpu_device_ids: "0,1,2,3"  # Or "all" to use all detected GPUs

Note

This configuration creates a shared GPU pool for services that run GPU workloads, such as the jobs service (for training, evaluation, and other GPU jobs) and the models service (for local NIM deployments). The platform will not over-schedule workloads - if all configured GPUs are in use, new workloads will wait until a GPU becomes available.

To find available GPU device IDs on your system, run:

nvidia-smi --list-gpus

To deploy the quickstart environment simply run the nmp quickstart up command. You will be prompted for your NGC API key.

nmp quickstart up

This will download and start our Docker compose stack, as well as configure an LLM provider. This step may take a few minutes, depending on your network and hardware. Once completed, the quickstart prints a command to try chatting with the LLM:

nmp chat <model-name>

You can also list available models:

nmp models list

Congrats! You’ve set up NeMo Microservices, and are ready to get building.

Python SDK#

You’ve also installed the Python SDK, which enables easy use of NeMo Microservices from your code. Here’s a sample:

from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url="http://localhost:8080",
    workspace="default"
)

# List models
models = client.models.list()
print(models.data)

An asynchronous Python client is also available. See the full SDK reference for more details, or start with one of our example applications.