API Reference#

This page documents the NMP-specific APIs for the Data Designer service.

For comprehensive documentation on building configurations (column types, constraints, processors, etc.), see the open-source library documentation.

Client#

NeMoDataDesignerClient#

The primary interface to the Data Designer service.

Initialization:

from nemo_microservices.data_designer.client import NeMoDataDesignerClient

# Option 1: From SDK instance
from nemo_microservices import NeMoMicroservices
sdk = NeMoMicroservices(base_url="...", workspace="default")
client = NeMoDataDesignerClient(sdk=sdk)

# Option 2: Direct initialization
client = NeMoDataDesignerClient(
    base_url="http://localhost:8080",
    workspace="default"
)

Methods:

preview(config_builder, num_records=10)#

Generate a small sample dataset for fast iteration.

Parameters:

  • config_builder (DataDesignerConfigBuilder): Configuration defining the dataset

  • num_records (int, optional): Number of records to generate (default: 10)

Returns: PreviewResults object with:

  • dataset: pandas DataFrame with generated records

  • analysis: Statistical analysis of the dataset

  • display_sample_record(): Method to display a random record

Example:

preview = client.preview(config_builder, num_records=20)
preview.display_sample_record()
df = preview.dataset

create(config_builder, num_records, wait_until_done=False)#

Submit a job to generate a full dataset.

Parameters:

  • config_builder (DataDesignerConfigBuilder): Configuration defining the dataset

  • num_records (int): Number of records to generate

  • wait_until_done (bool, optional): Whether to block until job completes (default: False)

Returns: DataDesignerJobResults object with:

  • wait_until_done(): Block until job completes

  • load_dataset(): Load generated dataset as pandas DataFrame

  • load_analysis(): Load statistical analysis

  • job_id: ID of the submitted job

Example:

job = client.create(config_builder, num_records=1000)
job.wait_until_done()
dataset = job.load_dataset()
analysis = job.load_analysis()

Model Provider Configuration#

When defining models in your DataDesignerConfigBuilder, reference NMP model providers:

import data_designer.config as dd

model_config = dd.ModelConfig(
    provider="default/build-nvidia",  # Format: workspace/provider-name
    model="nvidia/nemotron-3-nano-30b-a3b",  # Model name from provider
    alias="text",
    inference_parameters=dd.ChatCompletionInferenceParams(
        temperature=1.0,
        top_p=1.0,
        max_tokens=512,
    ),
)

Provider format:

  • Fully qualified: workspace/provider-name (recommended)

  • Implicit workspace: provider-name (uses client’s workspace)

Model name: Must match the model identifier expected by the external provider (e.g., NVIDIA Build, OpenAI).

Seed Sources#

FilesetFileSeedSource#

Use data from the NMP Files service as seed data.

from nemo_microservices.data_designer.plugins.fileset_file_seed_source import FilesetFileSeedSource

seed_source = FilesetFileSeedSource(
    path="default/my-fileset#data.parquet"  # Format: workspace/fileset#file-path
)

config_builder.with_seed_dataset(seed_source)

Path format:

  • Fully qualified: workspace/fileset-name#file-path (recommended)

  • Implicit workspace: fileset-name#file-path (uses client’s workspace)

HuggingFaceSeedSource#

Use data from HuggingFace as seed data.

import data_designer.config as dd

seed_source = dd.HuggingFaceSeedSource(
    path="datasets/my-username/my-dataset/data/*.parquet",
    token="default/huggingface-token"  # Reference to NMP secret
)

config_builder.with_seed_dataset(seed_source)

Token: Must reference a secret created via sdk.secrets.create().

Unsupported seed sources:

  • LocalFileSeedSource (use FilesetFileSeedSource instead)

  • DataFrameSeedSource (upload to Fileset first)

Configuration Building#

Configuration building uses the open-source library’s API. See the library documentation for:

Example:

import data_designer.config as dd

config_builder = dd.DataDesignerConfigBuilder(model_configs=[...])

# Add columns
config_builder.add_column(dd.SamplerColumnConfig(...))
config_builder.add_column(dd.LLMTextColumnConfig(...))
config_builder.add_column(dd.ExpressionColumnConfig(...))

# Add constraints
config_builder.add_constraint(...)

# Add processors
config_builder.add_processor(...)

# Configure seed data
config_builder.with_seed_dataset(seed_source)

# Build final config
config = config_builder.build()