Tutorials#

These tutorials demonstrate how to use the Data Designer service on NMP.

Library vs. Service#

Data Designer separates configuration (building dataset schemas) from execution (generating the data).

Part 1: Build Configs (Library)

Use data_designer.config to define your dataset. See the library documentation for comprehensive guides on column types, constraints, and processors.

import data_designer.config as dd

config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(dd.SamplerColumnConfig(...))
config_builder.add_column(dd.LLMTextColumnConfig(...))

Part 2: Execute (Microservice)

Submit your configuration to the Data Designer service for execution:

from nemo_microservices.data_designer.client import NeMoDataDesignerClient

client = NeMoDataDesignerClient(base_url="...", workspace="default")
preview = client.preview(config_builder)
job = client.create(config_builder, num_records=1000)

Tip

Migration is simple: Already using the standalone library? Your configuration code stays identical. Only the execution client changes. See the migration guide for details.

Service-Specific Considerations#

When using Data Designer as an NMP service:

Feature	Difference	Details
Inference	Routes through Inference Gateway	Configure model providers once, reference by name
Seed data	Remote sources only	Use HuggingFace or NMP Filesets (no local files/DataFrames)
Validators	Code & HTTP only	Custom Python function validators not supported
Artifacts	NMP artifact storage	Results stored in NMP, not local filesystem

Prerequisites#

Before starting these tutorials, complete the Quick Start guide to:

Install the NMP SDK
Set up inference with a model provider
Understand the basic workflow

Tutorials#

The Basics

Generate a product review dataset using samplers and LLM-generated text. Learn the fundamentals of building configurations and executing jobs.

beginner data-designer

The Basics

Seeding

Use external datasets to ground synthetic data generation. Generate realistic patient medical notes from symptom-to-diagnosis data.

intermediate data-designer

Seeding with External Datasets