Data Designer Service#

The Data Designer service enables high-quality synthetic data generation at scale through the NeMo Microservices Platform.

Overview#

Data Designer is a framework for orchestrating complex synthetic data generation workflows. It coordinates LLM calls, manages dependencies between data fields, handles batching and parallelization, and validates generated data against specifications.

The service is built on the open-source NVIDIA NeMo Data Designer library (GitHub).

How It Works: Library + Microservice#

Data Designer separates configuration from execution:

1. Build Configs with the Library#

Use the data_designer.config package (installed automatically with nemo-microservices[data-designer]) to define your dataset:

import data_designer.config as dd

# Define models
model_configs = [
    dd.ModelConfig(
        provider="default/build-nvidia",  # NMP model provider
        model="nvidia/nemotron-3-nano-30b-a3b",
        alias="text",
    )
]

# Build configuration
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(dd.SamplerColumnConfig(...))
config_builder.add_column(dd.LLMTextColumnConfig(...))

The library handles: Dataset schema definition, column types, dependencies, constraints, and validation rules.

Learn more: See the open-source library documentation for comprehensive guides on column types, samplers, constraints, and advanced features.

2. Execute via the Microservice#

Submit your configuration to the Data Designer service using the NMP SDK:

from nemo_microservices.data_designer.client import NeMoDataDesignerClient

client = NeMoDataDesignerClient(base_url="...", workspace="default")

# Fast iteration
preview = client.preview(config_builder)

# Production generation
job = client.create(config_builder, num_records=10000)
job.wait_until_done()
dataset = job.load_dataset()

The microservice handles: Job orchestration, inference routing through NMP’s Inference Gateway, distributed execution, artifact storage, and monitoring.

Key Differences from Standalone Library#

When using Data Designer as an NMP service:

Feature	Standalone Library	NMP Service
Inference	Direct API calls to any OpenAI-compatible endpoint	Routes through NMP Inference Gateway with model providers
Execution	Local Python process	Distributed job execution with monitoring
Seed Data	Local files, DataFrames, HuggingFace	HuggingFace, NMP Filesets (no local files/DataFrames)
Artifacts	Local filesystem	NMP artifact storage
Authentication	Direct API keys	NMP Secrets service