Data Designer Service#

The Data Designer service enables high-quality synthetic data generation at scale through the NeMo Microservices Platform.

Overview#

Data Designer is a framework for orchestrating complex synthetic data generation workflows. It coordinates LLM calls, manages dependencies between data fields, handles batching and parallelization, and validates generated data against specifications.

The service is built on the open-source NVIDIA NeMo Data Designer library (GitHub).

How It Works: Library + Microservice#

Data Designer separates configuration from execution:

1. Build Configs with the Library#

Use the data_designer.config package (installed automatically with nemo-microservices[data-designer]) to define your dataset:

import data_designer.config as dd

# Define models
model_configs = [
    dd.ModelConfig(
        provider="default/build-nvidia",  # NMP model provider
        model="nvidia/nemotron-3-nano-30b-a3b",
        alias="text",
    )
]

# Build configuration
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(dd.SamplerColumnConfig(...))
config_builder.add_column(dd.LLMTextColumnConfig(...))

The library handles: Dataset schema definition, column types, dependencies, constraints, and validation rules.

Learn more: See the open-source library documentation for comprehensive guides on column types, samplers, constraints, and advanced features.

2. Execute via the Microservice#

Submit your configuration to the Data Designer service using the NMP SDK:

from nemo_microservices.data_designer.client import NeMoDataDesignerClient

client = NeMoDataDesignerClient(base_url="...", workspace="default")

# Fast iteration
preview = client.preview(config_builder)

# Production generation
job = client.create(config_builder, num_records=10000)
job.wait_until_done()
dataset = job.load_dataset()

The microservice handles: Job orchestration, inference routing through NMP’s Inference Gateway, distributed execution, artifact storage, and monitoring.

Key Differences from Standalone Library#

When using Data Designer as an NMP service:

Feature

Standalone Library

NMP Service

Inference

Direct API calls to any OpenAI-compatible endpoint

Routes through NMP Inference Gateway with model providers

Execution

Local Python process

Distributed job execution with monitoring

Seed Data

Local files, DataFrames, HuggingFace

HuggingFace, NMP Filesets (no local files/DataFrames)

Artifacts

Local filesystem

NMP artifact storage

Authentication

Direct API keys

NMP Secrets service

Next Steps#

Quick Start

Set up inference and run your first Data Designer job.

Quick Start
Tutorials

Learn through examples: basics, seeding, and more.

Tutorials
Migration Guide

Migrate from the standalone library to the NMP service.

Migrating from Standalone Library
API Reference

SDK client methods and configuration options.

API Reference
Library Documentation

Comprehensive guides on column types, constraints, and advanced features.

https://nvidia-nemo.github.io/DataDesigner/0.4.0/