Data Designer Service#
The Data Designer service enables high-quality synthetic data generation at scale through the NeMo Microservices Platform.
Overview#
Data Designer is a framework for orchestrating complex synthetic data generation workflows. It coordinates LLM calls, manages dependencies between data fields, handles batching and parallelization, and validates generated data against specifications.
The service is built on the open-source NVIDIA NeMo Data Designer library (GitHub).
How It Works: Library + Microservice#
Data Designer separates configuration from execution:
1. Build Configs with the Library#
Use the data_designer.config package (installed automatically with nemo-microservices[data-designer]) to define your dataset:
import data_designer.config as dd
# Define models
model_configs = [
dd.ModelConfig(
provider="default/build-nvidia", # NMP model provider
model="nvidia/nemotron-3-nano-30b-a3b",
alias="text",
)
]
# Build configuration
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(dd.SamplerColumnConfig(...))
config_builder.add_column(dd.LLMTextColumnConfig(...))
The library handles: Dataset schema definition, column types, dependencies, constraints, and validation rules.
Learn more: See the open-source library documentation for comprehensive guides on column types, samplers, constraints, and advanced features.
2. Execute via the Microservice#
Submit your configuration to the Data Designer service using the NMP SDK:
from nemo_microservices.data_designer.client import NeMoDataDesignerClient
client = NeMoDataDesignerClient(base_url="...", workspace="default")
# Fast iteration
preview = client.preview(config_builder)
# Production generation
job = client.create(config_builder, num_records=10000)
job.wait_until_done()
dataset = job.load_dataset()
The microservice handles: Job orchestration, inference routing through NMP’s Inference Gateway, distributed execution, artifact storage, and monitoring.
Key Differences from Standalone Library#
When using Data Designer as an NMP service:
Feature |
Standalone Library |
NMP Service |
|---|---|---|
Inference |
Direct API calls to any OpenAI-compatible endpoint |
Routes through NMP Inference Gateway with model providers |
Execution |
Local Python process |
Distributed job execution with monitoring |
Seed Data |
Local files, DataFrames, HuggingFace |
HuggingFace, NMP Filesets (no local files/DataFrames) |
Artifacts |
Local filesystem |
NMP artifact storage |
Authentication |
Direct API keys |
NMP Secrets service |
Next Steps#
Set up inference and run your first Data Designer job.
Learn through examples: basics, seeding, and more.
Migrate from the standalone library to the NMP service.
SDK client methods and configuration options.
Comprehensive guides on column types, constraints, and advanced features.