Migrating from Standalone Library#
If you’re already using the standalone DataDesigner library, this guide shows you how to migrate to the NMP service.
Key Insight
Your configuration code stays the same. All config_builder code (columns, constraints, processors) works identically. Only the execution interface changes.
Migration Summary#
What changes:
Execution interface (imports and client initialization)
Model provider setup (reference by name instead of direct configuration)
Seed data sources (use Filesets or HuggingFace instead of local files)
What stays the same:
All column configurations (samplers, LLM columns, expressions, etc.)
Constraints, processors, and validation logic
Jinja2 templating and prompt syntax
Method names:
preview(),create()
Why migrate: Get distributed execution, job monitoring, centralized secrets, and team collaboration.
Quick Overview#
Standalone Library
from data_designer.interface import DataDesigner
import data_designer.config as dd
# Build config
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(...)
# Execute locally
data_designer = DataDesigner(artifact_path="./artifacts")
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000)
NMP Service
from nemo_microservices.data_designer.client import NeMoDataDesignerClient
import data_designer.config as dd
# Build config (identical)
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(...)
# Execute on NMP
client = NeMoDataDesignerClient(base_url="...", workspace="default")
preview = client.preview(config_builder, num_records=10)
results = client.create(config_builder, num_records=1000)
Step-by-Step Migration#
Step 1: Install the NMP SDK#
Replace or supplement your standalone library installation:
# Remove standalone library (optional)
pip uninstall data-designer
# Install NMP SDK with Data Designer support
pip install nemo-microservices[data-designer]
The [data-designer] extra includes the data_designer.config package, so you can still build configurations the same way.
Note
The nemo-microservices[data-designer] package pins to a specific version of the Data Designer library that matches the service version in your NMP deployment. This ensures compatibility between your configuration code and the service.
Step 2: Update Imports#
Change your execution imports:
# Before
from data_designer.interface import DataDesigner
# After
from nemo_microservices.data_designer.client import NeMoDataDesignerClient
Keep these imports unchanged:
import data_designer.config as dd # Still works!
Step 3: Set Up Inference#
The NMP service routes inference through the Inference Gateway. You need to configure model providers.
Store Your API Key#
from nemo_microservices import NeMoMicroservices
sdk = NeMoMicroservices(base_url="...", workspace="default")
sdk.secrets.create(
name="my-api-key",
data="<your-api-key>",
description="API key for inference provider"
)
Create a Model Provider#
sdk.inference.providers.create(
name="my-provider",
description="External inference provider",
host_url="https://integrate.api.nvidia.com", # Or your provider URL
api_key_secret_name="my-api-key"
)
Step 4: Update Model Configurations#
In the standalone library, you pass ModelProvider objects to the DataDesigner constructor. In the NMP service, you reference model providers by name in your ModelConfig.
# Before (standalone library)
from data_designer.interface import DataDesigner
import data_designer.config as dd
# Define model providers
model_providers = [
dd.ModelProvider(
name="nvidia-build",
endpoint="https://integrate.api.nvidia.com",
api_key="your-api-key"
)
]
# Model configs reference providers by name
model_configs = [
dd.ModelConfig(
alias="text",
model="nvidia/nemotron-3-nano-30b-a3b",
provider="nvidia-build", # References the provider name
inference_parameters=dd.ChatCompletionInferenceParams(
temperature=1.0,
top_p=1.0,
),
)
]
# Pass providers to DataDesigner constructor
data_designer = DataDesigner(
artifact_path="./artifacts",
model_providers=model_providers
)
# After (NMP service)
import data_designer.config as dd
# Model configs reference NMP model providers
model_configs = [
dd.ModelConfig(
alias="text",
model="nvidia/nemotron-3-nano-30b-a3b",
provider="default/build-nvidia", # workspace/provider-name
inference_parameters=dd.ChatCompletionInferenceParams(
temperature=1.0,
top_p=1.0,
),
)
]
# No need to pass providers - they're managed by Inference Gateway
Key changes:
ModelProviderobjects are no longer defined in your codeModel providers are configured once via Inference Gateway (see Step 3)
providerinModelConfignow uses fully qualified names:"workspace/provider-name"No direct API keys in code - managed by Secrets service
Step 5: Update Client Initialization#
Replace the DataDesigner client with NeMoDataDesignerClient:
# Before (standalone library)
data_designer = DataDesigner(
artifact_path="./artifacts",
model_providers=[...] # Optional provider list
)
# After (NMP service)
client = NeMoDataDesignerClient(
base_url="http://localhost:8080", # NMP deployment URL
workspace="default" # Workspace name
)
# Or using an existing SDK instance
sdk = NeMoMicroservices(base_url="...", workspace="default")
client = NeMoDataDesignerClient(sdk=sdk)
Step 6: Update Seed Data Sources (If Used)#
If you use seed datasets, you need to migrate local sources to remote ones:
Local Files → Filesets#
# Before (standalone library)
from data_designer.config import LocalFileSeedSource
seed_source = LocalFileSeedSource(path="./data/seed.csv")
config_builder.with_seed_dataset(seed_source)
# After (NMP service)
# 1. Upload file to Fileset
sdk.filesets.create(name="my-seed-data")
sdk.filesets.files.upload(
fileset="my-seed-data",
local_path="./data/seed.csv",
remote_path="seed.csv"
)
# 2. Reference Fileset in config
from nemo_microservices.data_designer.plugins.fileset_file_seed_source import FilesetFileSeedSource
seed_source = FilesetFileSeedSource(path="default/my-seed-data#seed.csv")
config_builder.with_seed_dataset(seed_source)
DataFrames → Filesets#
# Before (standalone library)
from data_designer.config import DataFrameSeedSource
import pandas as pd
df = pd.read_csv("data.csv")
seed_source = DataFrameSeedSource(dataframe=df)
config_builder.with_seed_dataset(seed_source)
# After (NMP service)
# 1. Save DataFrame and upload to Fileset
import tempfile
import pandas as pd
df = pd.read_csv("data.csv")
with tempfile.NamedTemporaryFile(suffix=".parquet", delete=False) as tmp:
df.to_parquet(tmp.name)
sdk.filesets.create(name="my-seed-data")
sdk.filesets.files.upload(
fileset="my-seed-data",
local_path=tmp.name,
remote_path="seed.parquet"
)
# 2. Reference Fileset in config
from nemo_microservices.data_designer.plugins.fileset_file_seed_source import FilesetFileSeedSource
seed_source = FilesetFileSeedSource(path="default/my-seed-data#seed.parquet")
config_builder.with_seed_dataset(seed_source)
HuggingFace (No Changes Needed)#
# Works the same in both!
from data_designer.config import HuggingFaceSeedSource
# Public dataset
seed_source = HuggingFaceSeedSource(
path="datasets/username/dataset/data/*.parquet"
)
# Private dataset - update token to reference NMP secret
seed_source = HuggingFaceSeedSource(
path="datasets/username/dataset/data/*.parquet",
token="default/hf-token" # Reference to NMP secret
)
config_builder.with_seed_dataset(seed_source)
Step 7: Update Execution Calls#
The method names stay the same, but the client is different:
# Before (standalone library)
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000, dataset_name="my-dataset")
# After (NMP service)
preview = client.preview(config_builder, num_records=10)
results = client.create(config_builder, num_records=1000, wait_until_done=False)
Note: The dataset_name parameter is not available in the service client. Job names are auto-generated.
Step 8: Update Result Access#
Result access is similar but with some differences:
# Preview results (identical)
preview.dataset # pandas DataFrame
preview.analysis.to_report() # Analysis report
preview.display_sample_record() # Display sample
# Create results
# Before (standalone library)
results = data_designer.create(config_builder, num_records=1000)
dataset = results.dataset # Available immediately
analysis = results.analysis
# After (NMP service)
job = client.create(config_builder, num_records=1000)
job.wait_until_done() # Must wait for job completion
dataset = job.load_dataset() # Load from artifact storage
analysis = job.load_analysis()
What Stays the Same#
All configuration code remains identical. You can copy your existing config_builder code directly without any changes.
Configuration APIs#
API |
Status |
Notes |
|---|---|---|
|
✅ Identical |
Constructor signature unchanged |
|
✅ Identical |
All column types supported |
|
✅ Identical |
All constraint types supported |
|
✅ Identical |
All processor types supported |
|
✅ Identical |
Method signature unchanged (seed sources differ) |
|
✅ Identical |
Returns same |
Column Types#
All column types work identically:
✅
SamplerColumnConfig- All sampler types and parameters✅
LLMTextColumnConfig- Text generation with prompts✅
LLMCodeColumnConfig- Code generation✅
LLMStructuredColumnConfig- JSON generation with schemas✅
LLMJudgeColumnConfig- Quality scoring✅
ExpressionColumnConfig- Jinja2 transformations✅
EmbeddingColumnConfig- Vector embeddings✅
ValidationColumnConfig- Code and HTTP validation✅
SeedDatasetColumnConfig- Automatically added with seed data
Other Features#
✅ Jinja2 templating in prompts - Reference other columns with
{{ column_name }}✅ Constraints - All constraint types (scalar, column inequalities)
✅ Processors - All processor types (drop columns, transformations)
✅ Inference parameters - Temperature, top_p, max_tokens, etc.
✅ Sampler parameters - All distributions and configurations
✅ Column dependencies - Automatic resolution based on references
What Changes#
Required Changes#
These changes are mandatory for migration:
Component |
Standalone Library |
NMP Service |
Migration Step |
|---|---|---|---|
Import |
|
|
|
Client |
|
|
|
Model Providers |
Direct |
String references: |
|
Inference |
Direct API calls with keys in code |
Inference Gateway with Secrets service |
|
Local Seed Files |
|
|
|
DataFrame Seeds |
|
|
Behavioral Changes#
These differences affect how you interact with results:
Feature |
Standalone Library |
NMP Service |
|---|---|---|
Execution |
Synchronous (blocks until complete) |
Asynchronous jobs (returns immediately) |
Result Access |
|
|
Artifact Storage |
Local filesystem |
NMP artifact storage |
Job Tracking |
No tracking |
Full job status and monitoring |
Note: You can use wait_until_done=True with create() for synchronous behavior similar to the standalone library.
Unsupported Features#
The following standalone library features are not available in the NMP service:
Feature |
Status |
Workaround |
|---|---|---|
|
❌ Not supported |
Upload to Fileset, use |
|
❌ Not supported |
Save to file, upload to Fileset |
Custom Python function validators |
❌ Not supported |
Use code validators or HTTP validators |
Local model providers |
❌ Not supported |
Use remote inference endpoints via Inference Gateway |
Complete Migration Example#
Here’s a full before/after example:
from data_designer.interface import DataDesigner
import data_designer.config as dd
# Define model providers
model_providers = [
dd.ModelProvider(
name="nvidia-build",
endpoint="https://integrate.api.nvidia.com",
api_key="nvapi-xxx"
)
]
# Model configuration
model_configs = [
dd.ModelConfig(
alias="text",
model="nvidia/nemotron-3-nano-30b-a3b",
provider="nvidia-build", # References provider name
inference_parameters=dd.ChatCompletionInferenceParams(
temperature=1.0,
top_p=1.0,
),
)
]
# Build configuration
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(
dd.SamplerColumnConfig(
name="category",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(values=["A", "B", "C"]),
)
)
config_builder.add_column(
dd.LLMTextColumnConfig(
name="description",
prompt="Describe a {{ category }} product.",
model_alias="text",
)
)
# Execute
data_designer = DataDesigner(
artifact_path="./artifacts",
model_providers=model_providers
)
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000)
# Access results
dataset = results.dataset
analysis = results.analysis
from nemo_microservices import NeMoMicroservices
from nemo_microservices.data_designer.client import NeMoDataDesignerClient
import data_designer.config as dd
# One-time setup: Configure inference
sdk = NeMoMicroservices(base_url="http://localhost:8080", workspace="default")
sdk.secrets.create(
name="nvidia-api-key",
data="nvapi-xxx",
description="NVIDIA API key"
)
sdk.inference.providers.create(
name="build-nvidia",
description="NVIDIA Build API",
host_url="https://integrate.api.nvidia.com",
api_key_secret_name="nvidia-api-key"
)
# Model configuration
model_configs = [
dd.ModelConfig(
alias="text",
model="nvidia/nemotron-3-nano-30b-a3b",
provider="default/build-nvidia", # Reference NMP provider
inference_parameters=dd.ChatCompletionInferenceParams(
temperature=1.0,
top_p=1.0,
),
)
]
# Build configuration (identical!)
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(
dd.SamplerColumnConfig(
name="category",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(values=["A", "B", "C"]),
)
)
config_builder.add_column(
dd.LLMTextColumnConfig(
name="description",
prompt="Describe a {{ category }} product.",
model_alias="text",
)
)
# Execute
client = NeMoDataDesignerClient(sdk=sdk)
preview = client.preview(config_builder, num_records=10)
job = client.create(config_builder, num_records=1000)
# Access results
job.wait_until_done()
dataset = job.load_dataset()
analysis = job.load_analysis()
Benefits of Migration#
Migrating to the NMP service provides:
Scalability: Distributed execution for large datasets
Monitoring: Job tracking and status updates
Artifact Management: Centralized storage and versioning
Team Collaboration: Shared workspaces and resources
Security: Centralized secret management
Infrastructure: Managed inference and compute resources
Getting Help#
Quick Start: See the quickstart guide for setup instructions
Tutorials: Follow the tutorials for hands-on examples
API Reference: Check the API reference for detailed documentation
Library Docs: Refer to the open-source library documentation for configuration details