Migrating from Standalone Library#

If you’re already using the standalone DataDesigner library, this guide shows you how to migrate to the NMP service.

Key Insight

Your configuration code stays the same. All config_builder code (columns, constraints, processors) works identically. Only the execution interface changes.

Migration Summary#

What changes:

Execution interface (imports and client initialization)
Model provider setup (reference by name instead of direct configuration)
Seed data sources (use Filesets or HuggingFace instead of local files)

What stays the same:

All column configurations (samplers, LLM columns, expressions, etc.)
Constraints, processors, and validation logic
Jinja2 templating and prompt syntax
Method names: preview(), create()

Why migrate: Get distributed execution, job monitoring, centralized secrets, and team collaboration.

Quick Overview#

Standalone Library

from data_designer.interface import DataDesigner
import data_designer.config as dd

# Build config
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(...)

# Execute locally
data_designer = DataDesigner(artifact_path="./artifacts")
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000)

NMP Service

from nemo_microservices.data_designer.client import NeMoDataDesignerClient
import data_designer.config as dd

# Build config (identical)
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(...)

# Execute on NMP
client = NeMoDataDesignerClient(base_url="...", workspace="default")
preview = client.preview(config_builder, num_records=10)
results = client.create(config_builder, num_records=1000)

Step-by-Step Migration#

Step 1: Install the NMP SDK#

Replace or supplement your standalone library installation:

# Remove standalone library (optional)
pip uninstall data-designer

# Install NMP SDK with Data Designer support
pip install nemo-microservices[data-designer]

The [data-designer] extra includes the data_designer.config package, so you can still build configurations the same way.

Note

The nemo-microservices[data-designer] package pins to a specific version of the Data Designer library that matches the service version in your NMP deployment. This ensures compatibility between your configuration code and the service.

Step 2: Update Imports#

Change your execution imports:

# Before
from data_designer.interface import DataDesigner

# After
from nemo_microservices.data_designer.client import NeMoDataDesignerClient

Keep these imports unchanged:

import data_designer.config as dd  # Still works!

Step 3: Set Up Inference#

The NMP service routes inference through the Inference Gateway. You need to configure model providers.

Store Your API Key#

from nemo_microservices import NeMoMicroservices

sdk = NeMoMicroservices(base_url="...", workspace="default")

sdk.secrets.create(
    name="my-api-key",
    data="<your-api-key>",
    description="API key for inference provider"
)

Create a Model Provider#

sdk.inference.providers.create(
    name="my-provider",
    description="External inference provider",
    host_url="https://integrate.api.nvidia.com",  # Or your provider URL
    api_key_secret_name="my-api-key"
)

Step 4: Update Model Configurations#

In the standalone library, you pass ModelProvider objects to the DataDesigner constructor. In the NMP service, you reference model providers by name in your ModelConfig.

# Before (standalone library)
from data_designer.interface import DataDesigner
import data_designer.config as dd

# Define model providers
model_providers = [
    dd.ModelProvider(
        name="nvidia-build",
        endpoint="https://integrate.api.nvidia.com",
        api_key="your-api-key"
    )
]

# Model configs reference providers by name
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",
        provider="nvidia-build",  # References the provider name
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# Pass providers to DataDesigner constructor
data_designer = DataDesigner(
    artifact_path="./artifacts",
    model_providers=model_providers
)

# After (NMP service)
import data_designer.config as dd

# Model configs reference NMP model providers
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",
        provider="default/build-nvidia",  # workspace/provider-name
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# No need to pass providers - they're managed by Inference Gateway

Key changes:

ModelProvider objects are no longer defined in your code
Model providers are configured once via Inference Gateway (see Step 3)
provider in ModelConfig now uses fully qualified names: "workspace/provider-name"
No direct API keys in code - managed by Secrets service

Step 5: Update Client Initialization#

Replace the DataDesigner client with NeMoDataDesignerClient:

# Before (standalone library)
data_designer = DataDesigner(
    artifact_path="./artifacts",
    model_providers=[...]  # Optional provider list
)

# After (NMP service)
client = NeMoDataDesignerClient(
    base_url="http://localhost:8080",  # NMP deployment URL
    workspace="default"  # Workspace name
)

# Or using an existing SDK instance
sdk = NeMoMicroservices(base_url="...", workspace="default")
client = NeMoDataDesignerClient(sdk=sdk)

Step 6: Update Seed Data Sources (If Used)#

If you use seed datasets, you need to migrate local sources to remote ones:

Local Files → Filesets#

# Before (standalone library)
from data_designer.config import LocalFileSeedSource

seed_source = LocalFileSeedSource(path="./data/seed.csv")
config_builder.with_seed_dataset(seed_source)

# After (NMP service)
# 1. Upload file to Fileset
sdk.filesets.create(name="my-seed-data")
sdk.filesets.files.upload(
    fileset="my-seed-data",
    local_path="./data/seed.csv",
    remote_path="seed.csv"
)

# 2. Reference Fileset in config
from nemo_microservices.data_designer.plugins.fileset_file_seed_source import FilesetFileSeedSource

seed_source = FilesetFileSeedSource(path="default/my-seed-data#seed.csv")
config_builder.with_seed_dataset(seed_source)

DataFrames → Filesets#

# Before (standalone library)
from data_designer.config import DataFrameSeedSource
import pandas as pd

df = pd.read_csv("data.csv")
seed_source = DataFrameSeedSource(dataframe=df)
config_builder.with_seed_dataset(seed_source)

# After (NMP service)
# 1. Save DataFrame and upload to Fileset
import tempfile
import pandas as pd

df = pd.read_csv("data.csv")

with tempfile.NamedTemporaryFile(suffix=".parquet", delete=False) as tmp:
    df.to_parquet(tmp.name)

    sdk.filesets.create(name="my-seed-data")
    sdk.filesets.files.upload(
        fileset="my-seed-data",
        local_path=tmp.name,
        remote_path="seed.parquet"
    )

# 2. Reference Fileset in config
from nemo_microservices.data_designer.plugins.fileset_file_seed_source import FilesetFileSeedSource

seed_source = FilesetFileSeedSource(path="default/my-seed-data#seed.parquet")
config_builder.with_seed_dataset(seed_source)

HuggingFace (No Changes Needed)#

# Works the same in both!
from data_designer.config import HuggingFaceSeedSource

# Public dataset
seed_source = HuggingFaceSeedSource(
    path="datasets/username/dataset/data/*.parquet"
)

# Private dataset - update token to reference NMP secret
seed_source = HuggingFaceSeedSource(
    path="datasets/username/dataset/data/*.parquet",
    token="default/hf-token"  # Reference to NMP secret
)

config_builder.with_seed_dataset(seed_source)

Step 7: Update Execution Calls#

The method names stay the same, but the client is different:

# Before (standalone library)
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000, dataset_name="my-dataset")

# After (NMP service)
preview = client.preview(config_builder, num_records=10)
results = client.create(config_builder, num_records=1000, wait_until_done=False)

Note: The dataset_name parameter is not available in the service client. Job names are auto-generated.

Step 8: Update Result Access#

Result access is similar but with some differences:

# Preview results (identical)
preview.dataset  # pandas DataFrame
preview.analysis.to_report()  # Analysis report
preview.display_sample_record()  # Display sample

# Create results
# Before (standalone library)
results = data_designer.create(config_builder, num_records=1000)
dataset = results.dataset  # Available immediately
analysis = results.analysis

# After (NMP service)
job = client.create(config_builder, num_records=1000)
job.wait_until_done()  # Must wait for job completion
dataset = job.load_dataset()  # Load from artifact storage
analysis = job.load_analysis()

What Stays the Same#

All configuration code remains identical. You can copy your existing config_builder code directly without any changes.

Configuration APIs#

API	Status	Notes
`DataDesignerConfigBuilder(model_configs)`	✅ Identical	Constructor signature unchanged
`config_builder.add_column(...)`	✅ Identical	All column types supported
`config_builder.add_constraint(...)`	✅ Identical	All constraint types supported
`config_builder.add_processor(...)`	✅ Identical	All processor types supported
`config_builder.with_seed_dataset(...)`	✅ Identical	Method signature unchanged (seed sources differ)
`config_builder.build()`	✅ Identical	Returns same `DataDesignerConfig` object

Column Types#

All column types work identically:

✅ SamplerColumnConfig - All sampler types and parameters
✅ LLMTextColumnConfig - Text generation with prompts
✅ LLMCodeColumnConfig - Code generation
✅ LLMStructuredColumnConfig - JSON generation with schemas
✅ LLMJudgeColumnConfig - Quality scoring
✅ ExpressionColumnConfig - Jinja2 transformations
✅ EmbeddingColumnConfig - Vector embeddings
✅ ValidationColumnConfig - Code and HTTP validation
✅ SeedDatasetColumnConfig - Automatically added with seed data

Other Features#

✅ Jinja2 templating in prompts - Reference other columns with {{ column_name }}
✅ Constraints - All constraint types (scalar, column inequalities)
✅ Processors - All processor types (drop columns, transformations)
✅ Inference parameters - Temperature, top_p, max_tokens, etc.
✅ Sampler parameters - All distributions and configurations
✅ Column dependencies - Automatic resolution based on references

What Changes#

Required Changes#

These changes are mandatory for migration:

Component	Standalone Library	NMP Service	Migration Step
Import	`from data_designer.interface import DataDesigner`	`from nemo_microservices.data_designer.client import NeMoDataDesignerClient`	Step 2
Client	`DataDesigner(artifact_path="...")`	`NeMoDataDesignerClient(base_url="...", workspace="...")`	Step 5
Model Providers	Direct `ModelProvider` objects	String references: `"workspace/provider-name"`	Step 4
Inference	Direct API calls with keys in code	Inference Gateway with Secrets service	Step 3
Local Seed Files	`LocalFileSeedSource`	`FilesetFileSeedSource` (upload first)	Step 6
DataFrame Seeds	`DataFrameSeedSource`	`FilesetFileSeedSource` (upload first)	Step 6

Behavioral Changes#

These differences affect how you interact with results:

Feature	Standalone Library	NMP Service
Execution	Synchronous (blocks until complete)	Asynchronous jobs (returns immediately)
Result Access	`results.dataset` (immediate)	`job.load_dataset()` (after completion)
Artifact Storage	Local filesystem	NMP artifact storage
Job Tracking	No tracking	Full job status and monitoring

Note: You can use wait_until_done=True with create() for synchronous behavior similar to the standalone library.

Unsupported Features#

The following standalone library features are not available in the NMP service:

Feature	Status	Workaround
`LocalFileSeedSource`	❌ Not supported	Upload to Fileset, use `FilesetFileSeedSource`
`DataFrameSeedSource`	❌ Not supported	Save to file, upload to Fileset
Custom Python function validators	❌ Not supported	Use code validators or HTTP validators
Local model providers	❌ Not supported	Use remote inference endpoints via Inference Gateway

Complete Migration Example#

Here’s a full before/after example:

Before (Standalone)

from data_designer.interface import DataDesigner
import data_designer.config as dd

# Define model providers
model_providers = [
    dd.ModelProvider(
        name="nvidia-build",
        endpoint="https://integrate.api.nvidia.com",
        api_key="nvapi-xxx"
    )
]

# Model configuration
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",
        provider="nvidia-build",  # References provider name
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# Build configuration
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="category",
        sampler_type=dd.SamplerType.CATEGORY,
        params=dd.CategorySamplerParams(values=["A", "B", "C"]),
    )
)
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="description",
        prompt="Describe a {{ category }} product.",
        model_alias="text",
    )
)

# Execute
data_designer = DataDesigner(
    artifact_path="./artifacts",
    model_providers=model_providers
)
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000)

# Access results
dataset = results.dataset
analysis = results.analysis

After (NMP Service)

from nemo_microservices import NeMoMicroservices
from nemo_microservices.data_designer.client import NeMoDataDesignerClient
import data_designer.config as dd

# One-time setup: Configure inference
sdk = NeMoMicroservices(base_url="http://localhost:8080", workspace="default")

sdk.secrets.create(
    name="nvidia-api-key",
    data="nvapi-xxx",
    description="NVIDIA API key"
)

sdk.inference.providers.create(
    name="build-nvidia",
    description="NVIDIA Build API",
    host_url="https://integrate.api.nvidia.com",
    api_key_secret_name="nvidia-api-key"
)

# Model configuration
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",
        provider="default/build-nvidia",  # Reference NMP provider
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# Build configuration (identical!)
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="category",
        sampler_type=dd.SamplerType.CATEGORY,
        params=dd.CategorySamplerParams(values=["A", "B", "C"]),
    )
)
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="description",
        prompt="Describe a {{ category }} product.",
        model_alias="text",
    )
)

# Execute
client = NeMoDataDesignerClient(sdk=sdk)
preview = client.preview(config_builder, num_records=10)
job = client.create(config_builder, num_records=1000)

# Access results
job.wait_until_done()
dataset = job.load_dataset()
analysis = job.load_analysis()

Benefits of Migration#

Migrating to the NMP service provides:

Scalability: Distributed execution for large datasets
Monitoring: Job tracking and status updates
Artifact Management: Centralized storage and versioning
Team Collaboration: Shared workspaces and resources
Security: Centralized secret management
Infrastructure: Managed inference and compute resources

Getting Help#

Quick Start: See the quickstart guide for setup instructions
Tutorials: Follow the tutorials for hands-on examples
API Reference: Check the API reference for detailed documentation
Library Docs: Refer to the open-source library documentation for configuration details