Migrating from Standalone Library#

If you’re already using the standalone DataDesigner library, this guide shows you how to migrate to the NMP service.

Key Insight

Your configuration code stays the same. All config_builder code (columns, constraints, processors) works identically. Only the execution interface changes.

Migration Summary#

What changes:

  • Execution interface (imports and client initialization)

  • Model provider setup (reference by name instead of direct configuration)

  • Seed data sources (use Filesets or HuggingFace instead of local files)

What stays the same:

  • All column configurations (samplers, LLM columns, expressions, etc.)

  • Constraints, processors, and validation logic

  • Jinja2 templating and prompt syntax

  • Method names: preview(), create()

Why migrate: Get distributed execution, job monitoring, centralized secrets, and team collaboration.

Quick Overview#

Standalone Library

from data_designer.interface import DataDesigner
import data_designer.config as dd

# Build config
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(...)

# Execute locally
data_designer = DataDesigner(artifact_path="./artifacts")
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000)

NMP Service

from nemo_microservices.data_designer.client import NeMoDataDesignerClient
import data_designer.config as dd

# Build config (identical)
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(...)

# Execute on NMP
client = NeMoDataDesignerClient(base_url="...", workspace="default")
preview = client.preview(config_builder, num_records=10)
results = client.create(config_builder, num_records=1000)

Step-by-Step Migration#

Step 1: Install the NMP SDK#

Replace or supplement your standalone library installation:

# Remove standalone library (optional)
pip uninstall data-designer

# Install NMP SDK with Data Designer support
pip install nemo-microservices[data-designer]

The [data-designer] extra includes the data_designer.config package, so you can still build configurations the same way.

Note

The nemo-microservices[data-designer] package pins to a specific version of the Data Designer library that matches the service version in your NMP deployment. This ensures compatibility between your configuration code and the service.

Step 2: Update Imports#

Change your execution imports:

# Before
from data_designer.interface import DataDesigner

# After
from nemo_microservices.data_designer.client import NeMoDataDesignerClient

Keep these imports unchanged:

import data_designer.config as dd  # Still works!

Step 3: Set Up Inference#

The NMP service routes inference through the Inference Gateway. You need to configure model providers.

Store Your API Key#

from nemo_microservices import NeMoMicroservices

sdk = NeMoMicroservices(base_url="...", workspace="default")

sdk.secrets.create(
    name="my-api-key",
    data="<your-api-key>",
    description="API key for inference provider"
)

Create a Model Provider#

sdk.inference.providers.create(
    name="my-provider",
    description="External inference provider",
    host_url="https://integrate.api.nvidia.com",  # Or your provider URL
    api_key_secret_name="my-api-key"
)

Step 4: Update Model Configurations#

In the standalone library, you pass ModelProvider objects to the DataDesigner constructor. In the NMP service, you reference model providers by name in your ModelConfig.

# Before (standalone library)
from data_designer.interface import DataDesigner
import data_designer.config as dd

# Define model providers
model_providers = [
    dd.ModelProvider(
        name="nvidia-build",
        endpoint="https://integrate.api.nvidia.com",
        api_key="your-api-key"
    )
]

# Model configs reference providers by name
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",
        provider="nvidia-build",  # References the provider name
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# Pass providers to DataDesigner constructor
data_designer = DataDesigner(
    artifact_path="./artifacts",
    model_providers=model_providers
)

# After (NMP service)
import data_designer.config as dd

# Model configs reference NMP model providers
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",
        provider="default/build-nvidia",  # workspace/provider-name
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# No need to pass providers - they're managed by Inference Gateway

Key changes:

  • ModelProvider objects are no longer defined in your code

  • Model providers are configured once via Inference Gateway (see Step 3)

  • provider in ModelConfig now uses fully qualified names: "workspace/provider-name"

  • No direct API keys in code - managed by Secrets service

Step 5: Update Client Initialization#

Replace the DataDesigner client with NeMoDataDesignerClient:

# Before (standalone library)
data_designer = DataDesigner(
    artifact_path="./artifacts",
    model_providers=[...]  # Optional provider list
)

# After (NMP service)
client = NeMoDataDesignerClient(
    base_url="http://localhost:8080",  # NMP deployment URL
    workspace="default"  # Workspace name
)

# Or using an existing SDK instance
sdk = NeMoMicroservices(base_url="...", workspace="default")
client = NeMoDataDesignerClient(sdk=sdk)

Step 6: Update Seed Data Sources (If Used)#

If you use seed datasets, you need to migrate local sources to remote ones:

Local Files → Filesets#

# Before (standalone library)
from data_designer.config import LocalFileSeedSource

seed_source = LocalFileSeedSource(path="./data/seed.csv")
config_builder.with_seed_dataset(seed_source)

# After (NMP service)
# 1. Upload file to Fileset
sdk.filesets.create(name="my-seed-data")
sdk.filesets.files.upload(
    fileset="my-seed-data",
    local_path="./data/seed.csv",
    remote_path="seed.csv"
)

# 2. Reference Fileset in config
from nemo_microservices.data_designer.plugins.fileset_file_seed_source import FilesetFileSeedSource

seed_source = FilesetFileSeedSource(path="default/my-seed-data#seed.csv")
config_builder.with_seed_dataset(seed_source)

DataFrames → Filesets#

# Before (standalone library)
from data_designer.config import DataFrameSeedSource
import pandas as pd

df = pd.read_csv("data.csv")
seed_source = DataFrameSeedSource(dataframe=df)
config_builder.with_seed_dataset(seed_source)

# After (NMP service)
# 1. Save DataFrame and upload to Fileset
import tempfile
import pandas as pd

df = pd.read_csv("data.csv")

with tempfile.NamedTemporaryFile(suffix=".parquet", delete=False) as tmp:
    df.to_parquet(tmp.name)

    sdk.filesets.create(name="my-seed-data")
    sdk.filesets.files.upload(
        fileset="my-seed-data",
        local_path=tmp.name,
        remote_path="seed.parquet"
    )

# 2. Reference Fileset in config
from nemo_microservices.data_designer.plugins.fileset_file_seed_source import FilesetFileSeedSource

seed_source = FilesetFileSeedSource(path="default/my-seed-data#seed.parquet")
config_builder.with_seed_dataset(seed_source)

HuggingFace (No Changes Needed)#

# Works the same in both!
from data_designer.config import HuggingFaceSeedSource

# Public dataset
seed_source = HuggingFaceSeedSource(
    path="datasets/username/dataset/data/*.parquet"
)

# Private dataset - update token to reference NMP secret
seed_source = HuggingFaceSeedSource(
    path="datasets/username/dataset/data/*.parquet",
    token="default/hf-token"  # Reference to NMP secret
)

config_builder.with_seed_dataset(seed_source)

Step 7: Update Execution Calls#

The method names stay the same, but the client is different:

# Before (standalone library)
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000, dataset_name="my-dataset")

# After (NMP service)
preview = client.preview(config_builder, num_records=10)
results = client.create(config_builder, num_records=1000, wait_until_done=False)

Note: The dataset_name parameter is not available in the service client. Job names are auto-generated.

Step 8: Update Result Access#

Result access is similar but with some differences:

# Preview results (identical)
preview.dataset  # pandas DataFrame
preview.analysis.to_report()  # Analysis report
preview.display_sample_record()  # Display sample

# Create results
# Before (standalone library)
results = data_designer.create(config_builder, num_records=1000)
dataset = results.dataset  # Available immediately
analysis = results.analysis

# After (NMP service)
job = client.create(config_builder, num_records=1000)
job.wait_until_done()  # Must wait for job completion
dataset = job.load_dataset()  # Load from artifact storage
analysis = job.load_analysis()

What Stays the Same#

All configuration code remains identical. You can copy your existing config_builder code directly without any changes.

Configuration APIs#

API

Status

Notes

DataDesignerConfigBuilder(model_configs)

✅ Identical

Constructor signature unchanged

config_builder.add_column(...)

✅ Identical

All column types supported

config_builder.add_constraint(...)

✅ Identical

All constraint types supported

config_builder.add_processor(...)

✅ Identical

All processor types supported

config_builder.with_seed_dataset(...)

✅ Identical

Method signature unchanged (seed sources differ)

config_builder.build()

✅ Identical

Returns same DataDesignerConfig object

Column Types#

All column types work identically:

  • SamplerColumnConfig - All sampler types and parameters

  • LLMTextColumnConfig - Text generation with prompts

  • LLMCodeColumnConfig - Code generation

  • LLMStructuredColumnConfig - JSON generation with schemas

  • LLMJudgeColumnConfig - Quality scoring

  • ExpressionColumnConfig - Jinja2 transformations

  • EmbeddingColumnConfig - Vector embeddings

  • ValidationColumnConfig - Code and HTTP validation

  • SeedDatasetColumnConfig - Automatically added with seed data

Other Features#

  • Jinja2 templating in prompts - Reference other columns with {{ column_name }}

  • Constraints - All constraint types (scalar, column inequalities)

  • Processors - All processor types (drop columns, transformations)

  • Inference parameters - Temperature, top_p, max_tokens, etc.

  • Sampler parameters - All distributions and configurations

  • Column dependencies - Automatic resolution based on references

What Changes#

Required Changes#

These changes are mandatory for migration:

Component

Standalone Library

NMP Service

Migration Step

Import

from data_designer.interface import DataDesigner

from nemo_microservices.data_designer.client import NeMoDataDesignerClient

Step 2

Client

DataDesigner(artifact_path="...")

NeMoDataDesignerClient(base_url="...", workspace="...")

Step 5

Model Providers

Direct ModelProvider objects

String references: "workspace/provider-name"

Step 4

Inference

Direct API calls with keys in code

Inference Gateway with Secrets service

Step 3

Local Seed Files

LocalFileSeedSource

FilesetFileSeedSource (upload first)

Step 6

DataFrame Seeds

DataFrameSeedSource

FilesetFileSeedSource (upload first)

Step 6

Behavioral Changes#

These differences affect how you interact with results:

Feature

Standalone Library

NMP Service

Execution

Synchronous (blocks until complete)

Asynchronous jobs (returns immediately)

Result Access

results.dataset (immediate)

job.load_dataset() (after completion)

Artifact Storage

Local filesystem

NMP artifact storage

Job Tracking

No tracking

Full job status and monitoring

Note: You can use wait_until_done=True with create() for synchronous behavior similar to the standalone library.

Unsupported Features#

The following standalone library features are not available in the NMP service:

Feature

Status

Workaround

LocalFileSeedSource

❌ Not supported

Upload to Fileset, use FilesetFileSeedSource

DataFrameSeedSource

❌ Not supported

Save to file, upload to Fileset

Custom Python function validators

❌ Not supported

Use code validators or HTTP validators

Local model providers

❌ Not supported

Use remote inference endpoints via Inference Gateway

Complete Migration Example#

Here’s a full before/after example:

from data_designer.interface import DataDesigner
import data_designer.config as dd

# Define model providers
model_providers = [
    dd.ModelProvider(
        name="nvidia-build",
        endpoint="https://integrate.api.nvidia.com",
        api_key="nvapi-xxx"
    )
]

# Model configuration
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",
        provider="nvidia-build",  # References provider name
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# Build configuration
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="category",
        sampler_type=dd.SamplerType.CATEGORY,
        params=dd.CategorySamplerParams(values=["A", "B", "C"]),
    )
)
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="description",
        prompt="Describe a {{ category }} product.",
        model_alias="text",
    )
)

# Execute
data_designer = DataDesigner(
    artifact_path="./artifacts",
    model_providers=model_providers
)
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000)

# Access results
dataset = results.dataset
analysis = results.analysis
from nemo_microservices import NeMoMicroservices
from nemo_microservices.data_designer.client import NeMoDataDesignerClient
import data_designer.config as dd

# One-time setup: Configure inference
sdk = NeMoMicroservices(base_url="http://localhost:8080", workspace="default")

sdk.secrets.create(
    name="nvidia-api-key",
    data="nvapi-xxx",
    description="NVIDIA API key"
)

sdk.inference.providers.create(
    name="build-nvidia",
    description="NVIDIA Build API",
    host_url="https://integrate.api.nvidia.com",
    api_key_secret_name="nvidia-api-key"
)

# Model configuration
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",
        provider="default/build-nvidia",  # Reference NMP provider
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# Build configuration (identical!)
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="category",
        sampler_type=dd.SamplerType.CATEGORY,
        params=dd.CategorySamplerParams(values=["A", "B", "C"]),
    )
)
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="description",
        prompt="Describe a {{ category }} product.",
        model_alias="text",
    )
)

# Execute
client = NeMoDataDesignerClient(sdk=sdk)
preview = client.preview(config_builder, num_records=10)
job = client.create(config_builder, num_records=1000)

# Access results
job.wait_until_done()
dataset = job.load_dataset()
analysis = job.load_analysis()

Benefits of Migration#

Migrating to the NMP service provides:

  • Scalability: Distributed execution for large datasets

  • Monitoring: Job tracking and status updates

  • Artifact Management: Centralized storage and versioning

  • Team Collaboration: Shared workspaces and resources

  • Security: Centralized secret management

  • Infrastructure: Managed inference and compute resources

Getting Help#